Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(search): append scraped API records to algolia index in CI #9366

Merged
merged 1 commit into from
Jun 12, 2024

Conversation

gforsyth
Copy link
Member

There's a larger issue here, which is that the quarto search.json
doesn't seem to include a bunch of items which we generate using
quartodoc, which makes the docs very unhelpful for someone trying to
search (especially) method names.

This is definitely a hack, but I've tried uploading these records
manually and it does make a noticeable improvement.

And yes, I am scraping through QMD files to grab the anchors, and
descriptions, and method names, and yes that's gross, but computers are gross.

I'm planning to spend a bit more time to try to better understand how we
can better augment the algolia index so our search is more useful, but
this is both a start, and a proof-of-concept that we can append to our
existing index.

@gforsyth
Copy link
Member Author

I ran this script locally and generated a records.json file which I then uploaded via the algolia dashboard, so you can check that method names now show up prominently in the search bar (so long as another PR doesn't get merged and reset the index...)

@gforsyth gforsyth added the docs Documentation related issues or PRs label Jun 12, 2024
Copy link
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All code is hacks and this is an effective one. +1 from me.

"docs/reference/expression-temporal.qmd",
]

HORRID_REGEX = re.compile(r"\|\s*\[(\w+)\]\((#[\w.]+)\)\s*\|\s*(.*?)\s*\|")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱

env:
ALGOLIA_WRITE_API_KEY: ${{ secrets.ALGOLIA_WRITE_API_KEY }}
ALGOLIA_APP_ID: HS77W8GWM1
ALGOLIA_INDEX: prod_ibis
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to avoid duplicating these envs across steps? Like a top-level env mapping? Not a big deal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably, I'll try to consolidate in a follow-up when i add a few more tweaks to the algolia index creation.

@@ -0,0 +1,72 @@
from __future__ import annotations # noqa: INP001
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This honestly isn't that horrid of a script. A few more comments/docstrings to explain the method to the method madness would help though. "This script generates records for algolia to search for all methods/functions because ...., the records look like ...."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to document a bit more. Time to 🚢

Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Jim, I've seen and done much worse 😂

There's a larger issue here, which is that the quarto `search.json`
doesn't seem to include a bunch of items which we generate using
`quartodoc`, which makes the docs very unhelpful for someone trying to
search (especially) method names.

This is definitely a hack, but I've tried uploading these records
manually and it does make a noticeable improvement.

And yes, I am scraping through QMD files to grab the anchors, and
descriptions, and method names, and yes that's gross, but computers are gross.

I'm planning to spend a bit more time to try to better understand how we
can better augment the algolia index so our search is more useful, but
this is both a start, and a proof-of-concept that we can append to our
existing index.
@gforsyth gforsyth enabled auto-merge (squash) June 12, 2024 22:39
@gforsyth gforsyth merged commit 05d9d7a into ibis-project:main Jun 12, 2024
76 checks passed
@cpcloud cpcloud added this to the 9.1 milestone Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation related issues or PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants