Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Search in MyST Sites #100

Open
Tracked by #1106 ...
rowanc1 opened this issue Nov 29, 2022 · 3 comments
Open
Tracked by #1106 ...

Support Search in MyST Sites #100

rowanc1 opened this issue Nov 29, 2022 · 3 comments

Comments

@rowanc1
Copy link
Member

rowanc1 commented Nov 29, 2022

Search is currently not enabled in mystjs sites, but certainly something we want!

A few options:

It would be nice for offline/local search to work, so this is likely plug-able, and can be taken over by the theme, deployment or site config.

References

@stevejpurves
Copy link
Member

interesting package in this comment executablebooks/jupyter-book#815 and the issue mentioned above

@choldgraf
Copy link
Member

Idea: search across MyST sites in intersphinx-style config?

One pain point of many organizations is that they host their documentation in multiple places, but link to them from a single place. For example, the 2i2c documentation the Dask documentation and our myst-tools site all have a topbar that links across pages that are hosted in different repositories.

One confusion point with this is that the scope of search (in Sphinx) is restricted to the currently-viewed sub-site, while I think many users expect a single search to work across all sites.

I wonder if it is possible to use configuration similar to intersphinx to pull in the search registries of other MyST websites, and include them in the search index for the currently-viewed site (e.g. either as a build-time operation, a server-side operation, or a client-side operation). Might be unrealistic to pull all of the text from those sites, but maybe if we store a registry of keywords and pages that would be enough?

@nthiery
Copy link

nthiery commented Jun 15, 2024

Out of scope for this ticket, but to keep in the back of our mind: RAG (Retrieval Augmented Generation) is getting traction these days. In short: you chat with a generative AI chatbot (e.g. chat-GPT), and you would like it to answer questions in the context of some given collection of documents. How does it work? Well, pretty much like for search engines by prebuilding an "index":
retraining the chatbot on the documents, or sending the documents together with each question would indeed be too costly. So instead once for all you split the document in chunks, and compute a vector (an embedding of the chunk as vector in some space) for each chunk that summarizes what the chunk is about (the equivalent of an index). Then, upon asking a question, a vector is built for your question and matched against the vectors of all the chunks to retrieve these that could be related. Finally, these chunks are fed back as context to the chatbot.

So we can foresee that, in some future, building and publishing a collection of vectors would be as much a standard part of the build process of a myst web site than building and publishing an index for it. As well as providing cross-site vectors, tailored chat bots, etc.

And it might be not that distant future. A student of mine is right now building a proof of concept of chatbot tailored to my course notes, and so is a student of a colleague.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants