-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Searching HTML contents #224
Comments
Hi @Zloka , thanks for the kind words :) In principle, there is nothing preventing you to index and search HTML content with Should one be able to search for tags like |
Thank you for your prompt reply! I went with your suggestion, and it seems to work well! A follow-up question came to mind: In my case, I can pre-process the files in a build step ahead of time. I figured this would be desirable, as I can build the While this works fine, the client-side of my application is a To give back, in case someone else has a similar use-case, here is a TypeScript script that outlines the rough approach I took creating the
|
You are right @Zloka , at the moment there is no utility function in In general, whether it makes sense to pre-build the index very much depends on the kind of pre-processing needed, and on how frequently the index needs to change. In many cases I have seen, indexing "just in time" is the best solution, as it is much simpler to implement and maintain. That said, since you need pre-processing, if your documents don't change too often it is probably more efficient to pre-build. Thank you for sharing your code, it is always useful for other people landing on an issue! |
I will open an issue there! Myself, I can work around it using For potential future readers, here's a draft of how one could implement it using I decided to create a
I then created a
It should then be straightforward to hook |
Hi! First off, thank you for a great library.
A new use-case for me would be to search HTML content. In practice, my data consists of what can be considered "pages", consisting of a
title
and someHTML
content. Do you happen to have any suggestions as to how I should handle searching the HTML content? Will the library handle it well as such, or should I look to e.g. parse it into "plaintext" by stripping away tags and such?The text was updated successfully, but these errors were encountered: