Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: migrate search engine to full-text-search package (#748) #968

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

adamgundry
Copy link
Member

See #748. This removes the search engine code in favour of the full-text-search package (which was derived from the Hackage codebase originally, so is almost exactly compatible). This is a very quick attempt; I haven't verified if there are code changes on either side that need to be dealt with.

It has a couple of awkward places that still need attention:

  • The queryExplain API apparently changed fractionally, which affects searchPackagesExplain.
  • I don't know how to define a MemSize instance, because full-text-search does not expose the internal representation of SearchEngine.

@gbaz
Copy link
Contributor

gbaz commented Aug 18, 2021

Would this let us implement #727 as well?

@adamgundry
Copy link
Member Author

Would this let us implement #727 as well?

It gives one possible strategy, yes.

The full-text-search package offers queryAutosuggest (https://hackage.haskell.org/package/full-text-search-0.2.1.4/docs/Data-SearchEngine.html#v:queryAutosuggest) which given a query with a partial last term, uses the search index to find potential completions of the term and produce both a ranked list of completions and the corresponding result documents. (I can't remember off the top of my head whether the completions are already stemmed; I think they might be.) Thus it's a bit more flexible than just autocompleting based on package name, but on the other hand the result quality depends on how well-tuned the fields/parameters are in the search index.

We'd then need to expose a query API wrapping queryAutosuggest, and have some frontend scripting to issue queries as the user types. I'm not really familiar enough with hackage-server to know how easy that would be. And even with debouncing, I guess that could potentially increase load on the server a fair bit?

@gbaz
Copy link
Contributor

gbaz commented Feb 18, 2022

I think we really should merge this. Sorry for letting it languish. Can you explain what "The queryExplain API apparently changed fractionally, which affects searchPackagesExplain." means in more detail?

(the memsize issue doesn't matter so much, imho)

@adamgundry
Copy link
Member Author

I think we really should merge this. Sorry for letting it languish. Can you explain what "The queryExplain API apparently changed fractionally, which affects searchPackagesExplain." means in more detail?

(the memsize issue doesn't matter so much, imho)

No need to apologise, because this clearly wasn't quite in a mergeable state when I threw it together, and apologies myself for not getting back to this sooner.

I've looked slightly more closely at the queryExplain issue, and I think the issue is that cf0f17e changed the search implementation on the hackage-server side, but this change isn't (yet) present in full-text-search. In particular this needs changes around query as well, but its type didn't change, so I hadn't noticed before.

Also, I noticed that queryExplain was removed from hackage-server in #1021. This seems unfortunate if we want to refine the field weighting in the future? cc @ysangkok

@ysangkok
Copy link
Member

@adamgundry Since the queryExplain code is really a debugging feature, I thought there is no need to keep it around in the codebase. If the search algorithm needs adjusting, these changes can be easily undone, since they were only removing code.

If you think it is better to keep the unreferenced code around, I wouldn't mind too much. I just think it is better to err on the side of removal. To me it is a philosophical choice, one could go either way, but if we choose to keep unused code around, that choice should be explicit so people can take it into consideration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants