New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Brainstorming search results that are autosuggested and shown on results page #2421

Closed
ebarry opened this Issue Feb 28, 2018 · 15 comments

Comments

Projects
4 participants
@ebarry
Member

ebarry commented Feb 28, 2018

Update: this is a long conversation and there are some next steps being broken out. Please continue to use this issue for brainstorming! Thanks :)

Original issue continues below:

Please describe the problem

The system by which autosuggested results seems to choose and rank content suggestions is mysterious, and seems like a black box.

Autosuggested results have a display limit of 15 assorted content types, but do not provide an overview of Public Lab resources on a topic.

What did you expect to see that you didn't?

I expect to understand what the results mean.

Please show us where to look

The Search box in the menu bar

@ebarry ebarry added this to the API and search improvements milestone Feb 28, 2018

@jywarren

This comment has been minimized.

Contributor

jywarren commented Feb 28, 2018

It is actually a black box! Full text search is a complex problem and we solve it with the "fulltext" module of MySQL, our database system; some pretty arcane (but thorough) documentation is here: https://dev.mysql.com/doc/refman/5.7/en/fulltext-search.html

It does seem we can tune/adjust it, though. There is, for example, a "natural language" option which attempts to algorithmically determine "relevance" -- https://dev.mysql.com/doc/refman/5.7/en/fulltext-natural-language.html

We use this fulltext feature on this line:

Revision.where('MATCH(node_revisions.body, node_revisions.title) AGAINST(?)', query)

It does look like we could "turn on" natural language mode by making that say:

    Revision.where('MATCH(node_revisions.body, node_revisions.title) AGAINST(? IN NATURAL LANGUAGE MODE)', query)

We may also need to then add ordering by relevance -- so, i /think/ that would be:

Revision.select('node_revisions.body, node_revisions.title, MATCH(node_revisions.body, node_revisions.title) AGAINST("' + query.to_s + '" IN NATURAL LANGUAGE MODE) AS score')
  .where('MATCH(node_revisions.body, node_revisions.title) AGAINST(? IN NATURAL LANGUAGE MODE)', query)

It might take some testing out.

Would you like to try this out? I have to point out that I do NOT know what will happen. The documentation for "natural language" says:

Every correct word in the collection and in the query is weighted according to its significance in the collection or query. Thus, a word that is present in many documents has a lower weight, because it has lower semantic value in this particular collection. Conversely, if the word is rare, it receives a higher weight. The weights of the words are combined to compute the relevance of the row. This technique works best with large collections.


As to the second issue, --

...but do not provide an overview of Public Lab resources on a topic.

I expect to understand what the results mean.

How might we break this down a bit? Do you mean that you'd like to show a mix of types, or that you'd like to show explanatory information about what different types are?

Thanks!

@jywarren

This comment has been minimized.

Contributor

jywarren commented Feb 28, 2018

I tested the above query and it does run, although again, I'm not super clear on how it works. But it'd be pretty easy to put it into production if you'd like!

@bronwen9

This comment has been minimized.

bronwen9 commented Feb 28, 2018

What I'd like to see in the auto-suggest is a list of search terms based on weight (popular, busy pages first). On the results page I would like to see keyword results weighted by relevance (popularity, whether the word in question is included in a tag or a title, etc), and then sorted by type (note, profile, question, comment, etc). I would then like to be able to search within the keyword results (say, I'm interested in spectrometers, but would like to narrow down my search to find examples of how they've been used in schools)

@jywarren

This comment has been minimized.

Contributor

jywarren commented Feb 28, 2018

Hi, Bronwen, thanks. Let's break this into separate features:

  1. auto-suggest search ordered by popularity (is this # of views, or likes, or another preference?)
  2. results page ordered by relevance (popularity, whether the word in question is included in a tag or a title, etc)
  3. results page displays each type (note, profile, question, comment, etc) separately -- like this, for example? https://publiclab.org/search/dynamic (that page doesn't work well yet)
  4. ability to refine search within the keyword results (say, I'm interested in spectrometers, but would like to narrow down my search to find examples of how they've been used in schools) -- how would you specify this, do you think? Could you continue typing in the search input and see the results narrow more? Or is there another interface you'd like to suggest?

Thanks! This is super helpful.

@jywarren

This comment has been minimized.

Contributor

jywarren commented Feb 28, 2018

And for the second one up there, do you mean not "relevance" as is defined in my comment above about "natural language search" but a definition of popularity such as "likes" or "views"?

@bronwen9

This comment has been minimized.

bronwen9 commented Mar 1, 2018

I think we'd probably want to create a rubric for relevance could includes likes/views, but also weights results based on KIND of page (a wiki page with search term in the title might always show up higher on a list than, say, a comment).

One example where we're struggling with kinds of results is a search for "open hour. On our website, this search brings up 15 research notes in the auto suggest, and two research notes on the keyword search, but none of them direct to our Open Hour page. I do think a popularity ranking would help with this, and might be simpler than introducing a semantic search feature, but I can see either offering improvements.

When I perform the same search on google (without boolean operators), I see a list or results that starts with our main open hour page, followed by items tagged with "openhour" and "open-hour", followed by links to pages for individual open hours. This would seem to be a sensible rubric for page-type sorting (providing that it's still possible to browse or narrow searchers for all occurrences of a search term on our site)

openhour

openhour2

@jywarren

This comment has been minimized.

Contributor

jywarren commented Mar 1, 2018

@bronwen9

This comment has been minimized.

bronwen9 commented Mar 7, 2018

Ah, sorry for the late response, but I think that it would be great to try some of these. I think at some point we're going to need the ability to work with boolean operators (whether that's through additional search fields or allowing for more than one word or phrase in the field), but I think any of these options would help get us closer to understanding where things are going haywire in the existing search. Plus-one to trying all three!

@jywarren

This comment has been minimized.

Contributor

jywarren commented Mar 20, 2018

Work now ongoing in #2518 -- this will result in:

Soon!

(update: now live on the site!)

@jywarren

This comment has been minimized.

Contributor

jywarren commented Mar 25, 2018

Hi, this needs some review and reorganization now that the above searches work -- @bronwen9 and @ebarry -- thanks for your help so far! Some additional steps might be:

  • create a button or set of links to change the sorting on pages like https://publiclab.org/search/oil-spill
  • choose one of these as the default sorting for the typeahead auto-complete suggestions

Also just cleaning up the lead of this issue a bit or starting a new one with our next steps clearly laid out would be helpful! Thanks!

@ebarry ebarry changed the title from PLANNING ISSUE: Autosuggested search results to Brainstorming search results (autosuggested and on results page) Mar 26, 2018

@ebarry ebarry changed the title from Brainstorming search results (autosuggested and on results page) to Brainstorming search results that are autosuggested and shown on results page Mar 26, 2018

@jywarren

This comment has been minimized.

Contributor

jywarren commented Aug 21, 2018

As the dynamic search work is upcoming (as per your original schedule), I'm not sure if this one is on your radar, @milaaraujo and @stefannibrasil -- what do you think?

@stefannibrasil stefannibrasil added this to To do in RGSoC 2018 Aug 22, 2018

@stefannibrasil

This comment has been minimized.

Collaborator

stefannibrasil commented Aug 22, 2018

we have some few things to finish this week, we are planning to start working on improving the search next week!

@stefannibrasil

This comment has been minimized.

Collaborator

stefannibrasil commented Sep 5, 2018

@ebarry @bronwen9 @jywarren we started addressing some of your concerns here #3295. Please keep in mind that this PR is mostly on the front-end, but it will help with our planning! :)

@stefannibrasil

This comment has been minimized.

Collaborator

stefannibrasil commented Sep 5, 2018

I have some notes to share with you, but I need to organize them better before sharing with you xD

@jywarren

This comment has been minimized.

Contributor

jywarren commented Sep 5, 2018

So I left some maybe not super helpful comments on #3286 -- and just pulling it back here, I want to highlight that one of the questions we try to answer may need to be:

What is the best default sorting AND default search type for /each result type/ -- acknowledging that the best ordering for nodes might not make sense for profiles.

Make sense?

RGSoC 2018 automation moved this from To do to Done Sep 22, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment