Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smart autocomplete #1047

Open
thatbudakguy opened this issue Jun 7, 2024 · 8 comments
Open

Smart autocomplete #1047

thatbudakguy opened this issue Jun 7, 2024 · 8 comments

Comments

@thatbudakguy
Copy link
Member

thatbudakguy commented Jun 7, 2024

This is a brand-new feature to enhance the search experience, both from the homepage and search results page.

To try it out, visit the interactive prototype and click in the search bar to simulate typing.

The suggestions are grouped into categories:

  • Datasets, Maps, etc. (corresponding to values in the resource class field in solr)
  • Places

For the latter, the most basic implementation would be to use values from the "Spatial coverage" field/facet. However, this won't result in a spatial search and won't move the map because those values are just metadata – they aren't tied to geospatial locations. We want to do more than this, if we can.

A more complete/featureful implementation would do geocoding, so that "New york (city)" and "New york (state)" could be matched to geospatial coordinates. When you selected each option, you would go to a search with the spatial facet active, as though you had already moved the map to the coordinates/bbox of the place you selected.

This deserves some thought re: implementation. We could query some kind of web service live during the autocomplete process, or we could pre-compute a list of known locations and their coordinates to make the process faster/simpler (but perhaps less complete).

Note that Stanford maintains its own geocoder service at https://sul-geocoding-web.stanford.edu/. It would be great to take advantage of this! There's more information on that page about how the service works; see in particular this page on the geocoding API. There is also a suggestion service that might be worth investigating.

Whatever we do, we probably want to take advantage of solr's built-in functionality for search suggestions.

@thatbudakguy
Copy link
Member Author

thatbudakguy commented Jun 7, 2024

Might be related to or synergize with #41.

@thatbudakguy
Copy link
Member Author

re: the stanford geocoding service, @mapninja says:

Because of the way the data is licensed, we currently have 5 services, rather than a single global service. I’m trying to negotiate a global geocoder but that may not happen until next license cycle in Jan ’25. That means that you would have to figure out how to aggregate the autocomplete from all 5 services, in order to accommodate global placenames.

@thatbudakguy
Copy link
Member Author

Evidently the ArcGIS Online (cloud) version of this API does have a /world endpoint, and because the APIs are the same, we could develop against/use that while the stanford-local one is being set up.

More info at: https://developers.arcgis.com/documentation/mapping-apis-and-services/geocoding/tutorials/search-for-an-address/

@hudajkhan
Copy link
Contributor

@dnoneill had a great suggestion of looking at Blacklight 8. Some references that might be useful to look at that:

https://github.com/projectblacklight/blacklight/blob/main/app/components/blacklight/search_bar_component.html.erb#L24
and possibly: https://github.com/projectblacklight/blacklight/blob/main/app/views/layouts/blacklight/base.html.erb#L23

The autocomplete path will default to the suggest handler in Solr. In looking at that, I am wondering if we need to do something separately on the Solr end to build the suggest index (that's separate from the regular Solr index). We should be able to call the suggest endpoint independently with a query and see which values we will get back.

Apart from the Solr stuff, some questions I have are:

  • For the category breakout, would it be enough to change the display of the results based on the types of Solr documents returned? i.e. we get the response from the suggest endpoint, it gets funneled to be displayed. At that point, we are able to group the documents by "Datasets"/"Maps" based on the document resource class field. We then update the display (somehow?) to show those categories along with the results.
  • As far as I know, the suggestion algorithm will return Solr documents as results. How would a place name be returned at all? Would place names really be a separate suggest endpoint or some kind of call, looking specifically at place name values (and not at Solr documents returned by the regular textual suggestion endpoint)? Or maybe looking at what possible place name facet values we have? Or just all place names possible in general?
  • What does a place name coming up in the autocomplete indicate? Does it indicate that we have results that tie to that bounding box? Or does it indicate just that these are possible place names in general?
  • For the dynamic spatial search - when we figure out how to get place names - are we trying to match the action of clicking on a place result to a bounding box query? E.g. The user types in "New Y". Our suggestion comes back with "New York". Clicking on New York will then show results with bounding box related to New York. Does this also do a textual search or is it only a bounding box search?

@hudajkhan
Copy link
Contributor

Also, how do we integrate lobsters? :)

@hudajkhan
Copy link
Contributor

Another question is - when we understand what and how we want place names to display: do we need to create a new suggest endpoint/dictionary to support this lookup?

@dbranchini
Copy link

I'll try to answer the questions I think I can answer. It seems there's a mix of UX and technical in those questions, and we might want to have a discussion around some of these.

For the category breakout, would it be enough to change the display of the results based on the types of Solr documents returned? i.e. we get the response from the suggest endpoint, it gets funneled to be displayed. At that point, we are able to group the documents by "Datasets"/"Maps" based on the document resource class field. We then update the display (somehow?) to show those categories along with the results.

Not a UX question, correct?

As far as I know, the suggestion algorithm will return Solr documents as results. How would a place name be returned at all? Would place names really be a separate suggest endpoint or some kind of call, looking specifically at place name values (and not at Solr documents returned by the regular textual suggestion endpoint)? Or maybe looking at what possible place name facet values we have? Or just all place names possible in general?

First part of this block seems to be more technical, but there's a UX component to the second part. We could go either route - show only locations (or place names) that match our data or show all locations. Two scenarios come to mind. One - showing only locations that match our data allows users to find datasets matching the location their interested in. Two - showing all locations supports a different use case. I'll use India and Rajasthan (a state in India). Let's imagine we have datasets matching Rajasthan, but we don't have any matching India. If someone knows they're looking for a state in India, but can't remember the exact name or maybe they're interested in a couple different states, etc., they might start by searching for India. In this case, they'd get zero matches. So I lean toward the latter scenario. User sees India as a location match, they click it, the map zooms to India and the search results show all datasets contained within that bounding box.

What does a place name coming up in the autocomplete indicate? Does it indicate that we have results that tie to that bounding box? Or does it indicate just that these are possible place names in general?

I believe I answered that above so I now I'm wondering if I misunderstood the question above?

For the dynamic spatial search - when we figure out how to get place names - are we trying to match the action of clicking on a place result to a bounding box query? E.g. The user types in "New Y". Our suggestion comes back with "New York". Clicking on New York will then show results with bounding box related to New York. Does this also do a textual search or is it only a bounding box search?

That's a good question. I think the textual search is shown through the matches for datasets and maps, and the location search is just a location/map search.

I hope this helps get a bigger conversation started around this. I'm open to other ideas.

@thatbudakguy
Copy link
Member Author

For the category breakout, would it be enough to change the display of the results based on the types of Solr documents returned? i.e. we get the response from the suggest endpoint, it gets funneled to be displayed. At that point, we are able to group the documents by "Datasets"/"Maps" based on the document resource class field. We then update the display (somehow?) to show those categories along with the results.

Technically speaking, this matches my understanding and seems like something we can do now, without regard to external calls or geocoding using a service. Maybe this is the MVP implementation for this ticket: just split up the autocomplete results depending on what type of thing they are.

As far as I know, the suggestion algorithm will return Solr documents as results. How would a place name be returned at all? Would place names really be a separate suggest endpoint or some kind of call, looking specifically at place name values (and not at Solr documents returned by the regular textual suggestion endpoint)?

This is correct. The way to get place names is via an external call to a geocoding service, as detailed in the initial description of the ticket. What we would do is merge solr's suggestion results with the external results and display them as a single list with different headings. This could be considered the more full-featured implementation of the ticket.

What does a place name coming up in the autocomplete indicate? Does it indicate that we have results that tie to that bounding box? Or does it indicate just that these are possible place names in general?

I think it would just be simpler to do the latter. It's entirely possible that the user enters "New Y...", gets a list of suggestions, and clicks on the one for "New York (city)", but then that search has zero results, because we don't hold any data that falls within the bounding box for NYC. Doing the former seems maybe possible, but hard?

For the dynamic spatial search - when we figure out how to get place names - are we trying to match the action of clicking on a place result to a bounding box query? E.g. The user types in "New Y". Our suggestion comes back with "New York". Clicking on New York will then show results with bounding box related to New York. Does this also do a textual search or is it only a bounding box search?

Yeah, this is the idea. It would not do a textual search, because:

  • we want the search to return any items that fall within the geographic boundary of NYC, including those that may not have "new york" anywhere in their textual data (this is the point of this feature; otherwise these results would be missed)
  • we don't want the search to return all things that have "new york" in their textual data/metadata, because that might result in returning things that apply to the entire state, or even things that were just published in or mention new york

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Ready
Development

No branches or pull requests

3 participants