Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multiple keywords in crate search #1543

Merged
merged 5 commits into from Jan 4, 2020

Conversation

@pwoolcoc
Copy link
Contributor

pwoolcoc commented Oct 28, 2018

This PR allows URLs like crates.io/search?all_keywords=gotham+middleware to enable linking to a search that returns crates that have all keywords specified in the parameter.

Towards #1461

Copy link
Member

jtgeibel left a comment

Thanks for the PR @pwoolcoc! It looks like you need to run cargo +stable fmt. Also, your collecting-hashmap crate includes edition = "2018" which is failing on CI in stable. We deploy with stable, so if you're not using any edition features you'll want to remove that for the next 6 weeks.

query = query.filter(
crates::id.eq_any(
crates_keywords::table
.select(crates_keywords::crate_id)
.inner_join(keywords::table)
.filter(::lower(keywords::keyword).eq(::lower(kw))),
.filter(::lower(keywords::keyword).eq(any(names))),

This comment has been minimized.

Copy link
@jtgeibel

jtgeibel Oct 29, 2018

Member

In #1461 it sounds like the intention is to have the search return only results that include all keywords. I think that means any would be replaced with all here.

This comment has been minimized.

Copy link
@sgrif

sgrif Oct 30, 2018

Contributor

I actually would have expected searching for multiple keywords to be any, not all -- which maybe raises the question of what is the general expectation here, if this should have another name

This comment has been minimized.

Copy link
@pwoolcoc

pwoolcoc Oct 30, 2018

Author Contributor

I asked the OP and they said they expected it to be the intersect operation, not the union...but for some reason changing any to all doesn't "just work" like I was hoping it would so I'm still investigating.

I agree that my intuition was that sending multiple keywords to the search endpoint would be an any operation, so I'm open to suggestions about where to put this...probably in src/controllers/keyword.rs? maybe a new endpoint there?

This comment has been minimized.

Copy link
@jtgeibel

jtgeibel Oct 31, 2018

Member

I'm split on this. I think my initial intuition was any, but that doesn't fit with OP's use case which I think is also reasonable. I guess when I search, I expect results that match all my search terms (although lately Google sometimes seems insistent that I really don't want to include a particular term).

I think its also worth considering how this looks in the URL. Currently, this uses multiple keyword parameters. Maybe a space separated list under a single keyword parameter would be slightly better. Or a new keywords or all_keywords parameter.

So I guess calling all doesn't work directly here because of the join, since a crates_keywords row can't possibly match multiple keywords at the same time. @sgrif how would this be expressed in Diesel if we want only crates that match all keywords? I've seen some SQL examples that would involve joining, grouping by crate, and then HAVING COUNT (DISTINCT crate.keyword_id) = {keyword_count}. That seems a bit... complicated, but I can't think of anything simpler.

This comment has been minimized.

Copy link
@sgrif

sgrif Oct 31, 2018

Contributor

I wouldn't use a having clause for this in PG personally. I'd do something like

.filter(
    crate_keywords::table
        .inner_join(keywords::table)
        .filter(crate_keywords::crate_id.eq(crates::id))
        .select(array_agg(keywords::keyword))
        .single_value()
        .contains(names)
)

This comment has been minimized.

Copy link
@pwoolcoc

pwoolcoc Nov 1, 2018

Author Contributor

So, I know what array_agg does in postgres, but there doesn't seem to be a way to use it through diesel. Should I just drop down to sql_query for that? Or is array_agg in another crate that I just haven't been able to find?

This comment has been minimized.

Copy link
@sgrif

sgrif Nov 2, 2018

Contributor

array_agg is an aggregate function that returns all values it collects as an array. You can declare any SQL function for use with Diesel's query builder by using sql_function!

@carols10cents

This comment has been minimized.

Copy link
Member

carols10cents commented Apr 7, 2019

I've been slowly working on moving this pull request forward by changing the query to return crates that have all the keywords specified, rather than any. So far all I've been successful at accomplishing is merging in master to resolve the merge conflicts and adding a failing test for the ALL functionality.

I've been attempting to turn the pseudocode sketch in this comment into a working diesel query, but I have been unsuccessful. I have a runnable reproduction of the issue extracted into a smaller codebase showing the set of diesel queries I've tried along with notes on why they didn't work in this repo.

I would drop into raw sql, but I don't want to lose the automatic escaping of the queried keywords to prevent against SQLI. I don't know how to generate just an escaped SQL string and insert it into a raw SQL query.

I'm now at a complete loss of how to proceed, and I think this PR is going to need your help to unblock, @sgrif.

@sgrif sgrif force-pushed the pwoolcoc:allow-multiple-keywords-in-search branch from 316002d to d86c231 Jun 14, 2019
@sgrif

This comment has been minimized.

Copy link
Contributor

sgrif commented Jun 14, 2019

I've rebased and added a working query. I do still have concerns that this isn't necessarily the expected behavior when passing multiple keyword params. I agree with @jtgeibel that we may want to introduce a new param for this.

@sgrif sgrif added S-waiting-on-author and removed S-blocked labels Jun 14, 2019
@sgrif sgrif removed their assignment Jun 14, 2019
@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jul 16, 2019

☔️ The latest upstream changes (presumably #1763) made this pull request unmergeable. Please resolve the merge conflicts.

@locks

This comment has been minimized.

Copy link
Contributor

locks commented Sep 4, 2019

What's the status on this effort?

@carols10cents

This comment has been minimized.

Copy link
Member

carols10cents commented Oct 21, 2019

The backend API requests are working, but there is no way to interact with this via the frontend. I'm going to make a param named all_keywords so that at least we can manually create links that use this, but eventually I think we're going to need an advanced search form.

@carols10cents carols10cents self-assigned this Oct 21, 2019
@carols10cents carols10cents force-pushed the pwoolcoc:allow-multiple-keywords-in-search branch from d86c231 to 77ae1b8 Oct 21, 2019
@carols10cents

This comment has been minimized.

Copy link
Member

carols10cents commented Oct 21, 2019

r? @jtgeibel because both sgrif and i have now contributed code to this PR

pwoolcoc and others added 5 commits Oct 28, 2018
This commit allows multiple `keyword=` parameters in the querystring to
enable searching for all crates under the specified keywords

Towards #1461
Diesel has a bug where array expression methods aren't implemented on
nullable expressions, so we can't just call Diesel's `.contains` method
until the fix lands. I've added a workaround for the time being. This
results in a runtime error since `keyword` was defined as `varchar`,
which coerces to `text`, but arrays do not. I've changed the column type
to address this. The keywords table is small enough and read
infrequently enough that I'm not concerned about the exclusive lock in
this migration.
Pass along any specified value for the all_keywords parameter to the
API request to search crates.

This lets us manually construct URLs in the browser like:

- /search?all_keywords=foo+bar
- /search?all_keywords=foo+bar&q=test

but there is no form field in the search form as yet; eventually, we'll
need an "advanced search" form or similar.
@carols10cents carols10cents force-pushed the pwoolcoc:allow-multiple-keywords-in-search branch from 77ae1b8 to 9a3485b Nov 2, 2019
@kzys

This comment has been minimized.

Copy link
Contributor

kzys commented Nov 19, 2019

@jtgeibel - Can you review this PR? It doesn't have any conflicts (so far) and looks good to me.

@jtgeibel

This comment has been minimized.

Copy link
Member

jtgeibel commented Jan 4, 2020

LGTM! Sorry it took so long for me to review. Amazingly, no merge conflicts so...

@bors r+

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jan 4, 2020

📌 Commit 9a3485b has been approved by jtgeibel

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jan 4, 2020

⌛️ Testing commit 9a3485b with merge 20a589e...

bors added a commit that referenced this pull request Jan 4, 2020
…tgeibel

Allow multiple keywords in crate search

This PR allows URLs like crates.io/search?all_keywords=gotham+middleware to enable linking to a search that returns crates that have all keywords specified in the parameter.

Towards #1461
@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jan 4, 2020

☀️ Test successful - checks-travis
Approved by: jtgeibel
Pushing 20a589e to master...

@bors bors merged commit 9a3485b into rust-lang:master Jan 4, 2020
3 checks passed
3 checks passed
Travis CI - Pull Request Build Passed
Details
homu Test successful
Details
percy/crates.io Visual review automatically approved, no visual changes found.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.