Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve textSearch_profiles result #3134

Merged
merged 15 commits into from Aug 14, 2018

Conversation

Projects
None yet
4 participants
@milaaraujo
Copy link
Collaborator

milaaraujo commented Jul 24, 2018

Hey everyone!

We are opening this request to fix the issue #3069.

  1. We created a new endpoint recentprofiles that returns the profiles with most recent activity.
  2. We fixed recentPeople method. In one of our latest PR (#3045) we did the following change that was not correct. Because in this part of the code the limit is for the node search. So we undid this change and added a count to control the number of profiles returned (line 290).

nodes = Node.all.order("changed DESC").limit(100).distinct -> nodes = Node.all.order("changed DESC").limit(srchString).distinct

  1. We don't think the name of the recentPeople method is appropriate for the peopleLocations endpoint, since the method returns the location of people with most recent activity with a specific tag. Can we change it to peopleLocations? So maybe we can use recentPeople for our new method.

We would like some feedback about our strategy here, thank you! :)

fixes #3069
@publiclab/reviewers @publiclab/soc @jywarren

@milaaraujo

This comment has been minimized.

Copy link
Collaborator Author

milaaraujo commented Jul 24, 2018

Moved from #3126

Hey @jywarren, I implemented a new version of profiles endpoint following your idea.

If we call /api/srch/profiles?srchString=Jeff the api perform the default search.
If we call /api/srch/profiles?srchString=Jeff&order=recentdesc the api returns more recently updated profiles for matching text.
That way I do not take the risk of breaking anything that already uses the endpoint. I'm going to open a new PR with these changes.

@plotsbot

This comment has been minimized.

Copy link
Collaborator

plotsbot commented Jul 24, 2018

1 Warning
⚠️ New migrations added. Please update schema.rb.example by overwriting it with a copy of the up-to-date db/schema.rb. Also, be aware to preserve the MySQL-specific conditions for full-text indices.
1 Message
📖 @milaaraujo Thank you for your pull request! I’m here to help with some tips and recommendations. Please take a look at the list provided and help us review and accept your contribution! And don’t be discouraged if you see errors – we’re here to help.

Generated by 🚫 Danger

@milaaraujo

This comment has been minimized.

Copy link
Collaborator Author

milaaraujo commented Jul 25, 2018

Hey @jywarren, regarding your question from the other PR (# 3126):

Have you tried requesting both and comparing their responses?

I created a silly test to illustrate the difference between the two ways of using the profiles endpoint.

  1. First, I created users, in this order: user5, user4, user1, user2 and user3.
  2. Then I created an activity (question) for users, in this order: user5, user4 and user3.
  3. If I call http://localhost:3000/api/srch/profiles?srchString=test, then:

{
"items": [
{
"docId": 0,
"docType": "user",
"docUrl": "/profile/test3",
"docTitle": "test3",
"docSummary": "",
"docScore": 0
},
{
"docId": 0,
"docType": "user",
"docUrl": "/profile/test2",
"docTitle": "test2",
"docSummary": "",
"docScore": 0
},
{
"docId": 0,
"docType": "user",
"docUrl": "/profile/test1",
"docTitle": "test1",
"docSummary": "",
"docScore": 0
},
{
"docId": 0,
"docType": "user",
"docUrl": "/profile/test4",
"docTitle": "test4",
"docSummary": "",
"docScore": 0
},
{
"docId": 0,
"docType": "user",
"docUrl": "/profile/test5",
"docTitle": "test5",
"docSummary": "",
"docScore": 0
}
],
"srchParams": {
"srchString": "test",
"seq": null,
"showCount": null,
"pageNum": null,
"tagName": null
}
}

  1. If I call http://localhost:3000/api/srch/profiles?srchString=test&order=recentdesc, then:

{
"items": [
{
"docId": 0,
"docType": "user",
"docUrl": "/profile/test3",
"docTitle": "test3",
"docSummary": "",
"docScore": 0
},
{
"docId": 0,
"docType": "user",
"docUrl": "/profile/test4",
"docTitle": "test4",
"docSummary": "",
"docScore": 0
},
{
"docId": 0,
"docType": "user",
"docUrl": "/profile/test5",
"docTitle": "test5",
"docSummary": "",
"docScore": 0
}
],
"srchParams": {
"srchString": "test",
"seq": null,
"showCount": null,
"pageNum": null,
"tagName": null
}
}

@milaaraujo milaaraujo force-pushed the milaaraujo:profiles-endpoint-most-recently-people branch 3 times, most recently from 3970a28 to 6c6c84a Jul 25, 2018

@jywarren

This comment has been minimized.

Copy link
Contributor

jywarren commented Jul 25, 2018

Looks like a good moment to address this optimization issue too! #3147

def getRecentProfiles
sresult = DocList.new

nodes = Node.all.order("changed DESC").limit(100).distinct

This comment has been minimized.

@jywarren

jywarren Jul 25, 2018

Contributor

I'm a little worried about the speed of this query. Could we collect up the uids, then run:

User.find_by(uid: uids.uniq).where(status: 1)

Then, we may also want to get recent edits, so what if we instead of doing Node, we searched Revision - that way we get both notes and wiki revisions?

users = SrchScope.find_users(srchString, limit = nil)
users =
if order == "recentdesc"
filterMatchingUserName(getRecentProfiles, srchString)

This comment has been minimized.

@jywarren

jywarren Jul 25, 2018

Contributor

Here, I worry we might not get any results if we filter after selecting the most recent 100. Is there another way to do this, like finding matching usernames first, sorting by last activity using the user table -- i think there's a last activity column, but not sure when we update that...

This comment has been minimized.

@jywarren

jywarren Jul 25, 2018

Contributor

It might take some kind of clever join and group query, like, joining revisions to the node table, and sorting by the revision timestamp...

@jywarren

This comment has been minimized.

Copy link
Contributor

jywarren commented Jul 25, 2018

Hi, sorry, we've created a conflict while fixing #3148 - but it should be pretty easy to resolve. Thanks!

@stefannibrasil

This comment has been minimized.

Copy link
Collaborator

stefannibrasil commented Jul 25, 2018

@jywarren agree with your comments! we will talk with our coaches today to help us improve the query, will keep you updated!

@stefannibrasil stefannibrasil force-pushed the milaaraujo:profiles-endpoint-most-recently-people branch 2 times, most recently from c01cae5 to 45ccc31 Jul 26, 2018

@stefannibrasil stefannibrasil changed the title New version of 'profiles' endpoint and fixing 'recentPeople' endpoint [WIP] version of 'profiles' endpoint and fixing 'recentPeople' endpoint Jul 26, 2018

@stefannibrasil

This comment has been minimized.

Copy link
Collaborator

stefannibrasil commented Jul 26, 2018

@jywarren we started doing the sorting and ordering today, still need to work on some things, but we believe we are targeting your concerns on them.

We are passing the a order_by and sort_direction as a parameter to the search/profiles? so we can use it on them.

The approach that our coaches are suggesting us to so is searching the profiles, joining them with nodes and then ordering them by the latest activities (ASC or DESC).

@wafflebot wafflebot bot removed the in progress label Jul 26, 2018

@wafflebot wafflebot bot added the in progress label Jul 26, 2018

@stefannibrasil stefannibrasil force-pushed the milaaraujo:profiles-endpoint-most-recently-people branch from 45ccc31 to 90603f5 Jul 27, 2018

@stefannibrasil

This comment has been minimized.

Copy link
Collaborator

stefannibrasil commented Jul 27, 2018

hi @jywarren could you review here, please? We worked today on improving the textSearch_profiles query. We have some questions, could you please take a look at the comments along the code?

We will work on the recentProfiles later after we finish this one. Also tomorrow we will write a test.

@jywarren

This comment has been minimized.

Copy link
Contributor

jywarren commented Jul 27, 2018

Great! First, looks like a missing or extra "end" in this file?

SyntaxError: /app/app/services/search_service.rb:251: syntax error, unexpected keyword_end

Reading now...


if search_criteria.order_by == "recent"
user_scope = user_scope.joins(:revisions)\
.order("timestamp #{search_criteria.sort_direction}")\

This comment has been minimized.

@jywarren

jywarren Jul 27, 2018

Contributor

I guess i have a couple thoughts here. In general, I think it's best not to be adding activeRecord filters or conditions to an object that's had them added at an earlier level of abstraction (like, within SrchScope.find_users), because you can't see them when you're editing, so there's a strong possibility of doing things redundantly or in conflict. Better to do all your ActiveRecord stuff in one place as much as possible.

Second, I wonder if there's a way to reduce the # of levels of abstraction. Here it seems like we do:

  1. A call arrives at app/api/srch/search.rb for get :profiles do
  2. its sent to app/services/execute_search.rb for execute()
  3. that's sent to app/services/search_service.rb for textSearch_profiles()
  4. that's sent to app/services/srch_scope.rb for SrchScope.find_users()
  5. finally an ActiveRecord call is made: User.search()...

Each file is now much more readable, but tracing the whole thing is a bit harder. The readability improvements are super, and the files are cleaner and less redundant. Great work! Now, I wonder if we could try flattening the system without losing those advantages? One way to try (and we must admit it's possible every layer is important! But let's try.) is to list why each layer is needed.

  1. where incoming calls get routed
  2. utility 'execute' function
  3. back-end standard service available inside the app
  4. scope?
  5. do the ActiveRecord call

I'm not saying there is an obvious or easy answer here. And if we stick with 5 layers, that's fine! But this is how I'd recommend we look at the question of "what complexity is necessary" and how readable it is. Make sense? What do you think?

To your narrower question of the optimization, I think it makes sense that you're doing an initial call for users, then making a more complex version of it by joining revisions and then sorting using that new joined table. This looks great! Because the incoming query is already going to be pretty efficient, we think, and then we're just joining something and adding an ordering. How do you like the results it returns?

@stefannibrasil stefannibrasil force-pushed the milaaraujo:profiles-endpoint-most-recently-people branch 2 times, most recently from 04a9b81 to d210f2d Jul 28, 2018

@stefannibrasil stefannibrasil referenced this pull request Jul 29, 2018

Merged

Link addition to email for marking comment as spam #3163

5 of 5 tasks complete
@jywarren

This comment has been minimized.

Copy link
Contributor

jywarren commented Aug 9, 2018

The only difference is that the Typeahead returns results with different info(tagId instead of docId, which I still don't understand why)

As to this, I think it's not a meaningful difference, although originally i think the idea was that it would return some representation of a tag or topic, while docId would represent a page on the site. I think we should aim consolidate them to remove ambiguity...

@jywarren jywarren referenced this pull request Aug 9, 2018

Closed

Profile api test PR #3224

@jywarren

This comment has been minimized.

Copy link
Contributor

jywarren commented Aug 9, 2018

I'm going to open a new PR to get Travis to run against the with-bio version of User.search...

@stefannibrasil

This comment has been minimized.

Copy link
Collaborator

stefannibrasil commented Aug 9, 2018

on travis it works only with bio... so you need to pass a username as the param to see the errors. For example, if you pass "jeff" it returns nil, but if you pass "something interesting about jeff" it returns something. At least that's how it was before

@stefannibrasil

This comment has been minimized.

Copy link
Collaborator

stefannibrasil commented Aug 9, 2018

okay, about the meeting, tomorrow we can all attend at 5:00 pm EET, so can we confirm your -virtual- presence? :)

@jywarren

This comment has been minimized.

Copy link
Contributor

jywarren commented Aug 9, 2018

@jywarren

This comment has been minimized.

Copy link
Contributor

jywarren commented Aug 9, 2018

I think i have an idea about searching for "jeff"... hang on...did it happen only for "jeff" or also for "steff"?

Do you think you could try to reproduce the Travis error in a separate PR? Or was it exactly the error we're seeing here: #3224 ?

@jywarren

This comment has been minimized.

Copy link
Contributor

jywarren commented Aug 9, 2018

And was the error you saw in test/unit/user_test.rb or in /test/functional/search_api_test.rb#L95 like in my PR?

@stefannibrasil

This comment has been minimized.

Copy link
Collaborator

stefannibrasil commented Aug 9, 2018

yes, it's Eastern European Time! So, tomorrow 5:30 EET. See you there!

@jywarren

This comment has been minimized.

Copy link
Contributor

jywarren commented Aug 9, 2018

@stefannibrasil stefannibrasil force-pushed the milaaraujo:profiles-endpoint-most-recently-people branch 4 times, most recently from d42a395 to 10441b4 Aug 10, 2018

@stefannibrasil stefannibrasil force-pushed the milaaraujo:profiles-endpoint-most-recently-people branch from 10441b4 to 42c5e32 Aug 11, 2018

@jywarren

This comment has been minimized.

Copy link
Contributor

jywarren commented Aug 13, 2018

OK, I modified the test PR at #3224 to build on your latest commits from this PR, and see what the issue was with the nil results. But I see you've modified to go back to user.search searching bio and username... did that work for you? It's passing Travis, and we can do the "switch to only username searching via parameter" variant in a separate PR if we want -- let's just get this PR merged!

Can you give me an update and I am happy to check in on it once I land.. it'll be sometime tomorrow ✈️

If it's ready to go, I approve and can merge it!

@stefannibrasil

This comment has been minimized.

Copy link
Collaborator

stefannibrasil commented Aug 14, 2018

Hi, @jywarren thanks for the approval and for the help! Saturday we were able to make it work searching for username and bio!! You can try now with our last commit if you want to check it out too! We don't know exactly where things started going wrong. I guess the errors that Travis was showing confused us and we missed something small and everything became a snowball of errors...

Yeah, I agree that it's better to create a new PR for the autocomplete feature, will start now!

So... I can't believe I'm saying this, LOL, but yes this can be merged!! 🎉 thank you!!

@@ -42,7 +43,7 @@ class User < ActiveRecord::Base
after_destroy :destroy_drupal_user

def self.search(query)
User.where('MATCH(username, bio) AGAINST(?)', query)
User.where('MATCH(bio, username) AGAINST(? IN BOOLEAN MODE)', query + '*')

This comment has been minimized.

@stefannibrasil

stefannibrasil Aug 14, 2018

Collaborator

@jywarren gold lesson from this PR about the MySQL fulltext search: the 'IN BOOLEAN MODE' modifier with the * wildcard.

https://dev.mysql.com/doc/refman/8.0/en/fulltext-boolean.html

really glad that this is working now, thanks for the help!

@stefannibrasil stefannibrasil added ready and removed in progress labels Aug 14, 2018

@jywarren jywarren merged commit 17827d1 into publiclab:master Aug 14, 2018

3 checks passed

codeclimate 2 fixed issues
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
danger/danger ⚠️ 1 Warning. Don't worry, everything is fixable.
Details

@wafflebot wafflebot bot removed the ready label Aug 14, 2018

@jywarren

This comment has been minimized.

Copy link
Contributor

jywarren commented Aug 14, 2018

Congratulations!!! 👍🏼⚡⚡

@milaaraujo milaaraujo deleted the milaaraujo:profiles-endpoint-most-recently-people branch Oct 31, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.