Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wiki index ordering -- with optimization difficulties #259

Closed
jywarren opened this issue Mar 27, 2015 · 22 comments
Closed

wiki index ordering -- with optimization difficulties #259

jywarren opened this issue Mar 27, 2015 · 22 comments
Labels
bug the issue is regarding one of our programs which faces problems when a certain task is executed help wanted requires help by anyone willing to contribute

Comments

@jywarren
Copy link
Member

jywarren commented Mar 27, 2015

Update: see this comment below for the latest -- subtle ordering vs. optimization issue still persists. We could/should create a functional test to reproduce the ordering issue.

I'm trying to nail that bug where on the http://publiclab.org/wiki listing page, some wiki pages show up as recent but don't actually have a recent edit. The crux is that we sort these wiki pages by node.changed -- a timestamp which gets updated any time a wiki page is saved. So, starring/liking a page bumps it to the top

Node.includes(:drupal_node_revision,:drupal_node_counter).where("status = 1").order("changed DESC")

Since sometimes pages are saved without a new revision being made (like starring/liking), when we display the edit date of the last revision, it shows up as being quite old, actually:

Last edit / 7 months ago by Shannon

Can anyone help me craft a query that sorts DrupalNodes by the timestamp of their most recent associated node_revision, rather than by their own timestamp? Note that nodes have multiple node_revisions, linked by the nid.


Still not making much progress on this; the resulting query is now:

SELECT DISTINCT `node`.nid FROM `node` LEFT OUTER JOIN `node_revisions` ON `node_revisions`.`nid` = `node`.`nid` ORDER BY node_revisions.timestamp DESC

but somehow I'm not getting all recent revisions. Timestamps (of assoc. revisions) are:

 => [1427465867, 1427218775, 1426896777, 1426896105, 1426895691, 1426895121, 1426894917, 1426894230, 1426894036, 1426893636] 

I can confirm that

SELECT `node`.* FROM `node` WHERE `node`.`nid` = 10229 LIMIT 1
SELECT `node_revisions`.* FROM `node_revisions` WHERE `node_revisions`.`nid` IN (10229) ORDER BY timestamp

 => 1427412864 

Which means node 10299 should appear in results from the top query, but it doesn't. I think I'm missing something obvious here.

My best guess is that since each node has multiple node_revisions, we are not ensuring that the revision joined is the most recent one, although we are sorting the resulting list by recency (timestamp) once the join has been made.

Is there a way to sort before the join happens?

I'm going to abandon this for now; An alternate solution could be to display either the last edited date, or the last liked/starred date, whichever is more recent, in the list view.

@jywarren jywarren added the bug the issue is regarding one of our programs which faces problems when a certain task is executed label Mar 27, 2015
@Sreyanth
Copy link
Member

Solved in the latest pull request.

@Sreyanth
Copy link
Member

I see from your latest commit that you did the same first and changed it. But the same code works fine for me. And the query seems to be correct too. Can you check this again?

@jywarren
Copy link
Member Author

I'm going to just pull in yours and take a look on the live site; I may have misunderstood the cause of this bug or there may be two overlapping bugs. But your PR has other things I'd like to merge so thanks!

@jywarren
Copy link
Member Author

Hi, Sreyanth - I still see some problems. For example, this page has a revision from 4 days ago; same time as I edited the "Australia" page: http://publiclab.org/wiki/ottawa

But it is not listed. Any idea why? Other pages with recent edits are also not listed.

@Sreyanth
Copy link
Member

I have setup a test server at http://46.101.38.163:3000/wiki

And the latest code changes work perfectly fine here.

Can you post the relevant database entries for the Ottawa wiki entry? May
be the bug is not where we are looking at?

@jywarren
Copy link
Member Author

jywarren commented Mar 30, 2015

https://gist.github.com/jywarren/b4c9286b2e104caeac6f

so it looks like the most recent rev should have the same timestamp as the
parent DrupalNode, but somehow it's not being listed; maybe it's using the
older revision for some reason?

@Sreyanth
Copy link
Member

Nope, I tested it on my localhost and I am sure our code uses the recent
time stamp itself.

Can we get the SQL query generated from the production log?

@jywarren
Copy link
Member Author

jywarren commented Mar 30, 2015

@jywarren
Copy link
Member Author

jywarren commented Mar 30, 2015

That's a simplified query but very close to the original.

@Sreyanth
Copy link
Member

May be the 'LIMIT 30' is causing the problem? I think the consideration set
is limited to 30 and somehow Ottawa is missing in it?

@Sreyanth
Copy link
Member

But no. The SQL query seems to be completely legit.

@jywarren
Copy link
Member Author

jywarren commented Mar 30, 2015

Well, i first noticed this when Ottawa had been listed in the top 10 when
sorted by node.changed before switching to revision.timestamp, and now
Ottawa would not appear in the top 10. This page should be next to it based
on last edits: http://publiclab.org/wiki/australia

However, now you can also see that in http://publiclab.org/wiki it looks as
if there has been only 1 edit since 3 days ago; that can't be true!

Hmm, there is another sneaky bug. Why would it not see more recent edits?

@Sreyanth
Copy link
Member

I guess there is some bug in the way we are writing our query? I am assuming you restarted passenger after the new merge.

If you look at http://publiclab.org/wiki?page=9 and http://publiclab.org/wiki?page=10, the listings are in desc order, but aren't continuous from page 9 to page 10.

While in the gist https://gist.github.com/jywarren/072595dd799ff3373da0, the first query seems legit, there is something fishy about the second one, it seems to select the recent x wikis and sort in desc of their timestamps. Don't you think?

Sreyanth added a commit to Sreyanth/plots2 that referenced this issue Mar 30, 2015
@Sreyanth
Copy link
Member

Can you check the latest pull request Jeff?

@Sreyanth
Copy link
Member

Successfully reproduced the bug and the bug seems to be ironed out. 👍

@jywarren
Copy link
Member Author

+1 :-) in #262

@jywarren
Copy link
Member Author

Reopening; The newer query uses find() and then chains paginate() which doesn't insert the LIMIT 30 and OFFSET into the original query! This increased typical query time from ~150ms to ~900ms, and may account for much increased memory use since April 30.

See 2 queries and time difference: https://gist.github.com/jywarren/6545430f4c2e8d12ce18

But why did reverting to the older-style find() syntax fix the wiki ordering bug we were seeing? The queries look otherwise identical.

I was able to replace find with paginate, without yet using the newer where() syntax. This re-introduces the ordering issue, but does not skip LIMIT and OFFSET, so the query is fast again.

@jywarren jywarren reopened this May 12, 2015
@jywarren jywarren added the help wanted requires help by anyone willing to contribute label Nov 15, 2016
@jywarren jywarren changed the title wiki index ordering wiki index ordering -- with optimization difficulties Dec 28, 2016
jywarren added a commit that referenced this issue Jan 12, 2018
* initial attempt at mysql fulltext search for nodes

* tweak

* major rails 4.x update of schema.rb, added fulltext index

* Update node.rb

* Update node_test.rb

* rename

* self.

* add title to fulltext

* plugged into typeahead

* rev title fulltext index

* fixed fulltext multi index

* updated schema example

* fix

* tweaked working fulltext in console

* removing solr test apparatus

* full text comment and user bio searching

* more conditions

* fixes

* unique results by grouping by nid

* Remove Solr stuff and add secret_token env var (#259)

* Remove Solr stuff and add secret_token env var

* Upgrade and switch database engine to Mariadb.
@grvsachdeva
Copy link
Member

@jywarren is this issue solved?

@jywarren
Copy link
Member Author

Sadly i think not! See how Simple Air Sensor is listed out of order?

image

@grvsachdeva
Copy link
Member

@jywarren I think we should open a new issue for this as it requires attention? What do you think?

@jywarren
Copy link
Member Author

jywarren commented Mar 30, 2019 via email

@grvsachdeva
Copy link
Member

Ok, closing this one as I have reopened the issue at #5317. Thanks!

SrinandanPai pushed a commit to SrinandanPai/plots2 that referenced this issue May 5, 2019
* initial attempt at mysql fulltext search for nodes

* tweak

* major rails 4.x update of schema.rb, added fulltext index

* Update node.rb

* Update node_test.rb

* rename

* self.

* add title to fulltext

* plugged into typeahead

* rev title fulltext index

* fixed fulltext multi index

* updated schema example

* fix

* tweaked working fulltext in console

* removing solr test apparatus

* full text comment and user bio searching

* more conditions

* fixes

* unique results by grouping by nid

* Remove Solr stuff and add secret_token env var (publiclab#259)

* Remove Solr stuff and add secret_token env var

* Upgrade and switch database engine to Mariadb.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug the issue is regarding one of our programs which faces problems when a certain task is executed help wanted requires help by anyone willing to contribute
Projects
No open projects
Development

No branches or pull requests

3 participants