New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunk the tokens array to avoid making extremely long queries. #17390

Merged
merged 2 commits into from Dec 18, 2017

Conversation

Projects
None yet
6 participants
@okonomiyaki3000
Contributor

okonomiyaki3000 commented Aug 3, 2017

Pull Request for Issue # .

Summary of Changes

Chunk the tokens array (which can potentially be very long) before using it to generate a query. Run a query for each chunk instead of the whole array.

Testing Instructions

  1. Check that your php max_execution_time is something sensible like 30 seconds.
  2. Turn on finder indexing for content.
  3. Create a super long article. Maybe like 10,000 words. Maybe this can help: http://slipsum.com/
  4. Save the article.

Expected result

Well, I expect the article to save properly and the page to reload without crashing.

Actual result

Most likely, the article will save but in the post-save indexing process there will be a timeout. This happens because we are looping over a huge array of tokens and adding each one to the values array of the query object. The query object handles this rather inefficiently (should be addressed in a different PR) but is mostly fine until the values array gets very large.

One solution is to avoid getting a huge values array by sending several smaller queries instead of one big one. This is that solution.

Documentation Changes Required

nope

@mbabker

mbabker approved these changes Aug 3, 2017

@okonomiyaki3000 okonomiyaki3000 force-pushed the okonomiyaki3000:chunk-tokens branch from 007281a to bf13859 Oct 27, 2017

@franz-wohlkoenig

This comment has been minimized.

franz-wohlkoenig commented Nov 1, 2017

Status is set on "Needs Review".

@okonomiyaki3000 okonomiyaki3000 force-pushed the okonomiyaki3000:chunk-tokens branch from bf13859 to cb3f12c Nov 2, 2017

@okonomiyaki3000

This comment has been minimized.

Contributor

okonomiyaki3000 commented Nov 2, 2017

Rebased with latest staging. I can't imagine why that drone test is failing. There's nothing in this code that would cause it to fail.

@okonomiyaki3000

This comment has been minimized.

Contributor

okonomiyaki3000 commented Nov 8, 2017

Hmm. So, is this getting in? Because this is a legit problem that people in the real world may actually face.

@mbabker

This comment has been minimized.

Member

mbabker commented Nov 8, 2017

If it gets tested, sure. Looking at this again reminded me of something though.

There's already a similar (yet less elegant) implementation of this in the SQL Server subclass and I think it essentially is the same thing, so it looks like the overriding method can be removed.

@okonomiyaki3000

This comment has been minimized.

Contributor

okonomiyaki3000 commented Nov 9, 2017

Yeah, that's weird. Someone noticed this could happen with SQL Server but didn't think it would ever be a problem for other dbs? Anyway, I can remove that override and include it in this PR if you want.

@mbabker

This comment has been minimized.

Member

mbabker commented Nov 9, 2017

That was me. SQL Server actually has (had? this was SQL Server 2008) a hard query limit of inserting 1000 values in a query which is why I had to do the override for that driver.

@okonomiyaki3000 okonomiyaki3000 force-pushed the okonomiyaki3000:chunk-tokens branch from cb3f12c to a972a11 Nov 27, 2017

@okonomiyaki3000

This comment has been minimized.

Contributor

okonomiyaki3000 commented Nov 27, 2017

Rebased with latest staging and went ahead and removed that redundant override.

@okonomiyaki3000

This comment has been minimized.

Contributor

okonomiyaki3000 commented Dec 4, 2017

Just need one more approval...

@ggppdk

This comment has been minimized.

Contributor

ggppdk commented Dec 4, 2017

I have tested this item successfully on a972a11

Tested and did code review too,

i am using similar code for updating DB in a faster way,
also this is not only about updating DB when having a long article,
there should be a considerable performance gain for the indexer in big sites


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/17390.

@ggppdk ggppdk referenced this pull request Dec 4, 2017

Closed

[4.0] Refactoring com_search & com_finder #12664

1 of 6 tasks complete
@Quy

This comment has been minimized.

Contributor

Quy commented Dec 8, 2017

I have tested this item successfully on a972a11


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/17390.

@franz-wohlkoenig

This comment has been minimized.

franz-wohlkoenig commented Dec 9, 2017

Ready to Commit after two successful tests.

@joomla-cms-bot joomla-cms-bot added the RTC label Dec 9, 2017

@mbabker mbabker added this to the Joomla 3.8.4 milestone Dec 11, 2017

@mbabker mbabker merged commit f0abd89 into joomla:staging Dec 18, 2017

5 checks passed

JTracker/HumanTestResults Human Test Results: 2 Successful 0 Failed.
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/drone/pr the build was successful
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
hound No violations found. Woof!

@joomla-cms-bot joomla-cms-bot removed the RTC label Dec 18, 2017

@okonomiyaki3000 okonomiyaki3000 deleted the okonomiyaki3000:chunk-tokens branch Dec 18, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment