Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunk the tokens array to avoid making extremely long queries. #17390

Merged
merged 2 commits into from Dec 18, 2017

Conversation

okonomiyaki3000
Copy link
Contributor

@okonomiyaki3000 okonomiyaki3000 commented Aug 3, 2017

Pull Request for Issue # .

Summary of Changes

Chunk the tokens array (which can potentially be very long) before using it to generate a query. Run a query for each chunk instead of the whole array.

Testing Instructions

  1. Check that your php max_execution_time is something sensible like 30 seconds.
  2. Turn on finder indexing for content.
  3. Create a super long article. Maybe like 10,000 words. Maybe this can help: http://slipsum.com/
  4. Save the article.

Expected result

Well, I expect the article to save properly and the page to reload without crashing.

Actual result

Most likely, the article will save but in the post-save indexing process there will be a timeout. This happens because we are looping over a huge array of tokens and adding each one to the values array of the query object. The query object handles this rather inefficiently (should be addressed in a different PR) but is mostly fine until the values array gets very large.

One solution is to avoid getting a huge values array by sending several smaller queries instead of one big one. This is that solution.

Documentation Changes Required

nope

@ghost
Copy link

ghost commented Nov 1, 2017

Status is set on "Needs Review".

@okonomiyaki3000
Copy link
Contributor Author

Rebased with latest staging. I can't imagine why that drone test is failing. There's nothing in this code that would cause it to fail.

@okonomiyaki3000
Copy link
Contributor Author

Hmm. So, is this getting in? Because this is a legit problem that people in the real world may actually face.

@mbabker
Copy link
Contributor

mbabker commented Nov 8, 2017

If it gets tested, sure. Looking at this again reminded me of something though.

There's already a similar (yet less elegant) implementation of this in the SQL Server subclass and I think it essentially is the same thing, so it looks like the overriding method can be removed.

@okonomiyaki3000
Copy link
Contributor Author

Yeah, that's weird. Someone noticed this could happen with SQL Server but didn't think it would ever be a problem for other dbs? Anyway, I can remove that override and include it in this PR if you want.

@mbabker
Copy link
Contributor

mbabker commented Nov 9, 2017

That was me. SQL Server actually has (had? this was SQL Server 2008) a hard query limit of inserting 1000 values in a query which is why I had to do the override for that driver.

@okonomiyaki3000
Copy link
Contributor Author

Rebased with latest staging and went ahead and removed that redundant override.

@okonomiyaki3000
Copy link
Contributor Author

Just need one more approval...

@ggppdk
Copy link
Contributor

ggppdk commented Dec 4, 2017

I have tested this item ✅ successfully on a972a11

Tested and did code review too,

i am using similar code for updating DB in a faster way,
also this is not only about updating DB when having a long article,
there should be a considerable performance gain for the indexer in big sites


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/17390.

@Quy
Copy link
Contributor

Quy commented Dec 8, 2017

I have tested this item ✅ successfully on a972a11


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/17390.

@ghost
Copy link

ghost commented Dec 9, 2017

Ready to Commit after two successful tests.

@joomla-cms-bot joomla-cms-bot added the RTC This Pull Request is Ready To Commit label Dec 9, 2017
@mbabker mbabker added this to the Joomla 3.8.4 milestone Dec 11, 2017
@mbabker mbabker merged commit f0abd89 into joomla:staging Dec 18, 2017
@joomla-cms-bot joomla-cms-bot removed the RTC This Pull Request is Ready To Commit label Dec 18, 2017
@okonomiyaki3000 okonomiyaki3000 deleted the chunk-tokens branch December 18, 2017 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants