Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.9] Problems with indexing com_finder. #33126

Closed
rigin opened this issue Apr 13, 2021 · 14 comments
Closed

[3.9] Problems with indexing com_finder. #33126

rigin opened this issue Apr 13, 2021 · 14 comments

Comments

@rigin
Copy link
Contributor

rigin commented Apr 13, 2021

Is your feature request related to a problem? Please describe.

  1. When indexing a large site on a slow server using a CLI script, for various reasons, you have to restart the process without finishing it, because the process takes several days.

When resuming, indexing does not continue from the place where it stopped, but begins to reindex the already prepared index, although a significant part of the site has not yet been indexed.

  1. And also, when you save content, the site freezes due to indexing.
    If you disable the smart search plugin content, saving is normal.

Describe the solution you'd like

  1. Change the order of crawling content during indexing: Index the content without an index first, and then in ascending order of the date of the previous indexing.
  2. If possible, create a separate option in the settings to disable indexing when saving content and indexing using the CLI script. In order to be able to save content without indexing, while the CLI script is running in parallel.

Additional context

@brianteeman
Copy link
Contributor

If the site is so large and on such a slow server that is takes several days then no amount of optimisation will really help you.

Time to bite the bullet and change to a better server. Problem solved permanently in hours saving you $$$ in time

@rigin
Copy link
Contributor Author

rigin commented Apr 13, 2021

That's how much I love these smart guys who, in response to a specific question, begin to lecture in the style of an old grandmother... Here's a type of youth today went... ))))
The server at my house is behind the TV. And he is weak, because I need it so much - he eats electricity less.
And the situation when a large site, even on a fast server, is indexed for many hours is typical. And the situation when indexing did not end at once is also common.

@brianteeman
Copy link
Contributor

There is only so much juice that you can get from a lemon before you need to get another lemon

@ghost
Copy link

ghost commented Apr 14, 2021

@rigin Can you append in title "[4] "? New features go in Joomla4 so its easier to find in issue-view which version they belong. Thanks.

@rigin rigin changed the title Problems with indexing com_finder. [4] ...[3.9] Problems with indexing com_finder. Apr 14, 2021
@richard67
Copy link
Member

@Hackwar Is there anything we could do about this?

@Hackwar
Copy link
Member

Hackwar commented Apr 18, 2021

@rigin is this indeed an issue you notice on Joomla 4? I improved the indexing process quite a bit in 4.0 and it does index a lot faster. Could you provide a bit more information about your site in order to determine if the indexing time is to be expected or not?

Generally, it is not exactly trivial to determine which parts haven't been indexed and which ones would need updating. However, checking if an item needs to be reindexed should be a rather quick operation, because it takes the result object that is prepared prior to indexing and creates a checksum over that. It then compares that checksum to the one in the database and only indexes that content when it has changed. If after restarting the indexing of that part still takes a long time, then you might want to look into your plugins to see which ones are executed during indexing (mainly to process the content triggers) and take up a lot of time.

In order to disable indexing on saving, you should just be able to disable the smart search content plugin. It should then still be possible to index via the CLI script.

I'm really torn on investing more work into this in 3.x, since all changes of 4.0 can't be backported due to backwards compatibility reasons. At the same time, it is more than just a bugfix and thus would have to go into a minor release... I would defer to @HLeithner if we want to do some improvements to speed this up in 3.x.

Generally, there are options to improve indexing speed specific to your site and there would also be the possibility of using a plugin which I wrote, which backports the changes from 4.0 to 3.9. If you want to go that route, please contact me privately and I'll try to help you. Please just search for my name and you will find ways to contact me.

@rigin
Copy link
Contributor Author

rigin commented Apr 19, 2021

I ran into this problem on Joomla 3.9. But I understand that this has always been the case in com_finder. ))
It's just that I got it on a combination of a slow server and a large amount of indexing.
@sandramay0905 advised to add the tag [4.0] in order to draw attention to the problem.

This problem occurred when indexing the site https://rigin.net/ . There are approximately 1,500 articles in it. Hardware-wise, it is located on a weak gigabyte ga-d525tud office computer.

I use the standard joomla 3.9 cron script for indexing.

On the provider's server, the indexing process took about 12 hours, and on this configuration it takes about 2 weeks.

When you try to edit the material in parallel with indexing, indexing is interrupted and when you restart the cron script, indexing can continue from the interrupted place, but if the session is interrupted, indexing starts again in the order of increasing the article id.

This can be seen in the admin panel /administrator/index. php?option=com_finder&view=index

@Hackwar
Copy link
Member

Hackwar commented Apr 19, 2021

So I'm running the improved code on a website with ~7500 items, about 3500 of those are articles. Indexing that takes about 30 minutes on a good server. Your site should index in definitely less than a day.

However, this is NOT a 4.0 problem.

@rigin
Copy link
Contributor Author

rigin commented Apr 19, 2021

OK, I'll fix it now.

@rigin rigin changed the title [4] ...[3.9] Problems with indexing com_finder. [3.9] Problems with indexing com_finder. Apr 19, 2021
@ghost
Copy link

ghost commented Apr 19, 2021

However, this is NOT a 4.0 problem.

Sorry @rigin i thought its a feature request.

@rigin
Copy link
Contributor Author

rigin commented Apr 19, 2021

This is my terrible English.. )))

@HLeithner
Copy link
Member

If needed we can backport improvements later (maybe after j4 release) but at this point in time I would really like to bring j4 into a release able state.

@Hackwar
Copy link
Member

Hackwar commented May 9, 2021

#33720 should improve indexing a little bit.

@chmst
Copy link
Contributor

chmst commented Jan 20, 2022

Closing as there is PR

@chmst chmst closed this as completed Jan 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants