Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebuild-indexes cmd #7753

Open
wants to merge 13 commits into
base: master
from

Conversation

@guillep2k
Copy link
Member

commented Aug 5, 2019

So, this is the direction I'm taking to implement the rebuild-indexes cmd mentioned in #7733:

gitea [global options] rebuild-indexes {--repositories|--issues|--all}

Only the --repositories option is currently implemented. It's a little bit convoluted because the indexes are normally updated in background on the web server. I didn't want to cause any kind of concurrency problem, so I've decided for a private api route.

I figured a per-repository request would make sense for very big installations, as the requests will go very quickly for the first setting.Indexer.UpdateQueueLength (20) repos. After that it will have to wait in the queue to advance. All in all, there's a risk for the requests to timeout. I'd like your input on how to handle that.

Please feel free to criticize. If you guys like where I'm going, I'll implement the --issues option.

Guillermo Prandi added some commits Aug 5, 2019

@codecov-io

This comment has been minimized.

Copy link

commented Aug 6, 2019

Codecov Report

Merging #7753 into master will decrease coverage by 0.09%.
The diff coverage is 10.22%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #7753     +/-   ##
=========================================
- Coverage    41.4%   41.31%   -0.1%     
=========================================
  Files         474      477      +3     
  Lines       63780    63942    +162     
=========================================
+ Hits        26410    26417      +7     
- Misses      33933    34090    +157     
+ Partials     3437     3435      -2
Impacted Files Coverage Δ
modules/private/rebuild_indexes.go 0% <0%> (ø)
modules/indexer/repo.go 66.66% <0%> (-3.57%) ⬇️
modules/indexer/issues/db.go 54.16% <0%> (-4.93%) ⬇️
modules/indexer/issues/bleve.go 60% <0%> (-1.54%) ⬇️
routers/private/rebuild_indexes.go 0% <0%> (ø)
cmd/rebuild_indexes.go 0% <0%> (ø)
models/repo_indexer.go 62.5% <10.52%> (-4.04%) ⬇️
routers/private/internal.go 23.63% <100%> (+2.88%) ⬆️
modules/indexer/indexer.go 44.73% <100%> (-10.53%) ⬇️
modules/indexer/issues/indexer.go 49.05% <26%> (-9.96%) ⬇️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 98fd085...fe655e4. Read the comment docs.

@guillep2k guillep2k changed the title WIP: Rebuild-indexes cmd Rebuild-indexes cmd Aug 8, 2019

@guillep2k

This comment has been minimized.

Copy link
Member Author

commented Aug 8, 2019

I've implemented the issues index rebuild that was previously missing. The PR is now complete (not WIP) except requests by the managers, of course.
My thoughts: it's a lot of code for such a "simple" functionality. Perhaps it would have been better to make an offline command (i.e. one that requires that gitea is not running) that just deletes the index files (bevle) or records (db).

@lunny lunny removed the status/wip label Aug 8, 2019

guillep2k added some commits Aug 8, 2019

@silverwind

This comment has been minimized.

Copy link
Contributor

commented Aug 11, 2019

Edit: Nevermind, it seems to work, I was missing my GITEA_CUSTOM in the command invocation.

@silverwind

This comment has been minimized.

Copy link
Contributor

commented Aug 11, 2019

So, some feedback:

I get a lot of this while running against a pretty much idle server:

Server too busy; backing off...
Server too busy; backing off...
Server too busy; backing off...

Not sure exactly why's that but on a server hosting like 15 repos, it ran like 10 minutes til completion with a lot of those "too busy" in between which seems quite long. Maybe drop the per-repo thing and delete the index files and force-reinitialize bleve? If possible, it should only take a few seconds.

The fact that his has to run while gitea is running on another thread seems non-intuitive to me. In fact I was caught off guard above when I forgot to set GITEA_CUSTOM for the command invocation, so it did not find the config. How about we move that command to the admin web ui only?

Edit: Also, what's interesting is that when I run the rebuild command a second time, it now completes almost instantly, not sure why that is.

@guillep2k

This comment has been minimized.

Copy link
Member Author

commented Aug 11, 2019

So, some feedback:

I get a lot of this while running against a pretty much idle server:

Server too busy; backing off...
Server too busy; backing off...
Server too busy; backing off...

Not sure exactly why's that but on a server hosting like 15 repos, it ran like 10 minutes til completion with a lot of those "too busy" in between which seems quite long. Maybe drop the per-repo thing and delete the index files and force-reinitialize bleve? If possible, it should only take a few seconds.

The fact that his has to run while gitea is running on another thread seems non-intuitive to me. In fact I was caught off guard above when I forgot to set GITEA_CUSTOM for the command invocation, so it did not find the config. How about we move that command to the admin web ui only?

Edit: Also, what's interesting is that when I run the rebuild command a second time, it now completes almost instantly, not sure why that is.

@silverwind Please see my response to Lunny about this option being more future proof.

Perhaps it took different times because the indexes got in fact different content, like with/without the camel case tokenizer? Also, it may seem overkill to do it per repo if you have 15 very big ones, If you have 500 small ones may make more sense.

Anyway, based on you guys' feedback, I think I'll change the strategy:

  • Make the rebuild-indexes cmd global, meaning it will delete all indexes at once and it will require gitea to be down.
  • Postpone the "per-repository" capability for a future version, when it makes more sense.

If you guy agree, I'll drop this PR and start another.

@silverwind

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2019

require gitea to be down

If it cannot be done at runtime, I guess there's not much sense to have the command in first place, may as well instruct users to stop gitea, manually delete index files and start it back up. Thought my preference would generally be to bump the index version on every change that affects indexing, that way, users will probably never have to manually touch index files.

@guillep2k

This comment has been minimized.

Copy link
Member Author

commented Aug 13, 2019

@silverwind I re-engineered the logic to support both offline and online rebuilds. It also uses the version metadata to force the rebuild, so it's much faster (although I've dropped the per-repository rebuild).

I think this version makes more sense an it's more robust.

guillep2k added some commits Aug 13, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
7 participants
You can’t perform that action at this time.