New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce repo indexer disk usage #3452

Merged
merged 1 commit into from Feb 5, 2018

Conversation

5 participants
@ethantkoenig
Copy link
Member

ethantkoenig commented Feb 3, 2018

Reduces disk usage of the repo (i.e. code) indexer:

  • Disables bleve's _all field (which meant that we were previously storing everything twice)
  • Use the bleve unique token filter (blevesearch/bleve#739), since we only display the first occurrence of the search term.

I saw as roughly 3x (1.5GB -> 500MB) reduction in disk usage as a result of these changes (of course, mileage will vary depending on what type of text/code you are indexing).

Also introduces a migration-like versions to the issue and repo indexers to facilitate changes (which will typically require rebuilding the index).

Yes, this PR shamelessly pulls in https://github.com/ethantkoenig/rupture as a dependency to facilitate tracking indexer versions and migrations; I am aware of no other alternatives.

@@ -70,9 +73,15 @@ func createIssueIndexer() error {
mapping := bleve.NewIndexMapping()
docMapping := bleve.NewDocumentMapping()

numericFieldMapping := bleve.NewNumericFieldMapping()
numericFieldMapping.Store = false
numericFieldMapping.IncludeInAll = false
docMapping.AddFieldMappingsAt("RepoID", bleve.NewNumericFieldMapping())

This comment has been minimized.

@lafriks

lafriks Feb 3, 2018

Member

Should this use numericFieldMapping?

This comment has been minimized.

@ethantkoenig

ethantkoenig Feb 3, 2018

Member

Yes, fixed

@tboerger tboerger added the lgtm/need 2 label Feb 3, 2018

@lafriks lafriks added this to the 1.5.0 milestone Feb 3, 2018

@lafriks lafriks added the changelog label Feb 3, 2018

@ethantkoenig ethantkoenig force-pushed the ethantkoenig:repo_indexer_disk_usage branch 3 times, most recently from 43870d9 to c90f7af Feb 3, 2018

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Feb 4, 2018

Codecov Report

Merging #3452 into master will decrease coverage by <.01%.
The diff coverage is 54.32%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3452      +/-   ##
==========================================
- Coverage   35.67%   35.67%   -0.01%     
==========================================
  Files         281      281              
  Lines       40697    40671      -26     
==========================================
- Hits        14519    14508      -11     
+ Misses      24031    24020      -11     
+ Partials     2147     2143       -4
Impacted Files Coverage Δ
models/issue_indexer.go 67.81% <0%> (ø) ⬆️
modules/indexer/indexer.go 63.26% <40.9%> (-14.24%) ⬇️
models/repo_indexer.go 43.85% <50%> (-3.58%) ⬇️
modules/indexer/repo.go 63.47% <53.33%> (+2.6%) ⬆️
modules/indexer/issue.go 67.56% <78.94%> (+8.35%) ⬆️
models/repo_list.go 65.62% <0%> (-1.57%) ⬇️
models/error.go 32.73% <0%> (-0.4%) ⬇️
models/repo.go 42.98% <0%> (+0.18%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 283e87d...55a3db8. Read the comment docs.

@lafriks

lafriks approved these changes Feb 4, 2018

@tboerger tboerger added lgtm/need 1 and removed lgtm/need 2 labels Feb 4, 2018

"strconv"

"code.gitea.io/gitea/modules/setting"

This comment has been minimized.

@appleboy

appleboy Feb 4, 2018

Member

Add empty line

@ethantkoenig

This comment has been minimized.

Copy link
Member

ethantkoenig commented Feb 4, 2018

@appleboy Done

@tboerger tboerger added lgtm/done and removed lgtm/need 1 labels Feb 5, 2018

@lafriks

This comment has been minimized.

Copy link
Member

lafriks commented Feb 5, 2018

@ethantkoenig please resolve conflicts

@ethantkoenig ethantkoenig force-pushed the ethantkoenig:repo_indexer_disk_usage branch from b2d1420 to 55a3db8 Feb 5, 2018

@ethantkoenig

This comment has been minimized.

Copy link
Member

ethantkoenig commented Feb 5, 2018

@lafriks Resolved

@lafriks lafriks merged commit a89592d into go-gitea:master Feb 5, 2018

3 checks passed

Codacy/PR Quality Review Good work! A positive pull request.
Details
approvals/lgtm this commit looks good
continuous-integration/drone/pr the build was successful
Details

@ethantkoenig ethantkoenig deleted the ethantkoenig:repo_indexer_disk_usage branch Feb 21, 2018

aswild added a commit to aswild/gitea that referenced this pull request Jul 6, 2018

Merge tag 'v1.5.0-rc1' into wild/v1.5
* SECURITY
  * Limit uploaded avatar image-size to 4096x3072 by default (go-gitea#4353)
  * Do not allow to reuse TOTP passcode (go-gitea#3878)
* FEATURE
  * Add cli commands to regen hooks & keys (go-gitea#3979)
  * Add support for FIDO U2F (go-gitea#3971)
  * Added user language setting (go-gitea#3875)
  * LDAP Public SSH Keys synchronization (go-gitea#1844)
  * Add topic support (go-gitea#3711)
  * Multiple assignees (go-gitea#3705)
  * Add protected branch whitelists for merging (go-gitea#3689)
  * Global code search support (go-gitea#3664)
  * Add label descriptions (go-gitea#3662)
  * Add issue search via API (go-gitea#3612)
  * Add repository setting to enable/disable health checks (go-gitea#3607)
  * Emoji Autocomplete (go-gitea#3433)
  * Implements generator cli for secrets (go-gitea#3531)
* ENHANCEMENT
  * Add more webhooks support and refactor webhook templates directory (go-gitea#3929)
  * Add new option to allow only OAuth2/OpenID user registration (go-gitea#3910)
  * Add option to use paged LDAP search when synchronizing users (go-gitea#3895)
  * Symlink icons (go-gitea#1416)
  * Improve release page UI (go-gitea#3693)
  * Add admin dashboard option to run health checks (go-gitea#3606)
  * Add branch link in branch list (go-gitea#3576)
  * Reduce sql query times in retrieveFeeds (go-gitea#3547)
  * Option to enable or disable swagger endpoints (go-gitea#3502)
  * Add missing licenses (go-gitea#3497)
  * Reduce repo indexer disk usage (go-gitea#3452)
  * Enable caching on assets and avatars (go-gitea#3376)
  * Add repository search ordered by stars/forks. Forks column in admin repo list (go-gitea#3969)
  * Add Environment Variables to Docker template (go-gitea#4012)
  * LFS: make HTTP auth period configurable (go-gitea#4035)
  * Add config path as an optionial flag when changing pass via CLI (go-gitea#4184)
  * Refactor User Settings sections (go-gitea#3900)
  * Allow square brackets in external issue patterns (go-gitea#3408)
  * Add Attachment API (go-gitea#3478)
  * Add EnableTimetracking option to app settings (go-gitea#3719)
  * Add config option to enable or disable log executed SQL (go-gitea#3726)
  * Shows total tracked time in issue and milestone list (go-gitea#3341)
* TRANSLATION
  * Improve English grammar and consistency (go-gitea#3614)
* DEPLOYMENT
  * Allow Gitea to run as different USER in Docker (go-gitea#3961)
  * Provide compressed release binaries (go-gitea#3991)
  * Sign release binaries (go-gitea#4188)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment