Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support elastic search for code search #10273

Merged
merged 11 commits into from Aug 30, 2020
Merged

Conversation

lunny
Copy link
Member

@lunny lunny commented Feb 14, 2020

This PR adds new code search type: elastic search. For highlighter, it uses the default Highlighter because most code files are small files. We get the keyword location on the text by an index function. That maybe not very fast.

Thanks @lafriks.

We use version number at the end of index name, for example gitea_codes.v1 and for next version create index named gitea_codes.v2 and use alias gitea_codes for indexing and searching index. Migrating data can also be easily done copying data over from previous index version.

And this PR also missed to store index version to some place. It should not be stored to a file because we assume that there are more than one gitea instances here. Maybe we should store it to database but we need another PR to do that.

@lunny lunny added the type/feature Completely new functionality. Can only be merged if feature freeze is not active. label Feb 14, 2020
@lunny lunny added this to the 1.12.0 milestone Feb 16, 2020
@lunny lunny changed the title WIP: Support elastic search for code search Support elastic search for code search Feb 17, 2020
@lunny
Copy link
Member Author

lunny commented Feb 18, 2020

It's ready to review.

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Feb 18, 2020
@6543
Copy link
Member

6543 commented Feb 23, 2020

has conflicts

@lafriks
Copy link
Member

lafriks commented Feb 23, 2020

Please also add new fields (language, indexed at, commit id)

@lunny lunny added the status/blocked This PR cannot be merged yet, i.e. because it depends on another unmerged PR label Feb 23, 2020
@lunny
Copy link
Member Author

lunny commented Feb 23, 2020

@6543 resolved @lafriks added. It should work now.

But I'm still finding a method to store the indexer version to some place. So I block this PR until I find it.

And I will send another PRs related to code search.

@codecov-io
Copy link

codecov-io commented Feb 23, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@f422a11). Click here to learn what that means.
The diff coverage is 15.73%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #10273   +/-   ##
=========================================
  Coverage          ?   43.53%           
=========================================
  Files             ?      588           
  Lines             ?    82448           
  Branches          ?        0           
=========================================
  Hits              ?    35895           
  Misses            ?    42100           
  Partials          ?     4453
Impacted Files Coverage Δ
modules/indexer/code/elastic_search.go 0% <0%> (ø)
modules/setting/indexer.go 92.68% <100%> (ø)
modules/indexer/code/wrapped.go 40.74% <100%> (ø)
modules/indexer/code/queue.go 40.42% <25%> (ø)
modules/indexer/code/indexer.go 33.01% <25.45%> (ø)
modules/indexer/code/bleve.go 67.8% <73.8%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f422a11...eb09fa6. Read the comment docs.

@lafriks
Copy link
Member

lafriks commented Jul 26, 2020

Merge seems to be broken

@codecov-commenter
Copy link

codecov-commenter commented Jul 28, 2020

Codecov Report

Merging #10273 into master will decrease coverage by 0.13%.
The diff coverage is 16.51%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10273      +/-   ##
==========================================
- Coverage   43.43%   43.29%   -0.14%     
==========================================
  Files         645      646       +1     
  Lines       71345    71588     +243     
==========================================
+ Hits        30988    30994       +6     
- Misses      35340    35577     +237     
  Partials     5017     5017              
Impacted Files Coverage Δ
modules/indexer/code/elastic_search.go 0.00% <0.00%> (ø)
modules/indexer/code/queue.go 43.20% <31.25%> (-0.91%) ⬇️
modules/indexer/code/indexer.go 36.17% <34.00%> (+0.10%) ⬆️
modules/indexer/code/bleve.go 71.42% <72.22%> (+2.65%) ⬆️
modules/indexer/code/wrapped.go 48.88% <100.00%> (ø)
modules/setting/indexer.go 91.42% <100.00%> (+0.80%) ⬆️
modules/git/utils.go 73.77% <0.00%> (-3.28%) ⬇️
modules/log/event.go 57.54% <0.00%> (-1.89%) ⬇️
modules/queue/unique_queue_disk_channel.go 53.84% <0.00%> (-1.54%) ⬇️
modules/queue/workerpool.go 58.77% <0.00%> (-1.23%) ⬇️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d257485...e1441e5. Read the comment docs.

@lafriks
Copy link
Member

lafriks commented Jul 28, 2020

We can use version number at the end of index name, for example gitea_codes.v1 and for next version create index named gitea_codes.v2 and use alias gitea_codes for indexing and searching index. Migrating data can also be easily done copying data over from previous index version.

@lunny lunny force-pushed the lunny/es_code branch 2 times, most recently from fce7bc0 to 17f1837 Compare August 11, 2020 02:54
@lunny
Copy link
Member Author

lunny commented Aug 11, 2020

@lafriks done.

@lunny lunny removed the status/blocked This PR cannot be merged yet, i.e. because it depends on another unmerged PR label Aug 11, 2020
custom/conf/app.example.ini Outdated Show resolved Hide resolved
custom/conf/app.example.ini Outdated Show resolved Hide resolved
@lunny
Copy link
Member Author

lunny commented Aug 13, 2020

@6543 Done.

modules/indexer/code/elastic_search.go Outdated Show resolved Hide resolved
modules/indexer/code/indexer_test.go Outdated Show resolved Hide resolved
@lunny
Copy link
Member Author

lunny commented Aug 15, 2020

@zeripath Done.

@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Aug 30, 2020
@6543
Copy link
Member

6543 commented Aug 30, 2020

🚀

@lafriks lafriks merged commit 9bc69ff into go-gitea:master Aug 30, 2020
@42wim
Copy link
Member

42wim commented Aug 30, 2020

🎉

@lunny
Copy link
Member Author

lunny commented Sep 11, 2020

Fix #6648

@lunny lunny added the type/changelog Adds the changelog for a new Gitea version label Sep 17, 2020
@lunny lunny deleted the lunny/es_code branch November 18, 2020 04:37
@go-gitea go-gitea locked and limited conversation to collaborators Nov 24, 2020
@delvh delvh removed the type/changelog Adds the changelog for a new Gitea version label Oct 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. type/feature Completely new functionality. Can only be merged if feature freeze is not active.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants