Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix chardet test and add ordering option #11621

Merged
merged 11 commits into from Jun 2, 2020

Conversation

zeripath
Copy link
Contributor

Add DETECTED_CHARSET_ORDER to repository config to allow setting of tie-breaking for detected charset ordering.

Fixes intermittent failure of chardet test

Signed-off-by: Andrew Thornton art27@cantab.net

Signed-off-by: Andrew Thornton <art27@cantab.net>
Signed-off-by: Andrew Thornton <art27@cantab.net>
Signed-off-by: Andrew Thornton <art27@cantab.net>
Copy link
Member

@lafriks lafriks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is missing Latvian windows-1257

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label May 25, 2020
@zeripath
Copy link
Contributor Author

@lafriks it appears that github.com/gogs/chardet doesn't detect or assign to windows-1257

Signed-off-by: Andrew Thornton <art27@cantab.net>
@guillep2k
Copy link
Member

This is the right direction, of course! However, I'm concerned that this will need to be done every time. IMHO the best way to address this is to assign the priority inside gogs/chardet (we would need to take over that library).

I elaborated about this here: #8474 (comment)

Signed-off-by: Andrew Thornton <art27@cantab.net>
Copy link
Member

@guillep2k guillep2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I believe this is a good compromise.
Nice work!!

priority, has := setting.Repository.DetectedCharsetScore[strings.ToLower(strings.TrimSpace(topResult.Charset))]
for _, result := range results {
if result.Confidence == topConfidence {
resultPriority, resultHas := setting.Repository.DetectedCharsetScore[strings.ToLower(strings.TrimSpace(result.Charset))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future we could attempt to normalize our list casing to the lib's casing in order to avoid calling ToLower() in a loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehehe have you looked at the libs casing? I copied the names exactly in to the setting, app.ini.sample and config cheat sheet. There's literally no fixed pattern for them.

@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels May 25, 2020
@guillep2k
Copy link
Member

One thing (sorry about the afterthought): if ANSI_CHARSET is set and DETECTED_CHARSET_ORDER is empty, we should prepend the given ANSI_CHARSET to the default order (i.e. it should have the maximum priority).

@lafriks
Copy link
Member

lafriks commented May 25, 2020

But not before utf-8

@zeripath
Copy link
Contributor Author

Heh ANSI charset simply overrides any detected charset that isn't utf8... It's not a very sensible option. I'm happy to make a breaking change by removing it and adding a default undetected charset or something like that?

@guillep2k
Copy link
Member

guillep2k commented May 25, 2020

@zeripath on second thought you're right, it would be breaking. Charset detection is a sensitive issue. Maybe we should leave the ANSI_CHARSET behavior unchanged after all. (*)

(*) We could deprecate it, but ANSI characters sets should already be a thing of the past. I believe most people dealing with them have really no choice, as they need to deal with ages of old code. Deprecating the option in a way they can't force anymore doesn't sound like a nice thing to do.

@guillep2k guillep2k closed this May 25, 2020
@guillep2k guillep2k reopened this May 25, 2020
@guillep2k
Copy link
Member

Sorry, bad operation! 😓

@zeripath zeripath added the type/feature Completely new functionality. Can only be merged if feature freeze is not active. label Jun 1, 2020
@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Jun 2, 2020
@guillep2k
Copy link
Member

Ping LG-TM

@guillep2k guillep2k merged commit a1ad188 into go-gitea:master Jun 2, 2020
@zeripath zeripath deleted the fix-chardet-test branch June 3, 2020 06:39
ydelafollye pushed a commit to ydelafollye/gitea that referenced this pull request Jul 31, 2020
* Fix chardet test and add ordering option

Signed-off-by: Andrew Thornton <art27@cantab.net>

* minor fixes

Signed-off-by: Andrew Thornton <art27@cantab.net>

* remove log

Signed-off-by: Andrew Thornton <art27@cantab.net>

* remove log2

Signed-off-by: Andrew Thornton <art27@cantab.net>

* only iterate through top results

Signed-off-by: Andrew Thornton <art27@cantab.net>

* Update docs/content/doc/advanced/config-cheat-sheet.en-us.md

* slight restructure of for loop

Signed-off-by: Andrew Thornton <art27@cantab.net>

Co-authored-by: techknowlogick <techknowlogick@gitea.io>
@go-gitea go-gitea locked and limited conversation to collaborators Nov 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. type/bug type/feature Completely new functionality. Can only be merged if feature freeze is not active. type/testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants