Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to catch encoding troubles automatically #393

Closed
gitblit opened this issue Aug 12, 2015 · 10 comments
Closed

Try to catch encoding troubles automatically #393

gitblit opened this issue Aug 12, 2015 · 10 comments

Comments

@gitblit
Copy link
Collaborator

gitblit commented Aug 12, 2015

Originally reported on Google Code with ID 97

Content that is not UTF-8 encoded is failing to be decoded properly.  Ideally Gitblit
should confirm successful charset decoding otherwise it should fallback to 8859-1,
1252, and/or a user-defined charset?

Reported by James.Moger on 2012-05-10 12:52:26

@gitblit
Copy link
Collaborator Author

gitblit commented Aug 12, 2015

Please support cp1251 too.
Thanks

Reported by Dmitry.A.Abramov on 2012-06-08 13:56:09

@gitblit
Copy link
Collaborator Author

gitblit commented Aug 12, 2015

The encodings, and the preferred order, will be user-definable.

Reported by James.Moger on 2012-06-08 14:06:01

@gitblit
Copy link
Collaborator Author

gitblit commented Aug 12, 2015

I've pushed up my changes for trying different encodings.  If you have a small test
repo that you can share I can double-check my work and add it to my testing repos.

Reported by James.Moger on 2012-06-09 01:03:26

  • Status changed: Queued

@gitblit
Copy link
Collaborator Author

gitblit commented Aug 12, 2015

Thank.
It works for me:
    <context-param>
        <param-name>web.blobEncodings</param-name>
        <param-value>UTF-8 Cp1251 ISO-8859-1</param-value>
    </context-param>

But I think this property must be applied to a project and not a server.

Reported by Dmitry.A.Abramov on 2012-06-11 11:02:03

@gitblit
Copy link
Collaborator Author

gitblit commented Aug 12, 2015

Having to set this for every repository would be very inconvenient.

Reported by robin.rosenberg on 2012-06-11 11:32:26

@gitblit
Copy link
Collaborator Author

gitblit commented Aug 12, 2015

Here's a bundle with two files.

"om kulneff.txt" ISO-Latin-1 encoded both content and commit message
"om kulneff2.txt" UTF8 encoded both content and commit message

"om kulneff2.txt" starts with the content of "om kulneff.txt" (just differently encoded).

the text comes from a work whose copyright has expired.



Reported by robin.rosenberg on 2012-06-11 11:51:49


- _Attachment: [kulneff.bundle](https://storage.googleapis.com/google-code-attachments/gitblit/issue-97/comment-6/kulneff.bundle)_

@gitblit
Copy link
Collaborator Author

gitblit commented Aug 12, 2015

Thanks, Robin.  I'll add your bundle to my tests.  BTW, try/catch preferred encoding
works as expected for blobs.  To reindex your repos, you'll have to do one of two things:
 (1) change the value of LuceneExecutor.INDEX_VERSION or (2) delete .git/lucene.conf
for all repositories.  I have not decided if I should increment the index version or
not - this would force all gitblit's to reindex all repositories whenever this release
is installed.

Your comment suggests that the commit messages are encoded too.  I thought that (J)Git
only stored UTF-8 metadata.  If you can store different commit message encodings, then
JGit must be handling that for me transparently.

As for per-server vs. per-repository.  I definitely think this needs to be per-server,
as implemented.  I had not considered per-repository.  I could implement repository
override of the default encodings, but I'm not sure how useful that would be.

Reported by James.Moger on 2012-06-11 13:11:59

@gitblit
Copy link
Collaborator Author

gitblit commented Aug 12, 2015

JGit decodes commit messages automatically.
First UTF-8, then platform default and finally ISO-8859-1.

Reported by robin.rosenberg on 2012-06-11 19:29:16

@gitblit
Copy link
Collaborator Author

gitblit commented Aug 12, 2015

I created the test repo using C Git. JGit can encode with non-UTF-8 too,
but I guess few people use that. It's only available through the API and
the only use I know of is a unit test.

Reported by robin.rosenberg on 2012-06-11 19:38:22

@gitblit
Copy link
Collaborator Author

gitblit commented Aug 12, 2015

Resolved in v1.0.0.

Reported by James.Moger on 2012-07-14 04:44:17

  • Status changed: Done

@gitblit gitblit closed this as completed Aug 12, 2015
@flaix flaix modified the milestone: 1.0.0 Dec 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants