Encoding problem with files with utf-8 char in their names #424

Closed
gitblit opened this Issue Aug 12, 2015 · 10 comments

Comments

Projects
None yet
2 participants
@gitblit
Owner

gitblit commented Aug 12, 2015

Originally reported on Google Code with ID 128

* What steps will reproduce the problem?

1. Add a file in your git repository with utf-8 char in its name, lets say « bébé.java
»
2. add and commit 
3. try to diff / view / blame the file in gitblit

* What is the expected output? 

The same as for other files with only pure ASCII characters in their names: the lovely
output we are used to get ;-)

* What do you see instead?

All links are broken (css, images, etc.).

When you try a diff on that file, you get the following error in log file:

ERROR failed to generate commit diff!
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
        at java.lang.String.substring(String.java:1937)
        at com.gitblit.utils.GitBlitDiffFormatter.getHtml(GitBlitDiffFormatter.java:138)
        at com.gitblit.utils.DiffUtils.getDiff(DiffUtils.java:176)
        at com.gitblit.utils.DiffUtils.getDiff(DiffUtils.java:93)


* What version of the product are you using? On what operating system?

v1.1.0, war version with Tomcat 5.5, Ubuntu  8.04.4 LTS
 (hardy)


Thank you in advance for solving that problem: I have more than 56000 documents in
my Git repo and about 20000 have utf-8 chars in their names...

Reported by benoit.mercibe on 2012-09-06 21:09:12

@gitblit

This comment has been minimized.

Show comment
Hide comment
@gitblit

gitblit Aug 12, 2015

Owner
Can you point me to a small example repo?

Reported by James.Moger on 2012-09-06 21:44:55

Owner

gitblit commented Aug 12, 2015

Can you point me to a small example repo?

Reported by James.Moger on 2012-09-06 21:44:55

@gitblit

This comment has been minimized.

Show comment
Hide comment
@gitblit

gitblit Aug 12, 2015

Owner
Ok so I actually did some real work and executed a few permutations of your recipe to
create a repo with a file named bébé.java.

Created, served, and viewed on on Mac OS X.
Cloned from Mac OS X to Win 7.  Served and viewed the clone on Win 7.
Cloned from Mac OS X to Ubuntu 12.04.  Served and viewed the clone on Ubuntu 12.04.

Created, served, and viewed on Win 7.
Cloned from Win 7 to Mac OS X.  Served and viewed the clone on Mac OS X.
Cloned from Win 7 to Ubuntu 12.04.  Served and viewed the clone on Ubuntu 12.04.

Created, served, and viewed on Ubuntu 12.04.

All these worked fine.  That led me to wonder about your choice of the é character
which can be represented in cp1252, iso-8859-1, and utf-8.  So I repeated the above
tests with wprowadź.java which uses an accented z character not represented by cp1252
nor iso-8859-1.  This also worked fine.

So long story short, I can not (yet) reproduce your issue.

Reported by James.Moger on 2012-09-07 13:14:10

Owner

gitblit commented Aug 12, 2015

Ok so I actually did some real work and executed a few permutations of your recipe to
create a repo with a file named bébé.java.

Created, served, and viewed on on Mac OS X.
Cloned from Mac OS X to Win 7.  Served and viewed the clone on Win 7.
Cloned from Mac OS X to Ubuntu 12.04.  Served and viewed the clone on Ubuntu 12.04.

Created, served, and viewed on Win 7.
Cloned from Win 7 to Mac OS X.  Served and viewed the clone on Mac OS X.
Cloned from Win 7 to Ubuntu 12.04.  Served and viewed the clone on Ubuntu 12.04.

Created, served, and viewed on Ubuntu 12.04.

All these worked fine.  That led me to wonder about your choice of the é character
which can be represented in cp1252, iso-8859-1, and utf-8.  So I repeated the above
tests with wprowadź.java which uses an accented z character not represented by cp1252
nor iso-8859-1.  This also worked fine.

So long story short, I can not (yet) reproduce your issue.

Reported by James.Moger on 2012-09-07 13:14:10

@gitblit

This comment has been minimized.

Show comment
Hide comment
@gitblit

gitblit Aug 12, 2015

Owner
Thank you for your quick feedback and deep tests.  I will try to connect to my repo
with Gitblit GO instead of WAR and will keep you inform.

Reported by benoit.mercibe on 2012-09-07 14:39:12

Owner

gitblit commented Aug 12, 2015

Thank you for your quick feedback and deep tests.  I will try to connect to my repo
with Gitblit GO instead of WAR and will keep you inform.

Reported by benoit.mercibe on 2012-09-07 14:39:12

@gitblit

This comment has been minimized.

Show comment
Hide comment
@gitblit

gitblit Aug 12, 2015

Owner
OK - found the problem and solution

The problem is that Tomcat uses ISO 8859-1 by default to decode URLs received from
the browser.  So I modified the connector properties (server.xml) according to http://wiki.apache.org/tomcat/FAQ/CharacterEncoding

Now almost everything is working fine.  Gitblit works perfectly well (view, blame,
history) with problematic files but still fails to show diff

ERROR failed to generate commit diff!
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
        at java.lang.String.substring(String.java:1937)
        at com.gitblit.utils.GitBlitDiffFormatter.getHtml(GitBlitDiffFormatter.java:138)
        at com.gitblit.utils.DiffUtils.getDiff(DiffUtils.java:176)
        at com.gitblit.utils.DiffUtils.getDiff(DiffUtils.java:93)

Reported by benoit.mercibe on 2012-09-07 16:24:26

Owner

gitblit commented Aug 12, 2015

OK - found the problem and solution

The problem is that Tomcat uses ISO 8859-1 by default to decode URLs received from
the browser.  So I modified the connector properties (server.xml) according to http://wiki.apache.org/tomcat/FAQ/CharacterEncoding

Now almost everything is working fine.  Gitblit works perfectly well (view, blame,
history) with problematic files but still fails to show diff

ERROR failed to generate commit diff!
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
        at java.lang.String.substring(String.java:1937)
        at com.gitblit.utils.GitBlitDiffFormatter.getHtml(GitBlitDiffFormatter.java:138)
        at com.gitblit.utils.DiffUtils.getDiff(DiffUtils.java:176)
        at com.gitblit.utils.DiffUtils.getDiff(DiffUtils.java:93)

Reported by benoit.mercibe on 2012-09-07 16:24:26

@gitblit

This comment has been minimized.

Show comment
Hide comment
@gitblit

gitblit Aug 12, 2015

Owner
Interesting find... yet another issue with Tomcat.

Diff.  Looks like the diff is formatted differently. Can you try setting web.diffStyle
to gitweb and/or plain and paste the contents here for the same file?

Reported by James.Moger on 2012-09-07 16:36:09

Owner

gitblit commented Aug 12, 2015

Interesting find... yet another issue with Tomcat.

Diff.  Looks like the diff is formatted differently. Can you try setting web.diffStyle
to gitweb and/or plain and paste the contents here for the same file?

Reported by James.Moger on 2012-09-07 16:36:09

@gitblit

This comment has been minimized.

Show comment
Hide comment
@gitblit

gitblit Aug 12, 2015

Owner
With web.diffstyle=gitweb and web.diffstyle=plain everything works fine.

The line that appears before the gitweb/plain diff result is:

diff --git "a/b/b\303\251b\303\251.ad" "b/b/b\303\251b\303\251.ad"

The full path of the file is /home/git/articles/b/bébé.ad , where articles is the repo
root.  The filename seems encoded in octal (base-8)!  « é » is the 2 bytes « C3 A9
» (hex) in UTF-8.  This string makes me think of UTF-8 data URL-encoded and then decoded
as Latin-1. Does this pose a problem to gitblit diff algorithm?

Here is the full stacktrace:


ERROR failed to generate commit diff!
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
        at java.lang.String.substring(String.java:1937)
        at com.gitblit.utils.GitBlitDiffFormatter.getHtml(GitBlitDiffFormatter.java:138)
        at com.gitblit.utils.DiffUtils.getDiff(DiffUtils.java:176)
        at com.gitblit.utils.DiffUtils.getDiff(DiffUtils.java:93)
        at com.gitblit.wicket.pages.BlobDiffPage.<init>(BlobDiffPage.java:51)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at org.apache.wicket.session.DefaultPageFactory.createPage(DefaultPageFactory.java:188)
        at org.apache.wicket.session.DefaultPageFactory.newPage(DefaultPageFactory.java:89)
        at org.apache.wicket.request.target.component.BookmarkablePageRequestTarget.newPage(BookmarkablePageRequestTarget.java:305)
        at org.apache.wicket.request.target.component.BookmarkablePageRequestTarget.getPage(BookmarkablePageRequestTarget.java:320)
        at org.apache.wicket.request.target.component.BookmarkablePageRequestTarget.processEvents(BookmarkablePageRequestTarget.java:234)
        at org.apache.wicket.request.AbstractRequestCycleProcessor.processEvents(AbstractRequestCycleProcessor.java:92)
        at org.apache.wicket.RequestCycle.processEventsAndRespond(RequestCycle.java:1279)
        at org.apache.wicket.RequestCycle.step(RequestCycle.java:1358)
        at org.apache.wicket.RequestCycle.steps(RequestCycle.java:1465)
        at org.apache.wicket.RequestCycle.request(RequestCycle.java:545)
        at org.apache.wicket.protocol.http.WicketFilter.doGet(WicketFilter.java:486)
        at org.apache.wicket.protocol.http.WicketFilter.doFilter(WicketFilter.java:319)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:244)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAsPrivileged(Subject.java:517)
        at org.apache.catalina.security.SecurityUtil.execute(SecurityUtil.java:276)
        at org.apache.catalina.security.SecurityUtil.doAsPrivilege(SecurityUtil.java:218)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:210)
        at org.apache.catalina.core.ApplicationFilterChain.access$0(ApplicationFilterChain.java:192)
        at org.apache.catalina.core.ApplicationFilterChain$1.run(ApplicationFilterChain.java:171)
        at java.security.AccessController.doPrivileged(Native Method)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:167)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151)
        at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:834)
        at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:640)
        at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1286)
        at java.lang.Thread.run(Thread.java:662)

Reported by benoit.mercibe on 2012-09-07 20:44:27

Owner

gitblit commented Aug 12, 2015

With web.diffstyle=gitweb and web.diffstyle=plain everything works fine.

The line that appears before the gitweb/plain diff result is:

diff --git "a/b/b\303\251b\303\251.ad" "b/b/b\303\251b\303\251.ad"

The full path of the file is /home/git/articles/b/bébé.ad , where articles is the repo
root.  The filename seems encoded in octal (base-8)!  « é » is the 2 bytes « C3 A9
» (hex) in UTF-8.  This string makes me think of UTF-8 data URL-encoded and then decoded
as Latin-1. Does this pose a problem to gitblit diff algorithm?

Here is the full stacktrace:


ERROR failed to generate commit diff!
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
        at java.lang.String.substring(String.java:1937)
        at com.gitblit.utils.GitBlitDiffFormatter.getHtml(GitBlitDiffFormatter.java:138)
        at com.gitblit.utils.DiffUtils.getDiff(DiffUtils.java:176)
        at com.gitblit.utils.DiffUtils.getDiff(DiffUtils.java:93)
        at com.gitblit.wicket.pages.BlobDiffPage.<init>(BlobDiffPage.java:51)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at org.apache.wicket.session.DefaultPageFactory.createPage(DefaultPageFactory.java:188)
        at org.apache.wicket.session.DefaultPageFactory.newPage(DefaultPageFactory.java:89)
        at org.apache.wicket.request.target.component.BookmarkablePageRequestTarget.newPage(BookmarkablePageRequestTarget.java:305)
        at org.apache.wicket.request.target.component.BookmarkablePageRequestTarget.getPage(BookmarkablePageRequestTarget.java:320)
        at org.apache.wicket.request.target.component.BookmarkablePageRequestTarget.processEvents(BookmarkablePageRequestTarget.java:234)
        at org.apache.wicket.request.AbstractRequestCycleProcessor.processEvents(AbstractRequestCycleProcessor.java:92)
        at org.apache.wicket.RequestCycle.processEventsAndRespond(RequestCycle.java:1279)
        at org.apache.wicket.RequestCycle.step(RequestCycle.java:1358)
        at org.apache.wicket.RequestCycle.steps(RequestCycle.java:1465)
        at org.apache.wicket.RequestCycle.request(RequestCycle.java:545)
        at org.apache.wicket.protocol.http.WicketFilter.doGet(WicketFilter.java:486)
        at org.apache.wicket.protocol.http.WicketFilter.doFilter(WicketFilter.java:319)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:244)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAsPrivileged(Subject.java:517)
        at org.apache.catalina.security.SecurityUtil.execute(SecurityUtil.java:276)
        at org.apache.catalina.security.SecurityUtil.doAsPrivilege(SecurityUtil.java:218)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:210)
        at org.apache.catalina.core.ApplicationFilterChain.access$0(ApplicationFilterChain.java:192)
        at org.apache.catalina.core.ApplicationFilterChain$1.run(ApplicationFilterChain.java:171)
        at java.security.AccessController.doPrivileged(Native Method)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:167)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151)
        at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:834)
        at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:640)
        at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1286)
        at java.lang.Thread.run(Thread.java:662)

Reported by benoit.mercibe on 2012-09-07 20:44:27

@gitblit

This comment has been minimized.

Show comment
Hide comment
@gitblit

gitblit Aug 12, 2015

Owner
Yes, this difference in the "diff --git" line causes Gitblit some trouble.  Specifically,
the paths are quoted which I did not expect.

It turns out that Git's default behavior is to output non-ascii filenames in octal
(good guess).  Some shells may parse the octal characters and display them as UTF-8.
 Others may not.  Since JGit strives to match Git it also outputs octal filenames for
the diff processor.

I have fixed the Gitblit html diff generation but before I push my fix I will try to
intercept the octal encoded filename and make that UTF-8.

Reported by James.Moger on 2012-09-10 13:01:00

  • Status changed: Started
  • Labels added: Milestone-1.2.0
Owner

gitblit commented Aug 12, 2015

Yes, this difference in the "diff --git" line causes Gitblit some trouble.  Specifically,
the paths are quoted which I did not expect.

It turns out that Git's default behavior is to output non-ascii filenames in octal
(good guess).  Some shells may parse the octal characters and display them as UTF-8.
 Others may not.  Since JGit strives to match Git it also outputs octal filenames for
the diff processor.

I have fixed the Gitblit html diff generation but before I push my fix I will try to
intercept the octal encoded filename and make that UTF-8.

Reported by James.Moger on 2012-09-10 13:01:00

  • Status changed: Started
  • Labels added: Milestone-1.2.0
@gitblit

This comment has been minimized.

Show comment
Hide comment
@gitblit

gitblit Aug 12, 2015

Owner
Nice to see that you quickly find the problem and solution.  Very appreciated.  I am
looking forward 1.2.0 ;-)  Till then, the workaround will be to use gitweb diff.
Thank you!

Reported by benoit.mercibe on 2012-09-10 13:07:23

Owner

gitblit commented Aug 12, 2015

Nice to see that you quickly find the problem and solution.  Very appreciated.  I am
looking forward 1.2.0 ;-)  Till then, the workaround will be to use gitweb diff.
Thank you!

Reported by benoit.mercibe on 2012-09-10 13:07:23

@gitblit

This comment has been minimized.

Show comment
Hide comment
@gitblit

gitblit Aug 12, 2015

Owner
I have committed the fixes.  If you need this anytime soon you will have to build from
source as 1.2.0 is months away.

I have updated the demo site and pushed a test repo.  You can confirm the fix here:
https://demo-gitblit.rhcloud.com/blobdiff/issue424.git/3b7fa5cbf367f37167a06d0395d57027ccffa13f/b%C3%A9b%C3%A9.txt

Reported by James.Moger on 2012-09-10 21:31:07

  • Status changed: Queued
Owner

gitblit commented Aug 12, 2015

I have committed the fixes.  If you need this anytime soon you will have to build from
source as 1.2.0 is months away.

I have updated the demo site and pushed a test repo.  You can confirm the fix here:
https://demo-gitblit.rhcloud.com/blobdiff/issue424.git/3b7fa5cbf367f37167a06d0395d57027ccffa13f/b%C3%A9b%C3%A9.txt

Reported by James.Moger on 2012-09-10 21:31:07

  • Status changed: Queued
@gitblit

This comment has been minimized.

Show comment
Hide comment
@gitblit

gitblit Aug 12, 2015

Owner
v1.2.0 has been deployed.

Reported by James.Moger on 2013-01-01 01:06:25

  • Status changed: Fixed
Owner

gitblit commented Aug 12, 2015

v1.2.0 has been deployed.

Reported by James.Moger on 2013-01-01 01:06:25

  • Status changed: Fixed

@gitblit gitblit closed this Aug 12, 2015

@fzs fzs modified the milestone: 1.2.0 Dec 13, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment