Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Fix many encoding problems #864

Merged
merged 5 commits into from

5 participants

@brodock

I've appended bug fixes for #725, #782, #648, #782 with a better solution then
force_encoding to UTF-8 as it will not fix every case. A proper reading
of Yehuda Katz's article about 1.9 encoding is a must to fully
understand the problem:
http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/.

This also fixes a unreported problem with plain/text blobs being sent
without charset being informed on HTTP HEADER.

brodock added some commits
@brodock brodock Fixed encoding problems with plain/text blobs being sent without char…
…set.
eb5749e
@brodock brodock Better fix for encoding problems on rendering of inline file visualiz…
…ations like README files.
39def0d
@brodock brodock It's dangerous to rescue errors here as it will hide bugs. define_tre…
…e_vars already catch all situations where something may not exist.
48a3685
@brodock brodock Better algorithm to deal with encodings. Moved fallback rescue messag…
…e from view to encode library.

This helps fix cases where UTF-8 is wrongly identified as ISO-8859-1. We will only try to convert strings if we are 100% sure about the charset, otherwise, we will fallback to UTF-8.
50c2c16
@brodock brodock Forgot to refactor a line on lib/gitlabhq/encode.rb 11f90ae
@zx1986

@brodock WOW!
AWESOME JOB!

@randx randx merged commit 11f90ae into gitlabhq:master
@randx
Owner

like it, merged

@mctully

thanks @brodock :)

my #725 issues were fixed by the 2.5 update, but I suspect your fix is more 'correct'!

@zzjin

update to 2.5 but still have encoding error in network/graph

ArgumentError (redundant UTF-8 sequence):
lib/graph_commit.rb:18:in `to_graph'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on May 26, 2012
  1. @brodock
  2. @brodock

    Better fix for encoding problems on rendering of inline file visualiz…

    brodock authored
    …ations like README files.
  3. @brodock

    It's dangerous to rescue errors here as it will hide bugs. define_tre…

    brodock authored
    …e_vars already catch all situations where something may not exist.
  4. @brodock

    Better algorithm to deal with encodings. Moved fallback rescue messag…

    brodock authored
    …e from view to encode library.
    
    This helps fix cases where UTF-8 is wrongly identified as ISO-8859-1. We will only try to convert strings if we are 100% sure about the charset, otherwise, we will fallback to UTF-8.
Commits on May 27, 2012
  1. @brodock
This page is out of date. Refresh to see the latest.
View
14 app/controllers/refs_controller.rb
@@ -1,4 +1,5 @@
class RefsController < ApplicationController
+ include Gitlabhq::Encode
before_filter :project
# Authorize
@@ -43,23 +44,26 @@ def tree
no_cache_headers
end
end
- rescue
- return render_404
end
def blob
if @tree.is_blob?
+ if @tree.text?
+ encoding = detect_encoding(@tree.data)
+ mime_type = encoding ? "text/plain; charset=#{encoding}" : "text/plain"
+ else
+ mime_type = @tree.mime_type
+ end
+
send_data(
@tree.data,
- :type => @tree.text? ? "text/plain" : @tree.mime_type,
+ :type => mime_type,
:disposition => 'inline',
:filename => @tree.name
)
else
head(404)
end
- rescue
- return render_404
end
def blame
View
2  app/views/commits/_commit.html.haml
@@ -8,7 +8,7 @@
%strong.cgray= commit.author_name
&ndash;
= image_tag gravatar_icon(commit.author_email), :class => "avatar", :width => 16
- %span.row_title= truncate(commit.safe_message, :length => 50) rescue "--broken encoding"
+ %span.row_title= truncate(commit.safe_message, :length => 50)
%span.right.cgray
= time_ago_in_words(commit.committed_date)
View
4 app/views/refs/_tree.html.haml
@@ -42,9 +42,9 @@
.readme
- if content.name =~ /\.(md|markdown)$/i
= preserve do
- = markdown(content.data.force_encoding('UTF-8'))
+ = markdown(content.data.detect_encoding!)
- else
- = simple_format(content.data.force_encoding('UTF-8'))
+ = simple_format(content.data.detect_encoding!)
- if params[:path]
- history_path = tree_file_project_ref_path(@project, @ref, params[:path])
View
2  app/views/refs/_tree_file.html.haml
@@ -13,7 +13,7 @@
#tree-readme-holder
.readme
= preserve do
- = markdown(file.data.force_encoding('UTF-8'))
+ = markdown(file.data.detect_encoding!)
- else
.view_file_content
- unless file.empty?
View
21 lib/gitlabhq/encode.rb
@@ -1,3 +1,6 @@
+# Patch Strings to enable detect_encoding! on views
+require 'charlock_holmes/string'
+
module Gitlabhq
module Encode
extend self
@@ -5,16 +8,26 @@ module Encode
def utf8 message
return nil unless message
- hash = CharlockHolmes::EncodingDetector.detect(message) rescue {}
- if hash[:encoding]
- CharlockHolmes::Converter.convert(message, hash[:encoding], 'UTF-8')
+ detect = CharlockHolmes::EncodingDetector.detect(message) rescue {}
+
+ # It's better to default to UTF-8 as sometimes it's wrongly detected as another charset
+ if detect[:encoding] && detect[:confidence] == 100
+ CharlockHolmes::Converter.convert(message, detect[:encoding], 'UTF-8')
else
message
end.force_encoding("utf-8")
+
# Prevent app from crash cause of
# encoding errors
rescue
- ""
+ "--broken encoding: #{encoding}"
+ end
+
+ def detect_encoding message
+ return nil unless message
+
+ hash = CharlockHolmes::EncodingDetector.detect(message) rescue {}
+ return hash[:encoding] ? hash[:encoding] : nil
end
end
end
Something went wrong with that request. Please try again.