Skip to content

Commit

Permalink
Strip invalid byte sequences from input
Browse files Browse the repository at this point in the history
darcs changes --xml might contain invalid UTF-8 byte sequences,
which breaks XML parsing [http://bugs.darcs.net/issue64].
  • Loading branch information
kerneis authored and purcell committed Mar 9, 2010
1 parent a19b75d commit e6ddd9e
Showing 1 changed file with 7 additions and 1 deletion.
8 changes: 7 additions & 1 deletion darcs-to-git
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ require 'rexml/document'
require 'optparse'
require 'yaml'
require 'pathname'
require 'iconv'

# Explicitly setting a time zone would cause darcs to only output in
# that timezone hence we couldn't get the actual patch TZ
Expand Down Expand Up @@ -112,11 +113,16 @@ def run(*args)
system(*args) || raise("Failed to run: #{args.inspect}")
end

# cf. Paul Battley, http://po-ru.com/diary/fixing-invalid-utf-8-in-ruby-revisited/
def validate_utf8(s)
return Iconv.iconv('UTF-8//IGNORE', 'UTF-8', (s + ' ') ).first[0..-2]
end

def output_of(*args)
puts "Running: #{args.inspect}"
output = IO.popen(args.map {|a| "'#{a}'"}.join(' '), 'r') { |p| p.read }
if $?.exitstatus == 0
return output
return validate_utf8(output)
else
raise "Failed to run: #{args.inspect}"
end
Expand Down

0 comments on commit e6ddd9e

Please sign in to comment.