Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`split': invalid byte sequence in UTF-8 (ArgumentError) #188

Closed
jatinganhotra opened this issue Oct 26, 2014 · 7 comments
Closed

`split': invalid byte sequence in UTF-8 (ArgumentError) #188

jatinganhotra opened this issue Oct 26, 2014 · 7 comments

Comments

@jatinganhotra
Copy link

Hi,

For a project, I'm storing diffs for each commit in forms of difffiles (diff per file) and patches, including the stats for each patch.

My script starts like -

require 'rubygems'
require 'logger'
require 'rugged'
require 'git'

# Include diff, difffile classes
load 'diff.rb'
load 'difffile.rb'
load 'helper.rb'

# Global array to store all the diffs
diffs_array = []

# Initialize both libraries
working_dir = `pwd`.chomp
rubygit_gem_repo = Git.open(working_dir, :log => Logger.new(STDOUT))
rugged_repo = Rugged::Repository.new(working_dir)

# Get all the commits for the project
commit_list = rubygit_gem_repo.log(nil)

# Get the initial empty tree state
empty_tree=`git hash-object -w -t tree /dev/null`
empty_state = rugged_repo.lookup("#{empty_tree.chomp}")

commit_list_array = commit_list.to_a

For 2 commits, I calculate the diff as follows:

  diff_bw_commits = rubygit_gem_repo.diff(prev_sha, next_sha)
  diff = Diff.new(prev_sha, next_sha, diff_bw_commits)
  diff.generate_difffiles_and_stats

In the generate_difffiles_and_stats function, I'm doing the following:

@difffiles = []
    # diff.class => Git::Diff
    # Get the stats for the diff, before extracting individual difffiles
    @stats = @diff.stats
    diff = @diff.to_a

    self.generate_stats
    @num_difffiles = diff.size
    @num_difffiles.times do |i|
      difffile = DiffFile.new( diff[i] )
      @difffiles << difffile
    end

My script runs fine for simple commit histories that I created myself, but when I run it on project JSHint, I'm getting an error:

</CreationDate(D:20120619174250-04'00')/Creator(Adobe Illustrator CS5.1)/ModDat0000000000 65535 f-04'00')/Producer(Adobe PDF library 9.90)/Title(jshint)>>
+0000000016 00000 n
+0000000144 00000 n
+0001070928 00000 n
<</Size 32/Root 1 0 R/Info 31 0 R/ID[<6BDD672972174366B9A561E955D8F759><CE113017%%EOF00efBA27BDB89A4EB25>]>>
\ No newline at end of file
/Users/jatinganhotra/.rvm/gems/ruby-2.1.3@527project/gems/git-1.2.8/lib/git/diff.rb:121:in `split': invalid byte sequence in UTF-8 (ArgumentError)
    from /Users/jatinganhotra/.rvm/gems/ruby-2.1.3@527project/gems/git-1.2.8/lib/git/diff.rb:121:in `process_full_diff'
    from /Users/jatinganhotra/.rvm/gems/ruby-2.1.3@527project/gems/git-1.2.8/lib/git/diff.rb:107:in `process_full'
    from /Users/jatinganhotra/.rvm/gems/ruby-2.1.3@527project/gems/git-1.2.8/lib/git/diff.rb:64:in `each'
    from diff.rb:57:in `to_a'
    from diff.rb:57:in `generate_difffiles_and_stats'
    from script.rb:48:in `block in <main>'
    from script.rb:32:in `each'
    from script.rb:32:in `<main>'

I researched about the error and found that the issue can be fixed by the answer in this StackOverflow answer.
Is it something that I am doing wrong?
Please let me know if you need any more information.

P.S. I know that this gem is not under active development, but it very nicely breaks down a Diff to Difffile to Patch. I can also easily access stats for each commit and stats per file. So, I stick to using it for diff purposes. I looked at the Rugged gem, but couldn't find such functionality. So, I just love this gem for this :)

@robertodecurnex
Copy link
Contributor

I must revisit encondig urgently. There are several problems with it.

Thank you for the detailed info!

@jatinganhotra
Copy link
Author

Thanks! 👍 Let me know if you need any info

@jatinganhotra
Copy link
Author

@robertodecurnex Did you get time to look at the encoding issue?
If no, could you share if I could do some work-around at my end, to avoid the issue.

I can also use Rugged for the diff, but the Rugged API doesn't provide stats for a particular diff and I need both stats and diff patch for each diff.

@matthutchinson
Copy link

Over in the lolcommits repo we're having a similar problem - but in a slightly different place, when reading the commit logs in the command_lines method.

/Users/henrikhansson/.rvm/rubies/ruby-2.1.2/lib/ruby/gems/2.1.0/gems/git-1.2.9.1/lib/git/lib.rb:846:in `split': invalid byte sequence in US-ASCII (ArgumentError)
from /Users/henrikhansson/.rvm/rubies/ruby-2.1.2/lib/ruby/gems/2.1.0/gems/git-1.2.9.1/lib/git/lib.rb:846:in `command_lines'
from /Users/henrikhansson/.rvm/rubies/ruby-2.1.2/lib/ruby/gems/2.1.0/gems/git-1.2.9.1/lib/git/lib.rb:145:in `full_log_commits'
from /Users/henrikhansson/.rvm/rubies/ruby-2.1.2/lib/ruby/gems/2.1.0/gems/git-1.2.9.1/lib/git/log.rb:119:in `run_log'
from /Users/henrikhansson/.rvm/rubies/ruby-2.1.2/lib/ruby/gems/2.1.0/gems/git-1.2.9.1/lib/git/log.rb:112:in `check_log'
from /Users/henrikhansson/.rvm/rubies/ruby-2.1.2/lib/ruby/gems/2.1.0/gems/git-1.2.9.1/lib/git/log.rb:89:in `first'

I think same approach in PR #190 from @jatinganhotra would probably fix this too..

@jatinganhotra
Copy link
Author

Yes. The same approach in PR #190 would work. I thought that the PR had been accepted, after I fixed the Ruby 1.8 syntax issue.

@matthutchinson
Copy link

I think it's no longer mergable due to conflicts, maybe rebase with upstream master and push again..

@stale
Copy link

stale bot commented Apr 2, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Apr 2, 2018
@stale stale bot closed this as completed Apr 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants