Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search is broken if there are images in repo #24

Closed
nimag42 opened this issue Jan 29, 2017 · 11 comments
Closed

Search is broken if there are images in repo #24

nimag42 opened this issue Jan 29, 2017 · 11 comments
Labels

Comments

@nimag42
Copy link

nimag42 commented Jan 29, 2017

Hi.

If there's an image in the repository, any research will result in an error "Invalid bytes in UTF-8".

This is caused because the search tries to encode line of files in utf-8 before trying to match a regexp, but as image are binary files, they can not be encoded so it throws an error.

It's line 170 in git_layer_rugged.rb

@gerwitz
Copy link

gerwitz commented Feb 27, 2017

bd2bd8e should be reverted.

@nimag42
Copy link
Author

nimag42 commented Feb 27, 2017

Well, I'm not sure, but if you revert this, won't you break research containing utf-8 stuff ?
For my use I just did the following :

@@ -167,7 +167,7 @@ module Gollum
          blob = @repo.lookup(entry[:oid])
          count = 0
          blob.content.each_line do |line|
-            next unless line.force_encoding("UTF-8").match(/#{Regexp.escape(query)}/i)
+           next unless line.force_encoding("UTF-8").scrub().match(/#{Regexp.escape(query)}/i)
            count += 1
          end
          path = options[:path] ? ::File.join(options[:path], root, entry[:name]) : "#{root}#{entry[:name]}"

I am not an expert in Ruby, and this is clearly an obvious hack since when someone search something, and if the image contain (when bytes have been converted in utf8) the text, it's showed in the results too. It's why I did'nt started a PR :)
Something more nicer would be to check if the file is binary or not before trying searching on it, but I don't know how to do so.

@gerwitz
Copy link

gerwitz commented Feb 28, 2017

Your hack is slight improvement to the prior hack, and of course the real solution is to address encoding across the project.

But at the moment we have a release with a new, broken behavior compared to the prior release, which is preventing users from upgrading. Better to revert to the old behavior until someone can address the issue properly.

(I wish I was qualified to do so. My next goal is to learn implementing proper tests, so maybe I can contribute to the gollum projects in that way.)

@dometto
Copy link
Member

dometto commented Mar 3, 2017

Yes, I think we should unfortunately revert, and suffer the problem that UTF8 doesn't work for a little while longer. Also, we should see if we can implement a spec testing search functionality in the presense of binary files/images.

@dometto
Copy link
Member

dometto commented Mar 9, 2017

@bartkamphorst do you agree with the revert, or do you want to revert and come up with a different fix for unicode at the same time?

gerwitz added a commit to TheArtificial/rails-wiki that referenced this issue Feb 2, 2018
gerwitz added a commit to TheArtificial/gollum-lib that referenced this issue Feb 2, 2018
@jvstein
Copy link

jvstein commented Oct 13, 2019

This seems to have regressed in 066e98d

2019-10-13 20:48:36 - LocalJumpError - no block given (yield):
        /usr/local/bundle/gems/gollum-rugged_adapter-0.4.4/lib/rugged_adapter/git_layer_rugged.rb:178:in `block in grep'
        /usr/local/bundle/gems/rugged-0.28.3.1/lib/rugged/tree.rb:175:in `block in walk_blobs'
        /usr/local/bundle/gems/rugged-0.28.3.1/lib/rugged/tree.rb:175:in `walk'
        /usr/local/bundle/gems/rugged-0.28.3.1/lib/rugged/tree.rb:175:in `walk_blobs'
        /usr/local/bundle/gems/gollum-rugged_adapter-0.4.4/lib/rugged_adapter/git_layer_rugged.rb:174:in `grep'
        /usr/local/bundle/gems/gollum-lib-4.2.10/lib/gollum-lib/wiki.rb:645:in `search'

@dometto
Copy link
Member

dometto commented Oct 13, 2019

@jvstein are you sure it's failing on the presence images? The adapter should be skipping binary files. Also the error message LocalJumpError - no block given (yield) does not seem to point in that direction. Also, can you give us the output of gollum --versions?

@dometto
Copy link
Member

dometto commented Oct 13, 2019

Hmm, on second thought, it looks like it may not be properly skipping binary files (we would have to do a next if blob.binary? instead). @jvstein if you're positive it's a problem with images, it would be great if you could produce a regression test against this!

@jvstein
Copy link

jvstein commented Oct 13, 2019

@dometto It could be either in my repo. I've got both images and other binary files like pdfs. I tested against cc33a24 and verified the problem doesn't exist there.

Tests are here? https://github.com/gollum/adapter_specs

@jvstein
Copy link

jvstein commented Oct 13, 2019

Nope, I misdiagnosed. Not caused by binaries.

$ gollum --version
Gollum 4.1.4

@dometto
Copy link
Member

dometto commented Oct 14, 2019

@jvstein there's your problem -- you're using rugged adapter of master with gollum 4.x. See here for info on 5.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants