New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workaround for Yajl encoding problems #69
Conversation
Thanks a lot for this – however, before pulling, I'd like to know what the actual problem is as this change more likely results in large strings to be created (if I am not mistaken). |
As a reminder for myself: http://bugs.ruby-lang.org/issues/2313 |
…coding_workaround
Hi @rogerbraun, sorry for dragging my feet on this one. Do you have a minimal failing case for this, perhaps? |
I can confirm this bug, and that fix works. |
Thanks @BadMinus. My point here is that I first need to understand why it doesn't (and does) work before this goes into Picky. So if you can help me (us) with understanding the issue, that would be great :) I'd pull this in in a second, but it raises new issues with big indexes, sadly. The thing is, most current pull requests or issues are related to encoding. And Picky itself does not do much with encodings, deliberately. So I'd like to know why and how Rails triggers this problem. And with that information, solve it in the best way possible. What do you guys think? |
I don't know too where is problem, and I tried many different possible solutions in rails and in yabl, but come to only this def dump_json internal
Yajl::Encoder.encode(internal) do |chunk|
::File.open(cache_path, 'w') do |out_file|
out_file.write chunk.force_encoding Encoding::UTF_8
end
end
end How that will work with big indexes? |
I'm currently trying to come up with a minimal failing test case: 2c50a3b |
@rogerbraun @BadMinus What version of Rails are you guys using? |
With my code I look at chunks that Yajl encoding and in the beginning encoding is ASCII-8BIT but next chunk in same file UTF-8. puts cache_path
Yajl::Encoder.encode(internal) do |chunk|
puts chunk.encoding
puts chunk
puts chunk.encoding
::File.open(cache_path, 'w:utf-8') do |out_file|
chunk = chunk.to_s.force_encoding Encoding::UTF_8
out_file.write chunk
end
end
|
I just found another solution. # backends/prepared/text.rb
def retrieve
id = nil
token = nil
::File.open(cache_path, 'r:utf-8') do |file|
file.read
file.each_line do |line|
id, token = line.split ?,, 2
yield id, (token.chomp! || token)
end
end
end |
@BadMinus Regarding your latest comment: Where exactly is the problem here? (I'm not sure I understand) |
So, to restate – the original problem occurred with Picky inside Rails.
|
I used Rails 3.2.0. If I remember correctly, the problem only occurred on OS X with Picky running inside Rails. I'm sorry I can't help much more, I 'solved' this problem by using a separate Picky server and don't really know how to reproduce it. |
@rogerbraun No worries. I am wondering though: Why did you conclude it is Yajl when the same code worked when not using it within Rails? |
I have no idea... |
No worries. |
I'm getting closer to the issue. Rails explicitly sets Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8 while Picky doesn't. I'm considering doing this to finally squash the encoding issues. |
Any progress? def dump_json internal
::File.open(cache_path, 'w') do |out_file|
::JSON.dump internal, out_file
#Yajl::Encoder.encode internal, out_file
end
end but solution with |
@BadMinus 0-1 second? Are you indexing your mp3 collection? ;) Some progress information: Picky in its current version has UTF8 as internal encoding, and an external encoding whatever is set by ENV variables etc. This is the basic encoding state of Ruby. It is sensible in that if you have a strange encoding set in your env, anything incoming will be transcoded into the internal encoding. I am currently experimenting with setting the external encoding to UTF8, like Rails does. Sorry about all this! |
@floere it's fast or not? I don't understand :) I'm indexing around 100 simple objects. Solutions works so it's ok, I just wanted to help and understand how picky works. It's hard to understand for me yet. |
@BadMinus It's ok – there's a bit of overhead on an indexing run, so 1 second even is ok. Don't get me wrong, I appreciate your help a lot! :) If you have questions about Picky, don't hesitate to ask. it is a big framework, after all. But, I hope, well enough structured such that it is readable. |
I believe the solution is to open the caching file in binary mode? picky/backends/memory/json.rb: |
Sadly, I cannot reproduce the original problem. See https://github.com/floere/picky/blob/cff4f81264e0171158334619c4850fd58d082715/server/spec/functional/backends/memory_json_utf8_spec.rb#L16-34 for details. Both internal and external encoding are set to the Rails defaults. If I forcefully encode, I get the error mentioned above, but not when saving with Yajl. |
I'm now very interested in: How do you load the data to be indexed, @rogerbraun? Thanks for any info! |
Hi @rogerbraun, @BadMinus, @gokulj (@kschiess), I've just released Picky 4.5.0 – it uses the |
I don't really know what triggers this, but with Picky integrated in my Rails app, I get this error when dumping the index:
This seems to be an error in Yajl, as changing the way the file is written makes it work. This is a workaround for this problem.