Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rails not reading the first char of fixture name. #25303

Closed
vitor-mariano opened this issue Jun 6, 2016 · 37 comments
Closed

Rails not reading the first char of fixture name. #25303

vitor-mariano opened this issue Jun 6, 2016 · 37 comments

Comments

@vitor-mariano
Copy link

Hey, guys. I'm getting a very strange error in my fixtures.

I generate a model called User normally.

rails generate model User

But when I execute my tests

rake test

It returns the error

Errno::ENOENT: No such file or directory @ rb_sysopen - /Users/matheus/Documents/Código/Bemind/apiservice/test/fixtures/sers.yml

Because the rails isn't reading the first character of my fixture name.

Here is terminal log.

# matheus at MacBook-Pro.local in ~/Documents/Código/Bemind/bmp1-api-service on git:master ● [14:14:43]
→ rails generate model User
Running via Spring preloader in process 4762
      invoke  active_record
      create    db/migrate/20160606171450_create_users.rb
      create    app/models/user.rb
      invoke    test_unit
      create      test/models/user_test.rb
      create      test/fixtures/users.yml

# matheus at MacBook-Pro.local in ~/Documents/Código/Bemind/bmp1-api-service on git:master ✖︎ [14:14:50]
→ rake db:drop && rake db:create && rake db:migrate
== 20160606171450 CreateUsers: migrating ======================================
-- create_table(:users)
   -> 0.0928s
== 20160606171450 CreateUsers: migrated (0.0930s) =============================


# matheus at MacBook-Pro.local in ~/Documents/Código/Bemind/bmp1-api-service on git:master ✖︎ [14:16:13]
→ rake test                                        
Running via Spring preloader in process 4948
Run options: --seed 45192

# Running:

E

Finished in 0.010359s, 96.5334 runs/s, 0.0000 assertions/s.

  1) Error:
UserTest#test_should_refuse_empty_name:
Errno::ENOENT: No such file or directory @ rb_sysopen - /Users/matheus/Documents/Código/Bemind/bmp1-api-service/test/fixtures/sers.yml


1 runs, 0 assertions, 0 failures, 1 errors, 0 skips

Ruby version: ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-darwin15]
Rails version: 4.2.6

@amrani
Copy link

amrani commented Jun 6, 2016

@MatheusMariano can you share your users.yml?

@vitor-mariano
Copy link
Author

vitor-mariano commented Jun 6, 2016

Yes, sure. I did no changes in this file.

users.yml

# Read about fixtures at http://api.rubyonrails.org/classes/ActiveRecord/FixtureSet.html

# This model initially had no columns defined.  If you add columns to the
# model remove the '{}' from the fixture names and add the columns immediately
# below each fixture, per the syntax in the comments below
#
one: {}
# column: value
#
two: {}
#  column: value

@amrani
Copy link

amrani commented Jun 6, 2016

Thanks, can you also show your user_test.rb?

@vitor-mariano
Copy link
Author

user_test.rb

require 'test_helper'

class UserTest < ActiveSupport::TestCase
  test "should refuse empty name" do
    user = User.create email: "john@me.com"
    invalid_name = user.errors.messages.key? :name
    assert invalid_name, "User accepted without name"
  end
end

@vitor-mariano
Copy link
Author

May this problem be caused by special characters in parent folders? I've realized the problem just occurs inside the folder Código, which have the special character ó.

@matthewd
Copy link
Member

matthewd commented Jun 6, 2016

@MatheusMariano are you able to confirm that temporarily renaming the folder makes it work?

@vitor-mariano
Copy link
Author

@matthewd Yes. I've changed my path from
~/Documents/Código/Bemind/apiservice
to
~/Documents/Code/Bemind/apiservice
and now it works.

@amrani
Copy link

amrani commented Jun 7, 2016

ó is actually of length 2 which is causing the first letter to be cut off(see below). I submitted a PR that converts the characters before calculating the length.

fixture_set_names.map! { |f| f[(fixture_path.to_s.size + 1)..-5] }

@fxn
Copy link
Member

fxn commented Jun 7, 2016

The length of a string is logical, number of bytes doesn't matter.

Maybe an encoding issue related to paths?

@fxn
Copy link
Member

fxn commented Jun 7, 2016

Wonder if this may be related to normalization. For example "À" can be encoded in two different ways:

"\u00c0"       # => "À"
"\u0041\u0300" # => "À"

These have different logical lengths because the second one is printed as the combination of two characters:

"\u00c0".size       # => 1
"\u0041\u0300".size # => 2

and of course they consist of different bytes:

"\u00c0".bytes       # => [195, 128]
"\u0041\u0300".bytes # => [65, 204, 128]

So it could be the case that Dir#[] is returning strings with a different normalization than the one in fixture_path.

Could you please inspect the bytes in each of them to see if it is the case?

@amrani
Copy link

amrani commented Jun 7, 2016

@fxn
fixture_path before setting Dir:
ó.bytes -> [111, 204, 129]

fixture_path after setting Dir:
ó.bytes -> [195, 179]

@fxn
Copy link
Member

fxn commented Jun 7, 2016

Just to be sure, you mean you tested fixture_path on one hand, and then the strings that come up from the Dir#[] call (which are longer), and that in the "common" segment they differed in those bytes?

@amrani
Copy link

amrani commented Jun 7, 2016

Correct. I found the 'common' segment after inspecting the chars in each and grabbing the substring. Below are the substring chars from each common segment

#before Dir
ó.chars -> ["o", "́"]
ó.bytes -> [111, 204, 129]
#after Dir
ó.chars -> ["ó"]
ó.bytes -> [195, 179]

@fxn
Copy link
Member

fxn commented Jun 7, 2016

Awesome, it squares. Interesting gotcha!

I am working now, saw the PR, but let's think a bit if there are other options to make this robust.

@amrani
Copy link

amrani commented Jun 8, 2016

Dir defaults to the filesystems encoding type when it isn't given. What about determining the length of the parent directory first to ensure the same encoding type is used?

path_length = Dir["#{fixture_path}"][0].length
fixture_set_names = Dir["#{fixture_path}/{**,*}/*.{yml}"]
fixture_set_names.map! { |f| f[(path_length + 1)..-5] }

I am not sure what will happen if no fixture path is present (looking into it).

Any other ideas?

@fxn
Copy link
Member

fxn commented Jun 8, 2016

Another possibility is to explore doing this in a more encapsulated way using Pathname, where you can ask for the path relative to a directory and have a rich API to manipulate file names.

Could be the case that this is also subject to normalization gotchas though, we'd need to see.

@eliasjpr
Copy link

eliasjpr commented Jun 8, 2016

@fxn and @MatheusMariano Just a thought I would look into your mac filesystem encoding settings since this looks more of an OS encoding issue.

http://stackoverflow.com/questions/9757843/unicode-encoding-for-filesystem-in-mac-os-x-not-correct-in-python

@fxn
Copy link
Member

fxn commented Jun 8, 2016

@amrani let's dig a little more into this.

  1. In your application, fixture_path is computed by Rails? Or is it set via the environment variable FIXTURES_PATH?
  2. Which operating system?

@eliasjpr we suspect it is a UTF-8 normalization issue (see comments above), the normalization in the string stored in fixture_path and the one in the strings that come from the glob are different. Anologous to what happens in this script:

require 'fileutils'

filename = "\u0041\u0300.test" # À encoded as capital A + combining diacritic `
FileUtils.touch(filename)
from_glob = Dir['*.test'].first

p filename.bytes  # => [65, 204, 128, ...]
p from_glob.bytes # => [195, 128, ...]

They differ because at some point between touching and globbing the normalization of the filename changes. Indeed, according to these Apple docs Mac OS X uses a fully decomposed form, which does not seem to be what globbing is returning. There is some yak shaving to do here I believe.

We need to understand why the normalizations are different in the case of fixture_path.

@amrani
Copy link

amrani commented Jun 8, 2016

Here is where the fixture_path is being set: https://github.com/rails/rails/blob/4-2-stable/railties/lib/rails/test_help.rb#L23
self.fixture_path = "#{Rails.root}/test/fixtures/"
Also, I am using Mac OS X

@fxn
Copy link
Member

fxn commented Jun 8, 2016

I'll try to reproduce. Which Ruby version? How do you create the directory? Finder? Terminal.app?

@eliasjpr
Copy link

eliasjpr commented Jun 8, 2016

@fxn I got different results when trying to reproduce your case. Do you have different keyboard settings? :)

Im very curious about what could be causing this (I mean obviously).
screen shot 2016-06-08 at 5 46 43 pm

It would be interesting to see the solution.

@amrani
Copy link

amrani commented Jun 9, 2016

ruby 2.2.4p230 and I created the directory through the finder

@fxn
Copy link
Member

fxn commented Jun 9, 2016

I suspect something has changed in Ruby. Look:

I create a file called "À" in a test directory. To do so, I press Option+` and then A. Then I inspect the bytes in the shell like this:

$ echo -n * | od -t uC
0000000    65 204 128                                                    
0000003

If we assume echo(1) and the shell are just passing bytes around, normalization chooses a decomposed form. That squares with the Apple docs linked above (in theory it should be independent from the normalization received as input of the filename, so independent of what the Option thing produces.)

Now, what comes up from globbing depends on the version of Ruby:

$ ruby -v -e 'p Dir["*"].first.bytes'
ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-darwin15]
[195, 128]

$ ruby -v -e 'p Dir["*"].first.bytes'
ruby 2.2.4p230 (2015-12-16 revision 53155) [x86_64-darwin15]
[195, 128]

$ /usr/bin/ruby -v -e 'p Dir["*"].first.bytes'
ruby 2.0.0p648 (2015-12-16 revision 53162) [universal.x86_64-darwin15]
[65, 204, 128]

I wonder if this may be a consequence of ruby/ruby@1f30b74.

@fxn
Copy link
Member

fxn commented Jun 9, 2016

@nobu does it ring a bell? The gist of the issue is:

A string (fixture_path) has a diacritic in decomposed form. This string is interpolated in the pattern of a dir glob as a base dir, and then the strings that come up from globbing use precomposed characters.

The OS (Mac OS X) seems to be storing things using a decomposed normalization:

$ echo -n * | od -t uC
0000000    65 204 128                                                    
0000003

and starting with Ruby 2.2 globbing does not seem to be transaparent in that sense:

$ ruby -v -e 'p Dir["*"].first.bytes'
ruby 2.2.4p230 (2015-12-16 revision 53155) [x86_64-darwin15]
[195, 128]

$ /usr/bin/ruby -v -e 'p Dir["*"].first.bytes'
ruby 2.0.0p648 (2015-12-16 revision 53162) [universal.x86_64-darwin15]
[65, 204, 128]

Could be related to ruby/ruby@1f30b74 perhaps?

@fxn
Copy link
Member

fxn commented Jun 9, 2016

Note that fixture_path comes from the file system as well (via Rails.root).

I notice some other inconsistencies that may be related. Within a directory called "À" Dir.pwd returns a composed form, while expand_path returns a decomposed one:

À $ ruby -e 'p Dir.pwd.bytes'
[47, 85, 115, 101, 114, 115, 47, 102, 120, 110, 47, 116, 109, 112, 47, 195, 128]

À $ ruby -e 'p File.expand_path(".").bytes'
[47, 85, 115, 101, 114, 115, 47, 102, 120, 110, 47, 116, 109, 112, 47, 65, 204, 128]

@fxn
Copy link
Member

fxn commented Jun 9, 2016

To remove the shell from the equation I have written this C program that creates a file called "À" using the precomposed code point, and then prints the bytes returned by readdir() for that file. Yeah, they come up decomposed.

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <dirent.h>
#include <string.h>

int main(void)
{
    char agrave[3] = { 195, 128, 0 };
    struct dirent *ent;

    int fd = open(agrave, O_RDWR|O_CREAT);
    close(fd);

    DIR *dirp = opendir(".");
    while ((ent = readdir(dirp)) != NULL) {
        if (!strstr(ent->d_name, ".")) {
            for (int i = 0; i < strlen(ent->d_name); i++) {
                printf("%u ", (unsigned char) ent->d_name[i]);
            }
            printf("\n");
        }
    }
    closedir(dirp);

    return 0;
}

Output:

$ gcc test.c && ./a.out 
65 204 128 

It could still be the case that the C library does something, but I doubt it. My bet is that Mac OS X normalizes the way the docs for HFS+ say, and that decomposition is not just internal, but what comes up without further interference.

If that is correct, the precomposed characters we are finding would be generated by Ruby I guess.

@eliasjpr
Copy link

eliasjpr commented Jun 9, 2016

@fxn will the use of string.force_encoding(Encoding::UTF_8) address the issue, since as per documentation:

The associated Encoding of a String can be changed in two different ways.
First, it is possible to set the Encoding of a string to a new Encoding without changing the internal byte representation of the string, with String#force_encoding. This is how you can tell Ruby the correct encoding of a string.

Second, it is possible to transcode a string, i.e. translate its internal byte representation to another encoding. Its associated encoding is also set to the other encoding. See String#encode for the various forms of transcoding, and the Encoding::Converter class for additional control over the transcoding process.

@nobu
Copy link
Contributor

nobu commented Jun 10, 2016

It is https://bugs.ruby-lang.org/issues/7267 (ruby/ruby@1891b60f2 and its related commits).
The results of Dir.glob are in the encoding of the given pattern.

@fxn
Copy link
Member

fxn commented Jun 10, 2016

@nobu Thanks! I see some messages in the thread do not fully understand normalization (or lack thereof, you know strings may technically mix composed and decomposed characters), but Yui's are very precise.

So the contract is that Dir.glob is going to return a UTF-8 string, but normalization may differ from the one in the original string. I'll see if we can address this issue via APIs, because I guess this particular code is going to be tricky to get right in a portable way.

Why do Dir.glob and File.expand_path return different normalizations for the same file? (See my comment above). Or perhaps that should not be considered to be an inconsistency because no normalization is guaranteed despite the fact that the file system has it decomposed?

@nobu
Copy link
Contributor

nobu commented Jun 11, 2016

File.expand_path is an oversight.
Thank you for the heads up.
I'll fix it.

@fxn
Copy link
Member

fxn commented Jun 11, 2016

Awesome, all squared then, appreciate the help @nobu.

hsbt pushed a commit to ruby/ruby that referenced this issue Jun 12, 2016
* file.c (append_fspath): normalize directory name to be appended
  on OS X.  [ruby-core:75957] [Ruby trunk Bug#12483]
  rails/rails#25303 (comment)

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55385 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
@fxn
Copy link
Member

fxn commented Jun 12, 2016

I have moved forward debugging this, but have not yet founded the ultimate culprit of this issue. Let me share with you all what I have found in case it rings a bell to someone.

Setup: We are going to use the accented letter "À" ("A" with a grave acent) as a directory name in Mac OS X, which may be encoded in two ways in UTF-8:

"\u00c0".bytes       # => [195, 128], composed
"\u0041\u0300".bytes # => [65, 204, 128], decomposed

As we have seen above it does not matter if the creation of the directory uses a composed or depomposed encoding, the file system is going to normalize it to its internal thing anyway. In particular, you can copy & paste the glyph above in Terminal.app and pass it to mkdir(1).

Let's also recall that, from the feedback of @nobu, under Mac OS X in general you expect file names returned by Ruby APIs coming from the filesystem to be UTF-8 with composed characters (at least if all is UTF-8 in your program). Again, this does not depend on the normalization used to create the file, Dir.pwd, Dir.glob, File.realpath, etc., return a composed encoding (with File.expand_path just patched). Composed: good. Decomposed: suspicious.

I have observed that Rails.root.to_s.bytes shows a composed representation in development mode, and a decomposed form while running tests, which is what causes the problem with fixture_path reported in this issue.

Well, the surprising observation is that the problem is not the test environment, what makes the bytes to be one way or another is runinng a Rake task. That is, if you inspect Rails.root with runner under the test environment you get the composed form. If you inspect it in a Rake task that depends on :environment running under the development environment you get the decomposed form. The discriminant is Rake.

The encoding of Rails.root depends on the behaviour of the method Thread::Backtrace::Location#absolute_path, exactly from this spot in the Rails source code. That absolute_path call returns the composed variant when running normally, and the decomposed (suspicious) one when running under Rake.

Having realized that, I have been able to reproduce this with a minimal Ruby script that does not depend on Rails. Let's consider

# À/foo.rb
def x
  p caller_locations.first.absolute_path.bytes
end

x

and this Rakefile:

require_relative 'foo'

task :foo do
end

The bytes shown are different:

$ ruby foo.rb
[..., 47, 195, 128, 47, 102, 111, 111, 46, 114, 98]

$ rake foo
[..., 47, 65, 204, 128, 47, 102, 111, 111, 46, 114, 98]

I can't for the life of me understand way so far.

I'll leave this here and will try to continue debugging later, but if someone has an inspiration please raise a flag! /cc @tenderlove

@fxn
Copy link
Member

fxn commented Jun 12, 2016

I think I have removed Rake from the equation.

I traced the execution, and the only thing that caught my attention was that Rake loads the Rakefile by calling Kernel#load on the result of a File.expand_path call, which in 2.3 still returns the decomposed variant, and somehow that affects what is stored in the backtrace metadata.

So, this is a minimal way to reproduce that does not use Rake. We need three files:

# foo.rb
load File.expand_path('bar.rb')

# bar.rb
require_relative 'baz'

# baz.rb
def x
  puts caller_locations.first.absolute_path
  p caller_locations.first.absolute_path.bytes
end

x

If we execute baz.rb directly all is fine:

$ ruby baz.rb
/Users/fxn/tmp/issue-25303/À/baz.rb
[..., 195, 128, 47, ...]

but if we load it from foo.rb:

$ ruby foo.rb
/Users/fxn/tmp/issue-25303/À/baz.rb
[..., 65, 204, 128, ...]

Voilà!

@nobu File.expand_path is fixed in trunk, but isn't it strange anyway that loading a file with decomposed characters affects the normalization of an unrelated later location? Is that decomposition being carried somehow internally? Sounds suspicious?

This is the reason Rails.root is decomposed under Rake, and ultimately what explains what this issue is about.

@fxn
Copy link
Member

fxn commented Jun 12, 2016

I have edited the minimal example to better reflect the actual situation by using three files. That way we get the decomposed character in a file name that is different from the one generated by the expand_path call (which is what happens in this issue, since loading the Rakefile is way above the call stack).

@nobu
Copy link
Contributor

nobu commented Jun 13, 2016

load and require call File.expand_path internally.
You can see it by __FILE__ in bar.rb.

@fxn
Copy link
Member

fxn commented Jun 13, 2016

@nobu But the behaviour we see indicates that the normalization used in one load call somehow persists internally and it is used to store subsequent locations. Indeed, if we run bar.rb directly, we get the expected composed variant for baz.rb:

$ ruby bar.rb 
/Users/fxn/tmp/issue-25303/À/baz.rb
[..., 195, 128, ...]

To illustrate my hypothesis, I have prepared this gist. There we have files 1.rb, 2.rb, ..., 6.rb, and when you call 1.rb the six of them are evaluated in order as a chain. The load in 1.rb uses a decomposed file name and makes the inspection in 3.rb to return a decomposed form (for an unrelated file), but the load in 4.rb uses a composed file name and that flips the variant printed by 6.rb, which is composed (again, an unrelated file name).

Rails of course does a ton of require and require_relative and Rails.root is fine (composed) in normal executions. For example, it is fine in the console regardless of the environment. But the fact that Rake is performing a load with a decomposed form to load the Rakefile affects caller location way below the call stack.

The tests suggest composed vs decomposed kind of sticks internally, it is carried somehow.

mrkn pushed a commit to mrkn/ruby that referenced this issue Jun 13, 2016
* file.c (append_fspath): normalize directory name to be appended
  on OS X.  [ruby-core:75957] [Ruby trunk Bug#12483]
  rails/rails#25303 (comment)

git-svn-id: svn+ssh://svn.ruby-lang.org/ruby/trunk@55385 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
@fxn
Copy link
Member

fxn commented Jun 18, 2016

I think we better do nothing here.

Debugging has shown the origin of this issue is in Ruby itself. There are two things going on:

  1. in Mac OS X methods that deal with filenames like Dir.cwd, File.realpath, etc., return UTF-8 strings[*] with precomposed characters, but File.expand_path doesn't due to an overlook.
  2. When you load a file, the normalization in the argument is carried and reflected in subsequent backtrace metadata (unless toggled).

Rails.root is computed from caller metadata and Rake loads the Rakefile using File.expand_path. Thus, as a consequence of 1) and 2), when we run Rake tasks Rails.root ends up having a decomposed variant that does not match the composed one coming up from Dir.glob in the computation of fixture_path. Note that Rake is doing nothing wrong, it is just using the API.

So, this is an inconsistency in the current version of Ruby that has been already patched in trunk. And it could manifest itself in a myriad ways. fixture_path is just one particular case in which the factors converge. That's why I lean on closing this as wontfix, either rename the directory or monkey patch something as a workaround. It won't be an issue in future versions of Ruby.

[*] It depends on the runtime encodings, really.

@fxn fxn closed this as completed Jun 18, 2016
rhenium pushed a commit to rhenium/ruby that referenced this issue Jul 1, 2016
* file.c (append_fspath): normalize directory name to be appended
  on OS X.  [ruby-core:75957] [Ruby trunk Bug#12483]
  rails/rails#25303 (comment)

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55385 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
tenderlove pushed a commit to tenderlove/ruby that referenced this issue Jul 11, 2016
* file.c (append_fspath): normalize directory name to be appended
  on OS X.  [ruby-core:75957] [Ruby trunk Bug#12483]
  rails/rails#25303 (comment)

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55385 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
hsbt pushed a commit to ruby/ruby that referenced this issue Aug 15, 2016
	* file.c (append_fspath): normalize directory name to be appended
	  on OS X.  [ruby-core:75957] [Ruby trunk Bug#12483]
	  rails/rails#25303 (comment)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_3@55909 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants