New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rails not reading the first char of fixture name. #25303
Comments
@MatheusMariano can you share your users.yml? |
Yes, sure. I did no changes in this file. users.yml
|
Thanks, can you also show your user_test.rb? |
user_test.rb
|
May this problem be caused by special characters in parent folders? I've realized the problem just occurs inside the folder Código, which have the special character ó. |
@MatheusMariano are you able to confirm that temporarily renaming the folder makes it work? |
@matthewd Yes. I've changed my path from |
ó is actually of length 2 which is causing the first letter to be cut off(see below). I submitted a PR that converts the characters before calculating the length.
|
The length of a string is logical, number of bytes doesn't matter. Maybe an encoding issue related to paths? |
Wonder if this may be related to normalization. For example "À" can be encoded in two different ways:
These have different logical lengths because the second one is printed as the combination of two characters:
and of course they consist of different bytes:
So it could be the case that Could you please inspect the bytes in each of them to see if it is the case? |
@fxn fixture_path after setting Dir: |
Just to be sure, you mean you tested |
Correct. I found the 'common' segment after inspecting the chars in each and grabbing the substring. Below are the substring chars from each common segment
|
Awesome, it squares. Interesting gotcha! I am working now, saw the PR, but let's think a bit if there are other options to make this robust. |
Dir defaults to the filesystems encoding type when it isn't given. What about determining the length of the parent directory first to ensure the same encoding type is used?
I am not sure what will happen if no fixture path is present (looking into it). Any other ideas? |
Another possibility is to explore doing this in a more encapsulated way using Could be the case that this is also subject to normalization gotchas though, we'd need to see. |
@fxn and @MatheusMariano Just a thought I would look into your mac filesystem encoding settings since this looks more of an OS encoding issue. |
@amrani let's dig a little more into this.
@eliasjpr we suspect it is a UTF-8 normalization issue (see comments above), the normalization in the string stored in require 'fileutils'
filename = "\u0041\u0300.test" # À encoded as capital A + combining diacritic `
FileUtils.touch(filename)
from_glob = Dir['*.test'].first
p filename.bytes # => [65, 204, 128, ...]
p from_glob.bytes # => [195, 128, ...] They differ because at some point between touching and globbing the normalization of the filename changes. Indeed, according to these Apple docs Mac OS X uses a fully decomposed form, which does not seem to be what globbing is returning. There is some yak shaving to do here I believe. We need to understand why the normalizations are different in the case of |
Here is where the fixture_path is being set: https://github.com/rails/rails/blob/4-2-stable/railties/lib/rails/test_help.rb#L23 |
I'll try to reproduce. Which Ruby version? How do you create the directory? Finder? Terminal.app? |
@fxn I got different results when trying to reproduce your case. Do you have different keyboard settings? :) Im very curious about what could be causing this (I mean obviously). It would be interesting to see the solution. |
ruby 2.2.4p230 and I created the directory through the finder |
I suspect something has changed in Ruby. Look: I create a file called "À" in a test directory. To do so, I press Option+` and then A. Then I inspect the bytes in the shell like this:
If we assume Now, what comes up from globbing depends on the version of Ruby:
I wonder if this may be a consequence of ruby/ruby@1f30b74. |
@nobu does it ring a bell? The gist of the issue is: A string ( The OS (Mac OS X) seems to be storing things using a decomposed normalization:
and starting with Ruby 2.2 globbing does not seem to be transaparent in that sense:
Could be related to ruby/ruby@1f30b74 perhaps? |
Note that I notice some other inconsistencies that may be related. Within a directory called "À"
|
To remove the shell from the equation I have written this C program that creates a file called "À" using the precomposed code point, and then prints the bytes returned by #include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <dirent.h>
#include <string.h>
int main(void)
{
char agrave[3] = { 195, 128, 0 };
struct dirent *ent;
int fd = open(agrave, O_RDWR|O_CREAT);
close(fd);
DIR *dirp = opendir(".");
while ((ent = readdir(dirp)) != NULL) {
if (!strstr(ent->d_name, ".")) {
for (int i = 0; i < strlen(ent->d_name); i++) {
printf("%u ", (unsigned char) ent->d_name[i]);
}
printf("\n");
}
}
closedir(dirp);
return 0;
} Output:
It could still be the case that the C library does something, but I doubt it. My bet is that Mac OS X normalizes the way the docs for HFS+ say, and that decomposition is not just internal, but what comes up without further interference. If that is correct, the precomposed characters we are finding would be generated by Ruby I guess. |
@fxn will the use of
|
It is https://bugs.ruby-lang.org/issues/7267 (ruby/ruby@1891b60f2 and its related commits). |
@nobu Thanks! I see some messages in the thread do not fully understand normalization (or lack thereof, you know strings may technically mix composed and decomposed characters), but Yui's are very precise. So the contract is that Why do |
|
Awesome, all squared then, appreciate the help @nobu. |
* file.c (append_fspath): normalize directory name to be appended on OS X. [ruby-core:75957] [Ruby trunk Bug#12483] rails/rails#25303 (comment) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55385 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
I have moved forward debugging this, but have not yet founded the ultimate culprit of this issue. Let me share with you all what I have found in case it rings a bell to someone. Setup: We are going to use the accented letter "À" ("A" with a grave acent) as a directory name in Mac OS X, which may be encoded in two ways in UTF-8:
As we have seen above it does not matter if the creation of the directory uses a composed or depomposed encoding, the file system is going to normalize it to its internal thing anyway. In particular, you can copy & paste the glyph above in Terminal.app and pass it to Let's also recall that, from the feedback of @nobu, under Mac OS X in general you expect file names returned by Ruby APIs coming from the filesystem to be UTF-8 with composed characters (at least if all is UTF-8 in your program). Again, this does not depend on the normalization used to create the file, I have observed that Well, the surprising observation is that the problem is not the test environment, what makes the bytes to be one way or another is runinng a Rake task. That is, if you inspect The encoding of Having realized that, I have been able to reproduce this with a minimal Ruby script that does not depend on Rails. Let's consider # À/foo.rb
def x
p caller_locations.first.absolute_path.bytes
end
x and this Rakefile: require_relative 'foo'
task :foo do
end The bytes shown are different:
I can't for the life of me understand way so far. I'll leave this here and will try to continue debugging later, but if someone has an inspiration please raise a flag! /cc @tenderlove |
I think I have removed Rake from the equation. I traced the execution, and the only thing that caught my attention was that Rake loads the Rakefile by calling So, this is a minimal way to reproduce that does not use Rake. We need three files: # foo.rb
load File.expand_path('bar.rb')
# bar.rb
require_relative 'baz'
# baz.rb
def x
puts caller_locations.first.absolute_path
p caller_locations.first.absolute_path.bytes
end
x If we execute baz.rb directly all is fine:
but if we load it from foo.rb:
Voilà! @nobu This is the reason |
I have edited the minimal example to better reflect the actual situation by using three files. That way we get the decomposed character in a file name that is different from the one generated by the |
|
@nobu But the behaviour we see indicates that the normalization used in one
To illustrate my hypothesis, I have prepared this gist. There we have files 1.rb, 2.rb, ..., 6.rb, and when you call 1.rb the six of them are evaluated in order as a chain. The Rails of course does a ton of The tests suggest composed vs decomposed kind of sticks internally, it is carried somehow. |
* file.c (append_fspath): normalize directory name to be appended on OS X. [ruby-core:75957] [Ruby trunk Bug#12483] rails/rails#25303 (comment) git-svn-id: svn+ssh://svn.ruby-lang.org/ruby/trunk@55385 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
I think we better do nothing here. Debugging has shown the origin of this issue is in Ruby itself. There are two things going on:
So, this is an inconsistency in the current version of Ruby that has been already patched in trunk. And it could manifest itself in a myriad ways. [*] It depends on the runtime encodings, really. |
* file.c (append_fspath): normalize directory name to be appended on OS X. [ruby-core:75957] [Ruby trunk Bug#12483] rails/rails#25303 (comment) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55385 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* file.c (append_fspath): normalize directory name to be appended on OS X. [ruby-core:75957] [Ruby trunk Bug#12483] rails/rails#25303 (comment) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55385 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* file.c (append_fspath): normalize directory name to be appended on OS X. [ruby-core:75957] [Ruby trunk Bug#12483] rails/rails#25303 (comment) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_3@55909 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Hey, guys. I'm getting a very strange error in my fixtures.
I generate a model called User normally.
rails generate model User
But when I execute my tests
rake test
It returns the error
Because the rails isn't reading the first character of my fixture name.
Here is terminal log.
Ruby version: ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-darwin15]
Rails version: 4.2.6
The text was updated successfully, but these errors were encountered: