Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure a GMS2 generated file is classified as such #4297

Merged
merged 8 commits into from
Dec 1, 2018

Conversation

RobQuistNL
Copy link
Contributor

@RobQuistNL RobQuistNL commented Oct 16, 2018

I'm making sure the generated .yy file is being matches as a generated GMS2 file, instead of Yacc.

Fixes #4293

Checklist:

@@ -588,7 +588,8 @@ def generated_graphql_relay?
def generated_gamemakerstudio?
return false unless ['.yy', '.yyp'].include? extname
return false unless lines.count > 3
return lines[2].match(/\"modelName\"\:\s*\"GM/)
return lines[2].match(/\"modelName\"\:\s*\"GM/) ||
lines[0].start_with?("1.0.0")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be a way for me to include the weird character back in there? Its this one: ←(U+2190 - 0xE2 0x86 0x90 in utf-8).

It might make the checking a lot stricter, and then could possible replace starts with 1.0.0 with \d\.\d\.\d← which would then support future versions :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS: Just throwing the character in there breaks the testsuite;

 1) Error:
TestBlob#test_language:
Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8
    /home/travis/build/github/linguist/lib/linguist/generated.rb:592:in `start_with?'
    /home/travis/build/github/linguist/lib/linguist/generated.rb:592:in `generated_gamemakerstudio?'
    /home/travis/build/github/linguist/lib/linguist/generated.rb:93:in `generated?'
    /home/travis/build/github/linguist/lib/linguist/generated.rb:12:in `generated?'
    /home/travis/build/github/linguist/lib/linguist/blob_helper.rb:362:in `generated?'
    /home/travis/build/github/linguist/test/test_blob.rb:281:in `block (2 levels) in test_language'
    /home/travis/build/github/linguist/test/test_blob.rb:271:in `each'
    /home/travis/build/github/linguist/test/test_blob.rb:271:in `block in test_language'
    /home/travis/build/github/linguist/test/test_blob.rb:265:in `each'
    /home/travis/build/github/linguist/test/test_blob.rb:265:in `test_language'
  2) Error:
TestFileBlob#test_language:
Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8
    /home/travis/build/github/linguist/lib/linguist/generated.rb:592:in `start_with?'
    /home/travis/build/github/linguist/lib/linguist/generated.rb:592:in `generated_gamemakerstudio?'
    /home/travis/build/github/linguist/lib/linguist/generated.rb:93:in `generated?'
    /home/travis/build/github/linguist/lib/linguist/generated.rb:12:in `generated?'
    /home/travis/build/github/linguist/lib/linguist/blob_helper.rb:362:in `generated?'
    /home/travis/build/github/linguist/test/test_file_blob.rb:676:in `block (2 levels) in test_language'
    /home/travis/build/github/linguist/test/test_file_blob.rb:666:in `each'
    /home/travis/build/github/linguist/test/test_file_blob.rb:666:in `block in test_language'
    /home/travis/build/github/linguist/test/test_file_blob.rb:660:in `each'
    /home/travis/build/github/linguist/test/test_file_blob.rb:660:in `test_language'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RobQuistNL Is LANG set in your environment? On OpenBSD, every test belched spurious encoding errors until I executed tests using LANG=en_AU.UTF-8 bundle exec rake (aliased in my shell as simply brake).

It fixed errors which looked extremely similar to the ones you've posted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the reports from the tests ran on the CI: https://travis-ci.org/github/linguist/jobs/442092134

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it was fixed by removing the character from the matching line

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an expert on how encodings work in Ruby (especially in CI), but you can express Unicode characters as 7-bit ASCII using:

/\u2190/ # RegExp escape
"\u2190" # Analogous string escape

"\u2190" =~ /\u2190/ # => Returns 0, the index of the first match
"foobar" =~ /\u2190/ # => nil No match

Copy link
Collaborator

@Alhadis Alhadis Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The =~ operator is shameless ripped off from Perl Ruby's way of performing a match between a regular expression and a value.

In PHP, the equivalent syntax is pcre_match("/\\u2190/", "input"); (if memory serves, it's been years since I bothered working with PHP. #Node4Life

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah cool, will try :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked into this today and attempting to match the is definitely a bit of a struggle. I can get it working locally on macOS by using that char directly, but it doesn't play nice on Travis.

An alternate solution may be to more closely match the format of the line ignoring the content, for example:

lines[0] =~ /^\d\.\d\.\d.+\|\{/

... matches the 1.0.0 at the beginning of the line and the |{ at the end, ignoring everything in between and does the trick locally and on Travis here. We could probably get more fancy with the regex, but I'm not sure it's worth the effort.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks, I didn't see your comment there for a while. Sorry! Yeah, I think we're going to have to walk this way since the actual char is not going to work.

lib/linguist/generated.rb Outdated Show resolved Hide resolved
@RobQuistNL
Copy link
Contributor Author

Hmm, that explodes again..

TestFileBlob#test_language:
Encoding::CompatibilityError: incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)

@Alhadis
Copy link
Collaborator

Alhadis commented Oct 16, 2018

@lildude Could you check this out, please? You're likely more familiar with this pipeline (and Ruby) than I am.

@RobQuistNL
Copy link
Contributor Author

@Alhadis can you confirm that the tests run on your own machine? I can't since I don't have the project running :x

@Alhadis
Copy link
Collaborator

Alhadis commented Oct 16, 2018

Yeah, no, I'm getting the same errors as CI is. This is with the brake alias I described earlier.

I'll leave this for an expert to figure out. 😜

@RobQuistNL
Copy link
Contributor Author

IMHO we could remove that one character - allthough it would be a pretty solid detection method :)

@lildude
Copy link
Member

lildude commented Oct 16, 2018

I've not had the chance to look into this in-depth but this smells very similar to #4028

@Alhadis
Copy link
Collaborator

Alhadis commented Oct 16, 2018

Is there any chance of it being related to charlock_holmes? I keep seeing this warning emitted whenever I run bundle exec rake …:

/home/travis/build/github/linguist/vendor/bundle/ruby/2.5.0/gems/charlock_holmes-0.7.6/lib/charlock_holmes/encoding_detector.rb:66: warning: instance variable encoding_list not initialized

... and this doesn't look very good:

    # A mapping table of supported encoding names from EncodingDetector
    # which point to the corresponding supported encoding name in Ruby.
    # Like: {"UTF-8" => "UTF-8", "IBM420_rtl" => "ASCII-8BIT"}
    #
    # Note that encodings that can't be mapped between Charlock and Ruby will resolve
    # to "ASCII-8BIT".
    @encoding_table = {}

Since charlock_holmes appears to be consulting a lower-level system dependency (ICU) I'm willing to bet this is undefined behaviour subject to environmental factors. It would explain why OpenBSD gets flooded with errors in volumes considered unacceptable by anybody's standards — whereas users of other platforms don't appear to have ever experienced this issue.

@RobQuistNL
Copy link
Contributor Author

I really wish I could help, but I'm a real big newbie when it comes to Ruby - I will at least try to setup the project on my Xenial machine soon.

@RobQuistNL
Copy link
Contributor Author

RobQuistNL commented Nov 22, 2018

@Alhadis @lildude I have changed the regex to use lildude's /^\d\.\d\.\d.+\|\{/ which seems to make the testsuite run correctly.

Maybe its worth making an issue for the underlying problem - either UTF-8 characters (), or ASCII-written, escaped UTF-8 characters (\u2190) do not seem to work.

For now this PR should be fine and fix the one wrongly detected file in GMS2 projects :)

@Alhadis
Copy link
Collaborator

Alhadis commented Nov 22, 2018

Seems like a decent compromise. 👍 Thanks for your patience!

Copy link
Contributor

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution @RobQuistNL!

Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lildude lildude merged commit 3ff6a9b into github-linguist:master Dec 1, 2018
@github-linguist github-linguist locked as resolved and limited conversation to collaborators Jun 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Game Maker Studio 2's broken "json" file is misclassified as Yacc
4 participants