Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding .yy and .yyp (GMS2 Project / metadata files) as JSON, and GML heuristics #4130

Merged

Conversation

RobQuistNL
Copy link
Contributor

@RobQuistNL RobQuistNL commented May 14, 2018

Fixes #4129

This change makes sure the .yy and .yyp files are correctly detected as JSON files within a Game Maker project.

I'm having difficulties running this tool in an example project - I could use some help with running my branch of linguist inside of an example project. (I get an error message Could not locate Gemfile or .bundle/ directory)

Description

Not sure how the Yacc language responds to this. Maybe I should also add some heuristic rules for .yy and .yyp files? They are pretty easily distinguishable from Yacc files.

Checklist:

Copy link
Contributor

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a stab at this!

I could use some help with running my branch of linguist inside of an example project. (I get an error message Could not locate Gemfile or .bundle/ directory)

Do you have a proper Ruby environment configured?

Maybe I should also add some heuristic rules for .yy and .yyp files? They are pretty easily distinguishable from Yacc files.

👍

I've left a few comments to address below.

@@ -2079,6 +2079,8 @@ JSON:
- ".topojson"
- ".webapp"
- ".webmanifest"
- ".yyp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file extensions (except the first) need to be sorted.

@@ -541,5 +541,13 @@ def call(data)
end
end

disambiguate ".yy" do |data|
Language["JSON"] if /\"modelName\"\:\s?\"/.match(data)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to add a case for Yacc here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I thought it would fall through. Would a simple "else" be sufficient in this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, my need above might be a bit strong. It's better if we have a case for Yacc 😄
Is the presence of the modelName key sufficient to identify GML JSON files? (i.e., is it mandatory in those files?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much. All the .yy files I've seen have a modelName. As does the .yyp file. But like you said - the .yyp file isn't documented in the languages.rb file so I've removed it in the latest commit. I've extended this testcase to first go for XML, then Graph Markup Language, and then a GameMaker file. The latter is harder because its a generic programming language and has no "required" keywords in the file. XML does, and it looks like Graph Markup Language does too (http://www.fim.uni-passau.de/fileadmin/files/lehrstuhl/brandenburg/projekte/gml/gml-technical-report.pdf) - the graph [ and potentially the node [.


disambiguate ".yyp" do |data|
Language["JSON"] if /\"modelName\"\:\s?\"GMProject\"/.match(data)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required? Did you find other languages using .yyp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, don't think so. Will look this up.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: is there a guarantee that only one or zero whitespace characters follow modelName:? If not, you should replace \s? with \s* so it'll match 2+ blanks instead of 1 or none. 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one, the * would be better. They are automatically generated files though, so I'm sure its usually 1 whitespace, but the * is better. Committing as we speak!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've extended it since I've checked pretty much all .yy files, and they all contain a modelName element with a value that starts with GM.

"modelName": "GM a match on that one is what I'm doing for this rule.

@RobQuistNL
Copy link
Contributor Author

RobQuistNL commented May 14, 2018

Would it be OK to add a .gml check like this;

disambiguate ".gml" do |data|
      Language["Game Maker Language"] if /^\s*(var|sprite_|vk_|image_|draw_|visible|sprite_)/.match(data)
    end

to this PR as well? I suppose I will have to add checks for Graph Modeling Language and XML as well?

EDIT: I've added checks for XML and Graph first, then make it fall through to Game Maker Language. Its probably easier to do this way and make sure.

I'm still pretty unfamiliar with Ruby so I'm not even sure if this code runs. I've been testing my regexes with this online tool; http://rubular.com/ but I wouldn't feel calm getting this merged without it being properly tested.

I'm sorry this is a bit slow & sluggish - I'm not familiar with Ruby. @pchaigno is there a IRC / slack / discord channel where I can ask for some guidance or this is place OK?

@RobQuistNL
Copy link
Contributor Author

RobQuistNL commented May 14, 2018

So the test is failing because;

TestSamples#test_Yacc_has_samples [/home/travis/build/github/linguist/test/test_samples.rb:86]:
Missing samples in "samples/Yacc/*.yy".

I'm not sure where I can pull an example file from - I can find some example files online but I'm not sure about the licenses on that one

TestSamples#test_JSON_has_samples [/home/travis/build/github/linguist/test/test_samples.rb:86]:
Missing samples in "samples/Yacc/*.yy".

I suppose this is to check if Yacc files don't get detected as JSON, while GM files do? So thats 2 birds one stone

TestClassifier#test_classify_ambiguous_languages:
NoMethodError: undefined method `[]' for nil:NilClass

I'm not sure what to do with this one :(

@RobQuistNL
Copy link
Contributor Author

I've also taken this opportunity to add .gmx files. They are from GameMaker:Studio 1 (EOL). Haven't found any other .gmx files in the languages.yml so this should do just fine.

GameMakerStudio 2 uses the .gml, .yy and .yyp files.

@RobQuistNL
Copy link
Contributor Author

Hey @pchaigno, are the changes I made valid like this?

@RobQuistNL
Copy link
Contributor Author

Ping @pchaigno :x sorry for my impatience :P

Copy link
Contributor

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For next time, it would be easier to review if the independent changes were in separate pull requests. For example, everything regarding the .yy and .yyp extensions looks ok and could be merged as is.

However, I couldn't find the license for the 3 other samples, including the two .gmx files. Did I miss something? If not, would it be possible to use other sample files, under a permissive license?

@@ -0,0 +1,53 @@
/* File taken from http://home.adelphi.edu/~sbloch/class/271/examples/lexyacc/words2.y . */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't find the license for this. Is it indicated anywhere on the website?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me neither, I find it very hard to find an example file for the .y files with a license - Since it was published like this online on an .edu site I was assuming. As we all know, assumptions are the mother of all f*ckups, so I'll have to find something else for this.

The GMX files are my own, I'll add a MIT license to the repository there.

@RobQuistNL
Copy link
Contributor Author

@pchaigno Sorry for the inconvenience.

I've included a Yacc file from an MIT repository (https://github.com/zcbenz/yacci/blob/master/examples/funccal/calc.y) and I've added the MIT license to my own repository for the 2 GMX files.

@RobQuistNL
Copy link
Contributor Author

I take it these failures are unrelated to this PR?

  1) Failure:
TestGrammars#test_whitelisted_hashes_dont_have_licenses [/home/travis/build/github/linguist/test/test_grammars.rb:130]:
The following whitelisted license hashes are unused:
* 7dfce11e2e3579ee43b83e69b1b64e77a2e378f0
* 62b97e52b78439c14550a44a3fe51332aeffb3a1
* 0acff2bb1536a3942a39ac74987ffd9c44905a6b
* 41cdc7e9f9d2e62eb8ac68a1a9359b9c39a7a9bf
Please remove them from the hash whitelist.
--- expected
+++ actual
@@ -1 +1 @@
-[]
+["7dfce11e2e3579ee43b83e69b1b64e77a2e378f0", "62b97e52b78439c14550a44a3fe51332aeffb3a1", "0acff2bb1536a3942a39ac74987ffd9c44905a6b", "41cdc7e9f9d2e62eb8ac68a1a9359b9c39a7a9bf"]
  2) Failure:
TestGrammars#test_submodules_have_approved_licenses [/home/travis/build/github/linguist/test/test_grammars.rb:115]:
The following submodules have unapproved licenses:
* vendor/grammars/FreeMarker.tmbundle: b81acf2ba52d312754bf5055845a723123bda388
* vendor/grammars/ant.tmbundle: 4da01d631a29c76456fd0bd16749c71e8d5f6dbf
* vendor/grammars/elixir-tmbundle: 220e011c8d686129e9c4163a7c655b9d64f61e59
* vendor/grammars/mako-tmbundle: 39f092c726491ca6a02354dbc6c3a0920bb44d4c
The license must be added to the LICENSE_WHITELIST in /test/test_grammars.rb once approved.
--- expected
+++ actual
@@ -1 +1 @@
-[]
+["vendor/grammars/FreeMarker.tmbundle: b81acf2ba52d312754bf5055845a723123bda388", "vendor/grammars/ant.tmbundle: 4da01d631a29c76456fd0bd16749c71e8d5f6dbf", "vendor/grammars/elixir-tmbundle: 220e011c8d686129e9c4163a7c655b9d64f61e59", "vendor/grammars/mako-tmbundle: 39f092c726491ca6a02354dbc6c3a0920bb44d4c"]

@lildude
Copy link
Member

lildude commented Jun 18, 2018

The test failures appear to be caused by recent changes to licensee which has resulted in a slightly modified hash for the four licenses in question. I'll address this in #4152

@lildude
Copy link
Member

lildude commented Jun 21, 2018

Merging master into your branch should fix the failing tests.

@RobQuistNL
Copy link
Contributor Author

Thanks @lildude !

@pchaigno We should be good to go now I think :)

Copy link
Contributor

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @RobQuistNL!

I've got two concerns below, but we're getting closer to something ready to be merged.

@@ -531,5 +541,13 @@ def call(data)
end
end

disambiguate ".yy" do |data|
if /\"modelName\"\:\s*\"GM/.match(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these files generated? If not, shouldn't we match on modelName: "GM too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are generated, but I've only seen them with the double quotes. I put the \s in there just in case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the precision. We should probably add a case in generated.rb for those files then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much all the .yy and .yyp files will be generated. The GML files (which the .yy files will point to) are 1:1 contents from the IDE. .yy contains options and a lot of meta-data for the IDE.

I've put this in place to disambuguate Yacc from GameMaker's JSON files - do you think we need the same check in generated.rb? Pretty much, if its a .yy and it has that "modelName": "GM format in there its a GM generated file for sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think we need the same check in generated.rb?

Yes, I think it's worth it. Is that the only indication that the file is generated? We often have clear comments such as Generated by XXXX.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately GM doesn't do that. Its plain JSON, but at least it has a consistent format. I'll add a case to generated.rb.

Language["Graph Modeling Language"]
else
Language["Game Maker Language"]
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can be sure we've eliminated the two other candidate languages here. Are we even sure /^\s*(graph\s?\[|node\s?\[)/i will match all Graph Modeling Language files?

Copy link
Contributor Author

@RobQuistNL RobQuistNL Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I could tell the Graph modeling Languages always have a graph [ or node [ element.

http://www.fim.uni-passau.de/fileadmin/files/lehrstuhl/brandenburg/projekte/gml/gml-technical-report.pdf

I haven't found a lot of docs, but I'm quite sure that it then will not be Game Maker Language. (text [ is invalid syntax), unless its in a comment ofcourse. I'll be searching a bit more for the Graph Modeling Language file formats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here there is some more documentation. All files I've seen so far have that graph [ and node [ stuff.

http://www.fim.uni-passau.de/index.php?id=17297&L=1

@RobQuistNL
Copy link
Contributor Author

I've edited the Graph Modeling Language regex - it will now match most of the files found here;

https://github.com/search?p=9&q=extension%3Agml+graph&ref=searchresults&type=Code&utf8=%E2%9C%93

I can't find any hard evidence that graph [ or node [ has to be in a graph modeling language file - but all the examples I've found have that setup, and so do the files in the search results.

@RobQuistNL
Copy link
Contributor Author

Hmm, might be an issue, since graph [4] = 20; is actually valid GML and it will match. Most people will write array accessors like graph[4] = 20; without the space though.. :/

@pchaigno
Copy link
Contributor

Hmm, might be an issue, since graph [4] = 20; is actually valid GML and it will match. Most people will write array accessors like graph[4] = 20; without the space though.. :/

We could anchor the [ to the end of the line?

@RobQuistNL
Copy link
Contributor Author

Here goes 🤞

Thanks a ton for your quick responses and your patience @pchaigno! I really appreciate it.

disambiguate ".gml" do |data|
if /^\s*(\<\?xml|xmlns)/i.match(data)
Language["XML"]
elsif /^\s*(graph|node)\s+\[$/i.match(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still want to match on:

graph
[
    foobar

right?
Any reason you changed the [\s\n\r]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought \s would include \n and \r, hence I removed it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\s matches any whitespace character (equal to [\r\n\t\f\v ])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This format is in sample.gml and now also in sample3.gml. I think the succeeding tests should prove that the regex works, right? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right! my bad!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad for initially writing \s\r\n ;)

@RobQuistNL RobQuistNL changed the title Adding .yy and .yyp (GMS2 Project / metadata files) as JSON Adding .yy and .yyp (GMS2 Project / metadata files) as JSON, and GML heuristics Jul 3, 2018
def test_gml_by_heuristics
assert_heuristics({
"Game Maker Language" => all_fixtures("Game Maker Language", "*.gml"),
"Graph Modeling Language" => all_fixtures("Graph Modeling Language", "*.gml")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a case for XML too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the testcases for .yy files as well.

@RobQuistNL
Copy link
Contributor Author

@pchaigno Ping! Congratulations on winning the finals BTW :) Hope your hangover is not too bad ;)

@RobQuistNL
Copy link
Contributor Author

@pchaigno Sorry :x ping again

@pchaigno
Copy link
Contributor

pchaigno commented Aug 8, 2018

Should we add a check in generated.rb?

# Return true or false
def generated_gamemakerstudio?
return false unless extname == '.yy' || extname == '.yyp'
return false unless lines.count > 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why require min. 6 lines if we only look at the third?

def generated_gamemakerstudio?
return false unless extname == '.yy' || extname == '.yyp'
return false unless lines.count > 5
return /\"modelName\"\:\s*\"GM/.match(lines[3])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the comment you mention the third line, but here you match on the fourth. Typo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, yeah, was already on it ^_^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(can't run the tests locally so abusing Travis for it :x )

@RobQuistNL
Copy link
Contributor Author

@pchaigno The generated test should work now ^_^

Copy link
Contributor

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Thanks for your patience @RobQuistNL!

@RobQuistNL
Copy link
Contributor Author

Oh, if anything I should thank you for your patience @pchaigno !

Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the PR and welcome to Linguist.

@lildude lildude merged commit 8b752c8 into github-linguist:master Aug 11, 2018
@RobQuistNL
Copy link
Contributor Author

Awesome! Thanks :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GML Projects get marked as Yacc because of .yy files
4 participants