Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Java Properties language for .properties extension #4098

Merged
merged 5 commits into from Oct 5, 2018

Conversation

eager
Copy link
Contributor

@eager eager commented Apr 14, 2018

This pull adds a Java Properties language for .properties.

Description

I noticed that our some of our project’s .properties files had incorrect syntax highlighting after escaped #:

Screen shot of incorrect syntax highlighting

Because INI and Java Properties are very similiar—Codemirror’s properties mode appears to be a super-set of both—the disambiguation heuristic is somewhat complex. It replicates an outer-if block ([key]=[value]) that will match INI, unless the file also matches Java Properties’ comment markers (^[#!]), similar to the following code:

if /^[^#!;][^=]*=/.match(data)
  # Match all key=value
  if /^[;\[]/.match(data)
    # Match INI comment token or sections
    Language["INI"]
  elsif /^[#!]/.match(data)
    # Match Java comment tokens
    Language["Java Properties"]
  else
    # Assume key=value is INI, if no comment tokens
    Language["INI"]
  end
elsif /^[^#!][^:]*:/.match(data)
  # Match key:value
  Language["Java Properties"]
end

Checklist:

Copy link
Contributor

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting the pull request @eager!

Because .properties would remain unambiguous, I didn’t add samples.

We still require a sample file for each extension.

LGTM otherwise!

@pchaigno
Copy link
Contributor

Using @Alhadis' Harvester script, I collected 2913 sample urls (from the 19M total) from the search extension:properties and downloaded the 615 corresponding repositories. According to Linguist, out of those 615 repositories with at least one .properties file, 226 are 100% Java, 457 have some Java (including the 226), and 158 don't contain any Java.

@eager Based on this analysis, I think we'll need to keep the .properties extension as an INI extension too, find a sample for INI, and add a heuristic rule to disambiguate the two.

@eager
Copy link
Contributor Author

eager commented Apr 16, 2018

@pchaigno thanks for the review and pointer to Harvester. I had only been doing spot-checking of about a dozen results, so I’ll grab some more samples from there for disambiguation.

A few questions:

  1. Are there project guidelines for how to handle ambiguous languages that surface during a samples search, but that aren‘t in the original scope?

I noticed in my initial spot-checking that there was some usage of .properties that were neither INI nor Java Properties, such as this. My best guess is they are for the OpenAstexViewer Scripting Language.

  1. Any sense of which language we should default to when it’s impossible to disambiguate? I don’t think it will be possible to disambiguate the most basic sample:
key = value
  1. Would it help to remove group: Java from the language definition, so that if we lean toward Java Properties, for its more robust syntax highlighting, it doesn’t add misleading repo statistics?

@pchaigno
Copy link
Contributor

  1. Are there project guidelines for how to handle ambiguous languages that surface during a samples search, but that aren‘t in the original scope?

Not really. When we do find such languages we consider them for language addition though. That means we then follow the appropriate steps from the contribution guidelines. Of course, we only add them if they meet the in-the-wild criteria.

I noticed in my initial spot-checking that there was some usage of .properties that were neither INI nor Java Properties, such as this. My best guess is they are for the OpenAstexViewer Scripting Language.

Then, we should first try to estimate their number on GitHub. Do you know keywords from that language we could use for the search? (From the documentation you linked to, it's not clear to me whether charge should be considered a keyword.)

  1. Any sense of which language we should default to when it’s impossible to disambiguate? I don’t think it will be possible to disambiguate the most basic sample:

Usually, we don't assign any language in the Heuristic strategy if we can't recognize a known pattern or keyword. Not assigning a language, means we defer to the next strategy, the Classifier strategy. I'm expecting there are much more INI files than Java Properties files, so I'd be tempted to default to INI here, if we identify such a key=value construct.

@lildude @Alhadis What do you think (about defaulting to INI for key=value constructs vs. deferring to Classifier)?

  1. Would it help to remove group: Java from the language definition, so that if we lean toward Java Properties, for its more robust syntax highlighting, it doesn’t add misleading repo statistics?

Yes, I think we should remove it from the Java group in any case. It feels a bit weird to have a data language being marked as a parent programming language (and I'm not sure how that would end up counting the files...).

@lildude
Copy link
Member

lildude commented Apr 17, 2018

@lildude @Alhadis What do you think (about defaulting to INI for key=value constructs vs. deferring to Classifier)?

As this is only for the .properties extension, I think we might be safe to do so. There are definitely a lot of uses of it as an INI-type file which clearly isn't related to Java, but I could be missing something.

@lildude
Copy link
Member

lildude commented Jun 21, 2018

Nudge.

@pchaigno
Copy link
Contributor

pchaigno commented Jul 19, 2018

Sorry for the delay @eager.

Could you add a test for the new heuristic rules in test_heuristics.rb? We should be good to go after that.

EDIT: Actually, there's a failure we need to handle. I'll have a look.

Language["INI"]
end
end
if /^[^#!][^:]*:/.match(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an elsif? It's either key=value or key:value, right? I think that will fix the test failure.

@pchaigno
Copy link
Contributor

@eager Did you get a chance to work again on this lately?

@eager eager changed the title Switch .properties extension from INI to new Java Properties language Add Java Properties language to .properties extension Sep 16, 2018
@eager eager changed the title Add Java Properties language to .properties extension Add Java Properties language for .properties extension Sep 16, 2018
@eager
Copy link
Contributor Author

eager commented Sep 16, 2018

@pchaigno apologies for disappearing on this for a while. Updated, with the test failure fixed.

@lildude lildude merged commit e3eb05e into github-linguist:master Oct 5, 2018
@eager eager deleted the java-properties branch October 19, 2018 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants