Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add .cnf to INI #6309

Merged
merged 3 commits into from
Mar 6, 2023
Merged

Add .cnf to INI #6309

merged 3 commits into from
Mar 6, 2023

Conversation

eggplants
Copy link
Contributor

@eggplants eggplants commented Mar 5, 2023

Description

I would like to add .cnf extension as INI.

.cnf is used for MySQL Option File and OpenSSL CONF library configuration file.

Checklist:

@eggplants eggplants requested a review from a team as a code owner March 5, 2023 19:55
Copy link
Collaborator

@Alhadis Alhadis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Distribution appears to check out (searching for .cnf files yields ~109k results, most of which appear to be MySQL configs.

Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add another .cnf sample, maybe one demonstrating the OpenSSL conf file? This one sample you've added seems to be influencing the classifier such that it is now detecting a .pro file as INI:

Proguard/proguard-rules2.pro BAD (INI)

Hopefully another sample will be enough to correct this 🤞

@lildude
Copy link
Member

lildude commented Mar 6, 2023

😢 that didn't do the trick so I've bumped the number of allowed failures.

@lildude lildude merged commit 735f94c into github-linguist:master Mar 6, 2023
@eggplants eggplants deleted the cnf branch March 6, 2023 10:50
@Alhadis
Copy link
Collaborator

Alhadis commented Mar 6, 2023

@lildude Please ⌘+R my memory: does Linguist's classifier take comments into consideration when analysing keyword frequency? Because the long-winded legalese preceding every GPL-licensed bit of code might skew the classifier's judgement if another candidate language's samples are also half-lawyer-speak, half actual source-code.

The problematic sample you mentioned earlier began with the GPL, so it left me wondering…

@lildude
Copy link
Member

lildude commented Mar 6, 2023

Please ⌘+R my memory: does Linguist's classifier take comments into consideration when analysing keyword frequency?

Yes, and thanks to the tokenizer, it detects it as a comment so ignores things after the hash when classifying. With LINGUIST_DEBUG=2 you'll see lines like this (I forced this by adding .cnf to the Proguard entry):

$ LINGUIST_DEBUG=2 bundle exec bin/github-linguist samples/INI/my.cnf
samples/INI/my.cnf: 107 lines (93 sloc)
  type:      Text
  mime type: text/plain
                    #       INI  Proguard
               #   19    29.260         -
              ##    1         -     4.395
               -   18         -    28.547
              .7    1     6.651         -
            .log    1     6.651         -
            .pid    1     6.651         -
           .sock    3    23.248         -
               /   28   302.228         -
               :    2         -     1.737
               ;    2         -     2.936
               =   47   612.767         -
          AES256    1     6.651         -
               C    1     7.344         -
     CAMELLIA256    1     6.651         -
        COMMENT#   28     1.506         -
        COMMENT;    1     8.848         -
CaseSensitiveOpt    1     6.651         -
[...]
win_path_with_po    1     6.651         -
       INI =  -2044.596 +  -5.312 =  -2049.908
  Proguard =  -4755.497 +  -6.852 =  -4762.349
  language:  INI
$

Stripping the GPL license comment makes no difference:

$ LINGUIST_DEBUG=2 bundle exec bin/github-linguist samples/INI/my.cnf
samples/INI/my.cnf: 107 lines (93 sloc)
  type:      Text
  mime type: text/plain
                    #       INI  Proguard
               #   19    29.260         -
              ##    1         -     4.395
               -   18         -    28.547
              .7    1     6.651         -
            .log    1     6.651         -
            .pid    1     6.651         -
           .sock    3    23.248         -
               /   28   302.228         -
               :    2         -     1.737
               ;    2         -     2.936
               =   47   612.767         -
          AES256    1     6.651         -
               C    1     7.344         -
     CAMELLIA256    1     6.651         -
        COMMENT#   28     1.506         -
        COMMENT;    1     8.848         -
CaseSensitiveOpt    1     6.651         -
[...]
win_path_with_po    1     6.651         -
       INI =  -2044.596 +  -5.312 =  -2049.908
  Proguard =  -4755.497 +  -6.852 =  -4762.349
  language:  INI
$

The problematic sample you mentioned earlier began with the GPL, so it left me wondering…

Yeah, this shouldn't make a difference as pointed out above, but I removed the GPL license text and tested again locally and it didn't make a difference. I still got the same failure that needed the number bump.

@github-linguist github-linguist locked as resolved and limited conversation to collaborators Jun 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants