Allow specifying an ignore file for language statistics #137

Closed
davidfowl opened this Issue Mar 17, 2012 · 20 comments

Projects

None yet
@davidfowl

Some repositories (like SignalR), have samples that include common javascript libraries like jQuery etc. and github ends up classifying the project as javascript instead of C# (in this particular case). Nothing is wrong with this at a high level since jQuery is javascript, but for project maintainers that want more control over statistics need a way to opt out of this behavior.

I see 2 options:

  • Short term hack: Exclude commonly used js files. This will handle some scenarios but you'll have to exclude multiple versions of the library (unless you had wildcard support).
  • Longer term solution: Allow a repository to have a .lignore or equivalent (I suck a naming) that uses glob syntax to exclude files to be processed for language statistics.
@DamianEdwards

This is a must have in my view. It seems ridiculous that a repo can't configure the way stats are collected when so many contain 3rd party code.

@abevoelker
Contributor

Just to note, there is some functionality that already exists to exclude certain files and paths. Just mentioning it in case you weren't aware, since you didn't reference it.

--- README snip ---

Ignore vendored files

Checking other code into your git repo is a common practice. But this often inflates your project's language stats and may even cause your project to be labeled as another language. We are able to identify some of these files and directories and exclude them.

Linguist::FileBlob.new("vendor/plugins/foo.rb").vendored? # => true

See Linguist::BlobHelper#vendored? and lib/linguist/vendor.yml.

Generated file detection

Not all plain text files are true source files. Generated files like minified js and compiled CoffeeScript can be detected and excluded from language stats. As an extra bonus, these files are suppressed in Diffs.

Linguist::FileBlob.new("underscore.min.js").generated? # => true

See Linguist::BlobHelper#generated?.

@josh
Member
josh commented Mar 28, 2012

We've talked about this internally and we want to avoid any sort of per repo configuration. Sorry.

@josh josh closed this Mar 28, 2012
@davidfowl

@josh Why? So this means our repository is broken unless we delete things from it. What about adding more libraries to the list of things to ignore?

@skoon
skoon commented Mar 28, 2012

Josh, could you make those internal reasons public? I'm failing to see a downside for per repo configuration in this case.

@Frogging101

What the above asked, basically. Why would you want to actively avoid giving users the ability to configure such things?

@Droogans

I would like it if doc directories were ignored. Min-ed jQuery typically dominates any project I that I use yardoc to publish my docs, and preview.github.io means I can host those docs online as well, which is a huge advantage.

@Chive
Chive commented Sep 4, 2013
  • 1 for ignoring doc/docs directories
@AnonymousMeerkat

I have 3rd party libraries that are written in a different language, but need to be included in the project itself.

True, language statistics aren't the biggest deal, but it would be very nice if I could just exclude the folder itself from being analyzed.

@danijar
Contributor
danijar commented May 30, 2014

Since the language statistics come from Github rather than from Git, adding an options section to the web interface would also be a solution to think of. Just a textbox with one folder path per line. Those paths inside the repository would be excluded from the language statistics.

You can't cover all libraries by a global exclude list. What about developers of an excluded library? If they use Github, their whole project would get excluded. In my opinion some kind of repository wide configuration is the way to go.

@pgibler
pgibler commented Aug 2, 2014

Any word on this? I have a project done mostly in Ruby. Despite this, Github has determined its a CSS/HTML project, because I wrote documentation and it's included with the project. I should be able to exclude my ./doc folder. Github should hook into this functionailty once it's in Linguist.

@cbelden
cbelden commented Aug 26, 2014

I would love this. I track my js libraries so I can easily deploy to different environments. Right now, I have a python app that is 98% javascript.. a bit misleading.

@AlbertoMonteiro

I made a workaround to force your repository's main language.
Example:

I want to force my repository to be marked as C# repository:

  1. Create somefile.cs
  2. Write this code
using System;

namespace ForceCSharpGitHub
{
    public class Program
    {
        public static void Main(string[] args)
        {
            Console.WriteLine("This is a C# repository"); //Duplicate this line 500,000 times
        }
    }
}

Commit it, push, and you will see now that your repository is a C# repository!

@danijar
Contributor
danijar commented Oct 12, 2014

I can't see any disadvantage of supporting an optional hint file. Where can we discuss this?

@pchaigno
Collaborator

@danijar @AlbertoMonteiro Look at the current PRs, it's already being developed.

@blakev
blakev commented Dec 3, 2014

What's the status on this? I don't want my web projects being labeled as CSS.... -_-

@pchaigno
Collaborator
pchaigno commented Dec 3, 2014

@blakev It was released a few weeks ago: README#Overrides.

@danijar
Contributor
danijar commented Dec 3, 2014

Awesome, thanks for posting the link!

@arcanis
arcanis commented Apr 5, 2015

Will this file (.gitattributes) also be used for github stats? It's always a bit disappointing to see all stat charts destroyed when commiting vendors to the tree. On one repo I have about 935,827 ++ / 849,501 --. I'm not that productive ๐Ÿ˜„

@pchaigno
Collaborator

@arcanis It's a good idea I think. It's not something that can be implemented in Linguist however; you should suggest it to GitHub support. Might be a bit more difficult to implement for them though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment