Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mined repositories languages #18

Closed
wolfenmark opened this issue Sep 10, 2021 · 7 comments
Closed

Mined repositories languages #18

wolfenmark opened this issue Sep 10, 2021 · 7 comments

Comments

@wolfenmark
Copy link

Is there a way to see if some important languages are excluded from the mining?

I have seen the language stats report in the link 'Mined Projects'.
Are these the 13 most widespread languages and everything else is 'below Kotlin' or there are holes with widespread languages in between?

@emadpres
Copy link
Collaborator

Our crawler uses the language filter of GitHub Search API to retrieve repositories (mainly) written in one of the predefined programming languages that you can find on the Mined Projects box — as of now 13 languages. That means, the number of repositories under, let's say, Kotlin indicates the total number of repositories retrieved as a result from GitHub Search API, given language: Kotlin.

However, due to GitHub API issues, sometimes we get repositories written in other languages than what we asked for and that may be the cause of confusion. In such cases, we stick with what we searched for and classifies such repositories under the language we filtered on.

Let me know if you still have any doubts.

@wolfenmark
Copy link
Author

My doubt was if those languages are the most present ones in decreasing order or a '(semi-)arbitrary' subset.
In other words: is there any language with more repos than Kotlin that is omitted from the search results by design?

@emadpres
Copy link
Collaborator

Okay, they are chosen based on their popularity, so you can call it a semi-arbitrary design decision.
And I don't know the answer to your question to be honest. but let me know if you believe an important programming language is missing, maybe Scala?

@wolfenmark
Copy link
Author

Just noticed that Smalltalk/Pharo was not there, but not sure if it can be considered relevant.
Scala might be a better example but I wonder if there's another way to see the language shares for each main language and include the top 13-15-20 accordingly.

@emadpres
Copy link
Collaborator

It would be nice to have such a list of languages

@wolfenmark
Copy link
Author

Not sure if it's exactly what we were talking about but here is a list of languages apparently known to GitHub:

https://github.com/github/linguist/blob/master/lib/linguist/languages.yml

@emadpres
Copy link
Collaborator

Great, I also leave this here for future references: https://madnight.github.io/githut

@seart-group seart-group locked and limited conversation to collaborators Feb 4, 2022
@dabico dabico converted this issue into discussion #20 Feb 4, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants