Skip to content

project-looo/looo

Repository files navigation

Project LOOO (List Of Opensource Organizations)

We have created a list of the top 100K open source organizations: https://www.project-looo.org

This project is built by the community to say thank you for the organizations for their opensource contribution. The list contains the top 100K organizations, based on the number of commits.

The method we used

How did we measure the contribution?

When we measured the contribution we only considered the commits on GitHub. There are many other ways to contribute to a project not just commits but in this particular case we wanted to focus on the commits.

How can we know which company contributed to a repository?

When you check a git history you can see each commit has an author. This has two parts: name and email. The second part of the email is usually the company’s domain. We know in some cases a contributor might use a different email address which doesn’t contain the company email address.

How to assign a commit to a company?

There are around 2.4B public commits in GitHub (since 2011) and we have to analyze each and every one of them to answer this question. Using the GitHub API to extract that amount of data would be impossible. Thanks to the GitHub Archive Project, all the public GitHub events are stored in a publicly available BigQuery database. Using SQL to extract data makes the process much easier.

Clean up the data

After we counted the commits for each company. The data needs to be cleaned. We excluded email providers like gmail, hotmail, yandex etc. And there are some cases when commits were made by a bot, we also excluded them.

Implementation

You can find the code in the etl (Export, Transform, Load) directory.

Roadmap

  • Add ETL source code
  • Automate ETL
  • Add the block list of domains to the repo and make it part of the ETL
  • Add user block list and make it part of the ETL
  • Add Organization description. In case, if an organization needs detailed description, change logo or name.

How to contribute?

This project is far from complete. Still there are number of things to improve. We expect the community to help to keep clean the list.

Special thanks to