Skip to content

GitHub Gems: Driving Open-Source Investments With Data - A template repository for building a data analytics pipeline to uncover trending open-source projects on GitHub.

Notifications You must be signed in to change notification settings

edsioufi/github-stars-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

GitHub Gems: Driving Open-Source Investments With Data

Welcome to the GitHub Gems project! This project hosts a data analytics pipeline that enables smarter investment decisions by measuring the popularity of open-source repos on Github.

Project Overview

The goal of this project is to develop an efficient data pipeline that streamlines analytics, reduces manual effort, and enables deeper insights into the open-source ecosystem on GitHub. By leveraging modern data tools and best practices, such as dbt (data build tool) and Airflow, we aim to create a scalable and reliable solution for data-driven decision-making.

Getting Started

To get started with the GitHub Gems project, follow these steps (click on the links for guides):

Set up your IDE

ℹ️ Skip some steps if you're already set!

If you already have git, VSCode, and/or Python installed, just skip the corresponding step(s).

  1. If you don't already use git, install it here.

  2. If you don't have a coding editor installed, install VSCode. After that, install the Python and Python extension.

  3. Make sure you have Python 3 installed (or install it here).

Create your personal repo

  1. Create a new repo in your Github account and name it github-stars-pipeline.

  2. Clone this repo.

git clone https://github.com/edsioufi/github-stars-pipeline.git
  1. Point your local clone to your own remote (so that you can modify your copy of the repo, not the template). Make sure you repalce {your_github_username} with the corresponding value.
cd github-stars-pipeline
git remote set-url origin https://github.com/{your_github_username}/github-stars-pipeline.git
  1. Push to your new github repo.
git push origin master

Set up your python environment and DuckDB

  1. Create a python virtual environment for your repo:
python -m venv venv
source venv/bin/activate
  1. Install DuckDB (make sure you select the Python option), your first python dependency.

ℹ️ You might have to install additional dependencies if you're on Windows.

  1. Install DBeaver to explore DuckDB.

  2. Create a new git branch:

git checkout -b add_duck_db
  1. Add your newly installed packages to your requirements file:
pip freeze > requirements.txt
  1. Commit and push:
git add --all
git commit
git push origin -u add_duck_db
  1. Create a Pull Request (PR) in Github.

  2. Merge your first PR.

About

GitHub Gems: Driving Open-Source Investments With Data - A template repository for building a data analytics pipeline to uncover trending open-source projects on GitHub.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published