# Github - Get commits ranking from repository
<a href="https://app.naas.ai/user-redirect/naas/downloader?url=https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/Github/Github_Get_commits_ranking_from_repository.ipynb" target="_parent"><img src="https://naasai-public.s3.eu-west-3.amazonaws.com/open_in_naas.svg"/></a>

**Tags:** #github #commits #stats #naas_drivers #plotly #linechart

## Input

In [1]:
import pandas as pd
import plotly.express as px
# from naas_drivers import github
import naas
from github_driver import Github
github = Github()

## Setup Github
**How to find your personal access token on Github?**

- First we need to create a personal access token to get the details of our organization from here: https://github.com/settings/tokens
- You will be asked to select scopes for the token. Which scopes you choose will determine what information and actions you will be able to perform against the API.
- You should be careful with the ones prefixed with write:, delete: and admin: as these might be quite destructive.
- You can find description of each scope in docs here (https://docs.github.com/en/developers/apps/building-oauth-apps/scopes-for-oauth-apps).

In [2]:
# Github repository url
REPO_URL = "https://github.com/jupyter-naas/awesome-notebooks"

# Github token
GITHUB_TOKEN = "ghp_CEvqR7QauDbNLRiIiwAC1v4xxxxxxxxxxxxx"

## Model

### Get commits from repository url

In [3]:
df_commits = github.connect(GITHUB_TOKEN).repos.get_commits(REPO_URL)
df_commits

Unnamed: 0,ID,MESSAGE,AUTHOR_DATE,AUTHOR_NAME,AUTHOR_EMAIL,COMMITTER_DATE,COMMITTER_NAME,COMMITTER_EMAIL,COMMENTS_COUNT,VERIFICATION_REASON,VERIFICATION_STATUS
0,8caf38067458f94717ee16e547ba7b13ba726d44,generateReadme: Refresh,2022-02-05 12:04:17,FlorentLvr,FlorentLvr@users.noreply.github.com,2022-02-05 12:04:17,FlorentLvr,FlorentLvr@users.noreply.github.com,0,unsigned,False
1,a12c460908b57314bd5f69151bce2fc293d8294e,Merge pull request #316 from jupyter-naas/LK_N...,2022-02-05 12:00:23,FlorentLvr,48032461+FlorentLvr@users.noreply.github.com,2022-02-05 12:00:23,GitHub,noreply@github.com,0,valid,True
2,7c805664729e069d16b601318f601277e3d32d10,fix: manage linkedin limit,2022-02-05 11:57:38,FlorentLvr,,2022-02-05 11:57:38,FlorentLvr,,0,unsigned,False
3,c8b41edfde7689f75293924d34e1f88b8fa7f36e,generateReadme: Refresh,2022-02-04 14:59:10,FlorentLvr,FlorentLvr@users.noreply.github.com,2022-02-04 14:59:10,FlorentLvr,FlorentLvr@users.noreply.github.com,0,unsigned,False
4,df44df68499601abd73d7b58208ffcf1a61bde72,Merge pull request #317 from jupyter-naas/Get_...,2022-02-04 14:55:07,FlorentLvr,48032461+FlorentLvr@users.noreply.github.com,2022-02-04 14:55:07,GitHub,noreply@github.com,0,valid,True
...,...,...,...,...,...,...,...,...,...,...,...
707,0ea23b89ce2a6066c7109e5ee4114d812378e4e2,Update README.md,2020-10-29 08:36:19,BobCashStory,47117399+BobCashStory@users.noreply.github.com,2020-10-29 08:36:19,GitHub,noreply@github.com,0,valid,True
708,83ecdbdfbd26bb9ac13b0735d7cc134e38d3b860,Update README.md,2020-10-29 08:35:09,BobCashStory,47117399+BobCashStory@users.noreply.github.com,2020-10-29 08:35:09,GitHub,noreply@github.com,0,valid,True
709,0a52defaf0c3f9b34f264c48da73d5ed2e40aca8,Update README.md,2020-10-29 08:34:28,BobCashStory,47117399+BobCashStory@users.noreply.github.com,2020-10-29 08:34:28,GitHub,noreply@github.com,0,valid,True
710,58cf4de85a2c375b3699abf1db312bd300a8eb1c,Update README.md,2020-10-29 08:34:04,BobCashStory,47117399+BobCashStory@users.noreply.github.com,2020-10-29 08:34:04,GitHub,noreply@github.com,0,valid,True


## Output

### Get commits ranking by user

In [None]:
def get_commits(df):
    # Exclude Github commits
    df = df[(df.COMMITTER_EMAIL.str[-10:] != "github.com")]
    
    # Groupby and count
    df = df.groupby(["AUTHOR_NAME"], as_index=False).agg({"ID": "count"})
    
    # Cleaning
    df = df.rename(columns={"ID": "NB_COMMITS"})
    return df.sort_values(by="NB_COMMITS", ascending=False).reset_index(drop=True)

df = get_commits(df_commits)
df

### Plot a bar chart of commit activity

In [None]:
def create_barchart(df, repository):
    # Get repository
    repository = repository.split("/")[-1]
    
    # Sort df
    df = df.sort_values(by="NB_COMMITS")
    
    # Calc commits
    commits = df.NB_COMMITS.sum()
    
    # Create fig
    fig = px.bar(df,
                 y="AUTHOR_NAME",
                 x="NB_COMMITS",
                 orientation='h',
                 title=f"Github - {repository} : Commits by user <br><span style='font-size: 13px;'>Total commits: {commits}</span>",
                 text="NB_COMMITS",
                 labels={"AUTHOR_NAME": "Author",
                         "NB_COMMITS": "Nb commits"}
                 )
    fig.update_traces(marker_color='black')
    fig.update_layout(
        plot_bgcolor="#ffffff",
        width=1200,
        height=800,
        font=dict(family="Arial", size=14, color="black"),
        paper_bgcolor="white",
        xaxis_title=None,
        xaxis_showticklabels=False,
        yaxis_title=None,
        margin_pad=10,
    )
    fig.show()
    return fig

fig = create_barchart(df, REPO_URL)

### Save and export html

In [None]:
output_path = f"{REPO_URL.split('/')[-1]}_commits_ranking.html"
fig.write_html(output_path)
naas.asset.add(output_path, params={"inline": True})