# cics dev stats

## Introduction

The [github-repo-stats GitHub Action](https://github.com/marketplace/actions/github-repo-stats) collects stats and [generates reports for each repository in the cicsdev organisation](https://github.com/cicsdev/repo-stats/tree/github-repo-stats/cicsdev) individually.

This workbook is an attempt to create a report across the data for all cicsdev repositories.

## Fetch and load data

The following curl command will download the latest output of the github-repo-stats action, including the raw data in csv files.

In [None]:
!mkdir -p ../.data && curl -sSL "https://github.com/cicsdev/repo-stats/archive/heads/reports.tar.gz" | tar -C ../.data --strip-components=1 -xzf -

Next, some imports and initial setup.

In [8]:
# Import all the things
import glob
import pandas as pd
from datetime import datetime
from dateutil.relativedelta import relativedelta
from functools import reduce
from numpy import int64

import matplotlib.pyplot as plt
import matplotlib.dates as mdates

import ipywidgets as widgets

In [9]:
# Setup graphs
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
plt.rcParams['figure.figsize'] = [10, 5]
plt.rc('axes', axisbelow=True)

Now load and merge the data.

In [None]:

csv_files = glob.iglob("../.data/**/ghrs-data/views_clones_aggregate.csv", recursive=True)
dfs = [
    pd.read_csv(csv_file, header=0, names=['date', csv_file.split('/')[3]], usecols=[0, 3]) for csv_file in csv_files
]

for df in dfs:
    df['date'] = df['date'].astype('datetime64[ns, UTC]')
    df.set_index('date', inplace = True)

# TODO why doesn't parsing the date work when loading?
# dfs = [
#     pd.read_csv(csv_file, header=0, names=['date', csv_file.split('/')[3]], usecols=[0, 3], parse_dates=[0], date_format=lambda t: pd.to_datetime(t, utc=True).to_datetime64()) for csv_file in csv_files
# ]
# map(lambda df: df.set_index('date', inplace = True), dfs)

merged_df = reduce(lambda left, right: pd.merge(left, right, on='date', how='outer').fillna(0), dfs)
merged_df.tail()



## One month view summary

Choose a month and the minimum number of views to be included in the graph.

In [None]:
from_date = datetime.today().date() - relativedelta(years=1)
dates = [from_date + relativedelta(months=i) for i in range(1, 13)]
options = [(i.strftime('%b %Y'), i) for i in dates]
month = widgets.SelectionSlider(
    options=options,
    index=11,
    description='Month:',
    disabled=False
)
display(month)

cutoff = widgets.IntSlider(
    value=25,
    min=0,
    max=100,
    step=5,
    description='Cut-off:',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
    readout_format='d'
)
display(cutoff)

Plot a graph using the settings above.

In [None]:
monthly_df = merged_df.groupby([lambda x: x.year, lambda x: x.month]).sum()
monthly_df.tail()
month_summary_sf = monthly_df.xs((month.value.year, month.value.month))

mask = month_summary_sf > cutoff.value

month_summary_plot = month_summary_sf[mask].sort_values(ascending=False).plot.bar(title=month.value.strftime('%b %Y'),xlabel='Sample',ylabel='Views')
month_summary_plot.grid(axis='y')

## Quarterly view summary

Choose a quarter and the minimum number of views to be included in the graph.

In [None]:
quarterly_df = merged_df.groupby(lambda x: x.to_period('Q')).sum()

quarters = [(i.strftime('%YQ%q'), i) for i in quarterly_df.index]
quarter = widgets.SelectionSlider(
    options=quarters,
    description='Quarter:',
    disabled=False
)
display(quarter)

qcutoff = widgets.IntSlider(
    value=100,
    min=0,
    max=400,
    step=25,
    description='Cut-off:',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
    readout_format='d'
)
display(qcutoff)

Plot a graph using the settings above.

In [None]:
quarterly_summary_sf = quarterly_df.xs(quarter.value)

qmask = quarterly_summary_sf > qcutoff.value

quarterly_summary_plot = quarterly_summary_sf[qmask].sort_values(ascending=False).plot.bar(title=quarter.value,xlabel='Sample',ylabel='Views')
quarterly_summary_plot.grid(axis='y')

## Future

The notebook is pretty basic at the moment but hopefully it's a useful start! Still to come:

- grant total of all sample views over time

Plus any other requested improvements!