# Cohort Analysis

Track a cohort of projects across a set of metrics over time

## Getting Started

Before running any analysis, you'll need to set up your environment.

Start your Python notebook with the following:

In [2]:
from dotenv import load_dotenv
import os
import pandas as pd
from pyoso import Client

load_dotenv()
OSO_API_KEY = os.environ['OSO_API_KEY']
client = Client(api_key=OSO_API_KEY)

## Discover Projects Deployed in an Ecosystem from the OSS Directory

The OSS Directory is a curated registry of open-source software projects. The query below filters this directory to highlight projects with verifiable onchain deployments on Arbitrum One. 

In [4]:

query = """
SELECT 
    project_id, 
    artifact_name, 
    artifact_type
FROM int_artifacts_by_project_in_ossd
WHERE 
    artifact_source = 'ARBITRUM_ONE'
    AND artifact_type IN ('CONTRACT', 'FACTORY', 'DEPLOYER')
LIMIT 10
"""

df = client.to_pandas(query)
df

Unnamed: 0,project_id,artifact_name,artifact_type
0,7ozZvGAAa3yVIqj87k+BjDGoNQX2WIjYuSDjJtL/1kw=,0x03475494dc89d378c4268e90a62876efb0278a1a,CONTRACT
1,VCeNot3diK3Kt0Da9SaRDvcNI1sGdrzyjTW62JUeI0U=,0xbb6f8affdca29aa1a282e6a21b192b6513a57f9a,CONTRACT
2,VCeNot3diK3Kt0Da9SaRDvcNI1sGdrzyjTW62JUeI0U=,0x67b208dbb6bdcacf196568083012d170be3e6f0c,CONTRACT
3,9f/wyYnV87C9zOzI22LcgbKAO+yQQI2fx9o3tVVjHMg=,0x37115cbfce229f3d65073dd155bc0cb4a39d9454,CONTRACT
4,+pN6WZUkcWuzSjoQjjIzSFWmXIfMRqVYe/odhLWSRjA=,0x5a5c0c4832828ff878ce3ab4fec44d21200b1496,CONTRACT
5,SrAW7CxbX2q6KzPf1xXwJsc5IQZqIkDDOkAx3/nTS0M=,0xf58d5d56aefdf755dbbf6e636b83f39af2f31977,CONTRACT
6,TjD5PhF05YUmsgZxD2V7ZtHxH+xX2TFziydSTO438Z0=,0x689f6606538d063d2687b0ca38b2369b1ed33e53,CONTRACT
7,K2QVL0gZxJtelm0K+6ChSVx7Eu1EIVIaKY+ocdsHUS4=,0xc5295c6a183f29b7c962df076819d44e0076860e,CONTRACT
8,vduYow9GAP+Ngi15XpPtR+yidOaSBA/y1b40Uhm+Tf4=,0x6de33698e9e9b787e09d3bd7771ef63557e148bb,CONTRACT
9,mxQJCrQluMzq+x8lNrdfNDyQfXJDteM3KYc0+7vtb1w=,0xe835b7ab7807d1ef33c9fbe1854983292040d7e1,CONTRACT


## Track Developer Activity for Arbitrum-Deployed Projects

By joining with timeseries metrics, we can observe GitHub activity trends for each project across 2024 and 2025. 

In [5]:

query = """
WITH arb_projects AS (
    SELECT 
        distinct project_id
    FROM int_artifacts_by_project_in_ossd
    WHERE 
        artifact_source = 'ARBITRUM_ONE'
        AND artifact_type IN ('CONTRACT', 'FACTORY', 'DEPLOYER')
)
SELECT  
    distinct p.display_name as Name,
    m.metric_name as Metric,
    ts.sample_date as Date,
    ts.amount as Value
FROM metrics_v0 m
JOIN timeseries_metrics_by_project_v0 ts
    on m.metric_id = ts.metric_id
JOIN projects_v1 p
    on p.project_id = ts.project_id
JOIN arb_projects a
    on p.project_id = a.project_id
WHERE 
    metric_name = 'GITHUB_active_developers_monthly'
    AND YEAR(ts.sample_date) in (2024, 2025)
"""

df_arb = client.to_pandas(query)
df_arb

Unnamed: 0,Name,Metric,Date,Value
0,OpenSea,GITHUB_active_developers_monthly,2024-12-01,2
1,Metamask,GITHUB_active_developers_monthly,2024-12-01,100
2,Premia,GITHUB_active_developers_monthly,2024-12-01,1
3,Hop Protocol,GITHUB_active_developers_monthly,2024-12-01,3
4,dHedge,GITHUB_active_developers_monthly,2024-12-01,2
...,...,...,...,...
2093,Solv Protocol,GITHUB_active_developers_monthly,2024-10-01,1
2094,Superfluid,GITHUB_active_developers_monthly,2024-10-01,4
2095,Arrakis Finance,GITHUB_active_developers_monthly,2024-10-01,4
2096,Tigris-Trade,GITHUB_active_developers_monthly,2024-10-01,1


## Analyze GitHub Activity of Arbitrum Stylus Grant Program Projects

In contrast to the previous query which filtered projects based on onchain deployment on Arbitrum One, this query focuses on all Github-related metrics for projects that are part of the ‘arb-stylus’ collection—a set representing participants in the Arbitrum Stylus grant program.

In [6]:
query = """
SELECT  
    distinct p.display_name as Name,
    m.metric_name as Metric,
    ts.sample_date as Date,
    ts.amount as Value
FROM metrics_v0 m
JOIN timeseries_metrics_by_project_v0 ts
    on m.metric_id = ts.metric_id
JOIN projects_v1 p
    on p.project_id = ts.project_id
JOIN projects_by_collection_v1 pc
    on p.project_id = pc.project_id
WHERE 
    metric_name like 'GITHUB_%'
    AND YEAR(ts.sample_date) in (2024, 2025)
    AND pc.collection_name = 'arb-stylus'
"""

df_stylus = client.to_pandas(query)
df_stylus

Unnamed: 0,Name,Metric,Date,Value
0,Trail of Bits Security Reviews,GITHUB_releases_daily,2024-06-20,1.0
1,Runtime Verification,GITHUB_releases_daily,2024-06-20,13.0
2,Walnut,GITHUB_comments_daily,2024-06-20,56.0
3,Open Source Observer,GITHUB_comments_daily,2024-06-20,7.0
4,Trail of Bits Security Reviews,GITHUB_repositories_daily,2024-06-20,19.0
...,...,...,...,...
46079,Trail of Bits Security Reviews,GITHUB_opened_pull_requests_daily,2024-08-13,6.0
46080,Runtime Verification,GITHUB_merged_pull_requests_daily,2024-08-13,8.0
46081,Walnut,GITHUB_opened_issues_daily,2024-08-13,1.0
46082,Runtime Verification,GITHUB_closed_issues_daily,2024-08-13,1.0


## Visualize the Results

This heatmap provides a visual overview of monthly active developer activity for each project in the collection. You can replace the metric with another measure—such as GITHUB_commits_monthly or GITHUB_merged_pull_requests_monthly—to explore different dimensions of developer engagement.

In [10]:
# Create a heatmap of active developers by project and month
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime
import numpy as np

# Filter for active developers metric
df_active_devs = df_stylus[df_stylus['Metric'] == 'GITHUB_active_developers_monthly'].copy()

# Convert date to month-year format
df_active_devs['Month'] = pd.to_datetime(df_active_devs['Date']).dt.strftime('%Y-%m')

# Pivot the data for heatmap
heatmap_data = df_active_devs.pivot(index='Name', columns='Month', values='Value')

# Fill NaN values with 0 for visualization
heatmap_data = heatmap_data.fillna(0)

# Create the heatmap
fig = go.Figure(data=go.Heatmap(
    z=heatmap_data.values,
    x=heatmap_data.columns,
    y=heatmap_data.index,
    colorscale='Viridis',
    colorbar=dict(title='Active Developers'),
    hoverongaps=False,
    text=heatmap_data.values,  # Show values on hover
    texttemplate='%{text:.0f}',  # Format as integers
    textfont={"size": 10},
    hovertemplate='Project: %{y}<br>Month: %{x}<br>Active Developers: %{z}<extra></extra>'
))

# Update layout
fig.update_layout(
    title='Monthly Active Developers by Project',
    xaxis_title='Month',
    yaxis_title='Project',
    height=800,  # Adjust height based on number of projects
    width=1200,  # Adjust width based on number of months
    xaxis=dict(tickangle=45),
    margin=dict(l=200)  # Increase left margin for project names
)

# Show the plot
fig.show()