GitHub stars and forks are valuable metrics for measuring a repository's popularity and adoption. Stars indicate interest or appreciation for a project, while forks represent developers who have copied the repository to potentially contribute or build upon it. Tracking these metrics over time can provide valuable insights into a project's growth trajectory and community engagement.

In this blog post, we will look at how to create a tool for monitoring any public GitHub repository's growth by fetching, processing, and visualizing stars and forks data over time.

## Why Track GitHub Repository Metrics?

Before diving into the implementation, let's understand why tracking these metrics is important:

1. **Project Health Assessment**: Steady growth in stars and forks often indicates a healthy, valuable open-source project.

2. **Marketing Impact**: Spikes in engagement may correlate with marketing efforts, conference talks, or major releases.

3. **Competitive Analysis**: Comparing multiple repositories in the same domain can reveal market trends and preferences.

4. **Strategic Decision Making**: Understanding growth patterns can inform decisions about feature prioritization, release timing, and community outreach and whether we should adopt and use the open source
project . 

## Implementation Overview

Our implementation will follow these key steps:

1. Authenticate with the GitHub API
2. Fetch stars and forks data for specified repositories
3. Process the timeline data
4. Create interactive visualizations
5. Identify key milestones in the repository's history

Let's explore each step in detail.

## Required Libraries

First, we need to install the required libraries. We'll use:
- PyGithub: For interacting with the GitHub API
- pandas: For data manipulation
- plotly: For interactive visualizations
- tqdm: For progress bars

In [2]:
# Install required packages
!pip install PyGithub pandas plotly tqdm jupyter ipywidgets jupyterlab_widgets



## Setting Up GitHub Authentication

To use the GitHub API, we need to authenticate. We'll use a personal access token, which you can create in your GitHub account settings.

**Note**: Keep your token secure and never commit it to version control. For this notebook, we'll use environment variables or prompt for input.



In [5]:
import os
import getpass
from github import Github
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from tqdm.notebook import tqdm
from datetime import datetime, timedelta

# Get token from environment variable or prompt
github_token = os.environ.get('GITHUB_TOKEN') or getpass.getpass('Enter your GitHub token: ')

# Initialize GitHub client
g = Github(github_token)

## Fetching Repository Data

Now, let's create a function to fetch data for a specific repository. We'll need to:

1. Find the repository by name
2. Get the stargazers (people who have starred the repo)
3. Get the forks
4. Organize the data by date

In [6]:
def fetch_repo_data(repo_name):
    """
    Fetch stars and forks data for a given repository.
    
    Parameters:
    -----------
    repo_name : str
        Repository name in format "owner/repo" (e.g., "facebook/react")
        
    Returns:
    --------
    tuple
        Pandas DataFrames containing stars and forks data
    """
    print(f"Fetching data for repository: {repo_name}")
    
    try:
        # Get repository
        repo = g.get_repo(repo_name)
        
        # Get basic repository info
        print(f"Repository: {repo.full_name}")
        print(f"Description: {repo.description}")
        print(f"Stars: {repo.stargazers_count}")
        print(f"Forks: {repo.forks_count}")
        
        # Fetch stargazers with timestamps (if repo has many stars, this can take time)
        print("\nFetching stargazers timeline...")
        stars_data = []
        

        stars_paginated=repo.get_stargazers_with_dates()

        # Use with_dates=True to get timestamps
        # Use tqdm for a progress bar
        # Fetch stargazers with timestamps
        try:
            stars_paginated = repo.get_stargazers_with_dates()


            for star in tqdm(list(stars_paginated), desc="Fetching stars"):
                stars_data.append({
                    'user': star.user.login,
                    'starred_at': star.starred_at
                })
        except Exception as star_error:
            import traceback
            traceback.print_exc()
            print(f"Error fetching stargazers: {star_error}")
            # If we can't get stargazers with dates, try to get basic stargazers info
            if len(stars_data) == 0:
                print("Attempting to fetch basic stargazer information...")
                current_time = datetime.now()
                for star in tqdm(list(repo.get_stargazers()), desc="Fetching stars"):
                    stars_data.append({
                    'user': star.login,
                    'starred_at': current_time  # Placeholder timestamp
                    })
                print("Note: Timestamps not available, using current time as placeholder")

        
        
        # Fetch forks (unfortunately direct timestamps aren't available through the API)
        # We'll use creation date as a proxy
        print("\nFetching forks timeline...")
        forks_data = []
        
        for fork in tqdm(list(repo.get_forks()), 
                         desc="Fetching forks"):
            forks_data.append({
                'user': fork.owner.login,
                'forked_at': fork.created_at
            })
        
        # Convert to DataFrames
        stars_df = pd.DataFrame(stars_data)
        forks_df = pd.DataFrame(forks_data)
        
        return stars_df, forks_df
    
    except Exception as e:
        print(f"Error fetching repository data: {e}")
        return None, None

In [7]:
name="Neoteroi/BlackSheep"
stars_df, forks_df=fetch_repo_data(name)

print(f"Stars data: {len(stars_df)} records")
print(f"Forks data: {len(forks_df)} records")
print("\nStars DataFrame Preview:")
print(stars_df.head())
print("\nForks DataFrame Preview:")
print(forks_df.head())



Fetching data for repository: Neoteroi/BlackSheep
Repository: Neoteroi/BlackSheep
Description: Fast ASGI web framework for Python
Stars: 2161
Forks: 82

Fetching stargazers timeline...


Fetching stars:   0%|          | 0/2161 [00:00<?, ?it/s]


Fetching forks timeline...


Fetching forks:   0%|          | 0/83 [00:00<?, ?it/s]

Stars data: 2161 records
Forks data: 83 records

Stars DataFrame Preview:
         user                starred_at
0  headsrooms 2018-12-20 09:30:23+00:00
1       trbck 2019-01-03 20:34:14+00:00
2        1st1 2019-02-19 14:05:11+00:00
3  khasanovbi 2019-02-19 14:10:36+00:00
4  hellysmile 2019-02-19 15:58:59+00:00

Forks DataFrame Preview:
           user                 forked_at
0      cve-zh00 2025-04-29 12:58:47+00:00
1  arthurbrenno 2025-03-28 10:46:15+00:00
2      stollero 2025-01-30 07:26:48+00:00
3      mbrukman 2024-12-18 23:22:09+00:00
4    waketzheng 2024-12-14 05:41:58+00:00


## Processing Timeline Data

Once we have the raw data, we need to process it to create a timeline view. Let's create functions to:

1. Group stars and forks by day/week/month
2. Calculate cumulative counts
3. Create a unified timeline dataframe

In [10]:
def process_timeline_data(stars_df, forks_df, freq='D'):
    """
    Process the stars and forks data to create timeline dataset.
    
    Parameters:
    -----------
    stars_df : pandas.DataFrame
        DataFrame containing stars data
    forks_df : pandas.DataFrame
        DataFrame containing forks data
    freq : str
        Frequency for grouping ('D' for daily, 'W' for weekly, 'M' for monthly)
        
    Returns:
    --------
    pandas.DataFrame
        DataFrame with timeline data
    """
    if stars_df is None or forks_df is None or stars_df.empty or forks_df.empty:
        print("No data available to process")
        return None
    
    # Ensure datetime format
    if not pd.api.types.is_datetime64_dtype(stars_df['starred_at']):
        stars_df['starred_at'] = pd.to_datetime(stars_df['starred_at'])
    
    if not pd.api.types.is_datetime64_dtype(forks_df['forked_at']):
        forks_df['forked_at'] = pd.to_datetime(forks_df['forked_at'])
    
    # Group by date with specified frequency
    #print(stars_df['starred_at'].head())


    stars_count = stars_df.groupby(pd.Grouper(key='starred_at', freq=freq)).size()
    forks_count = forks_df.groupby(pd.Grouper(key='forked_at', freq=freq)).size()
    
    #print(f"Stars count: {stars_count}")
    #print(f"Forks count: {forks_count}")
    
    min1=min(stars_count.index)
    # Create a complete date range
    start_date = min(stars_count.index.min(), forks_count.index.min())
    print(f"Start date: {start_date} {stars_df['starred_at'].min()} {forks_df['forked_at'].min()}")
    end_date = max(stars_count.index.max(), forks_count.index.max())
    

    
    date_range = pd.date_range(start=start_date, end=end_date, freq=freq)
    
    print(f"Date range: {start_date} to {end_date}")

    # Create timeline dataframe
    timeline_df = pd.DataFrame(index=date_range)
    timeline_df.index.name = 'date'
    
    # Add daily counts
    timeline_df['new_stars'] = stars_count
    timeline_df['new_forks'] = forks_count

    #print(f"Timeline DataFrame (before fillna):\n{timeline_df.head()}")
    
    # Fill NaN values with 0
    timeline_df.fillna(0, inplace=True)

    #print(f"Timeline DataFrame:\n{timeline_df.head()}")
    
    # Calculate cumulative counts
    timeline_df['total_stars'] = timeline_df['new_stars'].cumsum()
    timeline_df['total_forks'] = timeline_df['new_forks'].cumsum()
    
    # Reset index to make date a column
    timeline_df = timeline_df.reset_index()
    
    return timeline_df

In [11]:
# Process the timeline data
# Using 'ME' frequency which is month end - this groups data by calendar months
timeline_df = process_timeline_data(stars_df, forks_df, freq='M')

# Display a preview of the processed timeline data
print("\nTimeline DataFrame Preview:")
print(timeline_df.head())
print(f"Timeline DataFrame Length: {len(timeline_df)} records")

# Show summary statistics for better understanding of the data
print("\nSummary Statistics:")
print(f"Date range: {timeline_df['date'].min()} to {timeline_df['date'].max()}")
print(f"Total stars accumulated: {timeline_df['total_stars'].max()}")
print(f"Total forks accumulated: {timeline_df['total_forks'].max()}")
print(f"Average new stars per month: {timeline_df['new_stars'].mean():.2f}")
print(f"Average new forks per month: {timeline_df['new_forks'].mean():.2f}")


Start date: 2018-12-31 00:00:00+00:00 2018-12-20 09:30:23+00:00 2019-02-19 20:11:32+00:00
Date range: 2018-12-31 00:00:00+00:00 to 2025-05-31 00:00:00+00:00

Timeline DataFrame Preview:
                       date  new_stars  new_forks  total_stars  total_forks
0 2018-12-31 00:00:00+00:00          1        0.0            1          0.0
1 2019-01-31 00:00:00+00:00          1        0.0            2          0.0
2 2019-02-28 00:00:00+00:00         23        1.0           25          1.0
3 2019-03-31 00:00:00+00:00          2        0.0           27          1.0
4 2019-04-30 00:00:00+00:00         80        0.0          107          1.0
Timeline DataFrame Length: 78 records

Summary Statistics:
Date range: 2018-12-31 00:00:00+00:00 to 2025-05-31 00:00:00+00:00
Total stars accumulated: 2161
Total forks accumulated: 83.0
Average new stars per month: 27.71
Average new forks per month: 1.06


  stars_count = stars_df.groupby(pd.Grouper(key='starred_at', freq=freq)).size()
  forks_count = forks_df.groupby(pd.Grouper(key='forked_at', freq=freq)).size()
  date_range = pd.date_range(start=start_date, end=end_date, freq=freq)


## Visualizing the Data

Now that we have processed the timeline data, let's create visualizations to better understand the growth patterns. We'll use Plotly to create interactive charts.

For visualizations, we'll use Plotly, which creates interactive charts perfect for exploring time-series data:

In [None]:
    def plot_timeline(timeline_df, repo_name):
        """
        Create visualizations for the repository growth timeline.
        
        Parameters:
        -----------
        timeline_df : pandas.DataFrame
            DataFrame containing timeline data
        repo_name : str
            Name of the repository
        """
        if timeline_df is None or timeline_df.empty:
            print("No data available to plot")
            return
        
        # Plot 1: Cumulative growth
        fig1 = go.Figure()
        
        fig1.add_trace(go.Scatter(
            x=timeline_df['date'],
            y=timeline_df['total_stars'],
            mode='lines',
            name='Stars',
            line=dict(color='gold', width=3)
        ))
        
        fig1.add_trace(go.Scatter(
            x=timeline_df['date'],
            y=timeline_df['total_forks'],
            mode='lines',
            name='Forks',
            line=dict(color='blue', width=3)
        ))
        
        fig1.update_layout(
            title=f'{repo_name} - Cumulative Stars and Forks Over Time',
            xaxis_title='Date',
            yaxis_title='Count',
            legend_title='Metric',
            template='plotly_white',
            height=600
        )
        
        fig1.show()
        
        # Plot 2: New stars and forks per period
        fig2 = go.Figure()
        
        fig2.add_trace(go.Bar(
            x=timeline_df['date'],
            y=timeline_df['new_stars'],
            name='New Stars',
            marker_color='gold'
        ))
        
        fig2.add_trace(go.Bar(
            x=timeline_df['date'],
            y=timeline_df['new_forks'],
            name='New Forks',
            marker_color='blue'
        ))
        
        fig2.update_layout(
            title=f'{repo_name} - New Stars and Forks Over Time',
            xaxis_title='Date',
            yaxis_title='Count',
            barmode='group',
            legend_title='Metric',
            template='plotly_white',
            height=600
        )
        
        fig2.show()
        
        # Plot 3: Growth rate - rolling average
        window_size = 30  # 30-day window for smoother trend
        timeline_df['stars_growth_rate'] = timeline_df['new_stars'].rolling(window=window_size).mean()
        timeline_df['forks_growth_rate'] = timeline_df['new_forks'].rolling(window=window_size).mean()
        
        fig3 = go.Figure()
        
        fig3.add_trace(go.Scatter(
            x=timeline_df['date'],
            y=timeline_df['stars_growth_rate'],
            mode='lines',
            name=f'Stars ({window_size}-day avg)',
            line=dict(color='gold', width=3)
        ))
        
        fig3.add_trace(go.Scatter(
            x=timeline_df['date'],
            y=timeline_df['forks_growth_rate'],
            mode='lines',
            name=f'Forks ({window_size}-day avg)',
            line=dict(color='blue', width=3)
        ))
        
        fig3.update_layout(
            title=f'{repo_name} - Growth Rate (Rolling Average)',
            xaxis_title='Date',
            yaxis_title=f'Average New Count (per {window_size} days)',
            legend_title='Metric',
            template='plotly_white',
            height=600
        )
        
        fig3.show()

In [None]:


plot_timeline(timeline_df, name)

## Analyzing Key Milestones

Let's create a function to identify key milestones in a repository's growth, such as:
1. Days with unusually high star/fork activity
2. Significant growth accelerations
3. Milestone achievements (1k, 10k stars, etc.)

In [51]:
def identify_milestones(timeline_df):
    """
    Identify key milestones and interesting points in the repository's history.
    
    Parameters:
    -----------
    timeline_df : pandas.DataFrame
        DataFrame containing timeline data
        
    Returns:
    --------
    pandas.DataFrame
        DataFrame with milestone information
    """
    if timeline_df is None or timeline_df.empty:
        return None
    
    milestones = []
    
    # 1. Find dates when the repository hit significant star counts
    significant_counts = [100, 1000, 5000, 10000, 25000, 50000, 100000]
    
    for count in significant_counts:
        milestone_row = timeline_df[timeline_df['total_stars'] >= count].iloc[0] if any(timeline_df['total_stars'] >= count) else None
        if milestone_row is not None:
            milestones.append({
                'date': milestone_row['date'],
                'type': 'achievement',
                'description': f'Reached {count:,} stars',
                'stars': milestone_row['total_stars'],
                'forks': milestone_row['total_forks']
            })
    
    # 2. Find days with unusually high star activity (>3 standard deviations)
    mean_stars = timeline_df['new_stars'].mean()
    std_stars = timeline_df['new_stars'].std()
    threshold = mean_stars + (3 * std_stars)
    
    exceptional_days = timeline_df[timeline_df['new_stars'] > max(threshold, 10)]
    
    for _, row in exceptional_days.iterrows():
        milestones.append({
            'date': row['date'],
            'type': 'peak_activity',
            'description': f'Exceptional growth: {int(row["new_stars"])} new stars in one day',
            'stars': row['total_stars'],
            'forks': row['total_forks']
        })
    
    # Convert to DataFrame and sort by date
    milestones_df = pd.DataFrame(milestones)
    if not milestones_df.empty:
        milestones_df = milestones_df.sort_values('date')
    
    return milestones_df

In [55]:
# Identify milestones
milestones_df = identify_milestones(timeline_df)
print("\nMilestones DataFrame Preview:")
print(milestones_df.head() if milestones_df is not None else "No milestones found")
# Plot milestones on the timeline

def plot_milestones(timeline_df, milestones_df, repo_name):
    """
    Plot the repository timeline with milestones highlighted.
    
    Parameters:
    -----------
    timeline_df : pandas.DataFrame
        DataFrame containing timeline data
    milestones_df : pandas.DataFrame
        DataFrame containing milestone data
    repo_name : str
        Name of the repository
    """
    if timeline_df is None or milestones_df is None or timeline_df.empty or milestones_df.empty:
        print("No data available to plot")
        return
    
    fig = go.Figure()
    
    # Add cumulative stars and forks
    fig.add_trace(go.Scatter(
        x=timeline_df['date'],
        y=timeline_df['total_stars'],
        mode='lines',
        name='Total Stars',
        line=dict(color='gold', width=3)
    ))
    
    fig.add_trace(go.Scatter(
        x=timeline_df['date'],
        y=timeline_df['total_forks'],
        mode='lines',
        name='Total Forks',
        line=dict(color='blue', width=3)
    ))
    
    # Add milestones with numbers
    for i, milestone in milestones_df.iterrows():
        milestone_num = i + 1  # Start numbering from 1
        fig.add_trace(go.Scatter(
            x=[milestone['date']],
            y=[milestone['stars']],
            mode='markers+text',
            name=f"M{milestone_num}: {milestone['description']}",
            marker=dict(size=12, color='red', symbol='star'),
            text=[f"M{milestone_num}"],
            textposition='top center',
            hoverinfo='text',
            hovertext=f"Milestone {milestone_num}: {milestone['description']}<br>Date: {milestone['date'].strftime('%Y-%m-%d')}<br>Stars: {milestone['stars']}"
        ))
    
    fig.update_layout(
        title=f'{repo_name} - Repository Growth with Milestones',
        xaxis_title='Date',
        yaxis_title='Count',
        legend_title='Metrics & Milestones',
        template='plotly_white',
        height=600,
        # Position legend to the right of the plot
        legend=dict(
            orientation="v",
            yanchor="top",
            y=1.0,
            xanchor="left",
            x=1.01,
            bordercolor="Black",
            borderwidth=1
        ),
        # Add margin to make room for the legend
        margin=dict(r=250)  # Increased right margin
    )
    
    fig.show()

# Plot the milestones on the timeline
plot_milestones(timeline_df, milestones_df, name)



Milestones DataFrame Preview:
                       date           type  \
0 2019-04-30 00:00:00+00:00    achievement   
1 2022-10-31 00:00:00+00:00    achievement   
2 2023-10-31 00:00:00+00:00  peak_activity   
3 2024-12-31 00:00:00+00:00  peak_activity   

                                    description  stars  forks  
0                             Reached 100 stars    107    1.0  
1                           Reached 1,000 stars   1009   40.0  
2   Exceptional growth: 97 new stars in one day   1443   60.0  
3  Exceptional growth: 147 new stars in one day   2015   80.0  


## Putting It All Together

Now let's build a complete workflow to analyze any GitHub repository's stars and forks timeline.

In [53]:
def analyze_github_repository(repo_name, time_freq='W'):
    """
    Complete analysis of a GitHub repository's stars and forks timeline.
    
    Parameters:
    -----------
    repo_name : str
        Repository name in format "owner/repo" (e.g., "facebook/react")
    time_freq : str
        Time frequency for aggregation ('D' for daily, 'W' for weekly, 'M' for monthly)
    """
    print(f"Starting analysis for {repo_name}")
    print("-" * 50)
    
    # 1. Fetch data
    stars_df, forks_df = fetch_repo_data(repo_name)
    
    if stars_df is None or forks_df is None:
        print("Failed to fetch repository data. Please check the repository name and your API access.")
        return
    
    # 2. Process timeline data
    print("\nProcessing timeline data...")
    timeline_df = process_timeline_data(stars_df, forks_df, freq=time_freq)
    
    # 3. Identify key milestones
    print("\nIdentifying key milestones...")
    milestones_df = identify_milestones(timeline_df)
    
    # 4. Display results
    print("\n=== ANALYSIS RESULTS ===")
    print(f"Repository: {repo_name}")
    print(f"Total data points: {len(timeline_df)} time periods")
    print(f"First activity date: {timeline_df['date'].min()}")
    print(f"Latest activity date: {timeline_df['date'].max()}")
    print(f"Current stars: {timeline_df['total_stars'].max():,}")
    print(f"Current forks: {timeline_df['total_forks'].max():,}")
    
    # 5. Display milestones
    if milestones_df is not None and not milestones_df.empty:
        print("\n=== KEY MILESTONES ===")
        for _, milestone in milestones_df.iterrows():
            print(f"{milestone['date'].strftime('%Y-%m-%d')} - {milestone['description']}")
    
    # 6. Visualize the data
    print("\nGenerating visualizations...")
    plot_timeline(timeline_df, repo_name)
    
    return timeline_df, milestones_df

## Example: Analyzing a Popular Repository

Let's analyze a well-known open-source project to demonstrate the tool. We'll use the pandas library as an example.

In [11]:
# Let's analyze the pandas repository
repo_name = "Neoteroi/BlackSheep"
timeline_df, milestones_df = analyze_github_repository(repo_name, time_freq='M')

Starting analysis for Neoteroi/BlackSheep
--------------------------------------------------
Fetching data for repository: Neoteroi/BlackSheep
Repository: Neoteroi/BlackSheep
Description: Fast ASGI web framework for Python
Stars: 2160
Forks: 82

Fetching stargazers timeline...


  0%|          | 0/2160 [00:00<?, ?it/s]

AA 0
AA 1
AA 2
AA 3
AA 4
AA 5
AA 6
AA 7
AA 8
AA 9
AA 10
AA 11
AA 12
AA 13
AA 14
AA 15
AA 16
AA 17
AA 18
AA 19
AA 20
AA 21
AA 22
AA 23
AA 24
AA 25
AA 26
AA 27
AA 28
AA 29
AA 30
AA 31
AA 32
AA 33
AA 34
AA 35
AA 36
AA 37
AA 38
AA 39
AA 40
AA 41
AA 42
AA 43
AA 44
AA 45
AA 46
AA 47
AA 48
AA 49
AA 50
AA 51
AA 52
AA 53
AA 54
AA 55
AA 56
AA 57
AA 58
AA 59
AA 60
AA 61
AA 62
AA 63
AA 64
AA 65
AA 66
AA 67
AA 68
AA 69
AA 70
AA 71
AA 72
AA 73
AA 74
AA 75
AA 76
AA 77
AA 78
AA 79
AA 80
AA 81
AA 82
AA 83
AA 84
AA 85
AA 86
AA 87
AA 88
AA 89
AA 90
AA 91
AA 92
AA 93
AA 94
AA 95
AA 96
AA 97
AA 98
AA 99
AA 100
AA 101
AA 102
AA 103
AA 104
AA 105
AA 106
AA 107
AA 108
AA 109
AA 110
AA 111
AA 112
AA 113
AA 114
AA 115
AA 116
AA 117
AA 118
AA 119
AA 120
AA 121
AA 122
AA 123
AA 124
AA 125
AA 126
AA 127
AA 128
AA 129
AA 130
AA 131
AA 132
AA 133
AA 134
AA 135
AA 136
AA 137
AA 138
AA 139
AA 140
AA 141
AA 142
AA 143
AA 144
AA 145
AA 146
AA 147
AA 148
AA 149
AA 150
AA 151
AA 152
AA 153
AA 154
AA 155
AA 156
AA 157
AA 1

Fetching forks:   0%|          | 0/83 [00:00<?, ?it/s]


Processing timeline data...

Identifying key milestones...

=== ANALYSIS RESULTS ===
Repository: Neoteroi/BlackSheep
Total data points: 77 time periods
First activity date: 2018-12-31 09:30:23+00:00
Latest activity date: 2025-04-30 09:30:23+00:00
Current stars: 0.0
Current forks: 0.0

Generating visualizations...


  stars_count = stars_df.groupby(pd.Grouper(key='starred_at', freq=freq)).size()
  forks_count = forks_df.groupby(pd.Grouper(key='forked_at', freq=freq)).size()
  date_range = pd.date_range(start=start_date, end=end_date, freq=freq)


## Try Another Repository

You can easily analyze any other public GitHub repository by changing the `repo_name` variable. Here's an example with a different repository:

In [16]:
!pip install --upgrade tqdm



In [None]:
# Let's analyze another popular repository
repo_name = "tensorflow/tensorflow"
timeline_df, milestones_df = analyze_github_repository(repo_name, time_freq='W')

## Comparing Multiple Repositories

We can extend our analysis to compare multiple repositories side by side. This can be useful for competitive analysis or understanding ecosystem trends.

In [None]:
def compare_repositories(repo_list, time_freq='M'):
    """
    Compare multiple GitHub repositories.
    
    Parameters:
    -----------
    repo_list : list
        List of repository names to compare
    time_freq : str
        Time frequency for aggregation ('D' for daily, 'W' for weekly, 'M' for monthly)
    """
    all_data = {}
    
    # Collect data for each repository
    for repo_name in repo_list:
        stars_df, forks_df = fetch_repo_data(repo_name)
        if stars_df is None or forks_df is None:
            print(f"Skipping {repo_name} due to data retrieval issues")
            continue
            
        timeline_df = process_timeline_data(stars_df, forks_df, freq=time_freq)
        all_data[repo_name] = timeline_df
    
    # Create comparison visualizations
    # 1. Stars comparison
    fig1 = go.Figure()
    
    for repo_name, df in all_data.items():
        fig1.add_trace(go.Scatter(
            x=df['date'],
            y=df['total_stars'],
            mode='lines',
            name=repo_name,
            line=dict(width=3)
        ))
    
    fig1.update_layout(
        title='Comparison of Stars Over Time',
        xaxis_title='Date',
        yaxis_title='Total Stars',
        legend_title='Repository',
        template='plotly_white',
        height=600
    )
    
    fig1.show()
    
    # 2. Stars-to-forks ratio
    fig2 = go.Figure()
    
    for repo_name, df in all_data.items():
        # Create a copy to avoid warning, then calculate ratio
        df_copy = df.copy()
        df_copy['stars_to_forks_ratio'] = df_copy['total_stars'] / df_copy['total_forks'].replace(0, 1)
        
        fig2.add_trace(go.Scatter(
            x=df_copy['date'],
            y=df_copy['stars_to_forks_ratio'],
            mode='lines',
            name=repo_name,
            line=dict(width=3)
        ))
    
    fig2.update_layout(
        title='Stars to Forks Ratio Over Time',
        xaxis_title='Date',
        yaxis_title='Stars/Forks Ratio',
        legend_title='Repository',
        template='plotly_white',
        height=600
    )
    
    fig2.show()
    
    return all_data

In [None]:
# Compare multiple machine learning frameworks
repos_to_compare = [
    "tensorflow/tensorflow",
    "pytorch/pytorch",
    "scikit-learn/scikit-learn"
]

comparison_data = compare_repositories(repos_to_compare, time_freq='M')

## Conclusion

In this notebook, we've built a comprehensive tool for analyzing the growth of GitHub repositories through their stars and forks history. This can be valuable for:

- Open-source project maintainers to understand their project's growth
- Researchers studying the adoption of technologies
- Developers choosing between competing libraries
- Marketing teams measuring the impact of promotional activities

The techniques we've used - API integration, data processing, and interactive visualization - can be extended to analyze other aspects of GitHub repositories such as contributors, issues, and commits.

Feel free to adapt this notebook for your own GitHub repository analysis needs!