## Step 1: Load Articles into a DataFrame

In this step, we will load all published articles associated with our Dev.to account into a Pandas DataFrame. This data will allow us to examine the article titles and publication dates.

The `load_articles_to_dataframe` function fetches articles from the Dev.to API, handles pagination, and organizes the results in a DataFrame for easy analysis.


In [None]:
import os
import src.api_client as api_client
import pandas as pd

# Check if the articles file exists
if os.path.exists("articles.parquet"):
    # Load articles from the existing Parquet file
    articles_df = pd.read_parquet("articles.parquet")
    print("Loaded articles from articles.parquet")
else:
    # Fetch articles and save to Parquet file
    articles_df = api_client.load_articles_to_dataframe()
    articles_df.to_parquet("articles.parquet", compression='gzip')
    print("Fetched articles from API and saved to articles.parquet")

articles_df.head(10)

## Step 2: Load Followers into a DataFrame

Next, we’ll load the details of all followers into a Pandas DataFrame. This data provides insights into each follower’s profile, which we can later analyze to assess follower engagement and activity levels.

The `load_followers_to_dataframe` function collects data on each follower, including profile information.


In [None]:
# Check if the followers file exists
if os.path.exists("followers.parquet"):
    # Load followers from the existing Parquet file
    followers_df = pd.read_parquet("followers.parquet")
    print("Loaded followers from followers.parquet")
else:
    # Fetch followers and save to Parquet file
    followers_df = api_client.load_extended_followers_to_dataframe()
    followers_df.to_parquet("followers.parquet", compression='gzip')
    print("Fetched followers from API and saved to followers.parquet")

followers_df.head(10)

In [None]:
print(f"You have {len(articles_df)} articles and {len(followers_df)} followers.")

## Step 3: Enrich Followers Data with Article Information

In this step, we expand our analysis by adding article information for each follower. Using the `update_followers_with_articles` function, we retrieve:

- The total number of articles each follower has published (`article_count`).
- A list of article titles for each follower (`article_titles`).
- A unique list of tags across all articles by each follower (`unique_tags`).

This enriched data will allow us to better understand follower engagement and interests based on their content.

In [None]:
if os.path.exists("extended_followers.parquet"):
    # Load followers from the existing Parquet file
    extended_followers_df = pd.read_parquet("extended_followers.parquet")
    print("Loaded followers from extended_followers.parquet")
else:
    # Enrich followers with article information
    extended_followers_df = api_client.update_followers_with_articles(followers_df)
    extended_followers_df.to_parquet("extended_followers.parquet", compression='gzip')
    print("Fetched followers from API and saved to extended_followers.parquet")

extended_followers_df.head(10)

## Step 4: Enrich Followers Data with Profile Data from Dev.to

In this step, we enhance our follower dataset by including additional profile data scraped from Dev.to:

- **Badges**: List of badges each follower has earned.
- **Comments Count**: Total number of comments written by each follower.
- **Tags Followed**: Number of tags each follower is following.

In [None]:
if os.path.exists("extended_scrapped_followers.parquet"):
    extended_scrapped_followers_df = pd.read_parquet("extended_scrapped_followers.parquet")
    print("Loaded followers from extended_followers.parquet")
else:
    extended_scrapped_followers_df = api_client.update_followers_with_stats(extended_followers_df)
    extended_scrapped_followers_df.to_parquet("extended_scrapped_followers.parquet", compression='gzip')
    print("Fetched followers from API and saved to extended_followers.parquet")

extended_scrapped_followers_df.head(10)