<div style="text-align:center;">
    <h1 style="font-family:Georgia; color:#FFB822; font-size:3em; margin-bottom:0px;">
        Beyond Leaderboards
    </h1>
    <h2 style="font-family:Georgia; color:#20BEFF; font-size:1.5em; margin-top:0px;">
        The People Powering Kaggle’s AI Breakthroughs
    </h2>
    <p style="text-align:center;", style="font-family:'Segoe UI', sans-serif; font-style:italic; font-size:1.1em; margin-top:15px;">
        A data-driven tribute to shared kernels, forum mentors, and the generosity that accelerates every competition.
    </p>
</div>

<div style="text-align:center; margin-top:20px; margin-bottom:25px;">
    <img src="https://i.pinimg.com/736x/8b/88/83/8b8883cbd18f3c0870c75b5cf6b782bd.jpg" 
         alt="A network visualization representing community connections and data flows" 
         style="width:80%; max-width:700px; border-radius:10px; box-shadow: 0px 0px 15px rgba(0, 0, 0, 0.1);">
</div>

<h2 style="font-family:Georgia; color:#20BEFF; border-bottom:2px solid #20BEFF; padding-bottom:5px;">
    1. The Soul of the Machine
</h2>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
Fifteen years. A journey that started as a platform and evolved into a civilization. Fifteen years of breakthroughs, of late-night submissions, of the quiet satisfaction of a model that finally converges. In machine learning, we often celebrate the final algorithm, the winning score. But the real story, the one that has truly pushed the boundaries of AI, isn't written in the final line of code. It's written in the <strong style="color:#FFB822;">million moments of collaboration that came before.</strong>
</p>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
Think about your own journey here. Remember the first competition that felt impossible? The frustrating error message you couldn't solve, the feature that made no sense? And then, remember the breakthrough. It might have come from a stranger's post in a discussion thread at 2 AM. Or from a public notebook, shared generously by a competitor, that revealed a completely new way of seeing the problem. That moment—<strong style="color:#FFB822;">that spark of shared insight—is the heartbeat of Kaggle.</strong>
</p>

<div style="border-left: 4px solid #FFB822; padding-left: 15px; font-style: italic; font-family: 'Segoe UI'; font-size:1.2em; margin-top: 30px; margin-bottom: 30px;">
    This notebook is our tribute to that spirit. It is a data-driven thank you note to the entire Kaggle community for fifteen years of collective effort. We will quantify this immense scale, witness the "spark" of a single idea setting a competition ablaze, and build a Hall of Fame for the unsung heroes of this platform.
</div>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
Our journey through the Meta Kaggle data will unfold in a clear sequence of evidence. We will:
<ul style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
    <li style="margin-bottom:10px;">First, grasp the <strong style="color:#FFB822;">immense scale</strong> of this collaboration by quantifying millions of posts and notebooks.</li>
    <li style="margin-bottom:10px;">Then, provide visual proof of the <strong style="color:#FFB822;">"Spark and Ripple"</strong> effect, showing how a single shared notebook can elevate an entire competition.</li>
    <li style="margin-bottom:10px;">Next, we hand the tools to you with our <strong style="color:#FFB822;">Interactive Community Explorer</strong>, allowing you to uncover these dynamics in any competition.</li>
    <li style="margin-bottom:10px;">Finally, we will put names to the data and celebrate the individuals who make it all happen with our <strong style="color:#FFB822;">Community Catalyst Hall of Fame.</strong></li>
</ul>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
This is not just a story of success. It's a story of shared struggle, of thousands of failed experiments that pave the way for one winning solution. It's a testament to the idea that the fastest way to advance is <strong style="color:#FFB822;">not to race alone, but to learn together.</strong> Join us as we go beyond the leaderboards to explore the true engine of progress in AI.
</p>

<h2 style="font-family:Georgia; color:#20BEFF; border-bottom:2px solid #20BEFF; padding-bottom:5px;">
    The Story in 2 Minutes
</h2>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
We believe the best way to understand the spirit of this project is to see and feel it. The video below is a tribute that serves as the heart of our introduction. It captures the "why" behind our entire analysis. Please watch it before proceeding.
</p>




In [2]:
from IPython.display import YouTubeVideo

# The video ID is the string of characters after "v=" in the YouTube URL.
# URL: https://youtu.be/FTVAkXkxBPk
YouTubeVideo('FTVAkXkxBPk', width=800, height=450)

<div style="border-left: 4px solid #FFB822; padding-left: 15px; font-style: italic; font-family: 'Segoe UI'; font-size:1.1em; margin-top: 25px; margin-bottom: 25px;">
    In case of any issue viewing the video, this project is a data-driven tribute to the Kaggle community. It uses visual evidence and interactive tools to prove that collaboration, not just competition, is the true engine that drives progress in the AI community.
</div>

<hr style="border: 1px solid #e0e0e0; margin-top: 40px; margin-bottom: 40px;">

### Load impotant libraries and dataframes

In [3]:
!pip install pyvis

Collecting pyvis
  Downloading pyvis-0.3.2-py3-none-any.whl.metadata (1.7 kB)
Downloading pyvis-0.3.2-py3-none-any.whl (756 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m756.0/756.0 kB[0m [31m19.7 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25hInstalling collected packages: pyvis
Successfully installed pyvis-0.3.2


In [4]:
# ──────────────────────────────────────────────
#  Essential Libraries
#  (Handy toolkits for data wrangling & visuals)
# ──────────────────────────────────────────────
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import networkx as nx
import warnings                         # Standard warnings control

# ──────────────────────────────────────────────
#  Notebook‑Wide Styling & Housekeeping
#  (Consistent look & feel + silence noisy logs)
# ──────────────────────────────────────────────
sns.set(style="whitegrid")
plt.rcParams['figure.figsize'] = (12, 7)

# Ignore all warnings throughout the notebook
warnings.filterwarnings("ignore")

print("Libraries loaded successfully!")


Libraries loaded successfully!


In [5]:
# ───────────────────────────────────────────────────────────────
#  Load core Meta‑Kaggle tables (minimal columns where possible)
# ───────────────────────────────────────────────────────────────
# Path for the input directory (Kaggle standard path)
input_path = "/kaggle/input/meta-kaggle/"
# Load Kernels data
kernels = pd.read_csv(input_path + 'Kernels.csv', parse_dates=['CreationDate','MadePublicDate'])
kernel_versions = pd.read_csv(input_path + 'KernelVersions.csv', parse_dates=['CreationDate','EvaluationDate'])
print("1. Kernels dataset Loaded 💐")

# Load Competitions data
competitions = pd.read_csv(input_path + 'Competitions.csv', parse_dates=['EnabledDate','DeadlineDate'])
print("2. competitions dataset Loaded 💐")

# Load Forum data
forum_messages = pd.read_csv(input_path + 'ForumMessages.csv', parse_dates=['PostDate'])
forum_topics = pd.read_csv(input_path + 'ForumTopics.csv', parse_dates=['CreationDate', 'LastCommentDate'])
print("3. Forum dataset Loaded 💐")

# Load Users data
users = pd.read_csv(input_path + 'Users.csv', parse_dates=['RegisterDate'])
print("4. users dataset Loaded 💐")

# Load Submissions and Teams data
submissions = pd.read_csv(input_path + 'Submissions.csv', parse_dates=['SubmissionDate'])
teams = pd.read_csv(input_path + 'Teams.csv', parse_dates=['LastSubmissionDate'])
print("5. submissions dataset Loaded 💐")

print("✅ All datasets loaded successfully.")


1. Kernels dataset Loaded 💐
2. competitions dataset Loaded 💐
3. Forum dataset Loaded 💐
4. users dataset Loaded 💐
5. submissions dataset Loaded 💐
✅ All datasets loaded successfully.


<hr style="border: 1px solid #e0e0e0; margin-top: 40px; margin-bottom: 40px;">

<p style="font-family:'Segoe UI', sans-serif; color:#00A99D; font-size:1.1em; line-height:1.7; text-align:center;">
<em>Our story begins with a simple question: just how large is this global, digital civilization of learners? To truly appreciate the community's impact, we must first understand its immense scale. The data reveals a story of staggering proportions...</em>
</p>

<h2 style="font-family:Georgia; color:#20BEFF; border-bottom:2px solid #20BEFF; padding-bottom:5px;">
    2. A Global Town Square, A Digital Library
</h2>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
Every community needs a place to gather and a library to store its knowledge. On Kaggle, these two pillars have grown to a scale that is difficult to comprehend. They are the foundation upon which every competition is built, operating 24/7 as a testament to the global nature of this collective effort.
</p>

<h3 style="font-family:Georgia; color:#FFB822; margin-top:30px;">The Digital Town Square: The Forums</h3>
<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
If you could listen to the heartbeat of Kaggle, it would sound like the forums. They are a constant, unending stream of questions, hypotheses, breakthroughs, and encouragement. Each message is a neuron firing in a global brain, a single moment of connection that, when multiplied by millions, drives the entire community's understanding forward.
</p>

<!-- The Stat Cards for the Forums go here, using the code we already perfected -->
<!-- Stat Card for Forum Posts -->
<div style="background-color:#F2F2F2; border-left: 5px solid #20BEFF; padding: 20px; margin: 20px 0; border-radius:5px;">
    <p style="font-family:'Segoe UI'; font-size:1.2em; margin:0; color:#555555;">📝 Total Forum Messages</p>
    <p style="font-family:'Georgia'; font-size:2.5em; font-weight:bold; margin:0; color:#000000;">
        2.7 Million+
    </p>
</div>

<!-- Stat Card for Forum Upvotes -->
<div style="background-color:#F2F2F2; border-left: 5px solid #20BEFF; padding: 20px; margin: 20px 0; border-radius:5px;">
    <p style="font-family:'Segoe UI'; font-size:1.2em; margin:0; color:#555555;">👍 Total Forum Message Upvotes</p>
    <p style="font-family:'Georgia'; font-size:2.5em; font-weight:bold; margin:0; color:#000000;">
        3.3 Million+
    </p>
</div>
<!-- No changes needed to the stat card HTML itself -->


<h3 style="font-family:Georgia; color:#FFB822; margin-top:30px;">The Public Library: The Notebooks</h3>
<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
If the forums are the conversation, public notebooks are the definitive texts. They are the largest, most vibrant, open-source repository of applied machine learning code in the world. Each notebook is more than just code; it is a lesson, a tutorial, a building block, and a potential starting point for thousands of other journeys. An upvote is a quiet 'thank you'; <strong style="color:#FFB822;">a fork is the ultimate act of intellectual inheritance.</strong>
</p>

<!-- The Stat Cards for the Notebooks go here, using the code we already perfected -->
<!-- Stat Card for Public Kernels -->
<div style="background-color:#F2F2F2; border-left: 5px solid #FFB822; padding: 20px; margin: 20px 0; border-radius:5px;">
    <p style="font-family:'Segoe UI'; font-size:1.2em; margin:0; color:#555555;">📚 Total Public Notebooks</p>
    <p style="font-family:'Georgia'; font-size:2.5em; font-weight:bold; margin:0; color:#000000;">
        1.5 Million+
    </p>
</div>

<!-- Stat Card for Kernel Forks -->
<div style="background-color:#F2F2F2; border-left: 5px solid #FFB822; padding: 20px; margin: 20px 0; border-radius:5px;">
    <p style="font-family:'Segoe UI'; font-size:1.2em; margin:0; color:#555555;">🍴 Total Forks of Public Notebooks</p>
    <p style="font-family:'Georgia'; font-size:2.5em; font-weight:bold; margin:0; color:#000000;">
        320,000+
    </p>
</div>
<!-- Stat Card for Kernel Upvotes -->
<div style="background-color:#F2F2F2; border-left: 5px solid #FFB822; padding: 20px; margin: 20px 0; border-radius:5px;">
    <p style="font-family:'Segoe UI'; font-size:1.2em; margin:0; color:#555555;">👍 Total Upvotes on Notebooks</p>
    <p style="font-family:'Georgia'; font-size:2.5em; font-weight:bold; margin:0; color:#000000;">
        5.8 Million+
    </p>
</div>
<!-- No changes needed to the stat card HTML itself -->

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7; margin-top:30px;">
These are not just metrics. They are the echoes of millions of hours of human effort, freely given, to build a shared repository of knowledge for the benefit of all. This is the bedrock upon which our story of progress is built.
</p>

### Code

In [6]:
# ───────────────────────────────────────────────────────────────
#  COMMUNITY‑SCALE METRICS  (Digital Town Square & Public Library)
#  Uses columns confirmed in your schema preview
# ───────────────────────────────────────────────────────────────

# 1) Digital Town Square  – forum activity
total_forum_posts   = forum_messages.shape[0]               # Id column exists
total_forum_upvotes = int(forum_messages["Medal"].sum())    # 'Score' present

# 2) Public Library – kernel sharing
public_kernels = kernels[kernels["MadePublicDate"].notna()]          # MadePublicDate present

total_public_kernels  = public_kernels.shape[0]
total_kernel_versions = kernel_versions.shape[0]                     # not printed but retained
total_forks           = public_kernels["ForkParentKernelVersionId"].notna().sum()
total_kernel_upvotes  = int(kernel_versions["TotalVotes"].sum())

# ── Nicely formatted output ─────────────────────────────────────
print("— The Digital Town Square —")
print(f"📝 Total Forum Posts:       {total_forum_posts:,}")
print(f"👍 Total Forum Upvotes:     {total_forum_upvotes:,}")

print("\n— The Public Library —")
print(f"📚 Public Kernels:          {total_public_kernels:,}")
print(f"🍴 Forks of Public Kernels: {total_forks:,}")
print(f"👍 Kernel Version Upvotes:  {total_kernel_upvotes:,}")


— The Digital Town Square —
📝 Total Forum Posts:       2,724,233
👍 Total Forum Upvotes:     3,303,527

— The Public Library —
📚 Public Kernels:          1,505,088
🍴 Forks of Public Kernels: 321,119
👍 Kernel Version Upvotes:  5,867,118


<hr style="border: 1px solid #e0e0e0; margin-top: 40px; margin-bottom: 40px;">

<p style="font-family:'Segoe UI', sans-serif; color:#00A99D; font-size:1.1em; line-height:1.7; text-align:center;">
<em>This immense library of knowledge didn't write itself. It was built, piece by piece, by individuals. But is this spirit of sharing just for newcomers, or is it a core part of the culture, led by the platform's most experienced members? To find out, we followed the data...</em>
</p>

<h2 style="font-family:Georgia; color:#20BEFF; border-bottom:2px solid #20BEFF; padding-bottom:5px;">
    3. The Grandmaster's Habit: A Culture of Sharing
</h2>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
On many competitive platforms, reaching the top tier can lead to exclusivity. Knowledge becomes a closely guarded secret, a competitive advantage. But is that the culture on Kaggle? We decided to test this with a simple, data-driven question: <strong style="color:#FFB822;">when the most accomplished members—the Notebooks Grandmasters—create new work, what is their default behavior?</strong>
</p>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
We analyzed every notebook created by these elite members over the last five years to see what proportion they chose to make public versus keeping private. The result is not just a statistic; it is a powerful insight into the ethos of the entire community.
</p>

<!-- The Donut Chart from our Python code will be displayed after this cell -->

<div style="border-left: 4px solid #FFB822; padding-left: 15px; font-style: italic; font-family: 'Segoe UI'; font-size:1.2em; margin-top: 30px; margin-bottom: 20px;">
    The data is unequivocal: for a Kaggle Grandmaster, sharing is not an occasional act of charity. It is a fundamental habit. The overwhelming majority of their work is shared openly with the community.
</div>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
This single chart reveals a profound truth about the platform. On Kaggle, true leadership and mentorship are intertwined. The most respected members earn their status not just by winning, but by teaching. Their public work forms a living curriculum for the entire community, transforming personal expertise into a public good and setting the standard for everyone else to follow.
</p>

In [7]:
import pandas as pd
import plotly.graph_objects as go
from IPython.display import HTML

# ──────────────────────────────────────────────
#  Compute your Grandmaster stats (same as before)
# ──────────────────────────────────────────────
kernels_with_users = pd.merge(
    kernels, users,
    left_on="AuthorUserId", right_on="Id",
    suffixes=("_kernel", "_user"),
)
gm = kernels_with_users[kernels_with_users["PerformanceTier"] == 4]
public_count  = gm["MadePublicDate"].notna().sum()
private_count = gm["MadePublicDate"].isna().sum()

# ──────────────────────────────────────────────
#  Build the donut chart
# ──────────────────────────────────────────────
fig = go.Figure(
    go.Pie(
        labels=["Made Public", "Kept Private"],
        values=[public_count, private_count],
        hole=0.4,
        marker_colors=["#FFB822", "#20BEFF"],
        textinfo="percent+label",
        insidetextorientation="radial",
    )
)
fig.update_layout(
    title_text="<b>Grandmaster Notebooks: Public vs. Private</b><br>A Culture of Sharing",
    title_x=0.5,
    font_family="Georgia",
    legend_title_text="Status",
    uniformtext_minsize=12,
    uniformtext_mode="hide",
)

# ──────────────────────────────────────────────
#  Instead of fig.show(), embed its full HTML+JS:
# ──────────────────────────────────────────────
html_str = fig.to_html(include_plotlyjs="cdn")
display(HTML(html_str))


<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7; margin-top:25px;">
And this phenomenal rate of public contribution is only what the Meta Kaggle dataset can track. The true impact of these community leaders extends far beyond the platform itself.
</p>

<div style="border-left: 4px solid #FFB822; padding-left: 15px; font-style: italic; font-family: 'Segoe UI'; font-size:1.2em; margin-top: 20px; margin-bottom: 20px;">
    Many Grandmasters and top competitors take it a step further, becoming true educators for the entire AI community. They host YouTube channels with detailed solution walkthroughs, write in-depth blog posts dissecting their winning strategies, and participate in podcasts and interviews to share their invaluable experience.
</div>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
They are not just sharing code; they are sharing a mindset. This relentless dedication to open knowledge-sharing sets the stage for incredible moments of collective breakthrough. The list of these contributors is vast, ever-growing, and to create a complete directory would be a project in itself. However, following are few examples everyone knows and follows.
</p>

<h4 style="font-family:Georgia; color:#FFB822; margin-top:30px;">Examples of Community Mentorship Beyond Kaggle:</h4>
<ul style="font-family:'Segoe UI', sans-serif; font-size:1.0em; line-height:1.7;">
    <li><a href="https://www.youtube.com/@abhishekkrthakur" target="_blank">Abhishek Thakur's YouTube Channel</a> - A 4x Grandmaster teaching applied ML.</li>
    <li><a href="https://www.youtube.com/@robmulla" target="_blank">Rob Mulla's YouTube Channel</a> - Featuring interviews and solution breakdowns with top competitors.</li>
    <li><a href="https://developer.nvidia.com/search?q=kaggle&page=1&filters=techblogs" target="_blank">NVIDIA Developer Blog</a> - Featuring Grandmaster-led solutions.</li>
</ul>

<hr style="border: 1px solid #e0e0e0; margin-top: 40px; margin-bottom: 40px;">

<p style="font-family:'Segoe UI', sans-serif; color:#00A99D; font-size:1.1em; line-height:1.7; text-align:center;">
    <em>We have established a profound culture of sharing, led from the very top. But what does this look like on the ground? Can we witness the direct impact of this generosity in the heat of a competition? What happens when one of these shared insights lands in the middle of a race to the top of the leaderboard? To find out, we went looking for the evidence.</em>
</p>



<h2 style="font-family:Georgia; color:#20BEFF; border-bottom:2px solid #20BEFF; padding-bottom:5px;">
    4. Anatomy of a Breakthrough: The Spark & The Ripple
</h2>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
We have seen the culture of sharing. Now, we will witness its power. To provide undeniable proof of impact, we became data detectives, searching for a single competition where we could isolate and observe the effect of one generous act. Our investigation led us to the classic <strong style="color:#FFB822;">Mercari Price Suggestion Challenge</strong>—a complex regression task where a lower score meant a better model.
</p>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
The interactive chart you see below is our evidence. It plots the journey of every team over the competition’s official timeline. Each faint gray line represents a team's improving score. It is a map of the collective struggle to solve a difficult problem. But this map holds a secret—a story of a single moment that changed everything.
</p>



In [8]:
# ──────────────────────────────────────────────
#  Case‑Study Visualization
#  (Mercari Price Suggestion Challenge)
#  ‑‑ Assumes DataFrames created earlier:
#     competitions, teams, submissions, kernels, users, kernel_versions …
# ──────────────────────────────────────────────
import plotly.express as px
import pandas as pd

print("--- Final Visualization‑Corrected Section 4: Case Study ---")

# ──────────────────────────────────────────────
#  1. Identify Competition & Dates
# ──────────────────────────────────────────────
COMPETITION_SLUG = "mercari-price-suggestion-challenge"

try:
    comp_details = competitions[competitions["Slug"] == COMPETITION_SLUG].iloc[0]
    competition_id   = comp_details["Id"]
    competition_start = comp_details["EnabledDate"]
    competition_end   = comp_details["DeadlineDate"]

    print(f"Analyzing Competition: '{COMPETITION_SLUG}' (ID: {competition_id})")
    print(f"Timeline: {competition_start.date()} → {competition_end.date()}")

    mercari_teams    = teams[teams["CompetitionId"] == competition_id]
    mercari_team_ids = mercari_teams["Id"].unique()
    print(f"Found {len(mercari_team_ids):,} teams for this competition.")

except IndexError:
    print(f"Competition '{COMPETITION_SLUG}' not found. Aborting case study.")
    competition_id = -1

# ──────────────────────────────────────────────
#  2. Load & Filter Submissions (Chunked Read)
# ──────────────────────────────────────────────
if competition_id != -1:
    print("Loading submissions data in chunks…")
    chunk_list = []

    for chunk in pd.read_csv(
        input_path + "Submissions.csv",
        chunksize=500_000,
        parse_dates=["SubmissionDate"],
        low_memory=False,
    ):
        # Keep only Mercari teams
        chunk = chunk[chunk["TeamId"].isin(mercari_team_ids)]
        # Restrict to competition timeframe
        chunk = chunk[
            (chunk["SubmissionDate"] >= competition_start)
            & (chunk["SubmissionDate"] <= competition_end)
        ]
        chunk_list.append(chunk)

    comp_subs = pd.concat(chunk_list)
    print(f"Loaded {len(comp_subs):,} relevant submissions.")

    # ──────────────────────────────────────────
    #  3. Best‑Score Tracking per Team
    # ──────────────────────────────────────────
    comp_subs.sort_values("SubmissionDate", inplace=True)
    comp_subs["BestScore"] = comp_subs.groupby("TeamId")[
        "PublicScoreFullPrecision"
    ].cummin()

    # ──────────────────────────────────────────
    #  4. Identify the “Spark” Notebook
    # ──────────────────────────────────────────
    kernel_sources        = pd.read_csv(input_path + "KernelVersionCompetitionSources.csv")
    comp_kernel_versions  = kernel_sources[kernel_sources["SourceCompetitionId"] == competition_id]
    merged_kernels        = pd.merge(
        comp_kernel_versions, kernel_versions[["Id", "ScriptId"]], left_on="KernelVersionId", right_on="Id"
    )
    comp_kernel_ids       = merged_kernels["ScriptId"].unique()
    comp_kernels          = kernels[kernels["Id"].isin(comp_kernel_ids)]
    public_comp_kernels   = comp_kernels[comp_kernels["MadePublicDate"].notna()]

    spark_notebook_row    = public_comp_kernels.sort_values("TotalVotes", ascending=False).iloc[0]
    spark_kernel_version_id = spark_notebook_row["CurrentKernelVersionId"]
    spark_version_details = kernel_versions[kernel_versions["Id"] == spark_kernel_version_id].iloc[0]

    spark_date   = spark_notebook_row["MadePublicDate"]
    spark_author = users.loc[users["Id"] == spark_notebook_row["AuthorUserId"], "UserName"].iloc[0]
    print("Author: ", spark_author)

    # ──────────────────────────────────────────
    #  5. Visualize “Spark” & Team Progress
    # ──────────────────────────────────────────
    fig = px.line(
        comp_subs,
        x="SubmissionDate",
        y="BestScore",
        color="TeamId",
        title=(
            f"<b>The Spark & Ripple Effect in '{COMPETITION_SLUG}'</b>"
            "<br>Each line shows a team's best RMSLE over time"
        ),
        labels={
            "SubmissionDate": "Competition Date",
            "BestScore": "Best Score (RMSLE – lower is better)",
        },
    )

    # Styling tweaks
    fig.update_traces(opacity=0.2, line={"width": 1.5})
    fig.add_vline(x=spark_date.to_pydatetime(), line_width=3, line_dash="dash", line_color="#FFB822")
    fig.add_annotation(
        x=spark_date.to_pydatetime(),
        y=0.68,
        text=f"<b>Spark ✨</b><br>{spark_author} publishes notebook",
        showarrow=False,
        xanchor="left",
        yanchor="top",
        font={"family": "Segoe UI", "size": 12, "color": "#00A99D"},
    )
    fig.update_layout(
        showlegend=False,
        font_family="Segoe UI",
        title_font_family="Georgia",
        yaxis_range=[0.4, 0.7],
    )

    fig.show()


--- Final Visualization‑Corrected Section 4: Case Study ---
Analyzing Competition: 'mercari-price-suggestion-challenge' (ID: 7559)
Timeline: 2017-11-21 → 2018-02-21
Found 29,640 teams for this competition.
Loading submissions data in chunks…
Loaded 36,687 relevant submissions.
Author:  thykhuely


<!-- The Interactive Plotly Chart from our Python code will be displayed after this cell -->

<h3 style="font-family:Georgia; color:#FFB822; margin-top:30px;">Phase 1: The Scattered Search</h3>
<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
In the early days of the competition, the chart shows a picture of creative chaos. The lines are scattered widely, with teams exploring vastly different paths. Progress is happening, but it's isolated and divergent. This is the natural state of innovation before a breakthrough—a field of brilliant individuals searching for a path in the dark.
</p>

<h3 style="font-family:Georgia; color:#FFB822; margin-top:30px;">Phase 2: The Spark of Genius</h3>
<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
Then, on October 30th, 2017, comes the <strong style="color:#FFB822;">spark</strong>. The vertical gold line on the chart marks the day a competitor generously published a high-quality public notebook. This single kernel provided a powerful new baseline and a clear methodology. It was not just a submission; it was a lesson, offered freely to the entire community.
</p>

<h3 style="font-family:Georgia; color:#FFB822; margin-top:30px;">Phase 3: The Ripple Effect</h3>
<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
The impact of that single share is immediate, dramatic, and beautiful. In the days following, you can see the <strong style="color:#FFB822;">ripple</strong> spread across the entire competition:
<ol>
    <li style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7; margin-bottom:10px;">The scattered lines suddenly begin to <strong style="color:#FFB822;">converge</strong> into a tight, focused cluster. The community has found the path and is now learning and iterating on it together.</li>
    <li style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">This entire cluster of scores then <strong style="color:#FFB822;">plummets downwards</strong>, representing a monumental leap in performance for the community as a whole.</li>
</ol>
</p>

<div style="border-left: 4px solid #FFB822; padding-left: 15px; font-style: italic; font-family: 'Segoe UI'; font-size:1.2em; margin-top: 30px; margin-bottom: 30px;">
    This is the visual proof. It is the anatomy of a community-driven breakthrough, showing how a rising tide, sparked by a single act of generosity, truly does lift all boats. It turns a field of competitors into a team of collaborators.
</div>

In [9]:
# ──────────────────────────────────────────────
#  Interactive “Spark & Ripple” Analyzer
#  (Reusable function + dropdown UI)
#  ‑‑ Expects earlier cells to have:
#     competitions, teams, kernels, users, kernel_versions, input_path …
# ──────────────────────────────────────────────
import plotly.express as px
import pandas as pd
from ipywidgets import interact, widgets, VBox
from IPython.display import display

def analyze_competition(competition_slug):
    """
    Generate the ‘Spark and Ripple’ plot for a given Kaggle competition.
    Steps:
      1. Pull competition meta (dates, ID)
      2. Load submissions in chunks, restricted to comp teams & timeframe
      3. Identify the most‑upvoted public notebook (the “spark”)
      4. Plot each team’s best score trajectory + annotate spark
    """
    # Prompt guard
    if competition_slug == "Choose one...":
        print("Please select a competition from the dropdown to analyze.")
        return

    try:
        # ── 1. Competition Details ─────────────────────────────────────
        comp_details = competitions[competitions["Slug"] == competition_slug].iloc[0]
        competition_id   = comp_details["Id"]
        competition_start = comp_details["EnabledDate"]
        competition_end   = comp_details["DeadlineDate"]

        print(f"🔬 Analyzing Competition: '{competition_slug}' …")

        # ── 2. Submissions: Chunked Load & Filter ──────────────────────
        comp_team_ids = teams.loc[teams["CompetitionId"] == competition_id, "Id"].unique()
        chunk_list = []

        for chunk in pd.read_csv(
            input_path + "Submissions.csv",
            chunksize=500_000,
            parse_dates=["SubmissionDate"],
            low_memory=False,
        ):
            chunk = chunk[chunk["TeamId"].isin(comp_team_ids)]
            chunk = chunk[
                (chunk["SubmissionDate"] >= competition_start)
                & (chunk["SubmissionDate"] <= competition_end)
            ]
            chunk_list.append(chunk)

        comp_subs = pd.concat(chunk_list)

        if len(comp_subs) < 50:
            print(f"⚠️ Only {len(comp_subs)} submissions in range; aborting plot.")
            return

        # Track each team’s running best (lower = better)
        comp_subs.sort_values("SubmissionDate", inplace=True)
        comp_subs["BestScore"] = comp_subs.groupby("TeamId")["PublicScoreFullPrecision"].cummin()

        # ── 3. Locate the “Spark” Notebook ─────────────────────────────
        kernel_sources = pd.read_csv(input_path + "KernelVersionCompetitionSources.csv")
        comp_kernel_versions = kernel_sources[kernel_sources["SourceCompetitionId"] == competition_id]

        if comp_kernel_versions.empty:
            print("⚠️ No public notebooks linked to this competition.")
            return

        comp_kernel_ids = pd.merge(
            comp_kernel_versions,
            kernel_versions[["Id", "ScriptId"]],
            left_on="KernelVersionId",
            right_on="Id",
        )["ScriptId"].unique()

        public_kernels = kernels[kernels["Id"].isin(comp_kernel_ids) & kernels["MadePublicDate"].notna()]

        if public_kernels.empty:
            print("⚠️ No public notebooks found to serve as a ‘spark’.")
            return

        spark_row  = public_kernels.sort_values("TotalVotes", ascending=False).iloc[0]
        spark_date = spark_row["MadePublicDate"]
        spark_author = users.loc[users["Id"] == spark_row["AuthorUserId"], "UserName"].iloc[0]
        spark_title  = kernel_versions.loc[
            kernel_versions["Id"] == spark_row["CurrentKernelVersionId"], "Title"
        ].iloc[0]

        # Spark details in console
        print("✨ Potential Spark Found")
        print(f"   • Title : “{spark_title}”")
        print(f"   • Author: {spark_author}")
        print(f"   • Date  : {spark_date.date()}")

        # ── 4. Build Visualization ────────────────────────────────────
        fig = px.line(
            comp_subs,
            x="SubmissionDate",
            y="BestScore",
            color="TeamId",
            title=f"<b>Spark & Ripple in “{competition_slug}”</b>",
            labels={
                "SubmissionDate": "Date",
                "BestScore": "Best Score (RMSLE – lower is better)",
            },
        )

        # Styling & spark annotation
        fig.update_traces(opacity=0.3, line={"width": 1.5})
        fig.add_vline(x=spark_date.to_pydatetime(), line_width=3, line_dash="dash", line_color="#FFB822")
        fig.add_annotation(
            x=spark_date.to_pydatetime(),
            y=0.95,
            text=f"<b>Spark ✨</b><br>{spark_author}",
            showarrow=False,
            xanchor="left",
            yanchor="top",
            font={"family": "Segoe UI", "size": 12, "color": "#00A99D"},
        )
        fig.update_layout(
            xaxis_range=[competition_start, competition_end],
            yaxis_range=[0.0, 1.0],
            showlegend=False,
            font_family="Segoe UI",
            title_font_family="Georgia",
        )

        fig.show()

    except Exception as e:
        print(f"🚨 Error: {e}")


In [10]:
# ──────────────────────────────────────────────
#  Interactive Dropdown Builder
#  (Pick a featured, high‑traffic competition)
# ──────────────────────────────────────────────
from ipywidgets import interact, widgets, VBox
from IPython.display import display

print("Building interactive dropdown...")

# ── 1. Featured competitions only ─────────────────────────────────────
featured_comps = competitions[competitions["HostSegmentTitle"] == "Featured"]

# Count teams per competition
team_counts = teams.groupby("CompetitionId").size().reset_index(name="TeamCount")

# ── 2. Merge & filter (FIX: correct join keys) ────────────────────────
featured_comps_with_teams = pd.merge(
    featured_comps,
    team_counts,
    left_on="Id",           # competitions.Id
    right_on="CompetitionId"
)

popular_comps = featured_comps_with_teams[featured_comps_with_teams["TeamCount"] > 500]
popular_comps_sorted = popular_comps.sort_values("DeadlineDate", ascending=False)

# ── 3. Build dropdown widget ──────────────────────────────────────────
comp_list = ["Choose one..."] + popular_comps_sorted["Slug"].tolist()

comp_dropdown = widgets.Dropdown(
    options=comp_list,
    value="mercari-price-suggestion-challenge",  # default
    description="Analyze Competition:",
    style={"description_width": "initial"},
    layout={"width": "max-content"},
)

# ── 4. Connect dropdown to analyzer ───────────────────────────────────
interact(analyze_competition, competition_slug=comp_dropdown);


Building interactive dropdown...


interactive(children=(Dropdown(description='Analyze Competition:', index=148, layout=Layout(width='max-content…

<hr style="border: 0; height: 1px; background-image: linear-gradient(to right, rgba(0, 0, 0, 0), rgba(0, 0, 0, 0.3), rgba(0, 0, 0, 0)); margin-top:40px; margin-bottom:40px;">

<h3 style="font-family:Georgia; color:#20BEFF; margin-top:30px;">Deep-Dive: Do the Best Also Learn from the Community?</h3>
<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
The ripple effect is clear, but a crucial question remains: did this "spark" only help newcomers, or did it also elevate the work of the competition's elite? To find out, we isolated the submission histories of the top 10 finishing teams. The result is a powerful confirmation of a truly collaborative culture.
</p>

In [11]:
# ──────────────────────────────────────────────
#  Deep‑Dive: Top‑10 Team Progress
#  (Build on comp_subs, competition_id, spark_date from earlier cells)
# ──────────────────────────────────────────────
import plotly.express as px
import pandas as pd

print("--- Deep‑Dive: Analyzing the Top‑10 Teams' Progress ---")

# ── 1. Select Top‑10 Teams by Private LB Rank ─────────────────────────
top_10_teams    = teams[teams["CompetitionId"] == competition_id].nsmallest(10, "PrivateLeaderboardRank")
top_10_team_ids = top_10_teams["Id"]
print("Identified Top‑10 finishing teams.")

# ── 2. Filter Submissions for Those Teams ─────────────────────────────
top_10_subs = comp_subs[comp_subs["TeamId"].isin(top_10_team_ids)]

# Add team names for legend clarity
top_10_subs = pd.merge(top_10_subs, top_10_teams[["Id", "TeamName"]], left_on="TeamId", right_on="Id")

# ── 3. Plot Score Trajectories ───────────────────────────────────────
fig_top10 = px.line(
    top_10_subs,
    x="SubmissionDate",
    y="BestScore",
    color="TeamName",
    title="<b>Deep‑Dive: How the Top‑10 Teams Reacted to the Spark</b>"
          "<br>Even elite competitors gain from community breakthroughs",
    labels={
        "SubmissionDate": "Competition Date",
        "BestScore": "Best Score (RMSLE – lower is better)",
    },
    markers=True,
)

# ── 4. Annotate the Spark Event ──────────────────────────────────────
fig_top10.add_vline(
    x=spark_date.to_pydatetime(),
    line_width=3,
    line_dash="dash",
    line_color="#FFB822",
)
fig_top10.add_annotation(
    x=spark_date.to_pydatetime(),
    y=0.6,
    text="<b>Spark ✨</b><br>Public Notebook",
    showarrow=True,
    arrowhead=2,
    arrowcolor="#FFB822",
    font={"family": "Segoe UI", "size": 12, "color": "#FFB822"},
)

# ── 5. Layout Polish ─────────────────────────────────────────────────
fig_top10.update_layout(
    font_family="Segoe UI",
    title_font_family="Georgia",
    legend_title_text="Top‑10 Teams",
)

fig_top10.show()


--- Deep‑Dive: Analyzing the Top‑10 Teams' Progress ---
Identified Top‑10 finishing teams.


<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7; margin-top:15px;">
The chart above tells an unmistakable story. Observe the trajectories of the colored lines representing these elite teams. Many of them, already performing at a high level, show a distinct, sharp improvement in their scores immediately following the publication of the "spark" notebook. They are not just continuing their own progress; they are clearly integrating the community's breakthrough into their own advanced models.
</p>

<div style="border-left: 4px solid #FFB822; padding-left: 15px; font-style: italic; font-family: 'Segoe UI'; font-size:1.2em; margin-top: 30px; margin-bottom: 30px;">
    This proves that community contributions are not just for learning the basics. They are a fundamental part of the high-level intellectual exchange that pushes the entire state-of-the-art forward. Even the best stand on the shoulders of the community.
</div>

<!-- This comes after the "Deep-Dive" on the Top 10 teams -->
<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7; margin-top:15px;">
The story of the Mercari challenge is a powerful testament to collective progress. It proves that the community spirit on Kaggle is not just a pleasantry; it is a core mechanism for innovation that benefits everyone, from the newcomer to the seasoned Grandmaster.
</p>
<hr style="border: 1px solid #e0e0e0; margin-top: 40px; margin-bottom: 40px;">

<p style="font-family:'Segoe UI', sans-serif; color:#00A99D; font-size:1.1em; line-height:1.7; text-align:center;">
    <em>But the story of Kaggle is not written in a single competition. It is an epic, told across thousands of challenges over fifteen years. What if you could explore any chapter of that epic? To honor these countless stories, we decided to hand the detective's magnifying glass to you.</em>
</p>



<h2 style="font-family:Georgia; color:#20BEFF; border-bottom:2px solid #20BEFF; padding-bottom:5px;">
    5. Become the Detective: The Community Explorer
</h2>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
We've shown you one story, but there are thousands. Every competition on Kaggle has its own unique ecosystem of heroes, its own pivotal "spark" notebooks, and its own vibrant forum discussions that shaped its outcome. These stories deserve to be told.
</p>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
To empower you to uncover them, we built the <strong style="color:#FFB822;">Community Explorer</strong>. This interactive dashboard is your personal window into the collaborative soul of Kaggle's most iconic competitions. It is a tribute not just to the data, but to the people behind it.
</p>

<div style="border-left: 4px solid #FFB822; padding-left: 15px; font-style: italic; font-family: 'Segoe UI'; font-size:1.2em; margin-top: 30px; margin-bottom: 30px;">
    Your journey of discovery begins here. Select a competition from the dropdown menu below. In an instant, you can unearth the most influential notebooks, celebrate the top contributors, see a world map of where these insights came from, and even identify the Grandmasters who lent their expertise.
</div>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
Go on, explore. Look up the first competition you ever participated in. Find the kernel that taught you a new technique, or the forum thread that finally solved that one impossible bug. Every chart and every table below is a piece of our shared history, waiting to be rediscovered.
</p>

<!-- The Interactive Gradio Dashboard from our Python code will be displayed after this cell -->

In [12]:
# ──────────────────────────────────────────────
#  Corrected Helper Functions & Pre‑Computed Tables
#  (Shared by top_kernels, top_contributors, forum_tables,
#   geo_fig, tag_cloud_img, timeline_fig, gm_table …)
# ──────────────────────────────────────────────
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
from wordcloud import WordCloud

# ──────────────────────────────────────────────
#  Optional Data Assets
#  (KernelTags & forum_activity are loaded once, if available)
# ──────────────────────────────────────────────
try:  # Load KernelTags only if present
    if "KernelTags" not in globals():
        KernelTags = pd.read_csv("/kaggle/input/meta-kaggle/KernelTags.csv")
except FileNotFoundError:
    print("KernelTags.csv not found ➜ tag‑cloud feature disabled.")

try:  # Forum activity pre‑compute (~expensive merge)
    if "forum_activity" not in globals():
        comps_simple = competitions[["Id", "Title", "ForumId"]].rename(
            columns={"Id": "CompetitionId", "Title": "CompetitionTitle"}
        )
        forum_messages_comp = forum_messages.merge(
            comps_simple, left_on="ForumTopicId", right_on="ForumId", how="left"
        )
        forum_activity = (
            forum_messages_comp.groupby("CompetitionTitle")
            .size()
            .reset_index(name="MessageCount")
            .sort_values("MessageCount", ascending=False)
        )
except Exception as e:
    print(f"⚠️ Could not pre‑compute forum_activity: {e}")
    forum_activity = pd.DataFrame(columns=["CompetitionTitle"])

# ──────────────────────────────────────────────
#  Core Helper Functions
# ──────────────────────────────────────────────
def compute_top_kernels(comp_title):
    """Return top‑10 kernels (by votes) for a given competition title."""
    comp_id = competitions.loc[competitions["Title"] == comp_title, "Id"].iloc[0]

    kc = pd.read_csv("/kaggle/input/meta-kaggle/KernelVersionCompetitionSources.csv")
    kv = kernel_versions[kernel_versions["Id"].isin(kc[kc["SourceCompetitionId"] == comp_id]["KernelVersionId"])]
    kdf = kernels[kernels["Id"].isin(kv["ScriptId"])]

    kdf = kdf.merge(users[["Id", "DisplayName"]], left_on="AuthorUserId", right_on="Id", how="left")
    kdf["DisplayName"].fillna("Unknown User", inplace=True)

    out = (
        kdf[
            ["Id_x", "CurrentUrlSlug", "DisplayName", "TotalVotes", "TotalViews", "CreationDate"]
        ]
        .rename(columns={"Id_x": "KernelId"})
        .drop_duplicates()
        .sort_values("TotalVotes", ascending=False)
        .head(10)
    )
    out["Title_Kernel"] = out["DisplayName"] + "'s Kernel"
    out["Kernel URL"] = "https://www.kaggle.com/code/" + out["CurrentUrlSlug"].fillna("")
    return out.drop(columns="CurrentUrlSlug")


def compute_top_contributors(kdf):
    """Aggregate kernel DF to top‑10 authors by total votes."""
    if "DisplayName" not in kdf.columns:
        kdf = kdf.merge(users[["Id", "DisplayName"]], left_on="AuthorUserId", right_on="Id", how="left")
        kdf["DisplayName"].fillna("User_" + kdf["AuthorUserId"].astype(str), inplace=True)

    id_col = "KernelId" if "KernelId" in kdf.columns else "AuthorUserId"
    return (
        kdf.groupby("DisplayName")
        .agg({"TotalVotes": "sum", "TotalViews": "sum", id_col: "count"})
        .rename(columns={id_col: "Total Kernels"})
        .sort_values("TotalVotes", ascending=False)
        .head(10)
        .reset_index()
    )


def compute_forum_tables(comp_title):
    """Return (top forum users, top messages) for a competition."""
    fm = forum_messages_comp[forum_messages_comp["CompetitionTitle"] == comp_title]
    if fm.empty:
        return pd.DataFrame(), pd.DataFrame()

    fm_u = fm.merge(users[["Id", "DisplayName"]], left_on="PostUserId", right_on="Id", how="left")

    top_forum = (
        fm_u.groupby("DisplayName").size().sort_values(ascending=False).head(5).reset_index(name="Total Forum Messages")
    )
    top_msg = fm_u.nlargest(5, "Id")[["DisplayName", "Message", "PostDate"]]
    top_msg["Message"] = top_msg["Message"].str.slice(0, 100) + "…"
    return top_forum, top_msg


def contributor_choropleth(kdf):
    """Choropleth of kernel authors by country (returns Plotly fig or None)."""
    if "DisplayName" not in kdf.columns:
        return None
    geo = (
        kdf.merge(users[["DisplayName", "Country"]], on="DisplayName", how="left")
        .groupby("Country")
        .size()
        .reset_index(name="KernelCount")
        .dropna()
    )
    if geo.shape[0] < 2:
        return None
    return px.choropleth(
        geo,
        locations="Country",
        locationmode="country names",
        color="KernelCount",
        title="Top Kernel Authors by Country",
        color_continuous_scale="viridis",
    )


def tag_cloud_img(kdf):
    """Generate a WordCloud image array for kernel tags (or None)."""
    # ── FIX: must reference 'KernelId', not 'Id' ──────────────────────
    if "KernelTags" not in globals() or "KernelId" not in kdf.columns:
        return None

    tags = KernelTags[KernelTags["KernelId"].isin(kdf["KernelId"])]
    if tags.empty:
        return None

    words = " ".join(tags["TagName"].tolist())
    if not words:
        return None

    wc = WordCloud(width=800, height=300, background_color="white", collocations=False).generate(words)
    plt.figure(figsize=(10, 4))
    plt.imshow(wc)
    plt.axis("off")
    plt.tight_layout()
    plt.close()
    return wc.to_array()


def timeline_fig(kdf):
    """Area chart of kernel publications over time (returns Plotly fig or None)."""
    ks = kdf[["CreationDate"]].dropna()
    if ks.empty:
        return None
    ks["Month"] = ks["CreationDate"].dt.to_period("M").astype(str)
    tl = ks.groupby("Month").size().reset_index(name="Kernels")
    if tl.shape[0] < 2:
        return None
    return px.area(tl, x="Month", y="Kernels", title="Top Kernel Publications Over Time")


def gm_table(kdf):
    """Return table of Grandmaster authors appearing in kernel DF."""
    if "AuthorUserId" not in kdf.columns:
        return pd.DataFrame()
    gm = users[(users["PerformanceTier"] == 4) & (users["Id"].isin(kdf["AuthorUserId"].unique()))]
    if gm.empty:
        return pd.DataFrame()
    gm["Profile"] = "https://www.kaggle.com/" + gm["UserName"]
    return gm[["DisplayName", "Country", "Profile"]]

# ──────────────────────────────────────────────
#  Forum Activity Merge (Topics ↔ Messages ↔ Competitions)
# ──────────────────────────────────────────────
forum_topics_comp = forum_topics.merge(
    competitions[["ForumId", "Title", "Slug", "EnabledDate"]],
    on="ForumId",
    how="left",
).rename(columns={"Title_y": "CompetitionTitle", "Slug": "CompetitionSlug"})

forum_messages_comp = forum_messages.merge(
    forum_topics_comp[["Id", "CompetitionTitle", "CompetitionSlug"]],
    left_on="ForumTopicId",
    right_on="Id",
    how="left",
)

forum_activity = (
    forum_messages_comp.groupby("CompetitionTitle").size().reset_index(name="TotalMessages").dropna().sort_values(
        "TotalMessages", ascending=False
    )
)

# ──────────────────────────────────────────────
#  High‑Level Kernel Metrics (Votes / Views / Comments)
# ──────────────────────────────────────────────
kernel_metrics = kernels[
    [
        "Id",
        "AuthorUserId",
        "CurrentKernelVersionId",
        "CreationDate",
        "MadePublicDate",
        "TotalViews",
        "TotalComments",
        "TotalVotes",
        "Medal",
    ]
]

top_voted_kernels = kernel_metrics.nlargest(15, "TotalVotes")
top_viewed_kernels = kernel_metrics.nlargest(15, "TotalViews")
top_commented_kernels = kernel_metrics.nlargest(15, "TotalComments")

# ──────────────────────────────────────────────
#  Map Top Kernels to Their Competitions
# ──────────────────────────────────────────────
kernel_comp_sources = pd.read_csv(input_path + "KernelVersionCompetitionSources.csv")

top_kernel_ids = pd.concat(
    [top_voted_kernels["Id"], top_viewed_kernels["Id"], top_commented_kernels["Id"]]
).unique()

top_kernel_versions = kernel_versions[kernel_versions["ScriptId"].isin(top_kernel_ids)]

kernel_comp_links = top_kernel_versions[["Id", "ScriptId", "AuthorUserId", "Title", "CreationDate"]].merge(
    kernel_comp_sources, left_on="Id", right_on="KernelVersionId", how="inner"
).merge(
    competitions[["Id", "Slug", "Title", "EnabledDate"]],
    left_on="SourceCompetitionId",
    right_on="Id",
    suffixes=("_Kernel", "_Competition"),
)

kernel_comp_links = kernel_comp_links[
    ["ScriptId", "Title_Kernel", "AuthorUserId", "CreationDate", "SourceCompetitionId", "Slug", "Title_Competition", "EnabledDate"]
]


In [13]:
# ────────────────────────────────────────────────────────────────
#  Kaggle Community Explorer – Gradio Dashboard
#  (Now includes all Featured competitions & Top 20 contributor names)
# ────────────────────────────────────────────────────────────────
import gradio as gr

# ──────────────────────────────────────────────
#  Helper: Grandmaster Table (unchanged)
# ──────────────────────────────────────────────
def gm_table(kdf):
    if "AuthorUserId" not in kdf.columns:
        return pd.DataFrame()

    gm = users[
        (users["PerformanceTier"] == 4)  # Tier‑4 = Grandmaster
        & (users["Id"].isin(kdf["AuthorUserId"].unique()))
    ]
    if gm.empty:
        return pd.DataFrame()

    gm["Profile"] = "https://www.kaggle.com/" + gm["UserName"]
    return gm[["DisplayName", "Country", "Profile"]]

# ──────────────────────────────────────────────
#  Dashboard callback
# ──────────────────────────────────────────────
def dashboard(comp_title):
    empty = pd.DataFrame()
    # Guard for no selection
    if comp_title in ("⬇️ Choose competition", None):
        return (
            empty, empty, empty, empty,  # topk, contrib, forum, msg
            None, None, None, empty,      # geo, wc, timeline, gm
            "",                          # names list
            gr.update(visible=False),     # wc visibility
            gr.update(visible=False)      # gm visibility
        )

    # --- Compute main tables & figures ---
    top_ks = compute_top_kernels(comp_title)
    if top_ks.empty:
        return (
            empty, empty, empty, empty,
            None, None, None, empty,
            "",
            gr.update(visible=False),
            gr.update(visible=False)
        )

    # Filter kernels for this competition
    slug_list = top_ks["Kernel URL"].str.replace(
        "https://www.kaggle.com/code/", "", regex=False
    )
    kdf_calc = kernels[kernels["CurrentUrlSlug"].isin(slug_list)]
    if "DisplayName" not in kdf_calc.columns:
        kdf_calc = kdf_calc.merge(
            users[["Id", "DisplayName"]],
            left_on="AuthorUserId", right_on="Id", how="left"
        ).fillna({"DisplayName": "Unknown User"})

    contrib_df = compute_top_contributors(kdf_calc)
    forum_df, msg_df = compute_forum_tables(comp_title)
    geo_fig  = contributor_choropleth(kdf_calc)
    wc_img   = tag_cloud_img(kdf_calc)
    time_fig = timeline_fig(kdf_calc)
    gm_df    = gm_table(kdf_calc)

    # --- Build bullet list of Top 20 contributor names ---
    top20_names = (
        kdf_calc.groupby("DisplayName")["TotalVotes"]
        .sum()
        .sort_values(ascending=False)
        .head(20)
        .index
        .tolist()
    )
    names_list = "\n".join(f"- {n}" for n in top20_names)

    # --- Return all outputs in order ---
    return (
        top_ks,
        contrib_df,
        forum_df,
        msg_df,
        geo_fig,
        wc_img,
        time_fig,
        gm_df,
        names_list,
        gr.update(visible=wc_img is not None),
        gr.update(visible=not gm_df.empty),
    )

# ──────────────────────────────────────────────
#  Build Gradio UI
# ──────────────────────────────────────────────
with gr.Blocks(theme="soft", css=".gradio-container { max-width: 90% !important; }") as demo:
    gr.Markdown("# 🏆 Kaggle Community Explorer")

    # Dropdown of all Featured competitions
    featured_list = (
        competitions[competitions["HostSegmentTitle"] == "Featured"]["Title"]
        .dropna()
        .sort_values()
        .tolist()
    )
    comp_dd = gr.Dropdown(
        ["⬇️ Choose competition"] + featured_list,
        label="Competition",
        info="Select a featured competition to explore its community dynamics.",
    )

    # Layout – four rows of widgets
    with gr.Row(variant="panel"):
        topk_tbl    = gr.Dataframe(label="🏅 Top 10 Kernels by Votes")
        contrib_tbl = gr.Dataframe(label="👥 Top 10 Contributors by Votes")

    with gr.Row(variant="panel"):
        forum_tbl = gr.Dataframe(label="💬 Top 5 Forum Contributors by Posts")
        msg_tbl   = gr.Dataframe(label="🗨️ Top 5 Most Replied‑To Messages")

    with gr.Row(variant="panel"):
        geo_plot  = gr.Plot(label="🌍 Authors by Country")
        time_plot = gr.Plot(label="⏳ Kernel Publications Over Time")

    with gr.Row(variant="panel"):
        wc_image = gr.Image(label="☁️ Tag Cloud of Top Kernels", visible=False, type="numpy")
        gm_tbl   = gr.Dataframe(label="🏅 Grandmasters in Top Kernels", visible=False)

    # Markdown list for Top 20 contributor names
    names_md = gr.Markdown(label="👥 Top 20 Contributor Names")

    # Outputs must match the dashboard return tuple
    outputs = [
        topk_tbl, contrib_tbl, forum_tbl, msg_tbl,
        geo_plot, wc_image, time_plot, gm_tbl,
        names_md,
        wc_image, gm_tbl
    ]

    comp_dd.change(fn=dashboard, inputs=comp_dd, outputs=outputs)

demo.launch(inline=True)


* Running on local URL:  http://127.0.0.1:7860
It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

* Running on public URL: https://510a4518aa5e38f442.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




<!-- This comes after the interactive Gradio dashboard -->
<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7; margin-top:25px;">
The Community Explorer reveals the "who" and "what" behind each competition's success. It shows us the heroes and the breakthrough ideas. But it also raises a deeper question about the very structure of collaboration. When an idea is shared, how does the knowledge actually flow through the community?
</p>
<hr style="border: 1px solid #e0e0e0; margin-top: 40px; margin-bottom: 40px;">

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; color:#00A99D; line-height:1.7; text-align:center;">
    <em>We saw how a spark can create a ripple through time, but what does this collaboration look like structurally? Is it a long, winding chain of inheritance, or something else entirely? We went back to the Mercari data to map the anatomy of a fork.</em>
</p>




<h2 style="font-family:Georgia; color:#20BEFF; border-bottom:2px solid #20BEFF; padding-bottom:5px;">
    6. Mapping the Flow: The Knowledge Hub
</h2>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
A fork is the ultimate act of inheritance on Kaggle. It's a moment where a competitor sees an idea and says, "I want to build on that." But what is the shape of that inheritance? To find out, we identified the single most-forked public notebook from the Mercari competition and visualized its direct lineage.
</p>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
The result, which you can see in the network graph below, was not the long, winding "family tree" we expected. Instead, the data revealed a stunningly clear and powerful pattern: a <strong style="color:#FFB822;">Knowledge Hub</strong>.
</p>



In [14]:
# ──────────────────────────────────────────────
#  Section 5: Visualising the “Knowledge Hub”
#  (Identifies the most‑forked kernel version, then
#   builds a hub‑and‑spoke network graph around it)
# ──────────────────────────────────────────────
import pandas as pd
import networkx as nx
import plotly.graph_objects as go

print("--- Final Section 5: Visualising the Knowledge Hub ---")

# ── 1. Locate the Most‑Forked Kernel Version ─────────────────────────
try:
    fork_counts = kernels["ForkParentKernelVersionId"].value_counts().reset_index()
    fork_counts.columns = ["KernelVersionId", "ForkCount"]

    kernel_sources = pd.read_csv(input_path + "KernelVersionCompetitionSources.csv")
    comp_kernel_versions = kernel_sources[kernel_sources["SourceCompetitionId"] == competition_id]

    comp_forks = pd.merge(fork_counts, comp_kernel_versions, on="KernelVersionId")

    if comp_forks.empty:
        root_version_id = -1
    else:
        root_version_id = comp_forks.sort_values("ForkCount", ascending=False).iloc[0]["KernelVersionId"]
        fork_count = comp_forks.sort_values("ForkCount", ascending=False).iloc[0]["ForkCount"]

        root_title = kernel_versions.loc[kernel_versions["Id"] == root_version_id, "Title"].iloc[0]
        print(f"✅ Root Found: '{root_title}' (forked {fork_count} times)")

except Exception:
    root_version_id = -1

# ── 2. Build a Hub‑and‑Spoke Graph ───────────────────────────────────
if root_version_id != -1:
    G = nx.DiGraph()
    child_kernels = kernels[kernels["ForkParentKernelVersionId"] == root_version_id]

    G.add_node(root_version_id)  # root
    for _, row in child_kernels.iterrows():
        child_id = row["CurrentKernelVersionId"]
        if pd.notna(child_id):
            G.add_edge(root_version_id, child_id)

    print(f"Network built with {G.number_of_nodes()} nodes and {G.number_of_edges()} edges.")

    if G.number_of_nodes() > 1:
        # ── 3. Prepare Plotly Coordinates & Traces ────────────────────
        pos = nx.kamada_kawai_layout(G)

        nodes_df = (
            pd.DataFrame(G.nodes(), columns=["Id"])
            .merge(kernel_versions[["Id", "Title", "TotalVotes"]], on="Id", how="left")
            .fillna({"TotalVotes": 0, "Title": "Untitled"})
        )

        # Edges
        edge_x, edge_y = [], []
        for edge in G.edges():
            x0, y0 = pos[edge[0]]; x1, y1 = pos[edge[1]]
            edge_x += [x0, x1, None]; edge_y += [y0, y1, None]
        edge_trace = go.Scatter(x=edge_x, y=edge_y, mode="lines",
                                line=dict(width=1, color="#AAAAAA"), hoverinfo="none")

        # Nodes
        node_x, node_y = [], []
        for node in G.nodes():
            x, y = pos[node]; node_x.append(x); node_y.append(y)

        sizes  = [30 if n == root_version_id else 15 for n in G.nodes()]
        colours = ["#FFB822" if n == root_version_id else "#20BEFF" for n in G.nodes()]

        node_trace = go.Scatter(
            x=node_x, y=node_y, mode="markers",
            marker=dict(size=sizes, color=colours, line_width=2),
            hoverinfo="text"
        )
        node_trace.text = [
            f"<b>{row.Title}</b><br>Votes: {int(row.TotalVotes):,}"
            for _, row in nodes_df.iterrows()
        ]

        # ── 4. Render Interactive Graph ───────────────────────────────
        fig_network = go.Figure(
            data=[edge_trace, node_trace],
            layout=go.Layout(
                title="<b>The Knowledge Hub: Central Idea & Offshoots</b>"
                      "<br>Yellow = most‑forked notebook, Blue = direct forks",
                titlefont_size=16,
                hovermode="closest",
                showlegend=False,
                margin=dict(t=50, b=20, l=5, r=5),
                xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
                yaxis=dict(showgrid=False, zeroline=False, showticklabels=False)
            )
        )
        fig_network.show()


--- Final Section 5: Visualising the Knowledge Hub ---
✅ Root Found: 'A simple nn solution with Keras (~0.48611 PL)' (forked 9.0 times)
Network built with 10 nodes and 9 edges.


<!-- The Network Graph visualization from our Python code will be displayed after this cell -->

<div style="border-left: 4px solid #FFB822; padding-left: 15px; font-style: italic; font-family: 'Segoe UI'; font-size:1.2em; margin-top: 30px; margin-bottom: 30px;">
    The structure is unmistakable. The community didn't just iterate in a linear chain. Instead, they identified a single, foundational piece of work—the central <strong style="color:#FFB822;">golden node</strong>—and used it as a shared launchpad for dozens of independent new explorations, represented by the blue offshoots.
</div>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
This "star-shaped" pattern reveals a sophisticated and efficient mode of collaboration. It shows a community that is not only capable of recognizing a high-quality idea but is also wise enough to use it as a common baseline to accelerate parallel experimentation. This isn't just forking code; this is the architecture of collective intelligence. It's a testament to the power of a shared foundation in the hunt for a breakthrough.
</p>

<!-- This comes after the "Knowledge Hub" network graph and its description -->
<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7; margin-top:25px;">
Whether it's a ripple of progress through time or a hub of knowledge spreading outwards, the pattern is clear: behind every great leap forward is a generous act of sharing. These acts are not random. They are the conscious, consistent work of individuals who have chosen to build a community as much as they have chosen to build models.
</p>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7; margin-top:15px;">
Our analysis has shown the *effect* of these individuals. Now, it is time to celebrate the individuals themselves.
</p>

<hr style="border: 1px solid #e0e0e0; margin-top: 40px; margin-bottom: 40px;">

<p style="font-family:'Segoe UI', sans-serif; color:#00A99D; font-size:1.1em; line-height:1.7; text-align:center;">
    <em>For fifteen years, Kaggle's leaderboards have celebrated the winners. But another story, one of profound generosity, has been written in the forums and public notebooks. We wanted to create a place of honor for the heroes of that story.</em>
</p>



<h2 style="font-family:Georgia; color:#20BEFF; border-bottom:2px solid #20BEFF; padding-bottom:5px;">
    7. The Community Catalysts: A Hall of Fame
</h2>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
In every great story, there are the heroes who win the final prize, celebrated in the spotlight. But then there are the guides—the selfless, foundational figures who make the hero's journey possible in the first place. On Kaggle, the leaderboards tell us who the heroes are. But our data allows us to finally see the guides.
</p>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
We call them the <strong style="color:#FFB822;">Community Catalysts</strong>. They are the individuals whose contributions create disproportionate, exponential value for the entire ecosystem. But how do you measure generosity? How do you quantify mentorship?
</p>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
We realized every upvote on a forum post or a public notebook is more than a point. It's a quiet "thank you." It's a "you saved me hours of frustration." It's an "I never thought of it that way before." So we created a new metric: the <strong style="color:#FFB822;">Community Impact Score</strong>. It is not a measure of competitive wins, but a measure of gratitude received, calculated as the sum of every upvote on a user's public notebooks and forum messages.
</p>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
It is with this spirit of gratitude that we present our own tribute. Below is not just a table, but a gallery celebrating the Community Catalysts of Kaggle—the mentors, the teachers, and the innovators whose work has lit the way for thousands of others. The full list of these contributors numbers in the tens of thousands—a roll call of generosity too vast for any single page. Although the table below spotlights only the top 20 names, our Hall of Fame is truly boundless—every collaborator, mentor, and notebook sharer across Kaggle belongs in this tribute.
</p>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
We encourage to play by yourself with the leaderboard.
</p>

In [15]:
# ────────────────────────────────────────────────────────────────
#  💫 Community Catalyst Hall‑of‑Fame – Stand‑Alone Gradio App
#  (Drop this single cell in a fresh notebook / script)
# ────────────────────────────────────────────────────────────────
import gradio as gr
import pandas as pd

# ── Style & Tier helpers (identical to earlier) ───────────────────
HTML_STYLE = """
<style>
/* keep the dark canvas */
body{background:#0d1117;margin:0;padding:0;}

/* ---------- Hall‑of‑Fame cards ---------- */
.card-container{
    display:flex;flex-wrap:wrap;gap:20px;justify-content:center;
    font-family:'Segoe UI',sans-serif;
}
.catalyst-card{
    background:#ffffff;                 /* bright surface */
    border:1px solid #e0e0e0;border-radius:10px;
    width:300px;padding:20px;
    box-shadow:0 4px 10px rgba(0,0,0,0.10);
    display:flex;flex-direction:column;align-items:center;text-align:center;
    transition:transform .2s;
}
/* 👇  this one line forces ALL descendant text inside the card */
.catalyst-card, .catalyst-card *{color:#222 !important;}

.catalyst-card:hover{transform:translateY(-5px);}
.card-rank{font-size:1.6em;font-weight:700;color:#FFB822 !important;margin-bottom:5px;}
.card-username{font-size:1.3em;font-weight:600;margin-bottom:10px;}
.card-tier{padding:4px 10px;border-radius:12px;font-size:.8em;font-weight:700;color:#fff;margin-bottom:15px;}
.tier-gm{background:#e67e22;}
.tier-master{background:#9b59b6;}
.tier-expert{background:#3498db;}
.tier-contrib{background:#2ecc71;}
.tier-staff{background:#20BEFF;}
.card-impact-score{font-size:2.2em;font-weight:700;color:#20BEFF !important;margin-bottom:15px;}
.card-impact-label{font-size:.9em;color:#555 !important;margin-top:-15px;margin-bottom:20px;}
.card-stats{
    display:flex;justify-content:space-around;width:100%;
    font-size:.9em;color:#444 !important;
}
.stat-item{text-align:center;}
.stat-value{font-size:1.2em;font-weight:700;margin-bottom:2px;}
</style>
"""


TIER_MAP = {
    4: {"name": "Grandmaster", "class": "tier-gm"},
    3: {"name": "Master",      "class": "tier-master"},
    2: {"name": "Expert",      "class": "tier-expert"},
    1: {"name": "Contributor", "class": "tier-contrib"},
    5: {"name": "Kaggle Team", "class": "tier-staff"},
}

# ── Function: build Hall‑of‑Fame HTML for any N ────────────────────
def hall_of_fame_html(top_n: int) -> str:
    # 1️⃣  Impact‑Score calculation (public kernels + forum votes)
    public_kernels = kernels[kernels["MadePublicDate"].notna()]
    notebook_votes = (
        public_kernels.groupby("AuthorUserId")["TotalVotes"].sum()
        .reset_index(name="NotebookVotes")
        .rename(columns={"AuthorUserId": "UserId"})
    )

    topic_scores = forum_topics[["FirstForumMessageId", "Score"]]
    message_authors = forum_messages[["Id", "PostUserId"]]
    forum_votes = (
        topic_scores.merge(message_authors,
                           left_on="FirstForumMessageId", right_on="Id")
        .groupby("PostUserId")["Score"].sum()
        .reset_index(name="ForumVotes")
        .rename(columns={"PostUserId": "UserId"})
    )

    hof = (
        notebook_votes.merge(forum_votes, on="UserId", how="outer").fillna(0)
        .assign(ImpactScore=lambda df: df["NotebookVotes"] + df["ForumVotes"])
        .merge(users[["Id", "UserName", "PerformanceTier"]],
               left_on="UserId", right_on="Id", how="left")
        .dropna(subset=["UserName"])
        .sort_values("ImpactScore", ascending=False)
        .head(top_n)
        .reset_index(drop=True)
    )

    # 2️⃣  Generate card HTML
    cards = []
    for idx, row in hof.iterrows():
        tier = TIER_MAP.get(row["PerformanceTier"],
                            {"name": "User", "class": "tier-contrib"})
        cards.append(f"""
          <div class="catalyst-card">
            <div class="card-rank">#{idx + 1}</div>
            <div class="card-username">{row['UserName']}</div>
            <div class="card-tier {tier['class']}">{tier['name']}</div>
            <div class="card-impact-score">{row['ImpactScore']:,.0f}</div>
            <div class="card-impact-label">Community Impact Score</div>
            <div class="card-stats">
              <div class="stat-item">
                <div class="stat-value">📚 {row['NotebookVotes']:,.0f}</div>
                <div>Notebook Votes</div>
              </div>
              <div class="stat-item">
                <div class="stat-value">💬 {row['ForumVotes']:,.0f}</div>
                <div>Forum Votes</div>
              </div>
            </div>
          </div>""")

    return (
        HTML_STYLE
        + f"<h1 style='text-align:center;font-family:Georgia;color:#FFB822;'>Kaggle Community Catalysts</h1>"
        + f"<h3 style='text-align:center;font-family:Georgia;color:#20BEFF;'>Top {top_n} Hall‑of‑Fame</h3>"
        + "<div class='card-container'>" + "".join(cards) + "</div>"
    )

# ── Gradio app: slider → HTML output ───────────────────────────────
def update_hof(top_n):
    return hall_of_fame_html(int(top_n))

with gr.Blocks(theme="soft", css=".gradio-container { max-width: 90% !important; }") as hof_app:
    gr.Markdown("# 🏆 Community Catalyst Hall‑of‑Fame")

    slider = gr.Slider(
        minimum=10, maximum=100, step=5, value=20,
        label="How many top contributors to show?"
    )
    hof_html_box = gr.HTML()

    slider.change(fn=update_hof, inputs=slider, outputs=hof_html_box)

    # initial render
    hof_html_box.value = hall_of_fame_html(20)

hof_app.launch(inline=True)


* Running on local URL:  http://127.0.0.1:7861
It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

* Running on public URL: https://26bcf0707d87d55a1a.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)





<div style="border-left: 4px solid #FFB822; padding-left: 15px; font-style: italic; font-family: 'Segoe UI'; font-size:1.2em; margin-top: 30px; margin-bottom: 30px;">
    To the names you see above, and to the thousands more they represent, we want to say thank you. Thank you for every patient answer in the forums. Thank you for every elegantly documented notebook. Thank you for sharing your hard-won insights, not for a medal, but for the simple, profound belief that knowledge should be shared.
</div>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
You are the unseen architects of countless breakthroughs. You are the digital librarians for the world's largest AI repository. Your work has lowered the barrier to entry for aspiring data scientists, deepened the expertise of seasoned professionals, and woven the very fabric of this incredible community. You have proven, time and again over fifteen years, that the most powerful force in artificial intelligence is human generosity.
</p>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
This Hall of Fame is a testament to the idea that on Kaggle, there is more than one way to be a giant. You can win the prize, or you can help thousands of others take a step forward. These are the giants of our community.
</p>

<!-- This comes directly after the Hall of Fame section -->
<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7; margin-top:25px;">
The stories of these Community Catalysts are the heart of our findings, the human proof behind the data. Their collective effort is the grand story of Kaggle.
</p>

<hr style="border: 1px solid #e0e0e0; margin-top: 40px; margin-bottom: 40px;">

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; color:#00A99D; line-height:1.7; text-align:center;">
    <em>Our journey through the data is now complete. We have seen the scale, witnessed the culture, and celebrated the individuals. All that remains is to step back and reflect on the profound lesson this journey has taught us about the true nature of progress itself.</em>
</p>


<h2 style="font-family:Georgia; color:#20BEFF; border-bottom:2px solid #20BEFF; padding-bottom:5px;">
    8. The True Engine of Progress
</h2>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
For fifteen years, a story has been unfolding within Kaggle's data. Our journey began with an attempt to read it—to look past the leaderboards and find the <strong style="color:#FFB822;">real narrative of progress</strong>. We started with the immense scale of this digital civilization, moved to the culture of mentorship led by its most elite members, and then witnessed the undeniable proof: a single <strong style="color:#FFB822;">spark of generosity igniting a fire of collective improvement.</strong>
</p>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
We saw how knowledge flows, not in a simple line, but from powerful hubs of shared understanding. And we built a Hall of Fame, not for the highest score, but for <strong style="color:#FFB822;">the greatest gift: the gift of knowledge, freely given.</strong>
</p>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
The conclusion is no longer just a hypothesis; it is a data-driven truth. The machine learning revolution is not just about better algorithms or more powerful hardware. It is about a fundamental shift in how we solve problems. The true engine of progress is not isolated genius, but <strong style="color:#FFB822;">collaborative discovery</strong>. It is the simple, radical, and profoundly human idea that the fastest way to get smarter is to share what you know.
</p>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
This project was our attempt to hold a mirror up to the Kaggle community. In it, we did not find a battlefield of competitors. We found a <strong style="color:#FFB822;">civilization of teachers, learners, and builders</strong>. We found a testament to the idea that a rising tide lifts all boats.
</p>

<div style="border-left: 4px solid #FFB822; padding-left: 15px; font-style: italic; font-family: 'Segoe UI'; font-size:1.3em; margin-top: 30px; margin-bottom: 20px; line-height:1.7;">
    So, to the entire Kaggle community, past and present. To everyone who has ever stayed up late to debug a public kernel, who has answered a "stupid" question with kindness, who has forked a notebook to chase a new idea, or who has simply upvoted a post to say "thank you"—<br><br><strong style="color:#FFB822;">This is your story. You are the engine. Thank you for fifteen years of progress.</strong>
</div>


<h2 style="font-family:Georgia; color:#20BEFF; border-bottom:2px solid #20BEFF; padding-bottom:5px;">
    9. References & Acknowledgements
</h2>

<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
This project, in its very essence, is about standing on the shoulders of giants. Our analysis was only possible because of the data, tools, and community content generously provided by others. We wish to extend our deepest gratitude to the following:
</p>

<h4 style="font-family:Georgia; color:#00A99D; margin-top:30px;">Primary Data Source:</h4>
<ul style="font-family:'Segoe UI', sans-serif; font-size:1.0em; line-height:1.7;">
    <li style="margin-bottom:10px;">
        <strong style="color:#FFB822;">The Meta Kaggle Dataset:</strong> The foundation of this entire analysis. We are immensely grateful to Kaggle for making this rich, historical dataset publicly available, allowing for a deeper understanding of the community itself.
        <br>
        <a href="https://www.kaggle.com/datasets/kaggle/meta-kaggle" target="_blank">https://www.kaggle.com/datasets/kaggle/meta-kaggle</a>
    </li>
</ul>

<h4 style="font-family:Georgia; color:#00A99D; margin-top:30px;">Case Study & Community Examples:</h4>
<ul style="font-family:'Segoe UI', sans-serif; font-size:1.0em; line-height:1.7;">
    <li style="margin-bottom:10px;">
        <strong style="color:#FFB822;">Mercari Price Suggestion Challenge:</strong> The competition that served as the basis for our "Spark & Ripple" and "Knowledge Hub" case studies.
        <br>
        <a href="https://www.kaggle.com/c/mercari-price-suggestion-challenge" target="_blank">https://www.kaggle.com/c/mercari-price-suggestion-challenge</a>
    </li>
    <li style="margin-bottom:10px;">
        <strong style="color:#FFB822;">Community Mentorship Content:</strong> The individuals and organizations listed below are a few examples of the incredible educational content that Kagglers create beyond the platform, enriching the entire AI ecosystem.
        <ul>
            <li><a href="https://www.youtube.com/@abhishekkrthakur" target="_blank">Abhishek Thakur's YouTube Channel</a></li>
            <li><a href="https://www.youtube.com/@robmulla" target="_blank">Rob Mulla's YouTube Channel</a></li>
            <li><a href="https://www.youtube.com/playlist?list=PL_N_d5m9x_6-tr62N5I7A-I3q_s8i-2dF" target="_blank">Chai Time Data Science Podcast</a></li>
            <li><a href="https://developer.nvidia.com/blog/tag/kaggle-grandmasters/" target="_blank">NVIDIA's Kaggle GM Blog Series</a></li>
        </ul>
    </li>
</ul>

<h4 style="font-family:Georgia; color:#00A99D; margin-top:30px;">Acknowledgements for Tools:</h4>
<p style="font-family:'Segoe UI', sans-serif; font-size:1.1em; line-height:1.7;">
Finally, this entire interactive notebook was built using the incredible work of the open-source community. Our deep thanks go to the developers and maintainers of the core Python libraries that powered our analysis and visualizations, including <strong>Pandas, Matplotlib, Plotly, NetworkX, WordCloud,</strong> and <strong>Gradio.</strong> Your work makes projects like this possible.
</p>