# How to create a single CSV file that contains all past Kaggle Competition solution writeups.

 - Winners of Kaggle Competitions typically share their winning methods via solution "writeups". 
 - This notebook demonstrates how to create a new dataset containing all past competition writeups.
 - To learn more about competition solution writeups, please refer to https://www.kaggle.com/discussions/product-feedback/373153.

In [1]:
# Import helpful Python packages
import pandas as pd
import datetime

# Load the data
Teams = pd.read_csv('/kaggle/input/meta-kaggle/Teams.csv',low_memory=False)
Competitions = pd.read_csv('/kaggle/input/meta-kaggle/Competitions.csv',low_memory=False)
ForumTopics = pd.read_csv('/kaggle/input/meta-kaggle/ForumTopics.csv',low_memory=False)
ForumMessages = pd.read_csv('/kaggle/input/meta-kaggle/ForumMessages.csv',low_memory=False)

# Rename a few columns
Teams = Teams.rename(columns={'Id': 'Id_Teams'})
ForumTopics = ForumTopics.rename(columns={'Id': 'Id_ForumTopics',
                                         'Title': 'Title of Writeup',
                                         'CreationDate': 'Date of Writeup'})
ForumMessages = ForumMessages.rename(columns={'Id': 'Id_ForumMessages',
                                              'Message': 'Writeup'})
Competitions = Competitions.rename(columns={'Id': 'Id_Competitions',
                                           'Title': 'Title of Competition',
                                           'EnabledDate': 'Competition Launch Date'})

# Organize everything
df = Teams.merge(right=ForumTopics, how='inner', left_on='WriteUpForumTopicId', right_on='Id_ForumTopics')
df = df.merge(right=ForumMessages, how='inner', left_on='Id_ForumTopics', right_on='ForumTopicId')
df = df.merge(right=Competitions, how='inner', left_on='ForumId', right_on='ForumId')
df = df[df['FirstForumMessageId'] == df['Id_ForumMessages']]
df = df[df['HostSegmentTitle'].isin(['Featured','Research'])]

# Add in URLs
df['Id_Competitions'] = df['Id_Competitions'].astype(str)
df['Competition URL'] = 'https://www.kaggle.com/c/'+df['Id_Competitions']
df['Id_ForumTopics'] = df['Id_ForumTopics'].astype(str)
df['Writeup URL'] = 'https://www.kaggle.com/c/'+df['Id_Competitions']+'/discussion/'+df['Id_ForumTopics']

# Final cleanup
df = df[['Competition Launch Date',
         'Title of Competition',
         'Competition URL',
         'Date of Writeup',
         'Title of Writeup',
         'Writeup',
         'Writeup URL']]

print('# of entries: ',df['Writeup'].count())

# of entries:  2977


In [2]:
# Preview the data
df.tail(10)

Unnamed: 0,Competition Launch Date,Title of Competition,Competition URL,Date of Writeup,Title of Writeup,Writeup,Writeup URL
46385,08/30/2022 17:23:07,Feedback Prize - English Language Learning,https://www.kaggle.com/c/38321,12/01/2022 10:01:32,15th Place Solution : 6 kinds of AttentionPool...,<h3>First of all</h3>\n<p>We appreciate this f...,https://www.kaggle.com/c/38321/discussion/369760
46386,08/30/2022 17:23:07,Feedback Prize - English Language Learning,https://www.kaggle.com/c/38321,11/30/2022 12:23:28,Efficiency 7th / Private 84st place solution,<h1>Overview</h1>\n<p>I took the average of 6 ...,https://www.kaggle.com/c/38321/discussion/369540
46391,08/30/2022 17:23:07,Feedback Prize - English Language Learning,https://www.kaggle.com/c/38321,11/30/2022 01:33:39,10th place solution...my quick write-up!,<p><strong>Acknowledgement</strong></p>\n<p>Ph...,https://www.kaggle.com/c/38321/discussion/369373
46400,08/30/2022 17:23:07,Feedback Prize - English Language Learning,https://www.kaggle.com/c/38321,12/07/2022 11:08:06,18th place solution,<h1>18 th place Solution</h1>\n<p>First of all...,https://www.kaggle.com/c/38321/discussion/370974
46413,08/30/2022 17:23:07,Feedback Prize - English Language Learning,https://www.kaggle.com/c/38321,12/01/2022 13:11:33,49th place (Weighted Loss etc.),<p>I'm new to the NLP and this is my first NLP...,https://www.kaggle.com/c/38321/discussion/369793
46417,08/30/2022 17:23:07,Feedback Prize - English Language Learning,https://www.kaggle.com/c/38321,12/08/2022 14:29:20,42th Place Solution,"<p>First of all, thank you for hosting this co...",https://www.kaggle.com/c/38321/discussion/371203
46420,08/30/2022 17:23:07,Feedback Prize - English Language Learning,https://www.kaggle.com/c/38321,12/05/2022 22:06:00,28th Place Efficiency Solution,"<p>First of all, thanks to Kaggle and the comp...",https://www.kaggle.com/c/38321/discussion/370686
46421,08/30/2022 17:23:07,Feedback Prize - English Language Learning,https://www.kaggle.com/c/38321,12/11/2022 07:42:36,37th place solution,"<p>First of all, thanks to the host for an int...",https://www.kaggle.com/c/38321/discussion/371602
46422,08/30/2022 17:23:07,Feedback Prize - English Language Learning,https://www.kaggle.com/c/38321,12/04/2022 04:48:15,"1,782nd place - Multitasking solution","<p>💡 Leaderboard: Top&nbsp;68%&nbsp;(1,782 out...",https://www.kaggle.com/c/38321/discussion/370356
46423,08/30/2022 17:23:07,Feedback Prize - English Language Learning,https://www.kaggle.com/c/38321,12/13/2022 18:24:44,A 0.45 tree-based solution and a few thoughts,"<h1><a href=""https://www.kaggle.com/competitio...",https://www.kaggle.com/c/38321/discussion/372028


In [3]:
# Save all writeups to a single CSV file
todays_date = f'kaggle_writeups_{datetime.datetime.now().strftime("%H%M_%m%d%Y")}'
df.to_csv('/kaggle/working/'+todays_date+'.csv',index=False)

To learn more about competition solution writeups, please refer to https://www.kaggle.com/discussions/product-feedback/373153.