### This notebook wrangles the 'track' column for NIPS-2018 data to match it to NIPS-2019 tracks

I will read in the NIPS dataset and split the `track` column to create the `main_track` and `track` columns, where:   
    `main_track`: the parent track the NIPS conference assigned a track to  
    `track`: the child track    
    `track_original`: a copy of the original conference assigned track information        

for example, for `track_original` = 'Algorithms -- Adversarial Learning':      
    `track` = 'Adversarial Learning'    
    `main_track` = 'Algorithms'      
        

In [1]:
import pandas as pd
import re

In [11]:
#read in 2019 data

nips = pd.read_csv("../data/nips.csv")

In [12]:
nips.groupby("year").size()

year
2016    1724
2017    1946
2018    2846
2019    1821
dtype: int64

In [23]:
#subset 2019 data
nips19 = nips[nips['year'].astype(int) == 2019].copy()

In [24]:
#split the track column to get the main_track info

nips19['track_original'] = nips19['track']

ot19 = nips19.track.tolist()

t19 = []
mt19 = []

for t in ot19:
    t = t.split(" -- ")
    mt19.append(t[0])
    t19.append(t[1])
    
nips19['track'] = t19
nips19['main_track'] = mt19

In [25]:
nips19.head()

Unnamed: 0,title,abstract,pdf_link,year,track,track_original,main_track
6516,A Game Theoretic Approach to Class-wise Select...,Selection of input features such as relevant p...,http://papers.nips.cc/paper/by-source-2019-5315,2019,Adversarial Learning,Algorithms -- Adversarial Learning,Algorithms
6517,A Little Is Enough: Circumventing Defenses For...,Distributed learning is central for large-scal...,http://papers.nips.cc/paper/by-source-2019-4657,2019,Adversarial Learning,Algorithms -- Adversarial Learning,Algorithms
6518,A New Defense Against Adversarial Images: Turn...,Natural images are virtually surrounded by low...,http://papers.nips.cc/paper/by-source-2019-926,2019,Adversarial Learning,Algorithms -- Adversarial Learning,Algorithms
6519,Tight Certificates of Adversarial Robustness f...,Strong theoretical guarantees of robustness ca...,http://papers.nips.cc/paper/by-source-2019-2720,2019,Adversarial Learning,Algorithms -- Adversarial Learning,Algorithms
6520,Adversarial training for free!,"Adversarial training, in which a network is tr...",http://papers.nips.cc/paper/by-source-2019-1853,2019,Adversarial Learning,Algorithms -- Adversarial Learning,Algorithms


In [27]:
nips19.groupby("main_track").size()

main_track
Algorithms                                         465
Applications                                       343
Data, Challenges, Implementations, and Software     11
Deep Learning                                      359
Neuroscience and Cognitive Science                  42
Optimization                                       122
Probabilistic Methods                              115
Reinforcement Learning and Planning                185
Theory                                             179
dtype: int64

In [28]:
nips19.to_csv("../data/nips_with_track_cleaned.csv", index= False)

In [32]:
#save the key

document = nips19[['track', 'track_original',  'main_track', 'year']]

document = document.drop_duplicates()

document.to_csv("../data/nips_yearwise_trackinfo.csv", index = False)