# Investigate a Badminton Dataset

## Table of Contents
<ul>
<li><a href="#intro">Introduction</a></li>
<li><a href="#wrangling">Data Wrangling</a></li>
<li><a href="#eda">Exploratory Data Analysis</a></li>
<li><a href="#conclusions">Conclusions</a></li>
</ul>

<a id='intro'></a>
## Introduction

This dataset is downloaded from github. https://github.com/sdaphtardar/bwf-data which itself is forked from https://github.com/raywan/bwf-data



Import general packages and graphing capabilities which will be used in all datasets.

In [17]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os


<a id='wrangling'></a>
## Data Wrangling

> **Tip**: In this section of the report, you will load in the data, check for cleanliness, and then trim and clean your dataset for analysis. Make sure that you document your steps carefully and justify your cleaning decisions.

### General Properties

## Gather

Copy paste following cell to create multiple dataframes for each game type as needed.

Printing name of each file in directory ws

In [18]:
merged_df = pd.DataFrame()

In [19]:
for file1 in os.listdir('data/ws'):
    print(file1)
    temp_df = pd.read_csv('data/ws/'+file1)
    merged_df = merged_df.append(temp_df)
    
    

bwf_ws_2015w1.csv
bwf_ws_2015w10.csv
bwf_ws_2015w11.csv
bwf_ws_2015w12.csv
bwf_ws_2015w13.csv
bwf_ws_2015w14.csv
bwf_ws_2015w15.csv
bwf_ws_2015w16.csv
bwf_ws_2015w17.csv
bwf_ws_2015w18.csv
bwf_ws_2015w19.csv
bwf_ws_2015w2.csv
bwf_ws_2015w20.csv
bwf_ws_2015w21.csv
bwf_ws_2015w22.csv
bwf_ws_2015w23.csv
bwf_ws_2015w24.csv
bwf_ws_2015w25.csv
bwf_ws_2015w26.csv
bwf_ws_2015w27.csv
bwf_ws_2015w28.csv
bwf_ws_2015w29.csv
bwf_ws_2015w3.csv
bwf_ws_2015w30.csv
bwf_ws_2015w31.csv
bwf_ws_2015w32.csv
bwf_ws_2015w33.csv
bwf_ws_2015w34.csv
bwf_ws_2015w35.csv
bwf_ws_2015w36.csv
bwf_ws_2015w37.csv
bwf_ws_2015w38.csv
bwf_ws_2015w39.csv
bwf_ws_2015w4.csv
bwf_ws_2015w40.csv
bwf_ws_2015w41.csv
bwf_ws_2015w42.csv
bwf_ws_2015w43.csv
bwf_ws_2015w44.csv
bwf_ws_2015w45.csv
bwf_ws_2015w46.csv
bwf_ws_2015w47.csv
bwf_ws_2015w48.csv
bwf_ws_2015w49.csv
bwf_ws_2015w5.csv
bwf_ws_2015w50.csv
bwf_ws_2015w51.csv
bwf_ws_2015w52.csv
bwf_ws_2015w53.csv
bwf_ws_2015w6.csv
bwf_ws_2015w7.csv
bwf_ws_2015w8.csv
bwf_ws_2015w9.csv
bw

In [44]:
merged_df.shape

(68625, 8)

###### Load your data and print out a few lines. Perform operations to inspect data types and look for instances of missing or possibly errant data.


In [45]:
merged_df.sample(5)

Unnamed: 0.1,Unnamed: 0,RANK,COUNTRY,PLAYER,CHANGE +/-,WIN - LOSE,PRIZE MONEY,POINTS / TOURNAMENTS
118,118,119,RUS,Elena KOMENDROVSKAJA,0,0 - 0,,"14,540 / 12"
946,946,947,FRA,Emilie BEAUJEAN,-20,0 - 0,,270 / 2
18,18,19,THA,Busanan ONGBUMRUNGPHAN,0,131 - 62,"$52,695.00","42,937 / 13"
462,462,462,RUS,Olga LIPKINA,-3,0 - 0,,"1,810 / 4"
921,921,915,CHN,CHEN Huilin,8,0 - 0,,360 / 1


## Assess

###### perform rudimentary data assessment

In [26]:
merged_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 68625 entries, 0 to 1239
Data columns (total 8 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   Unnamed: 0            68625 non-null  object
 1   RANK                  68625 non-null  object
 2   COUNTRY               68625 non-null  object
 3   PLAYER                68625 non-null  object
 4   CHANGE +/-            68625 non-null  object
 5   WIN - LOSE            68625 non-null  object
 6   PRIZE MONEY           7890 non-null   object
 7   POINTS / TOURNAMENTS  68625 non-null  object
dtypes: object(8)
memory usage: 4.7+ MB


In [27]:
merged_df.describe()

Unnamed: 0.1,Unnamed: 0,RANK,COUNTRY,PLAYER,CHANGE +/-,WIN - LOSE,PRIZE MONEY,POINTS / TOURNAMENTS
count,68625,68625,68625,68625,68625,68625,7890,68625
unique,1240,1119,117,1677,575,2127,545,5097
top,619,733,INA,SUN Yu,0,0 - 0,$0.00,550 / 1
freq,59,667,3209,59,19245,60759,1837,7879


In [28]:
merged_df.corr()

In [29]:
merged_df.sample(5)

Unnamed: 0.1,Unnamed: 0,RANK,COUNTRY,PLAYER,CHANGE +/-,WIN - LOSE,PRIZE MONEY,POINTS / TOURNAMENTS
468,468,466,CHN,CHEN Xiao Jia,-184,0 - 0,,"1,670 / 1"
1100,1100,1081,BUL,Selin BAKALOVA,7,0 - 0,,170 / 1
1163,1163,1143,POL,Agata ISKRA,-5,0 - 0,,100 / 1
303,303,304,CRO,Katarina GALENIC,-3,0 - 0,,"3,470 / 6"
631,631,559,ETH,TADESSE Samrawit,-6,0 - 0,,920 / 1


###### List of issues you identiefied using rudimentry assessment

1. Extra column of index present at first location
2. Rank column is in object datatype, it should be int
3. Change +/- is in object datatype, it should be int
4. WIN - LOSE should be split in 2 different columns with int datatype
5. PRIZE MONEY is in object datatype, it should be int
6. POINTS / TOURNAMENTS should be split in 2 different columns with int datatype
7. All column names should be lower case with underscore (_) as separator

### Cleaning above issues first so that I can continue with rest of the assessment

###### Define - Extra column of index present at first location. SOlution to this issue is to simply drop that column

###### Code

In [251]:
df=merged_df.copy()

In [252]:
df.shape

(68625, 8)

In [253]:
df.columns.values

array(['Unnamed: 0', 'RANK', 'COUNTRY', 'PLAYER', 'CHANGE +/-',
       'WIN - LOSE', 'PRIZE MONEY', 'POINTS / TOURNAMENTS'], dtype=object)

In [254]:
df=df.drop(['Unnamed: 0'], axis=1)

Test

In [255]:
df.shape

(68625, 7)

###### ---------------------------------------------------------------------------------------------

###### Define - Rank column is in object datatype, it should be int. Solution is to change the datatype of the column using astype('int')

###### Code

In [256]:
df.dtypes

RANK                    object
COUNTRY                 object
PLAYER                  object
CHANGE +/-              object
WIN - LOSE              object
PRIZE MONEY             object
POINTS / TOURNAMENTS    object
dtype: object

In [257]:
df['RANK'].dtype

dtype('O')

In [258]:
df['RANK']=df['RANK'].astype('int32')

Test

In [259]:
df['RANK'].dtype

dtype('int32')

###### ---------------------------------------------------------------------------------------------

###### Define - Change +/- column is in object datatype, it should be int. Solution is to change the datatype of the column using astype('int')

###### Code

In [260]:
df.dtypes

RANK                     int32
COUNTRY                 object
PLAYER                  object
CHANGE +/-              object
WIN - LOSE              object
PRIZE MONEY             object
POINTS / TOURNAMENTS    object
dtype: object

In [261]:
df.columns.values

array(['RANK', 'COUNTRY', 'PLAYER', 'CHANGE +/-', 'WIN - LOSE',
       'PRIZE MONEY', 'POINTS / TOURNAMENTS'], dtype=object)

In [262]:
df['CHANGE +/-'].dtype

dtype('O')

In [263]:
df['CHANGE +/-']=df['CHANGE +/-'].astype('int32')

###### Test

In [264]:
df['CHANGE +/-'].dtype

dtype('int32')

###### ---------------------------------------------------------------------------------------------

###### Define - WIN - LOSE column should be split in 2 different columns with int data type

###### Code

In [265]:
df.columns.values

array(['RANK', 'COUNTRY', 'PLAYER', 'CHANGE +/-', 'WIN - LOSE',
       'PRIZE MONEY', 'POINTS / TOURNAMENTS'], dtype=object)

In [266]:
df['WIN - LOSE'].sample(7)

188     18 - 27
177       0 - 0
1069      0 - 0
606       0 - 0
285       0 - 0
109       0 - 0
463       0 - 0
Name: WIN - LOSE, dtype: object

In [267]:
df.sample(7)

Unnamed: 0,RANK,COUNTRY,PLAYER,CHANGE +/-,WIN - LOSE,PRIZE MONEY,POINTS / TOURNAMENTS
698,631,ETH,AFEWORK Rakeb,3,0 - 0,,920 / 1
259,260,UKR,Maryna ILYINSKAYA,2,0 - 0,,"4,750 / 7"
1077,1077,TUR,Basak KILIC,0,0 - 0,,100 / 1
167,168,IND,Lalita DAHIYA,18,0 - 0,,"9,120 / 11"
296,297,BAR,Sabrina SCOTT,6,0 - 0,,"3,720 / 4"
814,701,MRI,Sheem SANDOOYEEA,5,0 - 0,,550 / 1
226,227,INA,Intan Dwi JAYANTI,0,0 - 0,,"5,750 / 4"


In [268]:
df.head(7)

Unnamed: 0,RANK,COUNTRY,PLAYER,CHANGE +/-,WIN - LOSE,PRIZE MONEY,POINTS / TOURNAMENTS
0,1,CHN,LI Xuerui,0,221 - 46,"$626,110.00","95,244 / 10"
1,2,CHN,WANG Shixian,0,220 - 64,"$658,775.00","82,677 / 15"
2,3,CHN,WANG Yihan,0,307 - 71,"$738,865.50","75,611 / 15"
3,4,IND,Saina NEHWAL,0,278 - 124,"$570,610.00","71,081 / 15"
4,5,KOR,SUNG Ji Hyun,0,191 - 101,"$298,365.00","70,124 / 17"
5,6,THA,Ratchanok INTANON,0,194 - 93,"$295,955.00","65,142 / 15"
6,7,TPE,TAI Tzu Ying,0,163 - 97,"$226,905.00","64,608 / 17"


In [269]:
len(df['WIN - LOSE'].str.split('-'))

68625

In [270]:
type(df['WIN - LOSE'].str.split('-'))

pandas.core.series.Series

In [271]:
df_test=pd.DataFrame()

In [272]:
df_test=df['WIN - LOSE'].str.split('-')

In [273]:
df_test.shape

(68625,)

In [274]:
df_test[0:1][0]

['221 ', ' 46']

In [275]:
df_test[0:1][0][0]

'221 '

In [276]:
df_test[0:1][0][1]

' 46'

In [277]:
df['win']=df_test.map(lambda x:x[0])

In [278]:
df['loss']=df_test.map(lambda x:x[1])

In [279]:
df['win']=df['win'].astype('int32')

In [280]:
df['loss']=df['loss'].astype('int32')

###### Test

In [281]:
df.dtypes

RANK                     int32
COUNTRY                 object
PLAYER                  object
CHANGE +/-               int32
WIN - LOSE              object
PRIZE MONEY             object
POINTS / TOURNAMENTS    object
win                      int32
loss                     int32
dtype: object

In [284]:
df.sample(7)

Unnamed: 0,RANK,COUNTRY,PLAYER,CHANGE +/-,WIN - LOSE,PRIZE MONEY,POINTS / TOURNAMENTS,win,loss
672,673,GRE,Ioanna KARAPETRIDOU,-9,0 - 0,,880 / 4,0,0
946,931,MRI,Shania LEUNG,-10,0 - 0,,210 / 1,0,0
943,933,GRE,Angeliki GEORGIADOU,-2,0 - 0,,210 / 1,0,0
821,723,CZE,Marcela RUZICKA,0,0 - 0,,550 / 1,0,0
992,991,EST,Mari Ann KARJUS,29,0 - 0,,120 / 2,0,0
568,557,NZL,Maysell DOWNEY,0,0 - 0,,"1,290 / 1",0,0
542,542,NZL,Christine ZHANG,7,0 - 0,,"1,240 / 2",0,0


###### ---------------------------------------------------------------------------------------------

###### Define - PRIZE MONEY column should be renamed to prize_money_usd and trim  dollar sign and change datatype to int

###### Code

In [285]:
df.sample(7)

Unnamed: 0,RANK,COUNTRY,PLAYER,CHANGE +/-,WIN - LOSE,PRIZE MONEY,POINTS / TOURNAMENTS,win,loss
936,935,NZL,Rebecca YE,-6,0 - 0,,350 / 1,0,0
754,713,KAZ,Ademi SERIKBAYEVA,-10,0 - 0,,550 / 1,0,0
820,723,TUR,Ãzge OZER,6,0 - 0,,550 / 1,0,0
942,926,IRI,Yeganeh KERMANI,0,0 - 0,,360 / 1,0,0
1148,1143,AUT,Daniela ADAMEC,0,0 - 0,,70 / 1,0,0
1152,1151,SVK,Lenka DROTAROVA,7,0 - 0,,60 / 1,0,0
1018,998,UGA,Ritah NAKIMERA,-17,0 - 0,,210 / 1,0,0


In [286]:
df=df.rename(columns={'PRIZE MONEY':'prize_money_usd'})

In [287]:
df.columns.values

array(['RANK', 'COUNTRY', 'PLAYER', 'CHANGE +/-', 'WIN - LOSE',
       'prize_money_usd', 'POINTS / TOURNAMENTS', 'win', 'loss'],
      dtype=object)

In [288]:
df.sample(7)

Unnamed: 0,RANK,COUNTRY,PLAYER,CHANGE +/-,WIN - LOSE,prize_money_usd,POINTS / TOURNAMENTS,win,loss
450,449,RUS,Alina DAVLETOVA,1,0 - 0,,"1,840 / 2",0,0
845,716,MAR,Ghita CHAROUITE,-5,0 - 0,,550 / 1,0,0
1008,1004,KOR,Song KIM,-5,0 - 0,,320 / 1,0,0
68,69,HUN,Laura SAROSI,-2,108 - 72,"$3,317.50","22,900 / 24",108,72
1042,1018,MRI,Stephanie KOO TZE MEW,2,0 - 0,,100 / 1,0,0
731,706,PAK,Mahoor SHAHZAD,9,0 - 0,,550 / 1,0,0
698,697,KOR,Min Ji LEE,6,0 - 0,,660 / 1,0,0


In [289]:
df.head()

Unnamed: 0,RANK,COUNTRY,PLAYER,CHANGE +/-,WIN - LOSE,prize_money_usd,POINTS / TOURNAMENTS,win,loss
0,1,CHN,LI Xuerui,0,221 - 46,"$626,110.00","95,244 / 10",221,46
1,2,CHN,WANG Shixian,0,220 - 64,"$658,775.00","82,677 / 15",220,64
2,3,CHN,WANG Yihan,0,307 - 71,"$738,865.50","75,611 / 15",307,71
3,4,IND,Saina NEHWAL,0,278 - 124,"$570,610.00","71,081 / 15",278,124
4,5,KOR,SUNG Ji Hyun,0,191 - 101,"$298,365.00","70,124 / 17",191,101


Trimming first character of the data in prize_money_usd provided the charracter is  $

In [290]:
df.prize_money_usd=df.prize_money_usd.str.replace('$','')

In [293]:
df.prize_money_usd=df.prize_money_usd.str.replace(',','')

In [294]:
df['prize_money_usd']=df['prize_money_usd'].astype('float')

In [295]:
df.sample(7)

Unnamed: 0,RANK,COUNTRY,PLAYER,CHANGE +/-,WIN - LOSE,prize_money_usd,POINTS / TOURNAMENTS,win,loss
330,331,NGR,Fatima AZEEZ,0,0 - 0,,"3,270 / 2",0,0
24,25,KOR,KIM Hyo Min,0,95 - 46,12325.5,"36,560 / 16",95,46
741,710,BOT,Maranyane TUELO,-4,0 - 0,,550 / 1,0,0
201,202,ARG,Florencia BERNATENE,0,0 - 0,,"7,092 / 10",0,0
57,58,INA,Hana RAMADHINI,-1,68 - 36,7025.5,"25,220 / 11",68,36
200,201,UGA,Bridget Shamim BANGI,0,55 - 42,235.0,"6,956 / 8",55,42
41,42,IRL,Chloe MAGEE,2,154 - 129,15166.75,"28,392 / 17",154,129


###### ---------------------------------------------------------------------------------------------

###### Define - POINTS / TOURNAMENTS column should be split in 2 columns named points & tournaments with int datatype

###### Code

In [296]:
df_test=df['POINTS / TOURNAMENTS'].str.split('/')

In [297]:
df_test

0       [95,244 ,  10]
1       [82,677 ,  15]
2       [75,611 ,  15]
3       [71,081 ,  15]
4       [70,124 ,  17]
             ...      
1235         [60 ,  1]
1236         [60 ,  1]
1237         [60 ,  1]
1238         [60 ,  1]
1239         [60 ,  1]
Name: POINTS / TOURNAMENTS, Length: 68625, dtype: object

In [298]:
df['points']=df_test.map(lambda x:x[0])

In [299]:
df['tournaments']=df_test.map(lambda x:x[1])

In [301]:
df['tournaments']=df['tournaments'].astype('int')

In [303]:
df.points=df.points.str.replace(',','')

In [304]:
df['points']=df['points'].astype('int')

######  Test

In [315]:
df.sample(7)

Unnamed: 0,RANK,COUNTRY,PLAYER,CHANGE +/-,WIN - LOSE,prize_money_usd,POINTS / TOURNAMENTS,win,loss,points,tournaments
553,551,NZL,Courtney TRILLO,0,0 - 0,,"1,290 / 1",0,0,1290,1
125,126,TUR,Cemre FERE,2,0 - 0,,"13,120 / 15",0,0,13120,15
433,427,KEN,Lavina MARTINS,1,0 - 0,,"1,840 / 2",0,0,1840,2
764,765,SWE,Berfin ASLAN,0,0 - 0,,590 / 4,0,0,590,4
210,211,THA,Thamolwan POOPRADUBSIL,0,0 - 0,,"6,050 / 4",0,0,6050,4
24,25,HKG,CHEUNG Ngan Yi,0,72 - 59,16765.0,"38,808 / 18",72,59,38808,18
1113,1107,RUS,Daria DZHEDZHLA,1,0 - 0,,60 / 1,0,0,60,1


###### ---------------------------------------------------------------------------------------------

###### Define - All relevant column names to lower case and put underscore (_) as word separator.

###### Code

In [316]:
df.columns.values

array(['RANK', 'COUNTRY', 'PLAYER', 'CHANGE +/-', 'WIN - LOSE',
       'prize_money_usd', 'POINTS / TOURNAMENTS', 'win', 'loss', 'points',
       'tournaments'], dtype=object)

In [317]:
df=df.rename(columns={'RANK':'rank', 'COUNTRY':'country', 'PLAYER':'player', 'CHANGE +/-':'change_+_-'})

###### Test

In [318]:
df.columns.values

array(['rank', 'country', 'player', 'change_+_-', 'WIN - LOSE',
       'prize_money_usd', 'POINTS / TOURNAMENTS', 'win', 'loss', 'points',
       'tournaments'], dtype=object)

###### ---------------------------------------------------------------------------------------------

Columns identified in earlier step, run value_counts on them so as to get sense of outliers

In [None]:
df_name.column_name.value_counts()

Plot histograms of the dataframes so as to identify general distribution of features  

In [None]:
df_name.hist()

Plot scatterplot of the dataframes so as to identify correlations amongst several variables. Through this we will start to get sense of which features could be of use to us for further analysis.

In [None]:
pd.plotting.scatter_matrix(df_name)

###### List of issues you identiefied using visual assessment

- Issue 1
- Issue 2
- Issue 3
- Issue 4

nan value detection

Following code fragments can be run to identify presence of NaN Null in dataframe

In [None]:
df_name.isnull()

Following command will tell us columns that have atleast 1 NaN value in them

In [None]:
df_name.isnull().any(axis=0)

Following command will tell us rows that have atleast 1 NaN value in them

In [None]:
df_name.isnull().any(axis=1)

checking duplicate value/s 

In [None]:
sum(df_name.duplicated())

checking and making note of incorrect datatype/s. Prime examples to look for are date column in string datatype, unit mentioned in numeric value column.

In [None]:
df_name.info()

###### List of issues you identiefied using programmatic assessment

- Issue 1
- Issue 2
- Issue 3
- Issue 4

> **Tip**: You should _not_ perform too many operations in each cell. Create cells freely to explore your data. One option that you can take with this project is to do a lot of explorations in an initial notebook. These don't have to be organized, but make sure you use enough comments to understand the purpose of each code cell. Then, after you're done with your analysis, create a duplicate notebook where you will trim the excess and organize your steps so that you have a flowing, cohesive report.

> **Tip**: Make sure that you keep your reader informed on the steps that you are taking in your investigation. Follow every code cell, or every set of related code cells, with a markdown cell to describe to the reader what was found in the preceding cell(s). Try to make it so that the reader can then understand what they will be seeing in the following cell(s).

### Data Cleaning (Replace this with more specific notes!)

In [None]:
# After discussing the structure of the data and any problems that need to be
#   cleaned, perform those cleaning steps in the second part of this section.


<a id='eda'></a>
## Exploratory Data Analysis

> **Tip**: Now that you've trimmed and cleaned your data, you're ready to move on to exploration. Compute statistics and create visualizations with the goal of addressing the research questions that you posed in the Introduction section. It is recommended that you be systematic with your approach. Look at one variable at a time, and then follow it up by looking at relationships between variables.

#### Research Question 1 (Who it the most busy female singles player - who has played most tournaments in each year?)

In [None]:
# Use this, and more code cells, to explore your data. Don't forget to add
#   Markdown cells to document your observations and findings.


#### Research Question 2  List of female singles players who made most prize money in each year

In [None]:
# Continue to explore the data to address your additional research
#   questions. Add more headers as needed if you have more questions to
#   investigate.


#### Research Question 3  List of female singles players who have most change in their ranking for each year. output should contain respective change in ranking as well.

In [None]:
# Continue to explore the data to address your additional research
#   questions. Add more headers as needed if you have more questions to
#   investigate.


#### Research Question 4 -  List of female singles players who have most change in their ranking for each year. output should contain respective change in ranking as well.

In [None]:
# Continue to explore the data to address your additional research
#   questions. Add more headers as needed if you have more questions to
#   investigate.


#### Research Question 5 -  Female singles players who is a kick addict queen - someone who has least difference in win - loss points

In [None]:
# Continue to explore the data to address your additional research
#   questions. Add more headers as needed if you have more questions to
#   investigate.


#### Research Question 6 -  Female singles players who is a kick dominator queen - someone who has highest difference in win - loss points

In [None]:
# Continue to explore the data to address your additional research
#   questions. Add more headers as needed if you have more questions to
#   investigate.


<a id='conclusions'></a>
## Conclusions

> **Tip**: Finally, summarize your findings and the results that have been performed. Make sure that you are clear with regards to the limitations of your exploration. If you haven't done any statistical tests, do not imply any statistical conclusions. And make sure you avoid implying causation from correlation!

> **Tip**: Once you are satisfied with your work here, check over your report to make sure that it is satisfies all the areas of the rubric (found on the project submission page at the end of the lesson). You should also probably remove all of the "Tips" like this one so that the presentation is as polished as possible.

## Submitting your Project 

> Before you submit your project, you need to create a .html or .pdf version of this notebook in the workspace here. To do that, run the code cell below. If it worked correctly, you should get a return code of 0, and you should see the generated .html file in the workspace directory (click on the orange Jupyter icon in the upper left).

> Alternatively, you can download this report as .html via the **File** > **Download as** submenu, and then manually upload it into the workspace directory by clicking on the orange Jupyter icon in the upper left, then using the Upload button.

> Once you've done this, you can submit your project by clicking on the "Submit Project" button in the lower right here. This will create and submit a zip file with this .ipynb doc and the .html or .pdf version you created. Congratulations!

In [None]:
from subprocess import call
call(['python', '-m', 'nbconvert', 'Investigate_a_Dataset.ipynb'])