## Pandas Read CSV Tutorial
Here's the read_csv examples for the YouTube Video (link [here](https://youtu.be/piCU_gxSF7I)).

In [1]:
import pandas as pd

url_csv = 'http://vincentarelbundock.github.io/Rdatasets/csv/carData/MplsStops.csv'

df = pd.read_csv(url_csv)

df.head()

Unnamed: 0.1,Unnamed: 0,idNum,date,problem,MDC,citationIssued,personSearch,vehicleSearch,preRace,race,gender,lat,long,policePrecinct,neighborhood
0,6823,17-000003,2017-01-01 00:00:42,suspicious,MDC,,NO,NO,Unknown,Unknown,Unknown,44.966617,-93.246458,1,Cedar Riverside
1,6824,17-000007,2017-01-01 00:03:07,suspicious,MDC,,NO,NO,Unknown,Unknown,Male,44.98045,-93.27134,1,Downtown West
2,6825,17-000073,2017-01-01 00:23:15,traffic,MDC,,NO,NO,Unknown,White,Female,44.94835,-93.27538,5,Whittier
3,6826,17-000092,2017-01-01 00:33:48,suspicious,MDC,,NO,NO,Unknown,East African,Male,44.94836,-93.28135,5,Whittier
4,6827,17-000098,2017-01-01 00:37:58,traffic,MDC,,NO,NO,Unknown,White,Female,44.979078,-93.262076,1,Downtown West


## Pandas Read CSV Setting Index Column


In [2]:
import pandas as pd

url_csv = 'http://vincentarelbundock.github.io/Rdatasets/csv/carData/MplsStops.csv'

df = pd.read_csv(url_csv, index_col=0)

df.head()

Unnamed: 0,idNum,date,problem,MDC,citationIssued,personSearch,vehicleSearch,preRace,race,gender,lat,long,policePrecinct,neighborhood
6823,17-000003,2017-01-01 00:00:42,suspicious,MDC,,NO,NO,Unknown,Unknown,Unknown,44.966617,-93.246458,1,Cedar Riverside
6824,17-000007,2017-01-01 00:03:07,suspicious,MDC,,NO,NO,Unknown,Unknown,Male,44.98045,-93.27134,1,Downtown West
6825,17-000073,2017-01-01 00:23:15,traffic,MDC,,NO,NO,Unknown,White,Female,44.94835,-93.27538,5,Whittier
6826,17-000092,2017-01-01 00:33:48,suspicious,MDC,,NO,NO,Unknown,East African,Male,44.94836,-93.28135,5,Whittier
6827,17-000098,2017-01-01 00:37:58,traffic,MDC,,NO,NO,Unknown,White,Female,44.979078,-93.262076,1,Downtown West


## Pandas Reading Certain Columns
First we use Pandas usecols with a list of integers

In [7]:
import pandas as pd

url_csv = 'http://vincentarelbundock.github.io/Rdatasets/csv/carData/MplsStops.csv'

cols = [0, 1, 2, 3, 4]
df = pd.read_csv(url_csv, index_col=0, usecols=cols)

df.head()

Unnamed: 0,idNum,date,problem,MDC
6823,17-000003,2017-01-01 00:00:42,suspicious,MDC
6824,17-000007,2017-01-01 00:03:07,suspicious,MDC
6825,17-000073,2017-01-01 00:23:15,traffic,MDC
6826,17-000092,2017-01-01 00:33:48,suspicious,MDC
6827,17-000098,2017-01-01 00:37:58,traffic,MDC


Then we use Pandas usecols with a list of strings

In [10]:
import pandas as pd

url_csv = 'http://vincentarelbundock.github.io/Rdatasets/csv/carData/MplsStops.csv'

cols = ['idNum', 'date', 'problem', 'MDC', 'personSearch']

df = pd.read_csv(url_csv, index_col='idNum', usecols=cols)

df.head()

Unnamed: 0_level_0,date,problem,MDC,personSearch
idNum,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
17-000003,2017-01-01 00:00:42,suspicious,MDC,NO
17-000007,2017-01-01 00:03:07,suspicious,MDC,NO
17-000073,2017-01-01 00:23:15,traffic,MDC,NO
17-000092,2017-01-01 00:33:48,suspicious,MDC,NO
17-000098,2017-01-01 00:37:58,traffic,MDC,NO


In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 51920 entries, 17-000003 to 17-491482
Data columns (total 4 columns):
date            51920 non-null object
problem         51920 non-null object
MDC             51920 non-null object
personSearch    43699 non-null object
dtypes: object(4)
memory usage: 2.0+ MB


## Remove Unnamed Columns from Pandas Dataframe

In [14]:
csv_url = 'http://vincentarelbundock.github.io/Rdatasets/csv/carData/MplsStops.csv'
cols = pd.read_csv(csv_url, nrows=1).columns # Read first rows using nrows and take the columns
cols

Index(['Unnamed: 0', 'idNum', 'date', 'problem', 'MDC', 'citationIssued',
       'personSearch', 'vehicleSearch', 'preRace', 'race', 'gender', 'lat',
       'long', 'policePrecinct', 'neighborhood'],
      dtype='object')

In [15]:
df = pd.read_csv(csv_url, usecols=cols[1:]) # Take col 2 and forward
df.iloc[:, 0:6].head()

Unnamed: 0,idNum,date,problem,MDC,citationIssued,personSearch
0,17-000003,2017-01-01 00:00:42,suspicious,MDC,,NO
1,17-000007,2017-01-01 00:03:07,suspicious,MDC,,NO
2,17-000073,2017-01-01 00:23:15,traffic,MDC,,NO
3,17-000092,2017-01-01 00:33:48,suspicious,MDC,,NO
4,17-000098,2017-01-01 00:37:58,traffic,MDC,,NO


In [16]:
df.drop(df.columns[df.columns.str.contains('unnamed', case=False)],
          axis=1, inplace=True)

# The following line will give us the same result as the line above
# df = df.loc[:, ~df.columns.str.contains('unnamed', case=False)]

df.iloc[:, 0:7].head()

Unnamed: 0,idNum,date,problem,MDC,citationIssued,personSearch,vehicleSearch
0,17-000003,2017-01-01 00:00:42,suspicious,MDC,,NO,NO
1,17-000007,2017-01-01 00:03:07,suspicious,MDC,,NO,NO
2,17-000073,2017-01-01 00:23:15,traffic,MDC,,NO,NO
3,17-000092,2017-01-01 00:33:48,suspicious,MDC,,NO,NO
4,17-000098,2017-01-01 00:37:58,traffic,MDC,,NO,NO


## Skipping rows using Pandas read_csv skiprows and header
Now we will skip rows using the parameter skiprows. When we have done that we will use the parameter header to tell the parser where the header row is. In this case this is the same as skipping the rows.

In [18]:
import pandas as pd

csv_file = 'Simdata/skiprow.csv'

df = pd.read_csv(csv_file, index_col=0, skiprows=3)

df.head()

Unnamed: 0,ID,Name,Day,Age,Response,Gender
0,1,John,Sixth,23,0.5627187141917855,0
1,2,Billie,Sixth,22,Not Available,0
2,3,Robert,Sixth,20,Not Available,0
3,4,Don,Sixth,27,0.5229607643652396,0
4,5,Joseph,Sixth,21,Not Available,0


In [None]:
import pandas as pd

csv_file = 'Simdata/skiprow.csv'

df = pd.read_csv(csv_file, index_col=0, header=3)

df.head()

## Missing Values Coded Differently than 'Usual'
Now we are going to handle missing values that are coded in a way that the parser cannot understand. For instance, if it was coded "Not Available" we have to tell Pandas this.

In [19]:
import pandas as pd

csv_file = 'Simdata/skiprow.csv'

df = pd.read_csv(csv_file, index_col=0, header=3, na_values='Not Available')

df.head()

Unnamed: 0,ID,Name,Day,Age,Response,Gender
0,1,John,Sixth,23,0.562719,0
1,2,Billie,Sixth,22,,0
2,3,Robert,Sixth,20,,0
3,4,Don,Sixth,27,0.522961,0
4,5,Joseph,Sixth,21,,0
