# Read and write CSV files with pandas DataFrames

You can load data from a CSV file directly into a pandas DataFrame

In [1]:
import pandas as pd

## Reading a CSV file into a pandas DataFrame
**read_csv** allows you to read the contents of a csv file into a DataFrame

airports.csv contains the following:  

Name,City,Country  
Seattle-Tacoma,Seattle,USA  
Dulles,Washington,USA  
Heathrow,London,United Kingdom  
Schiphol,Amsterdam,Netherlands  
Changi,Singapore,Singapore  
Pearson,Toronto,Canada  
Narita,Tokyo,Japan

In [2]:
airports_df = pd.read_csv('Data/airports.csv')
airports_df

Unnamed: 0,Name,City,Country
0,Seattle-Tacoma,Seattle,USA
1,Dulles,Washington,USA
2,Heathrow,London,United Kingdom
3,Schiphol,Amsterdam,Netherlands
4,Changi,Singapore,Singapore
5,Pearson,Toronto,Canada
6,Narita,Tokyo,Japan


## Handling rows with errors
By default rows with an extra , or other issues cause an error

Note the extra , in the row for Heathrow London in airportsInvalidRows.csv:  

Name,City,Country  
Seattle-Tacoma,Seattle,USA  
Dulles,Washington,USA  
Heathrow,London,,United Kingdom  
Schiphol,Amsterdam,Netherlands  
Changi,Singapore,Singapore  
Pearson,Toronto,Canada  
Narita,Tokyo,Japan  

In [3]:
airports_df = pd.read_csv('Data/airportsInvalidRows.csv')
airports_df

ParserError: Error tokenizing data. C error: Expected 3 fields in line 4, saw 4


Specify **error_bad_lines=False** to skip any rows with errors

In [4]:
airports_df = pd.read_csv(
    'Data/airportsInvalidRows.csv', 
    error_bad_lines=False
)
airports_df

b'Skipping line 4: expected 3 fields, saw 4\n'


Unnamed: 0,Name,City,Country
0,Seattle-Tacoma,Seattle,USA
1,Dulles,Washington,USA
2,Schiphol,Amsterdam,Netherlands
3,Changi,Singapore,Singapore
4,Pearson,Toronto,Canada
5,Narita,Tokyo,Japan


## Handling files which do not contain column headers
If your file does not have the column headers in the first row by default, the first row of data is treated as headers

airportsNoHeaderRows.csv contains airport data but does not have a row specifying the column headers:

Seattle-Tacoma,Seattle,USA  
Dulles,Washington,USA  
Heathrow,London,United Kingdom  
Schiphol,Amsterdam,Netherlands  
Changi,Singapore,Singapore  
Pearson,Toronto,Canada  
Narita,Tokyo,Japan  

In [5]:
airports_df = pd.read_csv('Data/airportsNoHeaderRows.csv')
airports_df

Unnamed: 0,Seattle-Tacoma,Seattle,USA
0,Dulles,Washington,USA
1,Heathrow,London,United Kingdom
2,Schiphol,Amsterdam,Netherlands
3,Changi,Singapore,Singapore
4,Pearson,Toronto,Canada
5,Narita,Tokyo,Japan


Specify **header=None** if you do not have a Header row to avoid having the first row of data treated as a header row

In [6]:
airports_df = pd.read_csv(
    'Data/airportsNoHeaderRows.csv', 
    header=None
)
airports_df

Unnamed: 0,0,1,2
0,Seattle-Tacoma,Seattle,USA
1,Dulles,Washington,USA
2,Heathrow,London,United Kingdom
3,Schiphol,Amsterdam,Netherlands
4,Changi,Singapore,Singapore
5,Pearson,Toronto,Canada
6,Narita,Tokyo,Japan


If you do not have a header row you can use the **names** parameter to specify column names when data is loaded

In [7]:
airports_df = pd.read_csv(
    'Data/airportsNoHeaderRows.csv', 
    header=None, 
    names=['Name', 'City', 'Country']
)
airports_df

Unnamed: 0,Name,City,Country
0,Seattle-Tacoma,Seattle,USA
1,Dulles,Washington,USA
2,Heathrow,London,United Kingdom
3,Schiphol,Amsterdam,Netherlands
4,Changi,Singapore,Singapore
5,Pearson,Toronto,Canada
6,Narita,Tokyo,Japan


## Missing values in Data files
Missing values appear in DataFrames as **NaN**

There is no city listed for Schiphol airport in airportsBlankValues.csv :

Name,City,Country  
Seattle-Tacoma,Seattle,USA  
Dulles,Washington,USA  
Heathrow,London,United Kingdom  
Schiphol,,Netherlands  
Changi,Singapore,Singapore  
Pearson,Toronto,Canada  
Narita,Tokyo,Japan

In [8]:
airports_df = pd.read_csv('Data/airportsBlankValues.csv')
airports_df

Unnamed: 0,Name,City,Country
0,Seattle-Tacoma,Seattle,USA
1,Dulles,Washington,USA
2,Heathrow,London,United Kingdom
3,Schiphol,,Netherlands
4,Changi,Singapore,Singapore
5,Pearson,Toronto,Canada
6,Narita,Tokyo,Japan


## Writing DataFrame contents to a CSV file
**to_csv** will write the contents of a pandas DataFrame to a CSV file

In [9]:
airports_df

Unnamed: 0,Name,City,Country
0,Seattle-Tacoma,Seattle,USA
1,Dulles,Washington,USA
2,Heathrow,London,United Kingdom
3,Schiphol,,Netherlands
4,Changi,Singapore,Singapore
5,Pearson,Toronto,Canada
6,Narita,Tokyo,Japan


In [10]:
airports_df.to_csv('Data/MyNewCSVFile.csv')

The index column is written to the csv file

Specify **index=False** if you do not want the index column to be included in the csv file

In [11]:
airports_df.to_csv(
    'Data/MyNewCSVFileNoIndex.csv', 
    index=False
)