# Importing and Exporting Data


## Introduction 



No analytics tool operates in a vacuum. Most of the time, the systems that generate the data are not the ones where analysis of that data is conducted. Because of this, we must have a way to obtain data from external data sources, load it into Pandas, and also be able to export data or our results for further use or presentation. In this lesson, we will cover ways to import data from, and export data to, a variety of formats and destinations using Pandas.

## Preparation



**Important**: Before we begin, download vehicles.zip which contains a bunch of data files in different file formats. Extract the content to your machine. You should see the following extracted files:

- vehicles_messy.csv
- vehicles_pipe.txt
- vehicles_tab.txt
- vehicles.csv
- vehicles.json
- vehicles.xlsx

## Importing and Exporting Delimited Files

One of the most common places where data originates are **delimited files**. Most analytics applications have the ability to read and process delimited files, so they are a popular way to pass information from one system to another. 

There a few common file formats you are likely to see out in the real world.

- Comma-separated variable (CSV) files;
- Tab-delimited files;
- Pipe-delimited files.



Pandas provides us with the ability to import any of these using the read_csv method. For files delimited with characters other than commas, we just need to specify the type of delimiter via the method's *sep parameter* so that Pandas knows how it should separate the values.

In [1]:
import pandas as pd 

In [9]:
# import 'vehicles/vehicles.csv'
data = pd.read_csv('vehicles/vehicles.csv')

In [10]:
data.head()

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.4375,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.4375,2550


In [18]:
# read 'vehicles/vehicles_tab.txt'
data = pd.read_csv('vehicles/vehicles_tab.txt', sep ='\t')

In [19]:
data.head()  

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.4375,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.4375,2550


In [23]:
# 'vehicles/vehicles_pipe.txt'
data = pd.read_csv('vehicles/vehicles_pipe.txt', sep = '|')

In [24]:
data.head() 

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.4375,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.4375,2550


**Exporting** data as delimited files is just as easy. Instead of using the *read*_csv method, you use *to*_csv.

In [28]:
# export data to csv'
data.to_csv('cars.csv', index = True)

In [29]:
pd.read_csv('cars.csv').head() 

Unnamed: 0.1,Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.4375,2100
3,3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.4375,2550


In [34]:
# export 'vehicles/vehicles_tab.txt'
data.to_csv('vehicles/vehicles_tab.txt', sep = '\t', index = False)

In [33]:
# export 'vehicles/vehicles_pipe.txt'
data.to_csv('vehicles/vehicles_pipe_export_TEST.txt', sep = '|', index = False)

Note that we set the index parameter to False. If we did not do that, **it would export the data frame with an extra column containing its indexes.** Since the indexes have no meaning to us in this case, we are going to exclude them from our export.

In [35]:
pd.read_csv('cars.csv').head() 

Unnamed: 0.1,Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.4375,2100
3,3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.4375,2550


In [36]:
data.to_csv('cars.csv', index = True)
pd.read_csv('cars.csv').head()  

Unnamed: 0.1,Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.4375,2100
3,3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.4375,2550


## Importing and Exporting Excel



We can also import and export Microsoft Excel spreadsheets with Pandas. The way to do this is similar to how we imported and exported delimited files, but instead of read_csv and to_csv, we will use the **read_excel** and **to_excel** methods.

In [38]:
# read 'vehicles/vehicles.xlsx'
data = pd.read_excel('vehicles/vehicles.xlsx')
data.head()

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1984,4.2,6,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.4375,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,1987,3.8,6,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.4375,2550


In [39]:
# write 'vehicles/vehicles.xlsx' to excel 
data.to_excel('vehicles/vehicles.xlsx', index = False)

## Importing and Exporting JSON



Another common format for importing and exporting data is JSON. JSON stands for **Javascript Object Notation**, and it allows you to format data in intuitive ways so that it can be easily read and processed. We can use Pandas to read and write JSON files as follows.

In [40]:
# read 'vehicles/vehicles.json' 
data = pd.read_json('vehicles/vehicles.json')
data.head()

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1984,4.2,6,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.4375,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,1987,3.8,6,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.4375,2550


In [41]:
# write 'vehicles/vehicles.json' with orient 'records'
data.to_json('vehicles/vehicles_TEST.json', orient = 'records') 

Note that we set the orient parameter to 'records' in our code examples above. We did this because our JSON file was structured **as a list of dictionaries *where each dictionary represented a complete record of data*.** When working with JSON files in Pandas, the way the data is organized is going to dictate the value you pass to the orient parameter. 

Below are a few other common ways that JSON files can be structured and the corresponding value you should pass to the orient parameter for each one.

- 'split': Dictionary containing indexes, columns, and data.
- 'index': Nested dictionaries containing {index:{column:value}}.
- 'columns': Nested dictionaries containing {column:{index:value}}.
- 'values': Nested list where each sublist contains the values for a record.
- 'table': Nested dictionaries containing schema and data (records) like {‘schema’: {schema}, ‘data’: {data}}.

Challenge: Try exporting the data passing each of these values to the orient parameter. Open each of the files in a text editor and note the differences in structure.

## Reading Data From Databases



In addition to reading data from various types of files, Pandas also provides us with the ability to read data from MySQL databases. To do so, we need to import the **pymysql library** and the **create_engine function** from the **sqlalchemy library**.

In [42]:
#!pip install pymysql
#!pip install sqlalchemy 



In [44]:
import pymysql
from sqlalchemy import create_engine 

We must then call the **create_engine function** and pass it the string below, replacing username and password with the actual username and password for the MySQL database on your local machine. We will assign the result to a variable called engine.

In [45]:
#'mysql+pymysql://[USER]:[PASSWORD]@localhost/[NAME DB]'
engine = create_engine('mysql+pymysql://root:1234@localhost/publications')

From there, we can use the Pandas **read_sql_query function**, pass it a SQL statement, and specify that it is to run that statement on the engine connection we created to our MySQL database. 

In the example below, we are querying all records from the employee table in our publications database.

In [49]:
# make a query ('SELECT * FROM publications.employee') using the engine create above 
new_table = pd.read_sql_query('SELECT * FROM publications.employee WHERE pub_id < 2000', engine)
new_table.head()

Unnamed: 0,emp_id,fname,minit,lname,job_id,job_lvl,pub_id,hire_date
0,A-C71970F,Aria,,Cruz,10,87,1389,1991-10-26
1,ARD36773F,Anabela,R,Domingues,8,100,877,1993-01-27
2,CGS88322F,Carine,G,Schmitt,13,64,1389,1992-07-07
3,DBT39435M,Daniel,B,Tonini,11,75,877,1990-01-01
4,DWR65030M,Diego,W,Roel,6,192,1389,1991-12-16


In [47]:
pd.read_sql_query('SELECT * FROM publishers', engine) 

Unnamed: 0,pub_id,pub_name,city,state,country
0,736,New Moon Books,Boston,MA,USA
1,877,Binnet & Hardley,Washington,DC,USA
2,1389,Algodata Infosystems,Berkeley,CA,USA
3,1622,Five Lakes Publishing,Chicago,IL,USA
4,1756,Ramona Publishers,Dallas,TX,USA
5,9901,GGG&G,MÂnchen,,Germany
6,9952,Scootney Books,New York,NY,USA
7,9999,Lucerne Publishing,Paris,,France


In [48]:
pd.read_sql_query('SHOW tables', engine)

Unnamed: 0,Tables_in_publications
0,authors
1,discounts
2,employee
3,jobs
4,pub_info
5,publishers
6,python_query
7,roysched
8,sales
9,stores


## Writing Data to Databases



Once you have data in a data frame and you have your MySQL database connections saved to the engine variable, *writing* the data to a table in the database is pretty straightforward. You can use Pandas' **to_sql method** and specify the table name you want to give the data set, the database connection, what you want to happen if the table already exists (replace, append, fail, etc.) and whether you want to include or exclude the indexes.

In [50]:
# write data to sql  
new_table.to_sql('employee2', engine)

If you refresh the publications database, you should now see a table named "employee2."

## Summary



In this lesson, we covered multiple ways to import data into and export data out of Pandas. First, we learned how to read and write delimited files (csv, tab-delimited, and pipe-delimited). Then we learned how to read and write Excel and JSON files, and finished up the lesson with reading and writing to MySQL databases with the help of the pymysql and sqlalchemy libraries.