# Data Loading

Accessing data is a necessary first step for using most of the tools in this book. I’m going to be focused on data input and output using pandas, though there are numerous tools in other libraries to help with reading and writing data in various formats.

In this lecture, we only cover how to import CSV Files and Excel files into Python using Pandas.

# Table of Contents

- 1.1  **[Reading CSV Files](#CSV)**
- 1.2  **[Writing Data to CVS file](#Writing)**
- 1.3  **[Reading Microsoft Excel Files](#excel)**

In [1]:
import pandas as pd

<a id="CVS"></a>
## 1.1 Reading CSV Files

use `read_csv` to read it into a DataFrame:

In [18]:
df = pd.read_csv('examples.csv')

In [5]:
# Read the CSV file data from a specified file path into a defined dataframe variable

df = pd.read_csv('/Users/nkohei/Workspace/McDaniel-Repository/522/week4/examples data/examples.csv', index_col=0)

In [6]:
df

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


We could also have used `read_table` and specified the delimiter:

In [8]:
pd.read_table('/Users/nkohei/Workspace/McDaniel-Repository/522/week4/examples data/examples.csv', sep=',', index_col=0)

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


A file will not always have a header row. Consider this file:

To read this file, you have a couple of options. You can allow pandas to assign default column names, or you can specify names yourself:

In [21]:
pd.read_csv('examples2.csv', header=None)

Unnamed: 0,0,1,2,3,4
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


In [22]:
pd.read_csv('examples2.csv', names=['a', 'b', 'c', 'd', 'message'])

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


Suppose you wanted the `message` column to be the index of the returned DataFrame. You can either indicate you want the column at index 4 or named `'message'` using the `index_col` argument:

In [23]:
names = ['a', 'b', 'c', 'd', 'message']

In [24]:
pd.read_csv('examples2.csv', names=names, index_col='message')

Unnamed: 0_level_0,a,b,c,d
message,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
hello,1,2,3,4
world,5,6,7,8
foo,9,10,11,12


In the event that you want to form a hierarchical index from multiple columns, pass a list of column numbers or names:

In [26]:
parsed = pd.read_csv('examples3.csv',
                     index_col=['key1', 'key2'])

In [27]:
parsed

Unnamed: 0_level_0,Unnamed: 1_level_0,value1,value2
key1,key2,Unnamed: 2_level_1,Unnamed: 3_level_1
one,a,1,2
one,b,3,4
one,c,5,6
one,d,7,8
two,a,9,10
two,b,11,12
two,c,13,14
two,d,15,16


<a id="Writing"></a>
## 1.2 Writing Data to CVS file

Data can also be exported to a delimited format. Let’s consider one of the CSV files read before:

In [31]:
df = pd.read_csv('examples.csv')

In [32]:
df

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


In [36]:
df2 = df[:2] 

Using DataFrame’s `to_csv` method, we can write the data out to a comma-separated file:

In [37]:
df2

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world


In [38]:
df2.to_csv('examples5.csv')

<a id="excel"></a>
## 1.3 Reading Microsoft Excel Files

In [29]:
xlsx = pd.ExcelFile('examples.xlsx')


Data stored in a sheet can then be read into DataFrame:

In [39]:
pd.read_excel(xlsx, 'Sheet1')

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


If you are reading multiple sheets in a file, then it is faster to create the `ExcelFile`, but you can also simply pass the filename to `pandas.read_excel`:

In [41]:
frame = pd.read_excel('examples.xlsx', 'Sheet1')

In [42]:
frame

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo
