# Pandas DataFrame IO

In [None]:
from pandas import DataFrame
from pandas import Series
import pandas as pd
from numpy.random import randint
import numpy as np

## Reading CSV Files

- The next cell has `data1.csv`

data1.csv

`a,b,c,d,sleep
1,2,3,4,hello
5,6,7,8,world
9,10,11,12,uga`


### Taking All of the Defaults

In [None]:
df1 = pd.read_csv('./data1.csv')
print(df1)

- **NOTICE:** The first line of the file was taken as the column names

#### Naming the Column Names with `names`

In [None]:
names = ['A', 'B', 'C', 'D', 'WORD']
df2 = pd.read_csv('./data1.csv', names = names)
print(df2)

### Taking the Default Column Names

In [None]:
df3 = pd.read_csv('./data1.csv', header = None)
print(df3)

### Column Names as Row in CSV File

- Execute the next cell to see the file

data2.csv

`A,B,C,D,WORD
1,2,3,4,sleep
5,6,7,8,hello
9,10,11,12,world
13,14,15,16,uga`


- Execute the next cell to see the header in row 0

In [None]:
df4 = pd.read_csv('./data2.csv', header = 0)
print(df4)

### Custom Index from File

- Execute the next cell to see the column (header) and index (index_col) data

data3.csv

`,A,B,C,D,WORD 
one,a,b,c,d,sleep 
two,1,2,3,4,hello 
three,5,6,7,8,world 
four,9,10,11,12,uga` 


- **NOTICE:** The comma on line 3. This is because there is no index number for the column headings.

In [None]:
df5 = pd.read_csv('./data3.csv',header = 0, index_col = 0)
print(df5)

### Multiple Delimiters

data4.csv

`
A!B&C    D,WORD
1!2&3    4,sleep
5!6&7    8,hello
9!10&11  12,uga
`

In [None]:
df6 = pd.read_csv('./data4.csv', sep = r'!|,|&|\s+')
print(df6)

#### Turning Warning On and Off

```
import warnings

def fxn():
    warnings.warn("deprecated", DeprecationWarning)

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    fxn()
```

</br>
</br>

- You could just surpress all warnings with this:</br>

```
import warnings
warnings.filterwarnings("ignore")
```


### Reading In Set Number of Rows

- Execute the next cell to create "large" CSV file

In [None]:
df_large = DataFrame(randint(low = 1, high = 11, size = (1000, 5)),
                        columns = ['a', 'b', 'c', 'd', 'e'])
print(df_large.head())

df_large.to_csv('./data5.csv')

In [None]:
df_10_rows = pd.read_csv('./data5.csv', nrows = 10, index_col = 0, header = 0)
print(df_10_rows)

### Reading CSV by Chunks

In [None]:
totals = Series([])
chunker = pd.read_csv('./data5.csv', chunksize = 10)
for chunk in chunker:
    totals = totals.add(chunk['a'].value_counts(), fill_value = 0)
    print(totals.sort_index())
    return_value = input('Next? ')
    if return_value == '': break

- After two or three iterations break out, just tap the `Enter` key for the `Next?` question

## Write to CSV File

In [None]:
df_10_rows.to_csv('10_rows.csv')

In [None]:
!cat 10_rows.csv

- **NOTICE:** The default is to write out the columns and index 

## Reader and Writer Functions

![read_write](img/read_write.png) 


## Do Now!

1. Read in the local file, `data6.csv`. Row 0 has the column names and column 0 has the index names. The delimiter is a tab. 
2. Print out the data.

In [None]:
# Place your answer here



One possible solution is:

```
df7 = pd.read_csv('./data6.csv', sep = '\t', index_col = 0)
print(df7)
```

# End of Notebook