# Read & Write Data Files

In [None]:
# Import Pandas
import pandas as pd

<br>

## Note:
#### In this Notebook we will use the following Datafiles:
* iris.csv
* boston_houses.tsv
* wine.csv
* u.user
* titanic.xls
* pokemon.csv

#### Make sure to place them in the same folder as this Notebook. Otherwise, you will have to adjust the paths accordingly.

---
# Read Files (the Pandas way)

## Load Data from Text files
__Two common text file formats are:__
* __CSV:__ Comma-Separated Values
* __TSV:__ Tab-Separated Values

<br>

__Common Pandas methods to create DataFrames from text files:__
* `read_csv()`: Defaults to reading CSVs
* `read_table()`: Defaults to reading TSVs

__Note:__ They are wrappers of the same underline funtion, so they can be used interchangeably by setting the right parameters.

### 1. Load data from a CSV file

We will use the __Iris__ dataset (`iris.csv`). First, let's see how the file looks:

In [None]:
less iris.csv

In [None]:
# Path to the Iris data file
iris_fp = 'iris.csv'

In [None]:
# Let's load the Iris dataset into a Pandas DataFrame



In [None]:
# Now let's use .read_table() to load iris.csv



### 2. Load data from a TSV file

We will use the __Boston Housing__ dataset (`boston_houses.tsv`). First, let's see how it looks:

In [None]:
less boston_houses.tsv

In [None]:
# Path to the Boston data file
boston_fp = 'boston_houses.tsv'

In [None]:
# Load the boston_houses.tsv - using .read_table()



In [None]:
# Load the boston_houses.tsv - using .read_csv()



<br>

### 3. Load files with misleading extension

File extensions (__.csv__, __.tsv__, __.txt__, etc.) are just a convention.

<br>

__Disclaimer!__ Not everyone will follow the conventions!

<br><br>

We will use the __Wine__ dataset (`wine.csv`).

First, let's see how it looks:

In [None]:
less wine.csv

In [None]:
# Path to the wine data file
wine_fp = "wine.csv"

In [None]:
# Load the wine file using .read_csv



### 4. Load text file with unknown extension

* Files can have any extension (yes! you can create your own)
* Files can lack extension (Windows do not like that)


__Example:__ Load a text file with a non-conventional extension. Let's see it first:

In [None]:
less datasets/u.user

In [None]:
# Path of the u.user file
user_fp = 'u.user'

In [None]:
# Load the file with .read_csv()



## Load Data from Excel files

__Python Method:__ `.read_excel()`

__Supported Formats:__ __xls, xlsx, xlsm, xlsb, odf, ods__ and __odt__

<br>

__Note:__ If you get an error (like the one below), you might be missing some dependencies.

__Error Message:__ `ImportError: Missing optional dependency 'xlrd'. Install xlrd >= 1.0.0 for Excel support ...`

In [None]:
# Path to the Excel file
dataset = 'titanic.xls'

# Load data from the Excel file



---
# Slightly more advanced topics



# Write CSV Data

Many times when you manipulate your DataFrames, you would want to save your progress in a file.

__Python Method:__ `.to_csv()`

In [None]:
# Remember Iris dataset
iris

In [None]:
# Let's write it into a new file - with and w/o index



---
# Working with large amounts of data


__At some point you will deal with a file bigger than your RAM.__

* Imagine you get a __20GB__ data file.
* Imagine your computer's RAM is __8GB__.

<br>

Now what?

<br>

No need to ask for a new computer just yet!

<br>

We will load the data in __chunks__.

In [None]:
# Load the wine dataset in chunks (using a for-loop and chunksize)



## Other functions when reading data

There are way too many options with `.read_csv()`

In [None]:
# Have a look in the documentation
pd.read_csv?

In [None]:
# Medium difficulty: header, skiprows, nrows, index_col

pd.read_csv('pokemon.csv', skiprows=5)