##  Flat Files
**Flat files** are basic text files that contain records or table data without any structured relationships.

Column names are known as **attributes** or **features**.

Reading a textfile:
```
filename = 'huck_finn.txt'
file = open(`filename`, mode='r') #r is to read
text = file.read()
file.close()
```

Mode can be **r** for read and **w** for write.

If you want to avoid having to explicitly state the file.close(), use the following code:
```
with open('huck_finn.txt', 'r') as file:
    print(file.read())
```

Methods for file:
* **.open(`filename`, mode=`mode`)** - opens file
* **.read()** - reads entire file
* **.readline()** - reads a single line of the file
* **.close()** - closes file
* **.closed()** - checks if file is closed

Before opening a flat file, check if a header exists.

## Numpy for flatfiles

For basic imports use **np.loadtxt(`filename`, delimiter=`delimiter`, type=`datatype`)**

Delimiters: **','** for command and **'\t'** for tab

In [4]:
import numpy as np

filename = 'MNIST.csv'
data = np.loadtxt(filename, delimiter=',')
print (data)

[[1. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [1. 0. 0. ... 0. 0. 0.]
 ...
 [2. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [5. 0. 0. ... 0. 0. 0.]]


#### np.genfromtxt(`file`)
Used with different datatypes.  The first parameter takes filename.
Parameters:
* delimiter 
* names - True or False specifying header
* dtype - if set to None, will automatically determine

You can access columns of the data using data[i] or data[`column name`].

#### np.recfromcsv(`file`)
This is np.genfromtxt(`file`) with default delimiter set to ',', `names=True`, and dtype=`None`

## Pandas with flat files

```
import pandas as pd
filename = 'MNIST.csv`
data = pd.read_csv(filename)
```

Parameters of pd.read_csv:
* nrows - number of rows to read
* header - None or datatype
* sep - delimiter
* comment - the character(s) that comments occur after
* na_values - list of strings to recognize as NA/NaN

## Pickled Files

**Pickled files** are serialized, or stored as a sequence of bytes or bytestream.

```
import pickle
with open(filename, readmode) as file:
    data = pickle.load(file)
print (data)
```

One readmode is `'rb'` which specifies readonly binary.

## Pandas with Excel

```
import pandas as pd
    file = 'urbanpop.xlsx'
    data = pd.ExcelFile(file)
    print(data.sheet_names)
```

To get a particular sheet:
* `df = data.parse(sheet_name)` (sheet name as a string)
* `df = data.parse(int)` (sheet index as int)

Parameters:
* usecols - which columns to parse (int)
* skiprows - list of rows to skip (int)
* names - list of column names to rename to (str)

## Current Working Directory
```
import os
wd = os.getcwd()
os.listdir(wd)
```

## Pandas with SAS and Stata

For SAS:
```
import pandas as pd
from sas7bdat import SAS7BDAT

with SAS7BDAT('urbanpop.sas7bdat') as file:
    df_sas = file.to_data_frame()
```

For Stata
```
import pandas as pd
data = pd.read_stata('urbanpop.dta')
```

## Pandas with HDF5

Hierarchical Data Format version 5

```
import h5py
filename = 'filename.hdf5'
data = h5py.File(filename, 'r')
```

To get keys:
```
for key in data.keys():
    print(key)
```

To access values of a key, use numpy:
```
values = np.array(df['meta'][column_name])
```

## Pandas with MATLAB

```
import scipy.io
filename = 'workspace.mat'
mat = scipy.io.loadmat(filename)
print (type(mat))
```

## Relational Databases

### SQLite with SQLAlchemy

Firing up the database:
```
from sqlalchemy import create_engine
engine = create_engine('sqlite:///Northwind.sqlit')
```

Getting table names:
`table_names = engine.table_names()`

Connecting to engine:
`con = engine.connect()`

Executing queries:
`rs = con.execute("SELECT * FROM Orders")`

Saving the result set to a Pandas dataframe:
`df = pd.DataFrame(rs.fetchall())`

Assigning Column Names:
`df.columns = rs.keys()`

Closing the connection:
`con.close()`

#### Using Context Manager:
```
with engine.connect() as con:
    rs = con.execute("SELECT Order ID, OrderDate FROM Orders")
    df = pd.DataFrame(rs.fetchmany(size=5))
    df.columns = rs.keys()
```

#### 1-liner:
`df = pd.read_sql_query("SELECT * FROM Orders", engine)`