# Python Pandas I/O
### Creating DataFrames, Reading and Writing to CSV & JSON files  
[Documentation](https://pandas.pydata.org/docs/index.html)

In [1]:
import numpy as np
import pandas as pd
import random

### Creating DataFrames from Lists and Dicts
▶ New DataFrame from a **List**  
Pandas automatically assigns numerical row indexes.

In [2]:
data1 = [random.random() for i in range(10000)]
df = pd.DataFrame(data1)
df.head()

Unnamed: 0,0
0,0.027684
1,0.174996
2,0.743153
3,0.628041
4,0.658552


▶ New DataFrame from a **2D List**  
Column names default to integers. Each subList is a row.

In [3]:
data2 = [[i, random.randint(10,99)] for i in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ']
df = pd.DataFrame(data2)
df.head()

Unnamed: 0,0,1
0,A,28
1,B,78
2,C,85
3,D,65
4,E,98


▶ New DataFrame from a **Dictionary**  
Dict Keys become column names

In [4]:
data3 = {
    'Model':['T57','T61','T64','T65'],
    'Price':[1.42,1.48,1.73,1.95],
    'Size':['57 inches','61 inches','64 inches','65 inches']}
df = pd.DataFrame(data3)
df

Unnamed: 0,Model,Price,Size
0,T57,1.42,57 inches
1,T61,1.48,61 inches
2,T64,1.73,64 inches
3,T65,1.95,65 inches


Change previous example to use Model number as index.  

In [5]:
df = pd.DataFrame(
    {'Price':data3['Price'],'Size':data3['Size']}, 
    index=data3['Model'])
df

Unnamed: 0,Price,Size
T57,1.42,57 inches
T61,1.48,61 inches
T64,1.73,64 inches
T65,1.95,65 inches


▶ New DataFrame from a List of Dictionaries  
Note, missing Length is populated with NaN (not a number).

In [6]:
data4 = [
    {'Ht':63, 'Len':45, 'Wt':2.6}, 
    {'Ht':29, 'Wt':1.7},
    {'Ht':37, 'Len':71, 'Wt':4.2}]
df = pd.DataFrame(data4)
df

Unnamed: 0,Ht,Len,Wt
0,63,45.0,2.6
1,29,,1.7
2,37,71.0,4.2


### Reading & Writing DataFrames to CSV Files
[Documentation](https://pandas.pydata.org/docs/user_guide/io.html#csv-text-files) of numerous optional parameters.  

▶ Write DataFrame to CSV file 

In [7]:
df = pd.DataFrame(data4)
df.to_csv('outfile.csv', index=False) #, sep=';')

▶ Read CSV file into DataFrame  
Missing numerical data are given value NaN by default.

In [8]:
df = pd.read_csv('outfile.csv')
df

Unnamed: 0,Ht,Len,Wt
0,63,45.0,2.6
1,29,,1.7
2,37,71.0,4.2


▶ Convert DataFrame to_string

In [9]:
df = pd.DataFrame(data4)
d4str = df.to_string()
d4str

'   Ht   Len   Wt\n0  63  45.0  2.6\n1  29   NaN  1.7\n2  37  71.0  4.2'

### Reading & Writing DataFrames to JSON files
[Documentation](https://pandas.pydata.org/docs/user_guide/io.html#csv-text-files) of numerous optional parameters.  

▶ Convert DataFrame to **JSON** string  
No argument - json by columns is default, {column -> {index -> value}}

In [10]:
data4_json = df.to_json()
data4_json

'{"Ht":{"0":63,"1":29,"2":37},"Len":{"0":45.0,"1":null,"2":71.0},"Wt":{"0":2.6,"1":1.7,"2":4.2}}'

Use orient='index' to structure the json by rows, {index -> {column -> value}}.  
You can also strip out the row indices by using orient='records'.

In [11]:
data4_json = df.to_json(orient='index')
data4_json

'{"0":{"Ht":63,"Len":45.0,"Wt":2.6},"1":{"Ht":29,"Len":null,"Wt":1.7},"2":{"Ht":37,"Len":71.0,"Wt":4.2}}'

▶ Write to a text file in JSON format.

In [12]:
data4_json = df.to_json('outjson.txt')
data4_json

▶ Read same JSON data back in to a DataFrame.

In [13]:
data4 = pd.read_json('outjson.txt')
data4

Unnamed: 0,Ht,Len,Wt
0,63,45.0,2.6
1,29,,1.7
2,37,71.0,4.2
