## CSV to Pandas Data Frame

Let us see how we can create **Pandas Data Frames** using data from files. `read_csv` is the most popular API to create a Data Frame by reading data from files.
* Here are some of the important options.
  * sep or delimiter
  * header or names
  * index_col
  * dtype
  * and many more
* We have several other APIs which will facilitate us to create Data Frame
  * read_fwf
  * read_table
  * pandas.io.json
  * and more
* Here is how we can create a Data Frame for orders dataset.
  * Delimiter in our data is **,** which is default for Pandas `read_csv`.
  * There is no Header and hence we have to set keyword argument `header` to None.
  * We can pass the column names as a list using keyword argument `columns`.
  * Data types of each column are typically inferred based on the data, however we can explicitly specify Data Types using `dtype`.
 

In [2]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/Vh0HMbIdZs0?rel=0&amp;controls=1&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>

 
```{note}
We will be running this notebook from other notebooks to create orders and order_items data frames while exploring Pandas libraries. 

Make sure you comment out all the informational lines, so that output is not printed when we invoke this notebook from other notebooks.
```

In [4]:
import pandas as pd

In [None]:
# pd.read_csv?

In [None]:
%%sh

# ls -ltr /data/retail_db/orders/part-00000

In [None]:
%%sh

# tail /data/retail_db/orders/part-00000

In [None]:
%%sh

# head /data/retail_db/orders/part-00000

In [8]:
orders_path = "/data/retail_db/orders/part-00000"

In [7]:
orders_schema = [
    "order_id",
    "order_date",
    "order_customer_id",
    "order_status"
]

In [9]:
orders = pd.read_csv(orders_path,
                     delimiter=',',
                     header=None,
                     names=orders_schema
                    )

In [10]:
orders

Unnamed: 0,order_id,order_date,order_customer_id,order_status
0,1,2013-07-25 00:00:00.0,11599,CLOSED
1,2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT
2,3,2013-07-25 00:00:00.0,12111,COMPLETE
3,4,2013-07-25 00:00:00.0,8827,CLOSED
4,5,2013-07-25 00:00:00.0,11318,COMPLETE
...,...,...,...,...
68878,68879,2014-07-09 00:00:00.0,778,COMPLETE
68879,68880,2014-07-13 00:00:00.0,1117,COMPLETE
68880,68881,2014-07-19 00:00:00.0,2518,PENDING_PAYMENT
68881,68882,2014-07-22 00:00:00.0,10000,ON_HOLD


In [None]:
# orders.head(10)

In [None]:
order_items_path = "/data/retail_db/order_items/part-00000"

In [None]:
%%sh

# ls -ltr /data/retail_db/order_items/part-00000

In [None]:
%%sh

# tail /data/retail_db/order_items/part-00000

In [None]:
%%sh

# head /data/retail_db/order_items/part-00000

In [None]:
order_items_schema = [
    "order_item_id",
    "order_item_order_id",
    "order_item_product_id",
    "order_item_quantity",
    "order_item_subtotal",
    "order_item_product_price"
]

In [None]:
order_items = pd.read_csv(order_items_path,
                     delimiter=',',
                     header=None,
                     names=order_items_schema
                    )

In [None]:
# order_items

In [None]:
# order_items.head(10)

In [2]:
customers_path = "/data/retail_db/customers/part-00000"

In [1]:
customer_schema = [
     "customer_id", 
     "customer_fname",
     "customer_lname",
     "customer_email",
     "customer_password",  
     "customer_street", 
     "customer_city", 
     "customer_state",
     "customer_zipcode"
]

In [5]:
customers = pd.read_csv(customers_path,
                     delimiter=',',
                     header=None,
                     names=customer_schema
                    )

In [6]:
customers

Unnamed: 0,customer_id,customer_fname,customer_lname,customer_email,customer_password,customer_street,customer_city,customer_state,customer_zipcode
0,1,Richard,Hernandez,XXXXXXXXX,XXXXXXXXX,6303 Heather Plaza,Brownsville,TX,78521
1,2,Mary,Barrett,XXXXXXXXX,XXXXXXXXX,9526 Noble Embers Ridge,Littleton,CO,80126
2,3,Ann,Smith,XXXXXXXXX,XXXXXXXXX,3422 Blue Pioneer Bend,Caguas,PR,725
3,4,Mary,Jones,XXXXXXXXX,XXXXXXXXX,8324 Little Common,San Marcos,CA,92069
4,5,Robert,Hudson,XXXXXXXXX,XXXXXXXXX,10 Crystal River Mall,Caguas,PR,725
...,...,...,...,...,...,...,...,...,...
12430,12431,Mary,Rios,XXXXXXXXX,XXXXXXXXX,1221 Cinder Pines,Kaneohe,HI,96744
12431,12432,Angela,Smith,XXXXXXXXX,XXXXXXXXX,1525 Jagged Barn Highlands,Caguas,PR,725
12432,12433,Benjamin,Garcia,XXXXXXXXX,XXXXXXXXX,5459 Noble Brook Landing,Levittown,NY,11756
12433,12434,Mary,Mills,XXXXXXXXX,XXXXXXXXX,9720 Colonial Parade,Caguas,PR,725
