# Pandas Demo
This demo covers some of the basic Pandas functionality used in the course. Pandas is a Python library used mainly for data analysis.

In [1]:
import pandas as pd
import numpy as np

## DataFrames
**DataFrames** are data structures for working with tabular data. For more information about DataFrames, click [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) to check out the Pandas documentation.


### Reading Data From a CSV File

In [2]:
# read csv file and put into a dataframe
df1 = pd.read_csv('mydf1.csv')

# print first few rows (each row corresponds to a "data point")
df1.head()

Unnamed: 0,label,feature 1,feature 2
0,0.123,0.715279,-1.5454
1,1.23,0.5,-0.720086
2,-1.45,0.5,0.004291
3,0.51,0.433026,1.203037


In [3]:
# print last 2 rows
df1.tail(2)

Unnamed: 0,label,feature 1,feature 2
2,-1.45,0.5,0.004291
3,0.51,0.433026,1.203037


### Indexing by Column Name

In [4]:
# create new dataframe - obtain "label" column and print
df1_label = df1["label"]
df1_label.head()

0    0.123
1    1.230
2   -1.450
3    0.510
Name: label, dtype: float64

In [None]:
# create new dataframe - obtain the "feature 1 and "feature 2" columns and print
df1_features = df1[["feature 1","feature 2"]]
df1_features.head()

### Converting to numpy arrays

In [5]:
# convert df1_label into numpy array
Y = np.array(df1_label)
print("Y: {}".format(Y))
# add dimension to make into convert 2d array (col vector)
Y = np.expand_dims(Y, axis=1)
print("Y: \n{}".format(Y))

Y: [ 0.123  1.23  -1.45   0.51 ]
Y: 
[[ 0.123]
 [ 1.23 ]
 [-1.45 ]
 [ 0.51 ]]


In [8]:
# drop column label
df1 = df1.drop(columns="label")
df1.head()

Unnamed: 0,feature 1,feature 2
0,0.715279,-1.5454
1,0.5,-0.720086
2,0.5,0.004291
3,0.433026,1.203037


In [10]:
# convert remainder of df1 into numpy array
X = df1.values
print("X: \n{}".format(X))

X: 
[[ 0.71527897 -1.54540029]
 [ 0.5        -0.72008556]
 [ 0.5         0.00429143]
 [ 0.43302619  1.20303737]]
