# Pandas Demo
This demo covers some of the basic Pandas functionality used in the course. Pandas is a Python library used mainly for data analysis. To learn more, visit the [documentation](here).

In [283]:
import pandas as pd
import numpy as np

## DataFrames
**DataFrames** are data structures for working with tabular data. For more information about DataFrames, click [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) to check out the Pandas documentation.


### Reading Data From a CSV File

In [287]:
# read csv file from specified path
data = pd.read_csv('../Data/mydf.csv')

# print first few rows
data.head()

Unnamed: 0,label,feature 1,feature 2
0,spam,0.715279,-1.5454
1,not spam,0.5,-0.720086
2,not spam,0.5,0.004291
3,spam,0.433026,1.203037


In [288]:
# print last 2 rows
data.tail(2)

Unnamed: 0,label,feature 1,feature 2
2,not spam,0.5,0.004291
3,spam,0.433026,1.203037


### Indexing by Column Name

In [301]:
# obtain the "label" column and assign it to a variable called labels
labels = data['label']

labels

0        spam
1    not spam
2    not spam
3        spam
Name: label, dtype: object

### Encoding/mapping Values

In [295]:
# encode "not spam" labels as 0s and "spam" labels to 1s
Y = label.map({"not spam": 0, "spam": 1})

Y

0    1
1    0
2    0
3    1
Name: label, dtype: int64

### Removing Columns

In [298]:
# assign feature columns to feature matrix X
X = data.drop(columns='label', axis=1)

X

Unnamed: 0,feature 1,feature 2
0,0.715279,-1.5454
1,0.5,-0.720086
2,0.5,0.004291
3,0.433026,1.203037


### Extracting values

In [299]:
# extract the values as array without the index and column names
X = X.values

X

array([[ 0.71527897, -1.54540029],
       [ 0.5       , -0.72008556],
       [ 0.5       ,  0.00429143],
       [ 0.43302619,  1.20303737]])