# Hacktivist Session: Pandas

## What is Pandas
Pandas is basically Excel in Python and is an essential library for data science. It has become a standard used for I/O for libraries and reading external data

## Importing and Usage

Pandas should come installed with anaconda distribution
```
conda install pandas
pip install pandas
```
To use pandas

In [5]:
import pandas as pd
import numpy as np # Usually a good idea to import numpy as well

## Usage

### Series
Series are very similar to lists and Numpy arrays but can have indices and labels. Lists, Numpy arrays, or dictionaries can be turned into series. 
> If a dictionary is turned into a series then the keys are the indices
> Series can hold many types of data
> - Float64

> - DateTime

> - String

> - Boolean

> - Integer

> - Functions

In [32]:
labels = ['a','b','c']
my_list = [10,20,30]
arr = np.array([10,20,30])
d = {'a':10,'b':20,'c':30}

In [33]:
pd.Series(data=my_list)              # Pandas series using lists (automatically indexes)

0    10
1    20
2    30
dtype: int64

In [34]:
pd.Series(data=my_list,index=labels) # Data is a list and index is a list

a    10
b    20
c    30
dtype: int64

In [35]:
pd.Series(data=arr,index=labels)     # Data is a numpy array

a    10
b    20
c    30
dtype: int64

In [36]:
pd.Series(d)                         # Data is a dictionary

a    10
b    20
c    30
dtype: int64

### Dataframes

In [37]:
df = pd.DataFrame(np.random.randn(5,4),                  # a 5x4 matrix of random normal numbers
                  index   = ["A", "B", "C", "D", "E"],
                  columns = ["W", "X", "Y", "Z"],)
df

Unnamed: 0,W,X,Y,Z
A,0.268302,-0.078997,0.090428,-0.343149
B,0.595843,-1.116327,-1.373282,-1.501304
C,0.973139,0.318253,0.091077,0.410944
D,0.359129,-1.124782,1.236682,1.231424
E,-0.872105,0.201475,-0.931926,0.937558


#### Indexing 
Grabbing values of a dataframe is similar to using a dictionary

In [38]:
df['W'] # to grab the 'W' column

A    0.268302
B    0.595843
C    0.973139
D    0.359129
E   -0.872105
Name: W, dtype: float64

In [39]:
df[['W','Y']] # to grab multiple columns, give it a list of column names

Unnamed: 0,W,Y
A,0.268302,0.090428
B,0.595843,-1.373282
C,0.973139,0.091077
D,0.359129,1.236682
E,-0.872105,-0.931926


#### Making A New Column/Operations

In [43]:
df['New'] = df['W'] + df['Y']
df['New']

A    0.358730
B   -0.777439
C    1.064216
D    1.595811
E   -1.804032
Name: New, dtype: float64

NOTE: Data from anywhere can be used to make a new column, but in most instances you are most likely using data already in the dataframe

#### Removing Rows & Columns

In [45]:
df.drop('New',axis=1) # axis = 1 signifies that it is a column

Unnamed: 0,W,X,Y,Z
A,0.268302,-0.078997,0.090428,-0.343149
B,0.595843,-1.116327,-1.373282,-1.501304
C,0.973139,0.318253,0.091077,0.410944
D,0.359129,-1.124782,1.236682,1.231424
E,-0.872105,0.201475,-0.931926,0.937558


In [46]:
df # the column isn't actually dropped

Unnamed: 0,W,X,Y,Z,New
A,0.268302,-0.078997,0.090428,-0.343149,0.35873
B,0.595843,-1.116327,-1.373282,-1.501304,-0.777439
C,0.973139,0.318253,0.091077,0.410944,1.064216
D,0.359129,-1.124782,1.236682,1.231424,1.595811
E,-0.872105,0.201475,-0.931926,0.937558,-1.804032


In [47]:
df.drop('New',axis=1,inplace=True) # inplace=True actually drops the column

In [48]:
df

Unnamed: 0,W,X,Y,Z
A,0.268302,-0.078997,0.090428,-0.343149
B,0.595843,-1.116327,-1.373282,-1.501304
C,0.973139,0.318253,0.091077,0.410944
D,0.359129,-1.124782,1.236682,1.231424
E,-0.872105,0.201475,-0.931926,0.937558


In [49]:
df.drop('E',axis=0) # drops rows

Unnamed: 0,W,X,Y,Z
A,0.268302,-0.078997,0.090428,-0.343149
B,0.595843,-1.116327,-1.373282,-1.501304
C,0.973139,0.318253,0.091077,0.410944
D,0.359129,-1.124782,1.236682,1.231424


#### Selection

In [52]:
df.loc['A'] # selecting by a column returns a series

W    0.268302
X   -0.078997
Y    0.090428
Z   -0.343149
Name: A, dtype: float64

In [55]:
df.loc['B','Y'] # selecting by a [<row>,<column>] returns a single value

-1.3732821278541343

In [57]:
df.loc[['A','B'],['W','Y']] # selecting multiple rows/columns returns a dataframe

Unnamed: 0,W,Y
A,0.268302,0.090428
B,0.595843,-1.373282


#### Conditionals and Boolean Dataframes

In [59]:
df > 0 # returns a boolean dataframe

Unnamed: 0,W,X,Y,Z
A,True,False,True,False
B,True,False,False,False
C,True,True,True,True
D,True,False,True,True
E,False,True,False,True


In [61]:
df[df>0] # using a conditional as the index returns all values that fit conditional

Unnamed: 0,W,X,Y,Z
A,0.268302,,0.090428,
B,0.595843,,,
C,0.973139,0.318253,0.091077,0.410944
D,0.359129,,1.236682,1.231424
E,,0.201475,,0.937558


In [63]:
df[df['W']>0] # you can index the dataframe itself as a conditional for indexing

Unnamed: 0,W,X,Y,Z
A,0.268302,-0.078997,0.090428,-0.343149
B,0.595843,-1.116327,-1.373282,-1.501304
C,0.973139,0.318253,0.091077,0.410944
D,0.359129,-1.124782,1.236682,1.231424


#### Pandas I/O
Pandas can read/save CSV, Excel, and HTML. This is where Pandas will become super useful
> We will actually show this in the future