# Python Pandas - Dataframes

In this tutorial we will take a closer look at and introduce the pandas dataframe which is the true workhorse of pandas.  We will handle the basics of dataframes here including:
- Creating a DataFrame
- Accessing rows and columns of a dataframe
- Deleting rows and cols of a dataframe
- Grabbing specific parts of a df

In the next lectures we will look at more advanced use of df with other operations.

In [1]:
# Begin by importing libs
import pandas as pd
import numpy as np

In [2]:
from numpy.random import randn

In [3]:
# Lets begin by seeding the random module
np.random.seed(123)

In [4]:
df = pd.DataFrame(randn(5,4), ['A', 'B', 'C', 'D', 'E'], ['W', 'X', 'Y', 'Z'])
df # Take a closer look at the dataframe

Unnamed: 0,W,X,Y,Z
A,-1.085631,0.997345,0.282978,-1.506295
B,-0.5786,1.651437,-2.426679,-0.428913
C,1.265936,-0.86674,-0.678886,-0.094709
D,1.49139,-0.638902,-0.443982,-0.434351
E,2.20593,2.186786,1.004054,0.386186


In [5]:
# A dataframe is a bunch of series (see last lecture) with a shared index - lets take a closer look at this
# Grabbing a column:
df['W']

A   -1.085631
B   -0.578600
C    1.265936
D    1.491390
E    2.205930
Name: W, dtype: float64

In [6]:
# Lets look at what kind of object this is
type(df['W'])

pandas.core.series.Series

In [7]:
# Woah! Its a series!  - Look we can also grab multiple columns:
df[['W', 'Y']]

Unnamed: 0,W,Y
A,-1.085631,0.282978
B,-0.5786,-2.426679
C,1.265936,-0.678886
D,1.49139,-0.443982
E,2.20593,1.004054


In [8]:
# Lets move onto CRUDDING in pandas dataframes - First up creating a new column:
df['new_col'] = df['W'] + df['Y']
df

Unnamed: 0,W,X,Y,Z,new_col
A,-1.085631,0.997345,0.282978,-1.506295,-0.802652
B,-0.5786,1.651437,-2.426679,-0.428913,-3.005279
C,1.265936,-0.86674,-0.678886,-0.094709,0.58705
D,1.49139,-0.638902,-0.443982,-0.434351,1.047408
E,2.20593,2.186786,1.004054,0.386186,3.209984


In [9]:
# Deleting a column
df.drop(columns='new_col')
df.head(1)

Unnamed: 0,W,X,Y,Z,new_col
A,-1.085631,0.997345,0.282978,-1.506295,-0.802652


In [10]:
# Deleting a column
df.drop(columns='new_col', inplace=True)
df.head(1)

Unnamed: 0,W,X,Y,Z
A,-1.085631,0.997345,0.282978,-1.506295


In [11]:
# Lets do that another way
df['new_col'] = df['W'] + df['Y']
df.head(1)

Unnamed: 0,W,X,Y,Z,new_col
A,-1.085631,0.997345,0.282978,-1.506295,-0.802652


In [12]:
# Axis tells the interpreter that we want to access columns, not rows
df.drop('new_col', axis=1)
df.head(1)

Unnamed: 0,W,X,Y,Z,new_col
A,-1.085631,0.997345,0.282978,-1.506295,-0.802652


In [13]:
# Notice this far that we need to provice the inplace argument to true in order to alter the dataframe in question
df.drop('new_col', axis=1, inplace=True)
df.head(1)

Unnamed: 0,W,X,Y,Z
A,-1.085631,0.997345,0.282978,-1.506295


In [14]:
##### ROWS
# We can also delete rows
# Notice we dont use inplace because we like the E row
df.drop('E') 

Unnamed: 0,W,X,Y,Z
A,-1.085631,0.997345,0.282978,-1.506295
B,-0.5786,1.651437,-2.426679,-0.428913
C,1.265936,-0.86674,-0.678886,-0.094709
D,1.49139,-0.638902,-0.443982,-0.434351


In [15]:
# It is also possible to access rows - this is how
df.loc['D']

W    1.491390
X   -0.638902
Y   -0.443982
Z   -0.434351
Name: D, dtype: float64

In [16]:
# We can also access the row using index in int
df.iloc[2]

W    1.265936
X   -0.866740
Y   -0.678886
Z   -0.094709
Name: C, dtype: float64

In [17]:
# We can also use loc to get specific values
df.loc['A', 'W']

-1.0856306033005612

In [18]:
# Or parts of a dataframe
df.loc[['A', 'B'], ['W', 'X']]

Unnamed: 0,W,X
A,-1.085631,0.997345
B,-0.5786,1.651437
