<a href="https://colab.research.google.com/github/nurimammasri/Wooky-Pandas/blob/master/%5BAddition%5D%20Pandas%2002%20DataFrames.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# DataFrames

DataFrames are the workhorse of pandas and are directly inspired by the R programming language. We can think of a DataFrame as a bunch of Series objects put together to share the same index. Let's use pandas to explore this topic!

In [None]:
import pandas as pd
import numpy as np

In [None]:
from numpy.random import randn

In [None]:
np.random.seed(0)
df = pd.DataFrame(randn(5,4),index='A B C D E'.split(),columns='W X Y Z'.split())

In [None]:
'A B C D E'.split()

['A', 'B', 'C', 'D', 'E']

In [None]:
df

Unnamed: 0,W,X,Y,Z
A,1.764052,0.400157,0.978738,2.240893
B,1.867558,-0.977278,0.950088,-0.151357
C,-0.103219,0.410599,0.144044,1.454274
D,0.761038,0.121675,0.443863,0.333674
E,1.494079,-0.205158,0.313068,-0.854096


## Selection and Indexing

Let's learn the various methods to grab data from a DataFrame

In [None]:
df['W']

A    1.764052
B    1.867558
C   -0.103219
D    0.761038
E    1.494079
Name: W, dtype: float64

In [None]:
# Pass a list of column names
df[['W','Z']]

Unnamed: 0,W,Z
A,1.764052,2.240893
B,1.867558,-0.151357
C,-0.103219,1.454274
D,0.761038,0.333674
E,1.494079,-0.854096


In [None]:
# SQL Syntax (NOT RECOMMENDED!)
df.W

A    1.764052
B    1.867558
C   -0.103219
D    0.761038
E    1.494079
Name: W, dtype: float64

DataFrame Columns are just Series

In [None]:
type(df['W'])

pandas.core.series.Series

**Creating a new column:**

In [None]:
df['new'] = df['W'] + df['Y']

In [None]:
df

Unnamed: 0,W,X,Y,Z,new
A,1.764052,0.400157,0.978738,2.240893,2.74279
B,1.867558,-0.977278,0.950088,-0.151357,2.817646
C,-0.103219,0.410599,0.144044,1.454274,0.040825
D,0.761038,0.121675,0.443863,0.333674,1.204901
E,1.494079,-0.205158,0.313068,-0.854096,1.807147


** Removing Columns**

In [None]:
df.drop(['new','Z'],axis=1)

Unnamed: 0,W,X,Y
A,1.764052,0.400157,0.978738
B,1.867558,-0.977278,0.950088
C,-0.103219,0.410599,0.144044
D,0.761038,0.121675,0.443863
E,1.494079,-0.205158,0.313068


In [None]:
# Not inplace unless specified!
df

Unnamed: 0,W,X,Y,Z,new
A,1.764052,0.400157,0.978738,2.240893,2.74279
B,1.867558,-0.977278,0.950088,-0.151357,2.817646
C,-0.103219,0.410599,0.144044,1.454274,0.040825
D,0.761038,0.121675,0.443863,0.333674,1.204901
E,1.494079,-0.205158,0.313068,-0.854096,1.807147


In [None]:
df.drop('new',axis=1,inplace=True)

In [None]:
df

Unnamed: 0,W,X,Y,Z
A,1.764052,0.400157,0.978738,2.240893
B,1.867558,-0.977278,0.950088,-0.151357
C,-0.103219,0.410599,0.144044,1.454274
D,0.761038,0.121675,0.443863,0.333674
E,1.494079,-0.205158,0.313068,-0.854096


Can also drop rows this way:

In [None]:
df.drop('E',axis=0)

Unnamed: 0,W,X,Y,Z
A,1.764052,0.400157,0.978738,2.240893
B,1.867558,-0.977278,0.950088,-0.151357
C,-0.103219,0.410599,0.144044,1.454274
D,0.761038,0.121675,0.443863,0.333674


** Selecting Rows**

In [None]:
df.loc['A']

W    1.764052
X    0.400157
Y    0.978738
Z    2.240893
Name: A, dtype: float64

Or select based off of position instead of label 

In [None]:
df.iloc[3]

W    0.761038
X    0.121675
Y    0.443863
Z    0.333674
Name: D, dtype: float64

** Selecting subset of rows and columns **

In [None]:
df.loc['B','Y']

0.9500884175255894

In [None]:
df.loc[['A','B'],['W','Y']]

Unnamed: 0,W,Y
A,1.764052,0.978738
B,1.867558,0.950088


### Conditional Selection

An important feature of pandas is conditional selection using bracket notation, very similar to numpy:

In [None]:
df

Unnamed: 0,W,X,Y,Z
A,1.764052,0.400157,0.978738,2.240893
B,1.867558,-0.977278,0.950088,-0.151357
C,-0.103219,0.410599,0.144044,1.454274
D,0.761038,0.121675,0.443863,0.333674
E,1.494079,-0.205158,0.313068,-0.854096


In [None]:
df>0

Unnamed: 0,W,X,Y,Z
A,True,True,True,True
B,True,False,True,False
C,False,True,True,True
D,True,True,True,True
E,True,False,True,False


In [None]:
df[df>0]

Unnamed: 0,W,X,Y,Z
A,1.764052,0.400157,0.978738,2.240893
B,1.867558,,0.950088,
C,,0.410599,0.144044,1.454274
D,0.761038,0.121675,0.443863,0.333674
E,1.494079,,0.313068,


In [None]:
df[df['W']>0]

Unnamed: 0,W,X,Y,Z
A,1.764052,0.400157,0.978738,2.240893
B,1.867558,-0.977278,0.950088,-0.151357
D,0.761038,0.121675,0.443863,0.333674
E,1.494079,-0.205158,0.313068,-0.854096


In [None]:
df[df['W']>0]['Y']

A    0.978738
B    0.950088
D    0.443863
E    0.313068
Name: Y, dtype: float64

In [None]:
df[df['W']>0][['Y','X']]
#select Y, X from table where W>0

Unnamed: 0,Y,X
A,0.978738,0.400157
B,0.950088,-0.977278
D,0.443863,0.121675
E,0.313068,-0.205158


For two conditions you can use | and & with parenthesis:

In [None]:
#AND condition while indexing
df[(df['W']>0) & (df['Z'] > 1)]

Unnamed: 0,W,X,Y,Z
A,1.764052,0.400157,0.978738,2.240893


In [None]:
#OR condition while indexing
df[(1==0) | (df['Z'] > 1)]

Unnamed: 0,W,X,Y,Z
A,1.764052,0.400157,0.978738,2.240893
C,-0.103219,0.410599,0.144044,1.454274


In [None]:
df[(df['W']>df['Z'])]

Unnamed: 0,W,X,Y,Z
B,1.867558,-0.977278,0.950088,-0.151357
D,0.761038,0.121675,0.443863,0.333674
E,1.494079,-0.205158,0.313068,-0.854096
