## In this notebook:
- Using conditionals with DataFrames
- More Index Details

In [1]:
# Creating same DataFrame as before
import numpy as np
import pandas as pd
from numpy.random import randn
np.random.seed(101)
df = pd.DataFrame(randn(5,4),'A B C D E'.split(),'W X Y Z'.split())
df

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


### Conditional Selection
- Very much similar to that of numpy

In [2]:
df > 0 # returns boolean DataFrame indicating values > 0

Unnamed: 0,W,X,Y,Z
A,True,True,True,True
B,True,False,False,True
C,False,True,True,False
D,True,False,False,True
E,True,True,True,True


#### Can be used to Filter out all the negative values

In [3]:
df[df>0] # Returns true values and returns Nan in place of false

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,,,0.605965
C,,0.740122,0.528813,
D,0.188695,,,0.955057
E,0.190794,1.978757,2.605967,0.683509


#### But instead of passing in the whole DataFrame, we usually pass in a column(s)
- Eg. to remove incorrent prices in a DataFrame by removing prices < 0

In [6]:
df[df['W'] > 0] # removes C row since df[C,W] is less than 0

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [7]:
df[df['W'] > 0][['Y','Z']] # selects the y and z column in rows where w > 0 

Unnamed: 0,Y,Z
A,0.907969,0.503826
B,-0.848077,0.605965
D,-0.933237,0.955057
E,2.605967,0.683509


#### To pass in more than two conditionals use `&` and `|` instead of `and` and `or`
- Eg. `df['W'] > 0 and df['Y'] > 1` is incorrect
    - This is because python operators like and/or take only one argument on both sides but in case of series there are more than one argument at each side
    - However, &,| support multiple arguments on both sides
- Thus, `df['W'] & df['Y'] > 1` is correct

In [9]:
df[(df['W'] > 0) & (df['Y'] < 1)]

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
D,0.188695,-0.758872,-0.933237,0.955057


## More Index Details
- `reset_index()` : resets the index back to 0,1,2,...
- `set_index()` : set a column as new index

### Reset index of DataFrame
- reset_index() is used to reset the index of a DataFrame
- it resets the index to 0,1,2,..
- a new column is created with the original indices
`Syntax`
```
    df.reset_index()
    df.reset_index(inplace = True), actually implement the resetting of index
```

In [10]:
df.reset_index(inplace=True)

In [11]:
df

Unnamed: 0,index,W,X,Y,Z
0,A,2.70685,0.628133,0.907969,0.503826
1,B,0.651118,-0.319318,-0.848077,0.605965
2,C,-2.018168,0.740122,0.528813,-0.589001
3,D,0.188695,-0.758872,-0.933237,0.955057
4,E,0.190794,1.978757,2.605967,0.683509


### Creating a new index
- set_index() is used to set some other column as index
`Syntax`
```
    df.set_index(columnName) , where columnName can be any other column in the dataFrame
```

In [12]:
# Creating a new column and set as index
df['States'] = 'CA NY WY OR CO'.split()
df.set_index('States',inplace=True)
df

Unnamed: 0_level_0,index,W,X,Y,Z
States,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
CA,A,2.70685,0.628133,0.907969,0.503826
NY,B,0.651118,-0.319318,-0.848077,0.605965
WY,C,-2.018168,0.740122,0.528813,-0.589001
OR,D,0.188695,-0.758872,-0.933237,0.955057
CO,E,0.190794,1.978757,2.605967,0.683509


In [15]:
df.drop('index',axis=1,inplace=True) #remove Index column

In [16]:
df

Unnamed: 0_level_0,W,X,Y,Z
States,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,2.70685,0.628133,0.907969,0.503826
NY,0.651118,-0.319318,-0.848077,0.605965
WY,-2.018168,0.740122,0.528813,-0.589001
OR,0.188695,-0.758872,-0.933237,0.955057
CO,0.190794,1.978757,2.605967,0.683509
