## Pandas DF Slicing

In [67]:
import pandas as pd

df = pd.DataFrame({'col-0' : ['cell-0-0', 'cell-1-0', 'cell-2-0', 'cell-3-0', 'cell-4-0'],
                   'col-1' : ['cell-0-1', 'cell-1-1', 'cell-2-1', 'cell-3-1', 'cell-4-1'],
                   'col-2' : ['cell-0-2', 'cell-1-2', 'cell-2-2', 'cell-3-2', 'cell-4-2']
                  })

df

Unnamed: 0,col-0,col-1,col-2
0,cell-0-0,cell-0-1,cell-0-2
1,cell-1-0,cell-1-1,cell-1-2
2,cell-2-0,cell-2-1,cell-2-2
3,cell-3-0,cell-3-1,cell-3-2
4,cell-4-0,cell-4-1,cell-4-2


## Size of df - Shape

In [68]:
df.shape

(5, 3)

In [69]:
print ('#rows: ', df.shape[0])

#rows:  5


In [70]:
print ('#cols: ', df.shape[1])

#cols:  3


## Access Cells - iloc

`iloc` takes numbers as indexes

In [71]:
print ('df.iloc[0,0] :', df.iloc[0,0])
print ('df.iloc[0,1] :', df.iloc[0,1])
print ('df.iloc[1,1] :', df.iloc[1,1])

## Todo play with some indexes

df.iloc[0,0] : cell-0-0
df.iloc[0,1] : cell-0-1
df.iloc[1,1] : cell-1-1


In [72]:
## TODO: What happens when the index is out of range?
df.iloc[10,0]

IndexError: index 10 is out of bounds for axis 0 with size 5

In [None]:
## TODO: What is the max index can you go to?
## Hint: see size above

## Access Rows - `iloc`

We can get entire rows using iloc as well

In [73]:
# row 0
df.iloc[0]

col-0    cell-0-0
col-1    cell-0-1
col-2    cell-0-2
Name: 0, dtype: object

In [75]:
# what is this type
type (df.iloc[0])

pandas.core.series.Series

In [76]:
# Let's get multiple rows
df.iloc[0:2]

Unnamed: 0,col-0,col-1,col-2
0,cell-0-0,cell-0-1,cell-0-2
1,cell-1-0,cell-1-1,cell-1-2


In [77]:
# What type is it?
type (df.iloc[0:2])

pandas.core.frame.DataFrame

## Access Rows - loc

In [78]:
# What do we get here?
# we get row-0
df.loc[0]

col-0    cell-0-0
col-1    cell-0-1
col-2    cell-0-2
Name: 0, dtype: object

In [79]:
# What type is this?
## it is a Series

type (df.loc[0])

pandas.core.series.Series

In [80]:
# Get multiple rows
df.loc[0:2]

Unnamed: 0,col-0,col-1,col-2
0,cell-0-0,cell-0-1,cell-0-2
1,cell-1-0,cell-1-1,cell-1-2
2,cell-2-0,cell-2-1,cell-2-2


In [81]:
# what type is it?
## interesting :-)

type (df.loc[0:2])

pandas.core.frame.DataFrame

In [82]:
## What do we get when we use double brackets  [[ ]]
df.loc[[0]]

Unnamed: 0,col-0,col-1,col-2
0,cell-0-0,cell-0-1,cell-0-2


In [83]:
# And what type is it?

type (df.loc[[0]])

pandas.core.frame.DataFrame

In [84]:
# can we get multiple rows?
df.loc[[0,2]]

Unnamed: 0,col-0,col-1,col-2
0,cell-0-0,cell-0-1,cell-0-2
2,cell-2-0,cell-2-1,cell-2-2


In [85]:
type (df.loc[[0,2]])

pandas.core.frame.DataFrame

## iloc vs loc

[loc vs iloc](https://stackoverflow.com/questions/31593201/how-are-iloc-and-loc-different)

* `loc` gets rows (and/or columns) with particular labels.
* `iloc` gets rows (and/or columns) at integer locations.

Let's create a df with a non-numeric index

In [87]:
df2 = pd.DataFrame({'col-0' : ['cell-a-0', 'cell-b-0', 'cell-c-0', 'cell-d-0', 'cell-e-0'],
                   'col-1' : ['cell-a-1', 'cell-b-1', 'cell-c-1', 'cell-d-1', 'cell-e-1'],
                   'col-2' : ['cell-a-2', 'cell-b-2', 'cell-c-2', 'cell-d-2', 'cell-e-2']
                  }, index=['a', 'b', 'c', 'd', 'e'])
df2

Unnamed: 0,col-0,col-1,col-2
a,cell-a-0,cell-a-1,cell-a-2
b,cell-b-0,cell-b-1,cell-b-2
c,cell-c-0,cell-c-1,cell-c-2
d,cell-d-0,cell-d-1,cell-d-2
e,cell-e-0,cell-e-1,cell-e-2


In [None]:
df2.loc['a']

In [None]:
# will this work?
# no, remember iloc needs a number
df2.iloc['a'] 

In [None]:
df2.iloc[0:2]

In [None]:
## try these
df2.loc[['a', 'b']]

## Selecting Columns

In [None]:
# select single columns
df['col-0']

In [None]:
# select multiple columns
df [['col-0', 'col-1']] 

## Combine - Select specific rows and column

In [None]:
df.iloc[1]['col-1']

In [None]:
df.iloc[0:2]['col-1']

In [None]:
df.iloc[0:2][['col-1', 'col-2']]

In [None]:
## TODO, go ahead try a few formulas

## Setting Custom Index

By default Pandas has integer indexes, starting at 0.  This is fine for most usecases. Sometimes we may want to set a custom index.  Here is an example

In [2]:
import pandas as pd

customers = pd.DataFrame({'id' : ['101', '102', '103', '104'],
                   'name' : ['John', 'Tom', 'Jane', 'Liz'],
                   'phone' : ['1111', '2222', '3333', '4444']
                  })

customers

Unnamed: 0,id,name,phone
0,101,John,1111
1,102,Tom,2222
2,103,Jane,3333
3,104,Liz,4444


In [4]:
# As we can see the index is the default index.

customers.iloc[2]

id        103
name     Jane
phone    3333
Name: 2, dtype: object

In [6]:
## Let's set the customer_id as index

customers2 = customers.set_index('id')
customers2

Unnamed: 0_level_0,name,phone
id,Unnamed: 1_level_1,Unnamed: 2_level_1
101,John,1111
102,Tom,2222
103,Jane,3333
104,Liz,4444


In [7]:
## now access using customer_id
customers2.loc['103']

name     Jane
phone    3333
Name: 103, dtype: object

In [None]:
## Remember,pandas is creating a copy here, 

## the original customers still have the original structure
customers

In [None]:
## We can set index 'inplace' to avoid copying
customers.set_index( 'id',  inplace=True)

In [None]:
# original customers changed!
customers