# Advanced indexing
---
- In pandas Key building blocks are
    - `Indexes:` Sequence of labels
    - `Series:` 1D array with Index (index + 1 column)
    - `DataFrames:` 2D array with Series as columns
- Indexes
    - Immutable (Like dictionary keys)
    - Homogenous in data type (Like NumPy arrays)


## Exercise 1: Create index manually

In [13]:
import pandas as pd
prices = [10,12,13,11,9]
type(prices)
print(prices)


[10, 12, 13, 11, 9]


In [14]:
#pd.Series?

### Creating a series with `default indexes`

In [15]:
shares = pd.Series(data=prices)
shares
type(shares)

pandas.core.series.Series

In [16]:
shares

0    10
1    12
2    13
3    11
4     9
dtype: int64

### Creating a series with `custom indexes`

In [17]:
# 02. Creating an index
days = ['Mon','Tue','Wed','Thur','Fri']
shares = pd.Series(data = prices,index = days)
shares

Mon     10
Tue     12
Wed     13
Thur    11
Fri      9
dtype: int64

### Read Indexes

In [18]:
# 03. Read indexes
shares.index

Index(['Mon', 'Tue', 'Wed', 'Thur', 'Fri'], dtype='object')

In [19]:
shares.index[1]

'Tue'

In [20]:
shares.index[:3]

Index(['Mon', 'Tue', 'Wed'], dtype='object')

In [21]:
shares.index[:-3] # from right side 3 items will be removed

Index(['Mon', 'Tue'], dtype='object')

In [22]:
shares.index[-2:]

Index(['Thur', 'Fri'], dtype='object')

### Index names

In [23]:
shares

Mon     10
Tue     12
Wed     13
Thur    11
Fri      9
dtype: int64

In [24]:
# 04. Index names
shares.index.name

In [25]:
shares.index.name = 'weekday'

In [26]:
shares.index.name

'weekday'

In [28]:
shares

weekday
Mon     10
Tue     12
Wed     13
Thur    11
Fri      9
dtype: int64

### Indexs are immutable

In [31]:
# 05.Indexs are immutable
#shares.index[2] ='Wednesday' #TypeError: Index does not support mutable operations

In [32]:
#shares.index[:2] =['Monday','Tuesday']#TypeError: Index does not support mutable operations

### Modifing all index entries

In [33]:
shares.index = ['Monday','Tuesday','Wednesday','Thursday','Friday']
shares

Monday       10
Tuesday      12
Wednesday    13
Thursday     11
Friday        9
dtype: int64

## Exercise 2: Create index from file import

In [37]:
# change current working directory to where the files are available 
import os
os.chdir("C:\\Users\\ramreddymyla\\Google Drive\\01 DS ML DL NLP and AI With Python Lab Copy\\02 Lab Data\\Python")
os.getcwd()

'C:\\Users\\ramreddymyla\\Google Drive\\01 DS ML DL NLP and AI With Python Lab Copy\\02 Lab Data\\Python'

In [38]:
import pandas as pd
df = pd.read_csv("pandas_sales.csv",index_col="month")

In [39]:
df

Unnamed: 0_level_0,eggs,salt,spam
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Jan,47,12.0,17
Feb,110,50.0,31
Mar,221,89.0,72
Apr,77,87.0,20
May,132,,52
Jun,205,60.0,55


In [42]:
df.index # month is a primerkey

Index(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'], dtype='object', name='month')

## Exercise 3: Hierarchical indexing

### Load file as a dataframe

A composite key, in the context of relational databases, is a combination of two or more columns in a table that can be used to uniquely identify each row in the table. Uniqueness is only guaranteed when the columns are combined; when taken individually the columns do not guarantee uniqueness.

<code>hierarchical indexing = composit key</code>

In [43]:
import pandas as pd

In [44]:
df = pd.read_csv("pandas_sales_hierarchical_indexing.csv",
                 index_col=["state","month"])

In [45]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,47,12.0,17
NY,1,221,89.0,72
NY,2,77,87.0,20
TX,1,132,,52
TX,2,205,60.0,55
TX,1,22,2.0,24
CA,2,110,50.0,31


### Read index of dataframe

In [46]:
df.index

MultiIndex([('CA', 1),
            ('NY', 1),
            ('NY', 2),
            ('TX', 1),
            ('TX', 2),
            ('TX', 1),
            ('CA', 2)],
           names=['state', 'month'])

In [47]:
print(df.index.name)

None


In [48]:
df.index.name = "state_month_composit_key"

In [49]:
print(df.index.name)

state_month_composit_key


In [50]:
df.index.names

FrozenList(['state', 'month'])

In [51]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,47,12.0,17
NY,1,221,89.0,72
NY,2,77,87.0,20
TX,1,132,,52
TX,2,205,60.0,55
TX,1,22,2.0,24
CA,2,110,50.0,31


### Sort indexes

In [52]:
df = df.sort_index()

In [53]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,47,12.0,17
CA,2,110,50.0,31
NY,1,221,89.0,72
NY,2,77,87.0,20
TX,1,132,,52
TX,1,22,2.0,24
TX,2,205,60.0,55


### Reading

#### Using index method

In [54]:
df.loc['CA',1]

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,47,12.0,17


In [55]:
df.loc[('CA',1),'salt']

state  month
CA     1        12.0
Name: salt, dtype: float64

#### Using slice

In [56]:
df.loc['CA']

Unnamed: 0_level_0,eggs,salt,spam
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,47,12.0,17
2,110,50.0,31


In [57]:
df.loc['CA':'NY']

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,47,12.0,17
CA,2,110,50.0,31
NY,1,221,89.0,72
NY,2,77,87.0,20


#### Fancy

In [59]:
df.loc[(['CA','TX'],1),:]

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,47,12.0,17
TX,1,132,,52
TX,1,22,2.0,24


In [60]:
df.loc[(['CA','TX'],1),'eggs']

state  month
CA     1         47
TX     1        132
       1         22
Name: eggs, dtype: int64

In [61]:
df.loc[('CA',[1,2]),:]

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,47,12.0,17
CA,2,110,50.0,31
