# Advanced indexing
---
- In pandas Data Structures ,Key building blocks are
    - Indexes: Sequence of labels
    - Series: 1D array with Index (index + 1 column)
    - DataFrames: 2D array with Series as columns
- Indexes
    - Immutable (Like dictionary keys)
    - Homogenous in data type (Like NumPy arrays)


## 1. Exercise 1: Create index manually

In [2]:
import pandas as pd
prices = [10,12,13,11,9]
type(prices)
print(prices)


[10, 12, 13, 11, 9]


In [0]:
pd.Series?

### 1.1 Creating a series with `default indexes`

In [0]:
shares = pd.Series(data=prices)
shares
type(shares)

pandas.core.series.Series

In [0]:
shares

0    10
1    12
2    13
3    11
4     9
dtype: int64

### 1.2 Creating a series with `custom indexes`

In [0]:
# 02. Creating an index
days = ['Mon','Tue','Wed','Thur','Fri']
shares = pd.Series(data = prices,index = days)
shares

Mon     10
Tue     12
Wed     13
Thur    11
Fri      9
dtype: int64

### 1.3 Read Indexes

In [0]:
# 03. Read indexes
shares.index

Index(['Mon', 'Tue', 'Wed', 'Thur', 'Fri'], dtype='object')

In [0]:
shares.index[1]

'Tue'

In [0]:
shares.index[:3]

Index(['Mon', 'Tue', 'Wed'], dtype='object')

In [0]:
shares.index[:-3] # from right side 3 items will be removed

Index(['Mon', 'Tue'], dtype='object')

In [0]:
shares.index[-2:]

Index(['Thur', 'Fri'], dtype='object')

### 1.4 Index names

In [0]:
shares

Mon     10
Tue     12
Wed     13
Thur    11
Fri      9
dtype: int64

In [0]:
# 04. Index names
shares.index.name

In [0]:
shares.index.name = 'weekday'

In [0]:
shares.index.name

'weekday'

In [0]:
shares

weekday
Mon     10
Tue     12
Wed     13
Thur    11
Fri      9
dtype: int64

### 1.5 Indexs are immutable

In [0]:
# 05.Indexs are immutable
shares.index[2] ='Wednesday' #TypeError: Index does not support mutable operations

TypeError: Index does not support mutable operations

In [0]:
shares.index[:2] =['Monday','Tuesday']#TypeError: Index does not support mutable operations

TypeError: Index does not support mutable operations

### 1.6. Modifing all index entries

In [0]:
shares.index = ['Monday','Tuesday','Wednesday','Thursday','Friday']
shares

Monday       10
Tuesday      12
Wednesday    13
Thursday     11
Friday        9
dtype: int64

In [0]:
a_str ="python"

In [0]:
a_str[0]="J"

TypeError: 'str' object does not support item assignment

In [0]:
a_str = "jython"

In [0]:
a_str

'jython'

## 2. Exercise 2: Create index from file import

In [4]:
# change current working directory to where the files are available 
import os
os.chdir("C:\\Users\\Hi\\Google Drive\\01 DS ML DL NLP and AI With Python Lab Copy\\02 Lab Data\\Python")
os.getcwd()

'C:\\Users\\Hi\\Google Drive\\01 DS ML DL NLP and AI With Python Lab Copy\\02 Lab Data\\Python'

In [0]:
import pandas as pd
df = pd.read_csv("pandas_sales.csv",index_col="month")

In [0]:
df

Unnamed: 0_level_0,eggs,salt,spam
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Jan,47,12.0,17
Feb,110,50.0,31
Mar,221,89.0,72
Apr,77,87.0,20
May,132,,52
Jun,205,60.0,55


In [0]:
df.index # month is a primerkey

Index(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'], dtype='object', name='month')

## 3. Exercise 3: Hierarchical indexing

### 3.1 load file as a dataframe

A composite key, in the context of relational databases, is a combination of two or more columns in a table that can be used to uniquely identify each row in the table. Uniqueness is only guaranteed when the columns are combined; when taken individually the columns do not guarantee uniqueness.

<code>hierarchical indexing = composit key</code>

In [2]:
import pandas as pd

In [5]:
df = pd.read_csv("pandas_sales_hierarchical_indexing.csv",
                 index_col=["state","month"])

In [6]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,47,12.0,17
CA,2,110,50.0,31
NY,1,221,89.0,72
NY,2,77,87.0,20
TX,1,132,,52
TX,2,205,60.0,55


### 3.2. Read index of dataframe

In [7]:
df.index

MultiIndex(levels=[['CA', 'NY', 'TX'], [1, 2]],
           labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]],
           names=['state', 'month'])

In [0]:
print(df.index.name)

None


In [0]:
df.index.name = "state_month_composit_key"

In [0]:
print(df.index.name)

state_month_composit_key


In [0]:
df.index.names

FrozenList(['state', 'month'])

In [0]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,47,12.0,17
CA,2,110,50.0,31
TX,1,221,89.0,72
TX,2,77,87.0,20
NY,1,132,,52
NY,2,205,60.0,55


### 3.3. Sort indexes

In [0]:
df = df.sort_index()

In [0]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,47,12.0,17
CA,2,110,50.0,31
NY,1,132,,52
NY,2,205,60.0,55
TX,1,221,89.0,72
TX,2,77,87.0,20


### 3.4. Reading
#### 3.4.1. Using index method

In [0]:
df.loc['CA',1]

eggs    47.0
salt    12.0
spam    17.0
Name: (CA, 1), dtype: float64

In [0]:
df.loc[('CA',1),'salt']

12.0

#### 3.4.2. Using slice

In [0]:
df.loc['CA']

Unnamed: 0_level_0,eggs,salt,spam
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,47,12.0,17
2,110,50.0,31


In [0]:
df.loc['CA':'NY']

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,47,12.0,17
CA,2,110,50.0,31
NY,1,132,,52
NY,2,205,60.0,55


#### 3.4.3. Fancy

In [0]:
df.loc[(['CA','TX'],1),:]

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,47,12.0,17
TX,1,221,89.0,72


In [0]:
df.loc[(['CA','TX'],1),'eggs']

state  month
CA     1         47
TX     1        221
Name: eggs, dtype: int64

In [0]:
df.loc[('CA',[1,2]),:]

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,47,12.0,17
CA,2,110,50.0,31
