In Pandas datastructures, key building blocks are

a) Indexes: Sequence of labels.

b) Series: 1D array with Index (index + 1 column).

c) DataFrames: 2D array with Series as columns.

Indexes

a) Immutable (Like Dictionary Keys)

b) Homogenous in data type (Like Numpy Arrays)

## Exercise 1 - Create index manually

In [2]:
import pandas as pd

In [3]:
prices = [10, 12, 13, 11, 9]

In [4]:
type(prices)

list

#### 01. creating a series

In [5]:
shares = pd.Series(prices)

In [6]:
shares

0    10
1    12
2    13
3    11
4     9
dtype: int64

In [7]:
type(shares)

pandas.core.series.Series

#### 02. creating an index

In [8]:
days = ['Mon', 'Tues', 'Wed', 'Thur', 'Fri']

In [9]:
shares = pd.Series(data = prices, index = days)

In [10]:
shares

Mon     10
Tues    12
Wed     13
Thur    11
Fri      9
dtype: int64

#### 03. read indexes

In [11]:
shares.index

Index(['Mon', 'Tues', 'Wed', 'Thur', 'Fri'], dtype='object')

In [12]:
shares.index[1]

'Tues'

In [13]:
shares.index[:3]

Index(['Mon', 'Tues', 'Wed'], dtype='object')

In [14]:
shares.index[:-3]

Index(['Mon', 'Tues'], dtype='object')

In [15]:
shares.index[-2:]

Index(['Thur', 'Fri'], dtype='object')

#### 04. index names

In [16]:
shares.index.name

In [17]:
shares.index.name = 'weekday'

In [18]:
shares.index.name

'weekday'

In [19]:
shares

weekday
Mon     10
Tues    12
Wed     13
Thur    11
Fri      9
dtype: int64

#### 05. indexes are immutable

In [21]:
shares.index[2] = 'Wednesday'

TypeError: Index does not support mutable operations

In [22]:
shares.index[:2] = ['Monday', 'Tuesday']

TypeError: Index does not support mutable operations

#### 06. modifying all index entries

In [23]:
shares.index = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']

In [24]:
shares

Monday       10
Tuesday      12
Wednesday    13
Thursday     11
Friday        9
dtype: int64

## Exercise 2 - Create index from file import

In [25]:
df = pd.read_csv('C:\\Users\\rs\\Downloads\\01 Data Science Lab Copy\\02 Lab Data\\Python\\pandas_sales.csv', index_col = 'month')

In [26]:
df

Unnamed: 0_level_0,eggs,salt,spam
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Jan,47,12.0,17
Feb,110,50.0,31
Mar,221,89.0,72
Apr,77,87.0,20
May,132,,52
Jun,205,60.0,55


## Exercise 3 - Hierarchical Indexing

#### 01. load data

In [27]:
import pandas as pd

In [29]:
df = pd.read_csv('C:\\Users\\rs\\Downloads\\01 Data Science Lab Copy\\02 Lab Data\\Python\\pandas_sales_hierarchical_indexing.csv')

In [30]:
df

Unnamed: 0,state,month,eggs,salt,spam
0,CA,1,47,12.0,17
1,CA,2,110,50.0,31
2,TX,1,221,89.0,72
3,TX,2,77,87.0,20
4,NY,1,132,,52
5,NY,2,205,60.0,55


#### 02. create index using two columns (similar to composite key in RDBMS)

In [31]:
df = df.set_index(['state', 'month'])

In [32]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,47,12.0,17
CA,2,110,50.0,31
TX,1,221,89.0,72
TX,2,77,87.0,20
NY,1,132,,52
NY,2,205,60.0,55


#### 03. read index of dataframe

In [33]:
df.index

MultiIndex(levels=[['CA', 'NY', 'TX'], [1, 2]],
           labels=[[0, 0, 2, 2, 1, 1], [0, 1, 0, 1, 0, 1]],
           names=['state', 'month'])

In [34]:
df.index.name

In [35]:
df.index.names

FrozenList(['state', 'month'])

#### 04. sort indexes

In [36]:
df = df.sort_index()

In [37]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,47,12.0,17
CA,2,110,50.0,31
NY,1,132,,52
NY,2,205,60.0,55
TX,1,221,89.0,72
TX,2,77,87.0,20


#### 05. reading

##### 05-1. using index method

In [38]:
df.loc['CA', 1]

eggs    47.0
salt    12.0
spam    17.0
Name: (CA, 1), dtype: float64

In [39]:
df.loc[('CA', 1), 'salt'] 

12.0

##### 05-2. using slice

In [40]:
df.loc['CA']

Unnamed: 0_level_0,eggs,salt,spam
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,47,12.0,17
2,110,50.0,31


In [41]:
df.loc['CA' : 'NY']

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,47,12.0,17
CA,2,110,50.0,31
NY,1,132,,52
NY,2,205,60.0,55


##### 05-3. fancy

In [42]:
df.loc[(['CA', 'TX'], 1), :]

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,47,12.0,17
TX,1,221,89.0,72


In [43]:
df.loc[(['CA', 'TX'], 1), 'eggs']

state  month
CA     1         47
TX     1        221
Name: eggs, dtype: int64

In [44]:
df.loc[('CA', [1, 2]), :]

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,47,12.0,17
CA,2,110,50.0,31
