# The index Objects

In [1]:
import numpy as np
import pandas as pd

**Reindexing**

In [2]:
ser = pd.Series([2,5,7,4], index=['one','two','three','four'])


In [3]:
ser

one      2
two      5
three    7
four     4
dtype: int64

In [4]:
ser.reindex(['three','four','five','one'])

three    7.0
four     4.0
five     NaN
one      2.0
dtype: float64

As you can see from the value returned, the order of the labels has been completely rearranged. The
value corresponding to the label ‘two’ has been dropped and a new label ‘five’ is present in the Series.
However, to measure the reindexing, the definition of the list of all the labels can be awkward, especially
for a large data frame. So you could use some method that allows you to fill or interpolate values automatically.
To better understand the functioning of this mode of automatic reindexing, define the following Series.

Numerical operations on NaN always produces NaN.

For eg. : 80+NaN = NaN

In [5]:
ser3 = pd.Series([1,5,6,3],index=[0,3,5,6])
ser3


0    1
3    5
5    6
6    3
dtype: int64

In [6]:
ser3.reindex(range(6))

0    1.0
1    NaN
2    NaN
3    5.0
4    NaN
5    6.0
dtype: float64

There are two methods to eliminate NaN while reindexing

a. ffill

b. bfill

In [7]:
ser3.reindex(range(6), method='ffill')

0    1
1    1
2    1
3    5
4    5
5    6
dtype: int64

the indexes that were not present in the original Series were added. By
interpolation, those with the lowest index in the original Series, have been assigned as values. In fact the
indexes 1 and 2 have the value 1 which belongs to index 0.

If you want this index value to be assigned during the interpolation, you have to use the bfill method.

In [8]:
ser3.reindex(range(6), method='bfill')

0    1
1    5
2    5
3    5
4    6
5    6
dtype: int64

In [9]:
ser3.index

Index([0, 3, 5, 6], dtype='int64')

In this case the value assigned to the indexes 1 and 2 is the value 5, which belongs to index 3.

# Dropping

In [11]:
ser = pd.Series(np.arange(4.), index=['red','blue','yellow','white'])
ser

red       0.0
blue      1.0
yellow    2.0
white     3.0
dtype: float64

In [12]:
ser.drop('white')

red       0.0
blue      1.0
yellow    2.0
dtype: float64

In [13]:
ser

red       0.0
blue      1.0
yellow    2.0
white     3.0
dtype: float64

Hence, we can see that drop() function does not drop a column from the original series but it creates a new Series and drops the specified column from that Series

In [14]:
ser.drop(['white', 'blue'])

red       0.0
yellow    2.0
dtype: float64

In [17]:
frame = pd.DataFrame(np.arange(16).reshape((4,4)))


In [18]:
frame

Unnamed: 0,0,1,2,3
0,0,1,2,3
1,4,5,6,7
2,8,9,10,11
3,12,13,14,15


In [19]:
frame.drop(2)

Unnamed: 0,0,1,2,3
0,0,1,2,3
1,4,5,6,7
3,12,13,14,15


In [20]:
frame.drop([1,3])

Unnamed: 0,0,1,2,3
0,0,1,2,3
2,8,9,10,11


Deletes the row from the dataframe

**Note:** To delete columns, you always need to specify the indexes of the columns, but you must specify the
axis from which to delete the elements, and this can be done using the axis option. So to refer to the column
names you should specify axis = 1. 

# Arithmetic and data alignment

In [22]:
s1 = pd.Series([3,2,5,1],['white','yellow','green','blue'])
s2 = pd.Series([1,4,7,2,1],['white','yellow','black','blue','brown'])
s1, s2

(white     3
 yellow    2
 green     5
 blue      1
 dtype: int64,
 white     1
 yellow    4
 black     7
 blue      2
 brown     1
 dtype: int64)

In [23]:
s1+s2

black     NaN
blue      3.0
brown     NaN
green     NaN
white     4.0
yellow    6.0
dtype: float64

As you can see from the two
Series just declared, some labels are present in both, while other labels are present only in one of the two.
Well, when the labels are present in both operators, their values will be added, while in the opposite case,
they will also be shown in the result (new series), but with the value NaN.

Pandas does this **Data Alignment** inorder to incororporate for the error while which may arise due to inconsistency in the shape of the two series.

In [25]:
frame1 = pd.DataFrame(np.arange(16).reshape((4,4)),
index=['red','blue','yellow','white'],
columns=['ball','pen','pencil','paper'])
frame2 = pd.DataFrame(np.arange(12).reshape((4,3)),
index=['blue','green','white','yellow'],
columns=['mug','pen','ball'])

In [28]:
frame1


Unnamed: 0,ball,pen,pencil,paper
red,0,1,2,3
blue,4,5,6,7
yellow,8,9,10,11
white,12,13,14,15


In [29]:
frame2

Unnamed: 0,mug,pen,ball
blue,0,1,2
green,3,4,5
white,6,7,8
yellow,9,10,11


In [30]:
frame1 + frame2

Unnamed: 0,ball,mug,paper,pen,pencil
blue,6.0,,,6.0,
green,,,,,
red,,,,,
white,20.0,,,20.0,
yellow,19.0,,,19.0,
