<h1 align='center'> 5.2 Essential Functionality Part I

<h3>Reindexing

In [1]:
import pandas as pd

obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
obj
import  numpy as np

Calling reindex on this Series rearranges the data according to the new index, intro‐ducing missing values if any index values were not already present

In [2]:
obj2 = obj.reindex(['a', 'b', 'c', 'd', 'e'])
obj2

a   -5.3
b    7.2
c    3.6
d    4.5
e    NaN
dtype: float64

For ordered data like time series, it may be desirable to do some interpolation or fill‐ing  of  values  when  reindexing.  The  method  option  allows  us  to  do  this (either ffill/bfill)

In [3]:
obj3 = pd.Series(['blue', 'purple', 'yellow'], index=[0, 2, 4])
obj3

0      blue
2    purple
4    yellow
dtype: object

In [4]:
obj3.reindex(range(6), method='ffill') #forward fill ob
obj3


0      blue
2    purple
4    yellow
dtype: object

With  DataFrame,  reindex  can  alter  either  the  (row)  index,  columns,  or  both.  Whenpassed only a sequence, it reindexes the rows in the result

In [5]:
yo=frame = pd.DataFrame(np.arange(9).reshape((3, 3)), index=['a','c', 'd'],columns=['Ohio', 'Texas', 'California'])
yo

Unnamed: 0,Ohio,Texas,California
a,0,1,2
c,3,4,5
d,6,7,8


The columns can be reindexed with the columns keyword

In [6]:
states = ['Texas', 'Utah', 'California']
yo.reindex(columns=states)

Unnamed: 0,Texas,Utah,California
a,1,,2
c,4,,5
d,7,,8


<h3>Dropping Entries from an Axis

In [7]:
data = pd.DataFrame(np.arange(16).reshape((4, 4))
                     ,index=['Ohio', 'Colorado', 'Utah', 'New York']
                     ,columns=['one', 'two', 'three', 'four'])

Drop method will return a new object with the indicated value or values deleted from an axis.

Calling drop with a sequence of labels will drop values from the row labels

In [8]:
data.drop(['Colorado', 'Ohio'])

Unnamed: 0,one,two,three,four
Utah,8,9,10,11
New York,12,13,14,15


You can drop values from the columns by passing axis=1 or axis='columns'

In [9]:
data.drop('two', axis=1)

Unnamed: 0,one,three,four
Ohio,0,2,3
Colorado,4,6,7
Utah,8,10,11
New York,12,14,15


You can manipulate an object in-place without returning a new object

In [10]:
data.drop('Utah', inplace=True)
data

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7
New York,12,13,14,15


<h3> Indexing | Selection | Fitering

Note
* Slicing  with  labels  behaves  differently  than  normal  Python  slicing  in  that  the  end‐point is inclusive.

In [11]:
obj = pd.Series(np.arange(5.), index=['a', 'b', 'c', 'd','e'])

In [12]:
obj

a    0.0
b    1.0
c    2.0
d    3.0
e    4.0
dtype: float64

In [13]:
obj[2:3]

c    2.0
dtype: float64

In [14]:
obj['c':'d']

c    2.0
d    3.0
dtype: float64

In [15]:
data = pd.DataFrame(np.arange(16).reshape((4, 4))
                    ,index=['Ohio', 'Colorado', 'Utah', 'New York']
                    ,columns=['one', 'two', 'three', 'four'])
data

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


In [16]:
data[:2]

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7


In [17]:
data[data['three'] > 5]

Unnamed: 0,one,two,three,four
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


Indexing with a boolean DataFrame, such as one produced by ascalar comparison

In [18]:
data>5

Unnamed: 0,one,two,three,four
Ohio,False,False,False,False
Colorado,False,False,True,True
Utah,True,True,True,True
New York,True,True,True,True


<h3>Selection with loc | iloc

They  enable  you  to  select  a  subset  of  the  rows  and  columns  from  a DataFrame  with  NumPy-like  notation  using  either:  

1. Axis  labels  (loc) 

2. Integers(iloc)

In [19]:
data.loc['Colorado', ['two', 'three']]

two      5
three    6
Name: Colorado, dtype: int32

In [20]:
data.iloc[1, [1,2]]

two      5
three    6
Name: Colorado, dtype: int32

Both indexing functions work with slices in addition to single labels or lists of labels

In [21]:
data.loc[:'Utah', 'two'][data.two >= 5]

Colorado    5
Utah        9
Name: two, dtype: int32

In [22]:
data.iloc[:, :3][data.three > 3]

Unnamed: 0,one,two,three
Colorado,4,5,6
Utah,8,9,10
New York,12,13,14


Here  we  have  an  index  containing  0,  1,  2,but inferring what the user wants (label-based indexing or position-based) is difficult,so the error occurs.If you have an axis index containing integers, data selection will always be label-oriented

In [23]:
ser = pd.Series(np.arange(3.))
ser
ser[-1]

KeyError: -1

On the other hand, with a non-integer index, there is no potential for ambiguity

In [24]:
ser2 = pd.Series(np.arange(3.), index=['a', 'b', 'c'])
ser2[-1]

2.0

<b> Arithmetic and Data Alignment

When you are adding together objects, if any index pairs are not the same, the respective index in the result will be the union of the index pairs

This is similar to an automatic outerjoin on the index labels

The internal data alignment introduces missing values in the label locations that don’toverlap

In [25]:
s1 = pd.Series([7.3, -2.5, 3.4, 1.5], index=['a', 'c', 'd', 'e'])
s2 = pd.Series([-2.1, 3.6, -1.5, 4, 3.1],index=['a', 'c', 'e', 'f', 'g'])
s1+s2

a    5.2
c    1.1
d    NaN
e    0.0
f    NaN
g    NaN
dtype: float64

In the case of DataFrame, alignment is performed on both the rows and the columns.

In [39]:
df1 = pd.DataFrame(np.arange(9.).reshape((3, 3)), columns=list('bcd'),index=['Ohio', 'Texas', 'Colorado'])

df2 = pd.DataFrame(np.arange(12.).reshape((4, 3)), columns=list('bde'),index=['Utah', 'Ohio', 'Texas', 'Oregon'])

y=df1+df2
y

Unnamed: 0,b,c,d,e
Colorado,,,,
Ohio,3.0,,6.0,
Oregon,,,,
Texas,9.0,,12.0,
Utah,,,,


In [40]:
df2.loc["Ohio", 'b'] = np.nan
df2

Unnamed: 0,b,d,e
Utah,0.0,1.0,2.0
Ohio,,4.0,5.0
Texas,6.0,7.0,8.0
Oregon,9.0,10.0,11.0


In [41]:
x=df1.add(df2, fill_value=5)
x

Unnamed: 0,b,c,d,e
Colorado,11.0,12.0,13.0,
Ohio,5.0,6.0,6.0,10.0
Oregon,14.0,,15.0,16.0
Texas,9.0,9.0,12.0,13.0
Utah,5.0,,6.0,7.0


When reindexing a Series or DataFrame, you can also specify a different fillvalue

In [42]:
df1.reindex(columns=df2.columns, fill_value=0)

Unnamed: 0,b,d,e
Ohio,0.0,2.0,0
Texas,3.0,5.0,0
Colorado,6.0,8.0,0


In [46]:
df1 = pd.DataFrame(np.arange(9.).reshape((3, 3)), columns=list('bcd'),index=['Ohio', 'Texas', 'Colorado'])

df2 = pd.DataFrame(np.arange(9).reshape((3, 3)), columns=list('bcd'),index=['Ohio', 'Texas', 'Colorado'])

y=df1+df2
y

Unnamed: 0,b,c,d
Ohio,0.0,2.0,4.0
Texas,6.0,8.0,10.0
Colorado,12.0,14.0,16.0


In [47]:
df1.radd(df2)

Unnamed: 0,b,c,d
Ohio,0.0,2.0,4.0
Texas,6.0,8.0,10.0
Colorado,12.0,14.0,16.0


In [48]:
df1.rdiv(df2)

Unnamed: 0,b,c,d
Ohio,,1.0,1.0
Texas,1.0,1.0,1.0
Colorado,1.0,1.0,1.0


In [50]:
df2=df2*5.1
df1.rfloordiv(df2)

Unnamed: 0,b,c,d
Ohio,,5.0,5.0
Texas,5.0,5.0,5.0
Colorado,5.0,5.0,5.0


<b>Operations between DataFrame and Series

In [54]:
arr = np.arange(12.).reshape((3, 4))
arr

array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.],
       [ 8.,  9., 10., 11.]])

In [57]:
arr-arr[1]

array([[-4., -4., -4., -4.],
       [ 0.,  0.,  0.,  0.],
       [ 4.,  4.,  4.,  4.]])

When we subtract arr[0] from arr, the subtraction is performed once for each row.This is referred to as <b>broadcasting</b>

By Default arithmetic between DataFrame and Series matches the index of the Series on the DataFrame’s columns, broadcasting down the rows.

If an index value is not found in either the DataFrame’s columns or the Series’s index,the objects will be reindexed to form the union

In [61]:
series = df1.iloc[0]
series

b    0.0
c    1.0
d    2.0
Name: Ohio, dtype: float64

In [62]:
df1

Unnamed: 0,b,c,d
Ohio,0.0,1.0,2.0
Texas,3.0,4.0,5.0
Colorado,6.0,7.0,8.0


In [60]:
df1 - series 

Unnamed: 0,b,c,d
Ohio,0.0,0.0,0.0
Texas,3.0,3.0,3.0
Colorado,6.0,6.0,6.0


If you want to instead broadcast over the columns, matching on the rows, you have touse one of the arithmetic methods

In [63]:
series3 = df1['d']

In [64]:
df1.sub(series3,axis=0)

Unnamed: 0,b,c,d
Ohio,-2.0,-1.0,0.0
Texas,-2.0,-1.0,0.0
Colorado,-2.0,-1.0,0.0
