<span style="color:deeppink">
        <h1> Index Objects </h>


In [1]:
 from pandas import Series, DataFrame

In [2]:
 import pandas as pd 

pandas’s Index objects are responsible for holding the axis labels and other metadata. Any array or other sequence of labels used when constructing a Series or DataFrame is internally converted to an Index: 

In [3]:
 obj = Series(range(3), index=['a', 'b', 'c'])

In [4]:
index = obj.index

In [5]:
obj

a    0
b    1
c    2
dtype: int64

In [6]:
 index

Index(['a', 'b', 'c'], dtype='object')

In [7]:
index[1:] 

Index(['b', 'c'], dtype='object')

In [8]:
 index[1] = 'd' 

TypeError: Index does not support mutable operations

Immutability is important so that Index objects can be safely shared among data structures: 

In [9]:
import numpy as np

In [10]:
 index = pd.Index(np.arange(3))

In [11]:
 obj2 = Series([1.5, -2.5, 0], index=index)

In [12]:
 obj2.index is index 

True

In [13]:
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],        'year': [2000, 2001, 2002, 2001, 2002],        'pop': [1.5, 1.7, 3.6, 2.4, 2.9]} 

In [14]:
frame = DataFrame(data)
frame

Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9


In [15]:
 'pop' in frame.columns 

True

In [16]:
 2003 in frame.index 

False

<span style="color:deeppink">
        <h1> Reindexing </h>


A critical method on pandas objects is reindex, which means to create a new object with the data conformed to a new index. Consider a simple example from above: 

In [17]:
 obj = Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
obj

d    4.5
b    7.2
a   -5.3
c    3.6
dtype: float64

In [18]:
obj2 = obj.reindex(['a', 'b', 'c', 'd', 'e'])
obj2

a   -5.3
b    7.2
c    3.6
d    4.5
e    NaN
dtype: float64

In [19]:
 obj.reindex(['a', 'b', 'c', 'd', 'e'], fill_value=0)

a   -5.3
b    7.2
c    3.6
d    4.5
e    0.0
dtype: float64

<span style="black">
             For ordered data like time series, it may be desirable to do some interpolation or filling of values when reindexing. The
<span style="color:violet"> 
        method 
<span style=" color:black">
       option allows us to do this, using a method such as 
<span style="color:violet"> 
        ffill
<span style=" color:black">
 which forward fills the values:

In [20]:
 obj3 = Series(['blue', 'purple', 'yellow'], index=[0, 3, 6])
obj3   

0      blue
3    purple
6    yellow
dtype: object

In [21]:
 obj3.reindex(range(8), method='ffill') 

0      blue
1      blue
2      blue
3    purple
4    purple
5    purple
6    yellow
7    yellow
dtype: object

<span style="black">
     With DataFrame, 
<span style="color:violet"> 
       reindex 
<span style=" color:black">
    can alter either the (row) index, columns, or both. When passed just a sequence, the rows are reindexed in the result: 

In [22]:
 frame = DataFrame(np.arange(9).reshape((3, 3)), index=['a', 'c', 'd'],  columns=['Ohio', 'Texas', 'California'])
frame

Unnamed: 0,Ohio,Texas,California
a,0,1,2
c,3,4,5
d,6,7,8


In [23]:
 frame2 = frame.reindex(['a', 'b', 'c', 'd'])
frame2

Unnamed: 0,Ohio,Texas,California
a,0.0,1.0,2.0
b,,,
c,3.0,4.0,5.0
d,6.0,7.0,8.0


<span style="black">
       The columns can be reindexed using the 
<span style="color:violet"> 
       columns
<span style=" color:black">
       keyword: 

In [25]:
states = ['Texas', 'Utah', 'California']

In [26]:
 frame.reindex(columns=states) 

Unnamed: 0,Texas,Utah,California
a,1,,2
c,4,,5
d,7,,8


In [27]:
 frame.ix[['a', 'b', 'c', 'd'], states] 

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  """Entry point for launching an IPython kernel.


Unnamed: 0,Texas,Utah,California
a,1.0,,2.0
b,,,
c,4.0,,5.0
d,7.0,,8.0


<span style="color:deeppink">
        <h1> Dropping entries from an axis  </h>

Dropping one or more entries from an axis is easy if you have an index array or list without those entries. As that can require a bit of munging and set logic, the drop method will return a new object with the indicated value or values deleted from an axis: 

In [28]:
 obj = Series(np.arange(5.), index=['a', 'b', 'c', 'd', 'e'])

In [29]:
 new_obj = obj.drop('a')
new_obj    

b    1.0
c    2.0
d    3.0
e    4.0
dtype: float64

With DataFrame, index values can be deleted from either axis: 

In [30]:
 data = DataFrame(np.arange(16).reshape((4, 4)),  index=['Ohio', 'Colorado', 'Utah', 'New York'],      columns=['one', 'two', 'three', 'four'])


In [31]:
 data.drop(['Colorado', 'Ohio']) 

Unnamed: 0,one,two,three,four
Utah,8,9,10,11
New York,12,13,14,15


In [32]:
 data.drop('two', axis=1) 

Unnamed: 0,one,three,four
Ohio,0,2,3
Colorado,4,6,7
Utah,8,10,11
New York,12,14,15


In [33]:
 data.drop(['two', 'four'], axis=1)

Unnamed: 0,one,three
Ohio,0,2
Colorado,4,6
Utah,8,10
New York,12,14


<span style="color:deeppink">
        <h1> Indexing, selection, and filtering  </h>

Series indexing (obj[...]) works analogously to NumPy array indexing, except you can use the Series’s index values instead of only integers. Here are some examples this: 

In [34]:
 obj = Series(np.arange(4.), index=['a', 'b', 'c', 'd'])

In [35]:
 obj['b'] 

1.0

In [36]:
 obj[1] 

1.0

In [37]:
 obj[2:4] 

c    2.0
d    3.0
dtype: float64

In [38]:
 obj[['b', 'a', 'd']] 

b    1.0
a    0.0
d    3.0
dtype: float64

In [39]:
 obj[[1, 3]] 

b    1.0
d    3.0
dtype: float64

In [40]:
 obj[obj < 2] 

a    0.0
b    1.0
dtype: float64

Slicing with labels behaves differently than normal Python slicing in that the endpoint is inclusive: 

In [41]:
 obj['b':'c'] 

b    1.0
c    2.0
dtype: float64

In [42]:
 obj['b':'c'] = 5
obj    

a    0.0
b    5.0
c    5.0
d    3.0
dtype: float64

As you’ve seen above, indexing into a DataFrame is for retrieving one or more columns either with a single value or sequence:

In [43]:
 data = DataFrame(np.arange(16).reshape((4, 4)),  index=['Ohio', 'Colorado', 'Utah', 'New York'],  columns=['one', 'two', 'three', 'four'])
data

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


In [44]:
 data[['three', 'one']] 

Unnamed: 0,three,one
Ohio,2,0
Colorado,6,4
Utah,10,8
New York,14,12


In [45]:
 data[['three', 'one']] 

Unnamed: 0,three,one
Ohio,2,0
Colorado,6,4
Utah,10,8
New York,14,12


In [46]:
 data[:2] 

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7


In [47]:
 data < 5 

Unnamed: 0,one,two,three,four
Ohio,True,True,True,True
Colorado,True,False,False,False
Utah,False,False,False,False
New York,False,False,False,False


In [48]:
 data[data < 5] = 0
data    

Unnamed: 0,one,two,three,four
Ohio,0,0,0,0
Colorado,0,5,6,7
Utah,8,9,10,11
New York,12,13,14,15
