<a href="https://colab.research.google.com/github/shreejitp/Python/blob/master/Pandas_part1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
import pandas as pd 


In [0]:
from pandas import Series,DataFrame

#SERIES 

In [0]:
# A series is a one dimensional array-like object containing a sequence of values 
# and an associated array of data labels, called its index 
obj=pd.Series([4,7,-5,3])
obj

0    4
1    7
2   -5
3    3
dtype: int64

In [0]:
# A default index was created above 0 to n-1 where n is the length of the series 
obj.values

array([ 4,  7, -5,  3])

In [0]:
obj.index #Just like range(4)

RangeIndex(start=0, stop=4, step=1)

In [0]:
#It is desirable to create a Series with an index identifying each data point with a label
obj2 = pd.Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c'])

In [0]:
obj2

d    4
b    7
a   -5
c    3
dtype: int64

In [0]:
#Using Index values to access values in a series  
obj2['a']
obj2[['a','b']] # Accessing multiple values 

a   -5
b    7
dtype: int64

Using NumPy functions or NumPy-like operations, such as filtering with a boolean array, scalar multiplication, or applying math functions, will preserve the index-value link

In [0]:
obj2[obj2 > 0]

d    4
b    7
c    3
dtype: int64

In [0]:
obj2*2

d     8
b    14
a   -10
c     6
dtype: int64

In [0]:
import numpy as np
np.exp(obj2)

d      54.598150
b    1096.633158
a       0.006738
c      20.085537
dtype: float64

Another way to think about a Series is as a fixed-length, ordered dict, as it is a map‐ ping of index values to data values. It can be used in many contexts where you might use a dict

In [0]:
#Creating a series from a Dictionary
sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
obj3=pd.Series(sdata)
obj3

Ohio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64

In [0]:
obj3.index

Index(['Ohio', 'Texas', 'Oregon', 'Utah'], dtype='object')

In [0]:
# The results in the Series appear in the order they were defined in Dict 
# How do you make them appear the way you want ?
# Two ways are shown below 
states=['Ohio','Oregon','Texas']
#obj4=pd.Series(sdata,index=states)
obj4=pd.Series(sdata,index=['Ohio','Oregon','Texas','California'])
obj4

Ohio          35000.0
Oregon        16000.0
Texas         71000.0
California        NaN
dtype: float64

In [0]:
#Checking for NULL values in Series Indexes  
pd.isnull(obj4)
#pd.notnull(obj4)

Ohio          False
Oregon        False
Texas         False
California     True
dtype: bool

In [0]:
obj4.isnull

<bound method Series.isnull of Ohio          35000.0
Oregon        16000.0
Texas         71000.0
California        NaN
dtype: float64>

A useful Series feature for many applications is that it automatically aligns by index
label in arithmetic operations

In [0]:
#If you have experience with databases, you can think about this as being similar to a join operation
obj3 + obj4

California         NaN
Ohio           70000.0
Oregon         32000.0
Texas         142000.0
Utah               NaN
dtype: float64

In [0]:
#Both the Series object itself and its index have a name attribute, which integrates with other key areas of pandas functionality
obj4.name='population'
obj4.index.name='state'
obj4

state
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
California        NaN
Name: population, dtype: float64

Altering the Index of Series 

In [0]:
obj.index = ['Bob', 'Steve', 'Jeff', 'Ryan']
obj

Bob      4
Steve    7
Jeff    -5
Ryan     3
dtype: int64

#Dataframe 

While a DataFrame is physically two-dimensional, you can use it to represent higher dimensional data in a tabular format using hier‐ archical indexing, a subject we will discuss in Chapter 8 and an ingredient in some of the more advanced data-handling features in pandas

In [0]:
#Creating a dataframe 
# The most common method is from a dict of equal length lists of Numpy Arrays
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],
            'year': [2000, 2001, 2002, 2001, 2002, 2003],
            'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}

# If you notice, it is pretty much constructed the same way a series is constructed
# Instead of keys, you have lists


In [0]:
frame=pd.DataFrame(data)

In [0]:
# Look at the index for the rows. Columns all share a common index 
frame

Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9
5,Nevada,2003,3.2


In [0]:
#Head selects the first 5 rows 
frame.head()

Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9


In [0]:
#Rearranging the columns 
pd.DataFrame(data, columns=['year', 'state', 'pop'])

Unnamed: 0,year,state,pop
0,2000,Ohio,1.5
1,2001,Ohio,1.7
2,2002,Ohio,3.6
3,2001,Nevada,2.4
4,2002,Nevada,2.9
5,2003,Nevada,3.2


In [0]:
#If you pass a column that isn’t contained in the dict, 
#it will appear with missing values in the result
frame2 = pd.DataFrame(data, columns=['year', 'state', 'pop', 'debt'],
                              index=['one', 'two', 'three', 'four',
                                    'five', 'six'])

In [0]:
frame2

Unnamed: 0,year,state,pop,debt
one,2000,Ohio,1.5,
two,2001,Ohio,1.7,
three,2002,Ohio,3.6,
four,2001,Nevada,2.4,
five,2002,Nevada,2.9,
six,2003,Nevada,3.2,


Retrieving a specific column

In [0]:
#A column in a DataFrame can be retrieved as a Series either by dict-like notation or by attribute
frame2['year']

one      2000
two      2001
three    2002
four     2001
five     2002
six      2003
Name: year, dtype: int64

In [0]:
# Works only when column name is a valid Python Variable Name 
frame2.year

one      2000
two      2001
three    2002
four     2001
five     2002
six      2003
Name: year, dtype: int64

Retrieving a row using loc 

In [0]:
# See how the name attribute is automatically set 
frame2.loc['three']

year     2002
state    Ohio
pop       3.6
debt      NaN
Name: three, dtype: object

Columns can be modified by assignment. For example, introducing a debt column

In [0]:
frame2['debt']=16.5

In [0]:
frame2

Unnamed: 0,year,state,pop,debt
one,2000,Ohio,1.5,16.5
two,2001,Ohio,1.7,16.5
three,2002,Ohio,3.6,16.5
four,2001,Nevada,2.4,16.5
five,2002,Nevada,2.9,16.5
six,2003,Nevada,3.2,16.5


In [0]:
#Using ordered values in debt column 
frame2['debt']=np.arange(6.)
frame2

Unnamed: 0,year,state,pop,debt
one,2000,Ohio,1.5,0.0
two,2001,Ohio,1.7,1.0
three,2002,Ohio,3.6,2.0
four,2001,Nevada,2.4,3.0
five,2002,Nevada,2.9,4.0
six,2003,Nevada,3.2,5.0


In [0]:
#When you are assigning lists or arrays to a column, the values length must match the length of the dataframe 
#If you assign a Series, its labels will be realigned exactly to the DataFrame’s index, inserting missing values in any holes
val = pd.Series([-1.2, -1.5, -1.7], index=['two', 'four', 'five'])


In [0]:
frame2['debt'] = val
frame2 # Null values for index which didn't have any values assigned 

Unnamed: 0,year,state,pop,debt
one,2000,Ohio,1.5,
two,2001,Ohio,1.7,-1.2
three,2002,Ohio,3.6,
four,2001,Nevada,2.4,-1.5
five,2002,Nevada,2.9,-1.7
six,2003,Nevada,3.2,


In [0]:
#Assigning a column that doesn’t exist will create a new column. Creating a new state eastern 
#The del keyword will delete columns as with a dict
frame2['eastern'] = frame2.state == 'Ohio'
frame2

## NOTE-----> New columns cannot be created with the frame2.eastern syntax


Unnamed: 0,year,state,pop,debt,eastern
one,2000,Ohio,1.5,,True
two,2001,Ohio,1.7,-1.2,True
three,2002,Ohio,3.6,,True
four,2001,Nevada,2.4,-1.5,False
five,2002,Nevada,2.9,-1.7,False
six,2003,Nevada,3.2,,False


In [0]:
# The del method can then be used to remove this column
del frame2['eastern'] 

In [0]:
frame2.columns

Index(['year', 'state', 'pop', 'debt'], dtype='object')

#Another common form of data is a nested dict of dicts

In [0]:
pop = {'Nevada': {2001: 2.4, 2002: 2.9},
       'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}
# This way you can define the row indexes        

In [0]:
frame3 = pd.DataFrame(pop)
frame3

Unnamed: 0,Nevada,Ohio
2001,2.4,1.7
2002,2.9,3.6
2000,,1.5


Transposing a dataframe 

In [0]:
#You can transpose the DataFrame (swap rows and columns) with similar syntax to a NumPy array
# The keys in the inner dicts are combined and sorted to form the index in the result. See how you see 2000, 2001....
# This isn't true if an explicit index is specified 
frame3.T

Unnamed: 0,2001,2002,2000
Nevada,2.4,2.9,
Ohio,1.7,3.6,1.5


In [0]:
pd.DataFrame(pop,index=[2001, 2002, 2003])

Unnamed: 0,Nevada,Ohio
2001,2.4,1.7
2002,2.9,3.6
2003,,


Dicts of Series 

For a complete list of things you can pass the DataFrame constructor, see Table 5-1 in the book 

In [0]:
pdata = {'Ohio': frame3['Ohio'][:-1],
         'Nevada': frame3['Nevada'][:2]}
        

In [0]:
pd.DataFrame(pdata)

Unnamed: 0,Ohio,Nevada
2001,1.7,2.4
2002,3.6,2.9


In [0]:
frame3.index.name = 'year'
frame3.columns.name = 'state'
frame3

state,Nevada,Ohio
year,Unnamed: 1_level_1,Unnamed: 2_level_1
2001,2.4,1.7
2002,2.9,3.6
2000,,1.5


In [0]:
# As with Series, the values attribute returns the data contained in the DataFrame as a two-dimensional ndarray
frame3.values

array([[2.4, 1.7],
       [2.9, 3.6],
       [nan, 1.5]])

In [0]:
frame2.values

array([[2000, 'Ohio', 1.5, nan],
       [2001, 'Ohio', 1.7, -1.2],
       [2002, 'Ohio', 3.6, nan],
       [2001, 'Nevada', 2.4, -1.5],
       [2002, 'Nevada', 2.9, -1.7],
       [2003, 'Nevada', 3.2, nan]], dtype=object)

#Index Objects 

In [0]:
  obj = pd.Series(range(3), index=['a', 'b', 'c'])

In [0]:
obj

a    0
b    1
c    2
dtype: int64

In [0]:
index=obj.index
index    # This is an index Object # Means Index is an object within Itself 

Index(['a', 'b', 'c'], dtype='object')

In [0]:
#Accessing Index Object 
index[1:]

Index(['b', 'c'], dtype='object')

In [0]:
index[2:]

Index(['c'], dtype='object')

In [0]:
# Index objects are immutable and thus can’t be modified by the user
index[1]='d' #TypeError 

In [0]:
'Ohio' in frame3.columns
# 2003 in frame3.index

True

Unlike Python sets, a pandas Index can contain duplicate **labels** **bold text**

In [0]:
dup_labels = pd.Index(['foo', 'foo', 'bar', 'bar'])
dup_labels

Index(['foo', 'foo', 'bar', 'bar'], dtype='object')

**Each Index has a number of methods and properties for set logic, which answer other common questions about the data it contains. Some useful ones are summarized in Table 5-2** LIKE -- APPEND,DIFFERENCE,INTERSECTION,UNION


# **5.2 ESSENTIAL FUNCTIONALITY**

Reindexing

In [0]:
obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])

In [0]:
obj

d    4.5
b    7.2
a   -5.3
c    3.6
dtype: float64

In [0]:
obj2 = obj.reindex(['a', 'b', 'c', 'd', 'e'])

In [0]:
obj2

a   -5.3
b    7.2
c    3.6
d    4.5
e    NaN
dtype: float64

For ordered data like time series, it may be desirable to do some interpolation or fill‐ ing of values when reindexing. The method option allows us to do this, using a method such as ffill, which forward-fills the values

In [0]:
obj3 = pd.Series(['blue', 'purple', 'yellow'], index=[0, 2, 4])


In [0]:
obj3.reindex(range(6), method='ffill')

0      blue
1      blue
2    purple
3    purple
4    yellow
5    yellow
dtype: object

In [0]:
# Another popular way of creating a dataframe 
import numpy as np
frame = pd.DataFrame(np.arange(9).reshape((3, 3)),index=['a', 'c', 'd'],columns=['Ohio', 'Texas', 'California'])

In [0]:
frame

Unnamed: 0,Ohio,Texas,California
a,0,1,2
c,3,4,5
d,6,7,8


In [0]:
#Reindexing using index 
frame.reindex(['a', 'b', 'c', 'd'])

Unnamed: 0,Ohio,Texas,California
a,0.0,1.0,2.0
b,,,
c,3.0,4.0,5.0
d,6.0,7.0,8.0


In [0]:
states = ['Texas', 'Utah', 'California']

In [0]:
#Reindexing using columns # Note this is not the same as transpose 
frame.reindex(columns=states)

Unnamed: 0,Texas,Utah,California
a,1,,2
c,4,,5
d,7,,8


Dropping entries from an axis 


In [0]:
#Dropping is easy if you have an array or list of indexes 
obj=pd.Series(np.arange(5.),index=['a','b','c','d','e'])

In [0]:
obj

a    0.0
b    1.0
c    2.0
d    3.0
e    4.0
dtype: float64

In [0]:
new_obj=obj.drop('c')
new_obj

a    0.0
b    1.0
d    3.0
e    4.0
dtype: float64

In [0]:
#Dropping indexes c and d # Here there is only 1 axis and hence the row is only removed 
obj.drop(['c','d'])

a    0.0
b    1.0
e    4.0
dtype: float64

In [0]:
#With a dataframe, Index values can be removed from either axis 
data = pd.DataFrame(np.arange(16).reshape((4, 4)),  ### IF you remove reshape, it will turn into a Series 
index=['Ohio', 'Colorado', 'Utah', 'New York'],
columns=['one', 'two', 'three', 'four'])

In [0]:
data

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


In [0]:
data.drop(['Colorado', 'Ohio'])

Unnamed: 0,one,two,three,four
Utah,8,9,10,11
New York,12,13,14,15


In [0]:
#You can drop values from the columns by passing axis=1 or axis='columns':
# Note that the data still exists even when Ohio and Colorado were dropped 
data.drop('two',axis=1)#data.drop('two',axis='columns')

Unnamed: 0,one,three,four
Ohio,0,2,3
Colorado,4,6,7
Utah,8,10,11
New York,12,14,15


**Using the inplace argument to make changes directly in the object**

In [0]:
obj.drop('c', inplace=True)

In [0]:
## See how c has been dropped from the object 
obj

a    0.0
b    1.0
d    3.0
e    4.0
dtype: float64

Indexing, Selection and Filtering

In [0]:
obj = pd.Series(np.arange(4.), index=['a', 'b', 'c', 'd'])

In [0]:
obj['a']

0.0

In [0]:
obj[0]

0.0

In [0]:
obj[2:4]

c    2.0
d    3.0
dtype: float64

In [0]:
obj[['b', 'a', 'd']]

b    1.0
a    0.0
d    3.0
dtype: float64

In [0]:
obj[obj < 2]

a    0.0
b    1.0
dtype: float64

In [0]:
#Slicing with labels behaves differently than normal Python slicing in that the end‐point is inclusive
obj['b':'c']

b    1.0
c    2.0
dtype: float64

In [0]:
obj['b':'c'] = 5
obj

a    0.0
b    5.0
c    5.0
d    3.0
dtype: float64

Indexing Dataframe

In [0]:
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
index=['Ohio', 'Colorado', 'Utah', 'New York'],
columns=['one', 'two', 'three', 'four'])

In [0]:
#data['two']
data[['two','three']]

Unnamed: 0,two,three
Ohio,1,2
Colorado,5,6
Utah,9,10
New York,13,14


In [0]:
data[:2] # First 3 columns 

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7


In [0]:
#Subsetting based on some condition 
data[data['three'] > 5]

Unnamed: 0,one,two,three,four
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


In [0]:
data < 5 

Unnamed: 0,one,two,three,four
Ohio,True,True,True,True
Colorado,True,False,False,False
Utah,False,False,False,False
New York,False,False,False,False


Selection with loc and iloc 

In [0]:
data

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


In [0]:
#Using loc
# Index first and then Columns [Actually index names are given ]
data.loc['Colorado', ['two', 'three']]

two      5
three    6
Name: Colorado, dtype: int64

In [0]:
#Using for loop and printing actual values 
for x in data.loc['Colorado', ['two', 'three']]:
  print(x)

5
6


In [0]:
#Using iloc
# Same as loc, iloc doesn't require index names 
# Look for the name attribute 
data.iloc[2, [3, 0, 1]]

four    11
one      8
two      9
Name: Utah, dtype: int64

In [0]:
#Getting first two rows and first two columns 
data.iloc[[0,1],[0,1]]

Unnamed: 0,one,two
Ohio,0,1
Colorado,4,5


In [0]:
data.loc[:'Utah', 'two']

Ohio        1
Colorado    5
Utah        9
Name: two, dtype: int64

In [0]:
data.iloc[:, :3][data.three > 5]

Unnamed: 0,one,two,three
Ohio,0,1,2
Colorado,4,5,6
Utah,8,9,10
New York,12,13,14


Integer Indexes

In [0]:
ser = pd.Series(np.arange(3.))
ser

0    0.0
1    1.0
2    2.0
dtype: float64

In [0]:
ser[-1] #Error as the index names are integer 

In [0]:
ser2 = pd.Series(np.arange(3.), index=['a', 'b', 'c'])
ser2[-1]

2.0

In [0]:
ser[:1]

0    0.0
dtype: float64

In [0]:
ser.loc[:1]

0    0.0
1    1.0
dtype: float64

**Arithematic and Data Alignment**


In [0]:
s1 = pd.Series([7.3, -2.5, 3.4, 1.5], index=['a', 'c', 'd', 'e'])

In [0]:
s2 = pd.Series([-2.1, 3.6, -1.5, 4, 3.1],index=['a', 'c', 'e', 'f', 'g'])

In [0]:
# More like an outer join. Similar Indexes are added. Other are NULL
s1 + s2

a    5.2
c    1.1
d    NaN
e    0.0
f    NaN
g    NaN
dtype: float64

In [0]:
# In the case of DataFrame, alignment is performed on both the rows and the columns
df1 = pd.DataFrame(np.arange(9.).reshape((3, 3)), columns=list('bcd'),
       .....:                    index=['Ohio', 'Texas', 'Colorado'])
df2 = pd.DataFrame(np.arange(12.).reshape((4, 3)), columns=list('bde'),
       .....:                    index=['Utah', 'Ohio', 'Texas', 'Oregon'])


In [0]:
df1

Unnamed: 0,b,c,d
Ohio,0.0,1.0,2.0
Texas,3.0,4.0,5.0
Colorado,6.0,7.0,8.0


In [0]:
df2

Unnamed: 0,b,d,e
Utah,0.0,1.0,2.0
Ohio,3.0,4.0,5.0
Texas,6.0,7.0,8.0
Oregon,9.0,10.0,11.0


In [0]:
# Adding these together returns a DataFrame whose index and columns are the unions of the ones in each DataFrame
df1 + df2

Unnamed: 0,b,c,d,e
Colorado,,,,
Ohio,3.0,,6.0,
Oregon,,,,
Texas,9.0,,12.0,
Utah,,,,


Arithematic Methods with fill values 

In arithmetic operations between differently indexed objects, you might want to fill with a special value, like 0, when an axis label is found in one object but not the other

In [0]:
df1 = pd.DataFrame(np.arange(12.).reshape((3, 4)),
       .....:                    columns=list('abcd'))
df2 = pd.DataFrame(np.arange(20.).reshape((4, 5)),
       .....:                    columns=list('abcde'))
df2.loc[1, 'b'] = np.nan
df1

Unnamed: 0,a,b,c,d
0,0.0,1.0,2.0,3.0
1,4.0,5.0,6.0,7.0
2,8.0,9.0,10.0,11.0


In [0]:
df2

Unnamed: 0,a,b,c,d,e
0,0.0,1.0,2.0,3.0,4.0
1,5.0,,7.0,8.0,9.0
2,10.0,11.0,12.0,13.0,14.0
3,15.0,16.0,17.0,18.0,19.0


In [0]:
df1 + df2
#What if you want to treat the NaN in df2 as 0

Unnamed: 0,a,b,c,d,e
0,0.0,2.0,4.0,6.0,
1,9.0,,13.0,15.0,
2,18.0,20.0,22.0,24.0,
3,,,,,


In [0]:
df1.add(df2,fill_value=0)
#Table 5.5 lists other flexible arithematic methods 
### sub,div, floordiv, mul, pow

Unnamed: 0,a,b,c,d,e
0,0.0,2.0,4.0,6.0,4.0
1,9.0,5.0,13.0,15.0,9.0
2,18.0,20.0,22.0,24.0,14.0
3,15.0,16.0,17.0,18.0,19.0


Operations between a dataframe and Series 

In [0]:
arr = np.arange(12.).reshape((3, 4))

In [0]:
arr

array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.],
       [ 8.,  9., 10., 11.]])

In [0]:
arr[0]

array([0., 1., 2., 3.])

In [0]:
#When we subtract arr[0] from arr, the subtraction is performed once for each row. This is referred to as broadcasting and is explained in more detail 
#as it relates to gen‐ eral NumPy arrays in Appendix A
arr - arr[0]

array([[0., 0., 0., 0.],
       [4., 4., 4., 4.],
       [8., 8., 8., 8.]])

In [0]:
frame = pd.DataFrame(np.arange(12.).reshape((4, 3)),
       .....:                      columns=list('bde'),
       .....:                      index=['Utah', 'Ohio', 'Texas', 'Oregon'])

In [0]:
#Dataframe to Series 
series = frame.iloc[0]

In [0]:
frame - series

Unnamed: 0,b,d,e
Utah,0.0,0.0,0.0
Ohio,3.0,3.0,3.0
Texas,6.0,6.0,6.0
Oregon,9.0,9.0,9.0


Function application and Mapping 

Another frequent operation is applying a function on one-dimensional arrays to each column or row. DataFrame’s apply method does exactly this

In [0]:
f = lambda x: x.max() - x.min()

In [0]:
frame.apply(f)

b    9.0
d    9.0
e    9.0
dtype: float64

In [0]:
#If you pass axis='columns' to apply, the function will be invoked once per row instead
frame.apply(f, axis='columns')

Utah      2.0
Ohio      2.0
Texas     2.0
Oregon    2.0
dtype: float64

In [0]:
def f(x):
  return pd.Series([x.min(), x.max()], index=['min', 'max'])

In [0]:
frame.apply(f)

Unnamed: 0,b,d,e
min,0.0,1.0,2.0
max,9.0,10.0,11.0


Sorting and Ranking 

In [0]:
  import pandas as pd 
  import numpy as np 
 # obj = pd.Series(range(4), index=['d', 'a', 'b', 'c'])

In [0]:
#Sorting by index 
# sort_index() function 
obj.sort_index()

a    1
b    2
c    3
d    0
dtype: int64

In [0]:
 #With a DataFrame, you can sort by index on either axis
frame = pd.DataFrame(np.arange(8).reshape((2, 4)),
index=['three', 'one'],
columns=['d', 'a', 'b', 'c'])

frame.sort_index()

Unnamed: 0,d,a,b,c
one,4,5,6,7
three,0,1,2,3


In [0]:
#axis is 0 by default which is for Index 
frame.sort_index(axis=1)

Unnamed: 0,a,b,c,d
three,1,2,3,0
one,5,6,7,4


In [0]:
#By default sorting is ASC, using the ascending parameter 
frame.sort_index(axis=1, ascending=False)

Unnamed: 0,d,c,b,a
three,0,3,2,1
one,4,7,6,5


In [0]:
obj = pd.Series([4, 7, -3, 2])
obj.sort_values()

2   -3
3    2
0    4
1    7
dtype: int64

In [0]:
#Any missing values are sorted to the end of the Series by default
obj = pd.Series([4, np.nan, 7, np.nan, -3, 2])
obj.sort_values()

4   -3.0
5    2.0
0    4.0
2    7.0
1    NaN
3    NaN
dtype: float64

When sorting a DataFrame, you can use the data in one or more columns as the sort keys. To do so, pass one or more column names to the by option of sort_values

In [0]:
frame = pd.DataFrame({'b': [4, 7, -3, 2], 'a': [0, 1, 0, 1]})
frame

Unnamed: 0,b,a
0,4,0
1,7,1
2,-3,0
3,2,1


In [0]:
frame.sort_values(by='b')

Unnamed: 0,b,a
2,-3,0
3,2,1
0,4,0
1,7,1


In [0]:
#To sort by multiple columns, pass a list of names
frame.sort_values(by=['a', 'b'])

Unnamed: 0,b,a
2,-3,0
0,4,0
3,2,1
1,7,1


Ranking 

In [0]:
obj = pd.Series([7, -5, 7, 4, 2, 0, 4])
obj.rank()

0    6.5
1    1.0
2    6.5
3    4.5
4    3.0
5    2.0
6    4.5
dtype: float64

In [0]:
# One way of breaking the ties 
#Here, instead of using the average rank 6.5 for the entries 0 and 2, 
#they instead have been set to 6 and 7 because label 0 precedes label 2 in the data

#There are a number of tie breaking methods- Min max First Average Dense 
obj.rank(method='first')

0    6.0
1    1.0
2    7.0
3    4.0
4    3.0
5    2.0
6    5.0
dtype: float64

In [0]:
# Assign tie values the maximum rank in the group
obj.rank(ascending=False, method='max')

0    2.0
1    7.0
2    2.0
3    4.0
4    5.0
5    6.0
6    4.0
dtype: float64

DataFrame can compute ranks over the rows or the columns

In [0]:
frame = pd.DataFrame({'b': [4.3, 7, -3, 2], 'a': [0, 1, 0, 1],
       .....:                       'c': [-2, 5, 8, -2.5]})

In [0]:
frame.rank(axis='columns')

Unnamed: 0,b,a,c
0,3.0,2.0,1.0
1,3.0,1.0,2.0
2,1.0,2.0,3.0
3,3.0,2.0,1.0
