Sorting a dataset by some criterion is another important built-in operation. To sort
lexicographically by row or column index, use the sort_index method, which returns
a new, sorted object:

In [2]:
import pandas as pd
from pandas import Series, DataFrame
import numpy as np

In [4]:
#Series
obj = pd.Series(range(4), index=['d', 'a', 'b', 'c'])
obj

d    0
a    1
b    2
c    3
dtype: int64

In [5]:
obj.sort_index()

a    1
b    2
c    3
d    0
dtype: int64

In [10]:
#DataFrame
frame = pd.DataFrame(np.arange(8).reshape((2, 4)),  index=['three', 'one'], columns=['d', 'a', 'b', 'c'])
frame

Unnamed: 0,d,a,b,c
three,0,1,2,3
one,4,5,6,7


In [9]:
frame.sort_index()

Unnamed: 0,d,a,b,c
one,4,5,6,7
three,0,1,2,3


In [11]:
frame.sort_index(axis=1)

Unnamed: 0,a,b,c,d
three,1,2,3,0
one,5,6,7,4


In [12]:
frame.sort_index(axis=1, ascending=False) #The data is sorted in ascending order by default

Unnamed: 0,d,c,b,a
three,0,3,2,1
one,4,7,6,5


In [5]:
#To sort a Series by its values, use its sort_values method:
obj = pd.Series([4, 7, np.nan, -3, 2, np.nan])
#obj
obj.sort_values()
#Any missing values are sorted to the end of the Series by default

3   -3.0
4    2.0
0    4.0
1    7.0
2    NaN
5    NaN
dtype: float64

When sorting a DataFrame, you can use the data in one or more columns as the sort
keys. To do so, pass one or more column names to the by option of sort_values

In [16]:
frame = pd.DataFrame({'b': [4, 7, -3, 2], 'a': [0, 1, 0, 1]})
frame

Unnamed: 0,b,a
0,4,0
1,7,1
2,-3,0
3,2,1


In [17]:
frame.sort_values(by='b')

Unnamed: 0,b,a
2,-3,0
3,2,1
0,4,0
1,7,1


In [18]:
frame.sort_values(by=['a','b'])

Unnamed: 0,b,a
2,-3,0
0,4,0
3,2,1
1,7,1


Ranking assigns ranks from one through the number of valid data points in an array.
The rank methods for Series and DataFrame are the place to look; by default rank
breaks ties by assigning each group the mean rank

In [19]:
obj = pd.Series([7, -5, 7, 4, 2, 0, 4])
obj

0    7
1   -5
2    7
3    4
4    2
5    0
6    4
dtype: int64

In [20]:
#first values in series are sorted and rank is assigned as per the location.
#for same values by default average of location will become rank
obj.rank()

0    6.5
1    1.0
2    6.5
3    4.5
4    3.0
5    2.0
6    4.5
dtype: float64

In [21]:
## Assign tie values the maximum rank in the group
obj.rank(ascending=False, method='max')

0    2.0
1    7.0
2    2.0
3    4.0
4    5.0
5    6.0
6    4.0
dtype: float64

DataFrame can compute ranks over the rows or the columns

In [23]:
frame = pd.DataFrame({'b': [4.3, 7, -3, 2], 'a': [0, 1, 0, 1],
'c': [-2, 5, 8, -2.5]})
frame

Unnamed: 0,b,a,c
0,4.3,0,-2.0
1,7.0,1,5.0
2,-3.0,0,8.0
3,2.0,1,-2.5


In [24]:
frame.rank(axis='columns')

Unnamed: 0,b,a,c
0,3.0,2.0,1.0
1,3.0,1.0,2.0
2,1.0,2.0,3.0
3,3.0,2.0,1.0


Axis Indexes with Duplicate Labels

In [25]:
obj = pd.Series(range(5), index=['a', 'a', 'b', 'b', 'c'])
obj

a    0
a    1
b    2
b    3
c    4
dtype: int64

In [29]:
#The index’s is_unique property can tell you whether its labels are unique or not:
obj.index.is_unique

False

Data selection is one of the main things that behaves differently with duplicates.
Indexing a label with multiple entries returns a Series, while single entries return a
scalar value

In [27]:
obj['a']

a    0
a    1
dtype: int64

In [28]:
obj['c']

4

The same logic extends to indexing rows in a DataFrame

In [30]:
df = pd.DataFrame(np.random.randn(4, 3), index=['a', 'a', 'b', 'b'])
df

Unnamed: 0,0,1,2
a,0.279547,-0.538628,1.78246
a,-1.64626,0.704786,0.847498
b,1.765474,1.527301,0.106846
b,-1.272557,2.168799,-1.294881


In [32]:
df.loc['a']

Unnamed: 0,0,1,2
a,0.279547,-0.538628,1.78246
a,-1.64626,0.704786,0.847498
