In [2]:
import numpy as np 
import pandas as pd 

In [3]:
obj = pd.Series(np.arange(4), index=["d", "a", "b", "c"])
obj

d    0
a    1
b    2
c    3
dtype: int64

In [4]:
# sort lexicographically by row or column label means arranging the labels (index or column names) in dictionary order
obj.sort_index()

a    1
b    2
c    3
d    0
dtype: int64

In [8]:
# with a dataFrame, sort by index on either axis:

frame = pd.DataFrame(np.arange(8).reshape((2,4)), index=['three', 'one'], columns=['d', 'a', 'b', 'c'])
frame

Unnamed: 0,d,a,b,c
three,0,1,2,3
one,4,5,6,7


In [9]:
frame.sort_index()

Unnamed: 0,d,a,b,c
one,4,5,6,7
three,0,1,2,3


In [11]:
frame.sort_index(axis="columns")

Unnamed: 0,a,b,c,d
three,1,2,3,0
one,5,6,7,4


The data is sorted in ascending order by default but can be sorted in descending
order, too

In [12]:
frame.sort_index(axis='columns', ascending=False)

Unnamed: 0,d,c,b,a
three,0,3,2,1
one,4,7,6,5


In [13]:
# sort a Series by its values, use its sort_values method:
obj = pd.Series([4, 7, -3, 2])
obj.sort_values()

2   -3
3    2
0    4
1    7
dtype: int64

In [14]:
# any missing values are sorted to the end of the Series by default:

obj = pd.Series([4, np.nan, 7, np.nan, -3, 2])
obj.sort_values()

4   -3.0
5    2.0
0    4.0
2    7.0
1    NaN
3    NaN
dtype: float64

In [15]:
# missing values can be sorted to the start instead by using the na_position option
obj.sort_values(na_position='first')

1    NaN
3    NaN
4   -3.0
5    2.0
0    4.0
2    7.0
dtype: float64

When sorting a DataFrame, you can use the data in one or more columns as the sort
keys. To do so, pass one or more column names to sort_values

In [16]:
frame = pd.DataFrame({"b": [4, 7, -3, 2], "a": [0, 1, 0, 1]})
frame

Unnamed: 0,b,a
0,4,0
1,7,1
2,-3,0
3,2,1


In [17]:
frame.sort_values('b')

Unnamed: 0,b,a
2,-3,0
3,2,1
0,4,0
1,7,1


In [18]:
# to sort multiple columns, pass a list of names
frame.sort_values(['a', 'b'])

Unnamed: 0,b,a
2,-3,0
0,4,0
3,2,1
1,7,1



### Key Differences
| Feature                | `sort_values()`                             | `sort_index()`                              |
|------------------------|---------------------------------------------|---------------------------------------------|
| **Sorts by**           | Column or row values                       | Row or column labels (index)               |
| **Use case**           | Rearrange data based on content values      | Rearrange structure based on labels        |
| **Default axis**       | `axis=0` (rows)                            | `axis=0` (rows)                            |
| **Multi-column support** | Yes (via `by` parameter)                  | No (only for MultiIndex using `level`)     |


### When to Use Which
1. **Use `sort_values()`**:
   - When the focus is on the data and not on labels.
   - To prioritize data importance, ranks, or patterns.
   - For custom ordering of rows/columns based on content.

2. **Use `sort_index()`**:
   - When working with labeled data and the order of labels is significant.
   - To prepare for indexing, lookups, or chronological alignment.
   - For cleaning or organizing data by labels.


----- 

## Ranking

Ranking in pandas assigns a **rank** to each valid value in a Series or DataFrame, starting from the smallest value. If there are **ties** (duplicate values), the default method assigns the **average rank** to the tied values.

### Key Points:
1. **Ranking starts from 1**, based on the values.
2. **Tied values** get the **mean rank** by default.
3. You can change how ties are handled using the `method` parameter (e.g., assign smallest, largest, or consecutive ranks).

In [20]:
obj = pd.Series([7, -5, 7, 4, 2, 0, 4])
obj

0    7
1   -5
2    7
3    4
4    2
5    0
6    4
dtype: int64

In [21]:
obj.rank()

0    6.5
1    1.0
2    6.5
3    4.5
4    3.0
5    2.0
6    4.5
dtype: float64

In [22]:
# Ranks can also be assigned according to the order in which they’re observed in the data
obj.rank(method='first')

0    6.0
1    1.0
2    7.0
3    4.0
4    3.0
5    2.0
6    5.0
dtype: float64

In [23]:
# rank in descending order
obj.rank(ascending=False)

0    1.5
1    7.0
2    1.5
3    3.5
4    5.0
5    6.0
6    3.5
dtype: float64

In [25]:
# DataFrame can compute ranks over the rows or the columns
frame =  pd.DataFrame({"b": [4.3, 7, -3, 2], "a": [0, 1, 0, 1], "c": [-2, 5, 8, -2.5]})
frame

Unnamed: 0,b,a,c
0,4.3,0,-2.0
1,7.0,1,5.0
2,-3.0,0,8.0
3,2.0,1,-2.5


In [26]:
frame.rank(axis='columns')

Unnamed: 0,b,a,c
0,3.0,2.0,1.0
1,3.0,1.0,2.0
2,1.0,2.0,3.0
3,3.0,2.0,1.0


### Summary Table

| Method    | How Ties Are Ranked                       | Example Ranks for `[10, 20, 20, 40]` |
|-----------|------------------------------------------|-------------------------------------|
| `average` | Assign the average of tied ranks         | `[1.0, 2.5, 2.5, 4.0]`             |
| `min`     | Assign the smallest rank to ties         | `[1.0, 2.0, 2.0, 4.0]`             |
| `max`     | Assign the largest rank to ties          | `[1.0, 3.0, 3.0, 4.0]`             |
| `first`   | Assign ranks in the order of appearance  | `[1.0, 2.0, 3.0, 4.0]`             |
| `dense`   | Assign same rank, next rank is consecutive | `[1.0, 2.0, 2.0, 3.0]`            |

### Choosing a Method
- Use **`average`** if you're analyzing overall ranks (default and common).
- Use **`min`** or **`max`** for specific tie-handling preferences.
- Use **`dense`** when ranks must remain consecutive.
- Use **`first`** if the order of values matters.
