# Data Transformation
Once your data is clean, the next step is to reshape, reformat, and reorder it as needed for analysis. Pandas gives you plenty of flexible tools to do this.

## Sorting & Ranking
### Sort by Values

In [1]:
import pandas as pd

In [3]:
df = pd.read_csv("data.csv")

In [4]:
df

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9


In [5]:
df.sort_values("Year")                   # Ascending sort

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
10,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6


In [6]:
df.sort_values("Year", ascending=False)  # Descending

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0


In [17]:
d2 = df.sort_values(["Year", "IMDb"]).copy()

In [18]:
d2

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
10,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6


### Reset Index
If you want the index to start from 0 and be sequential, you can reset it using reset_index()

In [26]:
d2.reset_index(drop=True , inplace = True) # Reset the index and drop the old index

In [27]:
d2

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Aamir Khan,Dangal,2016,Biography,2024,8.4
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
2,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1
3,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
4,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Hrithik Roshan,War,2019,Action,475,6.5
7,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
8,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
9,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6


- The df.sort_index() function is used to sort the DataFrame based on its index values. If the index is not in a sequential order (e.g., you have dropped rows or performed other operations that change the index), you can use sort_index() to restore it to a sorted order

## Ranking
The .rank() function in pandas is used to assign ranks to numeric values in a column, like scores or points. By default, it gives the average rank to tied values, which can result in decimal numbers. For example, if two people share the top score, they both get a rank of 1.5. You can customize the ranking behavior using the method parameter. One useful option is method='dense', which assigns the same rank to ties but doesnâ€™t leave gaps in the ranking sequence. This is helpful when you want a clean, consecutive ranking system without skips.

In [37]:
df["Rank"] = df["IMDb"].rank(ascending = False)                 # Default: average method

In [38]:
df

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb,Rank
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2,5.0
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0,10.0
2,Aamir Khan,Dangal,2016,Biography,2024,8.4,1.0
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6,12.0
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0,6.5
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3,2.0
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5,4.0
7,Hrithik Roshan,War,2019,Action,475,6.5,8.0
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0,6.5
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9,11.0


In [39]:
df["Rank"] = df["IMDb"].rank(method="dense" , ascending = False) 

In [36]:
df

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb,Rank
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2,5.0
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0,9.0
2,Aamir Khan,Dangal,2016,Biography,2024,8.4,1.0
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6,11.0
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0,6.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3,2.0
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5,4.0
7,Hrithik Roshan,War,2019,Action,475,6.5,7.0
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0,6.0
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9,10.0


  ## Renaming Columns & Index

In [40]:
df.rename(columns={"BoxOffice(INR Crore)": "BoxOffice"}, inplace=True)

In [41]:
df

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice,IMDb,Rank
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2,5.0
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0,9.0
2,Aamir Khan,Dangal,2016,Biography,2024,8.4,1.0
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6,11.0
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0,6.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3,2.0
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5,4.0
7,Hrithik Roshan,War,2019,Action,475,6.5,7.0
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0,6.0
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9,10.0


In [42]:
df.rename(index={0: "ActorName", 1: "IMDb_Rank"}, inplace=True)

In [43]:
df

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice,IMDb,Rank
ActorName,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2,5.0
IMDb_Rank,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0,9.0
2,Aamir Khan,Dangal,2016,Biography,2024,8.4,1.0
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6,11.0
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0,6.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3,2.0
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5,4.0
7,Hrithik Roshan,War,2019,Action,475,6.5,7.0
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0,6.0
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9,10.0
