# Data Transformation

Once your data is clean, the next step is to **reshape, reformat, and reorder** it as needed for analysis. Pandas gives you plenty of flexible tools to do this.

---

## Sorting & Ranking

### Sort by Values

```python
df.sort_values("Age")                   # Ascending sort
df.sort_values("Age", ascending=False)  # Descending
df.sort_values(["Age", "Salary"])       # Sort by multiple columns
```
df.sort_values(["Age", "Salary"]) sorts the DataFrame first by the "Age" column, and if there are ties (i.e., two or more rows with the same "Age"), it will sort by the "Salary" column.

### Reset Index
If you want the index to start from 0 and be sequential, you can reset it using reset_index()
```python
df.reset_index(drop=True, inplace=True)  # Reset the index and drop the old index
```
### Sort by Index

```python
df.sort_index()
```
The df.sort_index() function is used to sort the DataFrame based on its index values. If the index is not in a sequential order (e.g., you have dropped rows or performed other operations that change the index), you can use sort_index() to restore it to a sorted order.
### Ranking
The .rank() function in pandas is used to assign ranks to numeric values in a column, like scores or points. By default, it gives the average rank to tied values, which can result in decimal numbers. For example, if two people share the top score, they both get a rank of 1.5. You can customize the ranking behavior using the method parameter. One useful option is method='dense', which assigns the same rank to ties but doesn’t leave gaps in the ranking sequence. This is helpful when you want a clean, consecutive ranking system without skips.
```python
df["Rank"] = df["Score"].rank()                 # Default: average method
df["Rank"] = df["Score"].rank(method="dense")   # 1, 2, 2, 3
```

---

## Renaming Columns & Index

```python
df.rename(columns={"oldName": "newName"}, inplace=True)
df.rename(index={0: "row1", 1: "row2"}, inplace=True)
```

To rename all columns:

```python
df.columns = ["Name", "Age", "City"]
```

---

## Changing Column Order

Just pass a new list of column names:

```python
df = df[["City", "Name", "Age"]]   # Reorder as desired
```

You can also move one column to the front:

```python
cols = ["Name"] + [col for col in df.columns if col != "Name"]
df = df[cols]
```

---



## Summary

- Sort, rank, and rename to prepare your data    
- Reordering and reshaping are key for EDA and visualization

 

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv("data.csv")

In [3]:
df

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9


In [8]:
 df.sort_values("Year") # for asc sort

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
10,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6


In [9]:
df.sort_values("Year",ascending = False) # for desc sort

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0


In [10]:
df.sort_values(["Year","IMDb"]) # if there is any tie in year it will sort on the basis of OMDb

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
10,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6


In [11]:
df2 = df.sort_values(["Year","IMDb"]).copy()

In [14]:
df2.reset_index(drop = True)

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Aamir Khan,Dangal,2016,Biography,2024,8.4
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
2,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1
3,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
4,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Hrithik Roshan,War,2019,Action,475,6.5
7,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
8,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
9,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6


In [13]:
df2.reset_index(drop = True)

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Aamir Khan,Dangal,2016,Biography,2024,8.4
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
2,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1
3,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
4,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Hrithik Roshan,War,2019,Action,475,6.5
7,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
8,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
9,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6


we can directly make changes in our df2 with new index

In [15]:
df2.reset_index(drop = True,inplace = True) # this will permently change df2

In [16]:
df2

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Aamir Khan,Dangal,2016,Biography,2024,8.4
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
2,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1
3,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
4,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Hrithik Roshan,War,2019,Action,475,6.5
7,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
8,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
9,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6


In [22]:
df3 = df.sort_values("Year").copy()

In [23]:
df3

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
10,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6


In [24]:
df3.sort_index()

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9


In [26]:
df2["Rank"] = df2["IMDb"].rank()

In [27]:
df2

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb,Rank
0,Aamir Khan,Dangal,2016,Biography,2024,8.4,12.0
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0,3.0
2,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1,4.0
3,Ranveer Singh,Padmaavat,2018,Historical,585,7.0,6.5
4,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5,9.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3,11.0
6,Hrithik Roshan,War,2019,Action,475,6.5,5.0
7,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0,6.5
8,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2,10.0
9,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6,1.0


In [29]:
df2["Rank"] = df2["IMDb"].rank(ascending = False)

In [30]:
df2

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb,Rank
0,Aamir Khan,Dangal,2016,Biography,2024,8.4,1.0
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0,10.0
2,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1,9.0
3,Ranveer Singh,Padmaavat,2018,Historical,585,7.0,6.5
4,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5,4.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3,2.0
6,Hrithik Roshan,War,2019,Action,475,6.5,8.0
7,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0,6.5
8,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2,3.0
9,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6,12.0


In [31]:
df2.sort_values("Rank") 

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb,Rank
0,Aamir Khan,Dangal,2016,Biography,2024,8.4,1.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3,2.0
8,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2,3.0
4,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5,4.0
11,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2,5.0
3,Ranveer Singh,Padmaavat,2018,Historical,585,7.0,6.5
7,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0,6.5
6,Hrithik Roshan,War,2019,Action,475,6.5,8.0
2,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1,9.0
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0,10.0


### since we get avg in case of tie but if want dense rank

In [32]:
df2["Rank"] = df2["IMDb"].rank(ascending = False,method = "dense")

In [33]:
df2.sort_values("Rank")

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb,Rank
0,Aamir Khan,Dangal,2016,Biography,2024,8.4,1.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3,2.0
8,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2,3.0
4,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5,4.0
11,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2,5.0
3,Ranveer Singh,Padmaavat,2018,Historical,585,7.0,6.0
7,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0,6.0
6,Hrithik Roshan,War,2019,Action,475,6.5,7.0
2,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1,8.0
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0,9.0


In [34]:
df2.rename(columns={"Rank" : "IMDb Rank"} , inplace = True)

In [35]:
df2

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb,IMDb Rank
0,Aamir Khan,Dangal,2016,Biography,2024,8.4,1.0
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0,9.0
2,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1,8.0
3,Ranveer Singh,Padmaavat,2018,Historical,585,7.0,6.0
4,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5,4.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3,2.0
6,Hrithik Roshan,War,2019,Action,475,6.5,7.0
7,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0,6.0
8,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2,3.0
9,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6,11.0


In [37]:
df2.columns = ["Actor Name","Film Name","Film Release Year","Film Genre","BoxOffice","IMDB","IMDB Rank"]

In [38]:
df2

Unnamed: 0,Actor Name,Film Name,Film Release Year,Film Genre,BoxOffice,IMDB,IMDB Rank
0,Aamir Khan,Dangal,2016,Biography,2024,8.4,1.0
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0,9.0
2,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1,8.0
3,Ranveer Singh,Padmaavat,2018,Historical,585,7.0,6.0
4,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5,4.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3,2.0
6,Hrithik Roshan,War,2019,Action,475,6.5,7.0
7,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0,6.0
8,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2,3.0
9,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6,11.0


In [40]:
df2 = df2[["Film Name","Actor Name","Film Release Year","Film Genre","BoxOffice","IMDB Rank"]] 

In [41]:
df2

Unnamed: 0,Film Name,Actor Name,Film Release Year,Film Genre,BoxOffice,IMDB Rank
0,Dangal,Aamir Khan,2016,Biography,2024,1.0
1,Tiger Zinda Hai,Salman Khan,2017,Action,565,9.0
2,Badrinath Ki Dulhania,Varun Dhawan,2017,Romantic Comedy,201,8.0
3,Padmaavat,Ranveer Singh,2018,Historical,585,6.0
4,Stree,Rajkummar Rao,2018,Horror Comedy,180,4.0
5,Andhadhun,Ayushmann Khurrana,2018,Thriller,111,2.0
6,War,Hrithik Roshan,2019,Action,475,7.0
7,Good Newwz,Akshay Kumar,2019,Comedy,318,6.0
8,Uri: The Surgical Strike,Vicky Kaushal,2019,Action,342,3.0
9,Brahmastra,Ranbir Kapoor,2022,Fantasy,431,11.0
