--

### 🔄 Data Transformation

Once your data is cleaned, the next step is to **reshape, reformat, and reorder** it as needed for further analysis. Pandas provides a wide array of flexible tools to help with this process.

---

### 🔢 Sorting & Ranking

#### **Sort by Values**

You can sort a DataFrame based on one or more column values. Sorting helps in arranging your data in a meaningful order, such as ascending or descending order by age or salary. When sorting by multiple columns, the DataFrame is sorted by the first column, and in the case of ties, the second column is used to break the tie.

#### **Reset Index**

After certain operations, the DataFrame index might become non-sequential. Resetting the index helps reassign a fresh, sequential index, optionally dropping the old one.

#### **Sort by Index**

When rows are identified by index values that may be unsorted, sorting by index can restore a predictable row order. This is especially useful after row deletions or concatenation.

#### **Ranking**

Ranking assigns a rank to numeric values in a column. The default behavior gives tied values the average rank, resulting in decimal values. Other methods like 'dense' can assign the same rank to ties while keeping the sequence without gaps. Ranking is useful for scoring, leaderboards, and comparative analysis.

---

### 🏷 Renaming Columns & Index

You can rename specific columns or index values to make them more descriptive or standardized. Renaming improves the clarity of your data, especially when dealing with multiple sources or preparing data for visualization.

To rename all columns at once, simply assign a new list of names that better represent the data.

---

### 📊 Changing Column Order

Reordering columns helps prioritize which data to focus on. You can rearrange the entire column layout or bring specific columns (like 'Name' or 'City') to the front for better readability.

---

### ✅ Summary

* Use **sorting** to organize your data by value or index.
* Use **ranking** to understand relative positions or performance.
* Use **renaming** to make column headers or index labels more meaningful.
* Use **reordering** to present the most important columns first.

These transformations are essential steps in **Exploratory Data Analysis ( a Markdown `.md` file for direct use in a GitHub repo?


In [1]:
import pandas as pd

# Create a clean sample dataset
data = {
    "Actor": [
        "Shah Rukh Khan", "Aamir Khan", "Salman Khan", "Ranbir Kapoor", "Ayushmann Khurrana",
        "Deepika Padukone", "Alia Bhatt", "Akshay Kumar", "Kangana Ranaut", "Rajkummar Rao"
    ],
    "Film": [
        "Chennai Express", "Dangal", "Bajrangi Bhaijaan", "Barfi!", "Andhadhun",
        "Padmaavat", "Raazi", "Toilet: Ek Prem Katha", "Queen", "Stree"
    ],
    "Year": [
        2013, 2016, 2015, 2012, 2018,
        2018, 2018, 2017, 2014, 2018
    ],
    "Genre": [
        "Action-Comedy", "Sports-Drama", "Drama", "Romantic-Comedy", "Thriller",
        "Historical-Drama", "Spy-Thriller", "Comedy-Drama", "Comedy-Drama", "Horror-Comedy"
    ],
    "BoxOffice (INR Crore)": [
        423, 2024, 969, 175, 456,
        585, 197, 311, 108, 180
    ],
    "IMDB": [
        6.9, 8.4, 8.1, 8.1, 8.2,
        7.0, 7.7, 7.2, 8.2, 7.5
    ]
}

# Create DataFrame
df = pd.DataFrame(data)



In [2]:
df

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice (INR Crore),IMDB
0,Shah Rukh Khan,Chennai Express,2013,Action-Comedy,423,6.9
1,Aamir Khan,Dangal,2016,Sports-Drama,2024,8.4
2,Salman Khan,Bajrangi Bhaijaan,2015,Drama,969,8.1
3,Ranbir Kapoor,Barfi!,2012,Romantic-Comedy,175,8.1
4,Ayushmann Khurrana,Andhadhun,2018,Thriller,456,8.2
5,Deepika Padukone,Padmaavat,2018,Historical-Drama,585,7.0
6,Alia Bhatt,Raazi,2018,Spy-Thriller,197,7.7
7,Akshay Kumar,Toilet: Ek Prem Katha,2017,Comedy-Drama,311,7.2
8,Kangana Ranaut,Queen,2014,Comedy-Drama,108,8.2
9,Rajkummar Rao,Stree,2018,Horror-Comedy,180,7.5


In [3]:
df.sort_values("Actor")

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice (INR Crore),IMDB
1,Aamir Khan,Dangal,2016,Sports-Drama,2024,8.4
7,Akshay Kumar,Toilet: Ek Prem Katha,2017,Comedy-Drama,311,7.2
6,Alia Bhatt,Raazi,2018,Spy-Thriller,197,7.7
4,Ayushmann Khurrana,Andhadhun,2018,Thriller,456,8.2
5,Deepika Padukone,Padmaavat,2018,Historical-Drama,585,7.0
8,Kangana Ranaut,Queen,2014,Comedy-Drama,108,8.2
9,Rajkummar Rao,Stree,2018,Horror-Comedy,180,7.5
3,Ranbir Kapoor,Barfi!,2012,Romantic-Comedy,175,8.1
2,Salman Khan,Bajrangi Bhaijaan,2015,Drama,969,8.1
0,Shah Rukh Khan,Chennai Express,2013,Action-Comedy,423,6.9


In [4]:
df.sort_values("Actor", ascending=False)

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice (INR Crore),IMDB
0,Shah Rukh Khan,Chennai Express,2013,Action-Comedy,423,6.9
2,Salman Khan,Bajrangi Bhaijaan,2015,Drama,969,8.1
3,Ranbir Kapoor,Barfi!,2012,Romantic-Comedy,175,8.1
9,Rajkummar Rao,Stree,2018,Horror-Comedy,180,7.5
8,Kangana Ranaut,Queen,2014,Comedy-Drama,108,8.2
5,Deepika Padukone,Padmaavat,2018,Historical-Drama,585,7.0
4,Ayushmann Khurrana,Andhadhun,2018,Thriller,456,8.2
6,Alia Bhatt,Raazi,2018,Spy-Thriller,197,7.7
7,Akshay Kumar,Toilet: Ek Prem Katha,2017,Comedy-Drama,311,7.2
1,Aamir Khan,Dangal,2016,Sports-Drama,2024,8.4


In [5]:
df.sort_values(["Actor", "IMDB"]) 

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice (INR Crore),IMDB
1,Aamir Khan,Dangal,2016,Sports-Drama,2024,8.4
7,Akshay Kumar,Toilet: Ek Prem Katha,2017,Comedy-Drama,311,7.2
6,Alia Bhatt,Raazi,2018,Spy-Thriller,197,7.7
4,Ayushmann Khurrana,Andhadhun,2018,Thriller,456,8.2
5,Deepika Padukone,Padmaavat,2018,Historical-Drama,585,7.0
8,Kangana Ranaut,Queen,2014,Comedy-Drama,108,8.2
9,Rajkummar Rao,Stree,2018,Horror-Comedy,180,7.5
3,Ranbir Kapoor,Barfi!,2012,Romantic-Comedy,175,8.1
2,Salman Khan,Bajrangi Bhaijaan,2015,Drama,969,8.1
0,Shah Rukh Khan,Chennai Express,2013,Action-Comedy,423,6.9


In [6]:
df2=df.sort_values(["Actor", "IMDB"]).copy() 

In [9]:
df2.reset_index()

Unnamed: 0,index,Actor,Film,Year,Genre,BoxOffice (INR Crore),IMDB
0,1,Aamir Khan,Dangal,2016,Sports-Drama,2024,8.4
1,7,Akshay Kumar,Toilet: Ek Prem Katha,2017,Comedy-Drama,311,7.2
2,6,Alia Bhatt,Raazi,2018,Spy-Thriller,197,7.7
3,4,Ayushmann Khurrana,Andhadhun,2018,Thriller,456,8.2
4,5,Deepika Padukone,Padmaavat,2018,Historical-Drama,585,7.0
5,8,Kangana Ranaut,Queen,2014,Comedy-Drama,108,8.2
6,9,Rajkummar Rao,Stree,2018,Horror-Comedy,180,7.5
7,3,Ranbir Kapoor,Barfi!,2012,Romantic-Comedy,175,8.1
8,2,Salman Khan,Bajrangi Bhaijaan,2015,Drama,969,8.1
9,0,Shah Rukh Khan,Chennai Express,2013,Action-Comedy,423,6.9


In [10]:
df.reset_index(drop=True, inplace=True) ##origional data

In [11]:
df

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice (INR Crore),IMDB
0,Shah Rukh Khan,Chennai Express,2013,Action-Comedy,423,6.9
1,Aamir Khan,Dangal,2016,Sports-Drama,2024,8.4
2,Salman Khan,Bajrangi Bhaijaan,2015,Drama,969,8.1
3,Ranbir Kapoor,Barfi!,2012,Romantic-Comedy,175,8.1
4,Ayushmann Khurrana,Andhadhun,2018,Thriller,456,8.2
5,Deepika Padukone,Padmaavat,2018,Historical-Drama,585,7.0
6,Alia Bhatt,Raazi,2018,Spy-Thriller,197,7.7
7,Akshay Kumar,Toilet: Ek Prem Katha,2017,Comedy-Drama,311,7.2
8,Kangana Ranaut,Queen,2014,Comedy-Drama,108,8.2
9,Rajkummar Rao,Stree,2018,Horror-Comedy,180,7.5


In [12]:
df2

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice (INR Crore),IMDB
1,Aamir Khan,Dangal,2016,Sports-Drama,2024,8.4
7,Akshay Kumar,Toilet: Ek Prem Katha,2017,Comedy-Drama,311,7.2
6,Alia Bhatt,Raazi,2018,Spy-Thriller,197,7.7
4,Ayushmann Khurrana,Andhadhun,2018,Thriller,456,8.2
5,Deepika Padukone,Padmaavat,2018,Historical-Drama,585,7.0
8,Kangana Ranaut,Queen,2014,Comedy-Drama,108,8.2
9,Rajkummar Rao,Stree,2018,Horror-Comedy,180,7.5
3,Ranbir Kapoor,Barfi!,2012,Romantic-Comedy,175,8.1
2,Salman Khan,Bajrangi Bhaijaan,2015,Drama,969,8.1
0,Shah Rukh Khan,Chennai Express,2013,Action-Comedy,423,6.9


In [13]:
df2.sort_index()

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice (INR Crore),IMDB
0,Shah Rukh Khan,Chennai Express,2013,Action-Comedy,423,6.9
1,Aamir Khan,Dangal,2016,Sports-Drama,2024,8.4
2,Salman Khan,Bajrangi Bhaijaan,2015,Drama,969,8.1
3,Ranbir Kapoor,Barfi!,2012,Romantic-Comedy,175,8.1
4,Ayushmann Khurrana,Andhadhun,2018,Thriller,456,8.2
5,Deepika Padukone,Padmaavat,2018,Historical-Drama,585,7.0
6,Alia Bhatt,Raazi,2018,Spy-Thriller,197,7.7
7,Akshay Kumar,Toilet: Ek Prem Katha,2017,Comedy-Drama,311,7.2
8,Kangana Ranaut,Queen,2014,Comedy-Drama,108,8.2
9,Rajkummar Rao,Stree,2018,Horror-Comedy,180,7.5


In [14]:
df2["Rank"]=df2["IMDB"]

In [16]:
df2

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice (INR Crore),IMDB,Rank
1,Aamir Khan,Dangal,2016,Sports-Drama,2024,8.4,8.4
7,Akshay Kumar,Toilet: Ek Prem Katha,2017,Comedy-Drama,311,7.2,7.2
6,Alia Bhatt,Raazi,2018,Spy-Thriller,197,7.7,7.7
4,Ayushmann Khurrana,Andhadhun,2018,Thriller,456,8.2,8.2
5,Deepika Padukone,Padmaavat,2018,Historical-Drama,585,7.0,7.0
8,Kangana Ranaut,Queen,2014,Comedy-Drama,108,8.2,8.2
9,Rajkummar Rao,Stree,2018,Horror-Comedy,180,7.5,7.5
3,Ranbir Kapoor,Barfi!,2012,Romantic-Comedy,175,8.1,8.1
2,Salman Khan,Bajrangi Bhaijaan,2015,Drama,969,8.1,8.1
0,Shah Rukh Khan,Chennai Express,2013,Action-Comedy,423,6.9,6.9


In [20]:
df2["rank"]=df2["IMDB"].rank(ascending=False,method="dense")

In [19]:
df2

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice (INR Crore),IMDB,Rank,rank
1,Aamir Khan,Dangal,2016,Sports-Drama,2024,8.4,8.4,1.0
7,Akshay Kumar,Toilet: Ek Prem Katha,2017,Comedy-Drama,311,7.2,7.2,8.0
6,Alia Bhatt,Raazi,2018,Spy-Thriller,197,7.7,7.7,6.0
4,Ayushmann Khurrana,Andhadhun,2018,Thriller,456,8.2,8.2,2.5
5,Deepika Padukone,Padmaavat,2018,Historical-Drama,585,7.0,7.0,9.0
8,Kangana Ranaut,Queen,2014,Comedy-Drama,108,8.2,8.2,2.5
9,Rajkummar Rao,Stree,2018,Horror-Comedy,180,7.5,7.5,7.0
3,Ranbir Kapoor,Barfi!,2012,Romantic-Comedy,175,8.1,8.1,4.5
2,Salman Khan,Bajrangi Bhaijaan,2015,Drama,969,8.1,8.1,4.5
0,Shah Rukh Khan,Chennai Express,2013,Action-Comedy,423,6.9,6.9,10.0


In [24]:
df.rename(columns={"Actor":"Actors"},inplace=True)

In [25]:
df

Unnamed: 0,Actors,Film,Year,Genre,BoxOffice (INR Crore),IMDB
0,Shah Rukh Khan,Chennai Express,2013,Action-Comedy,423,6.9
1,Aamir Khan,Dangal,2016,Sports-Drama,2024,8.4
2,Salman Khan,Bajrangi Bhaijaan,2015,Drama,969,8.1
3,Ranbir Kapoor,Barfi!,2012,Romantic-Comedy,175,8.1
4,Ayushmann Khurrana,Andhadhun,2018,Thriller,456,8.2
5,Deepika Padukone,Padmaavat,2018,Historical-Drama,585,7.0
6,Alia Bhatt,Raazi,2018,Spy-Thriller,197,7.7
7,Akshay Kumar,Toilet: Ek Prem Katha,2017,Comedy-Drama,311,7.2
8,Kangana Ranaut,Queen,2014,Comedy-Drama,108,8.2
9,Rajkummar Rao,Stree,2018,Horror-Comedy,180,7.5


In [26]:
df=df[["Actors","Film","Year","Genre","IMDB","BoxOffice (INR Crore)"]]##reorder

In [27]:
df

Unnamed: 0,Actors,Film,Year,Genre,IMDB,BoxOffice (INR Crore)
0,Shah Rukh Khan,Chennai Express,2013,Action-Comedy,6.9,423
1,Aamir Khan,Dangal,2016,Sports-Drama,8.4,2024
2,Salman Khan,Bajrangi Bhaijaan,2015,Drama,8.1,969
3,Ranbir Kapoor,Barfi!,2012,Romantic-Comedy,8.1,175
4,Ayushmann Khurrana,Andhadhun,2018,Thriller,8.2,456
5,Deepika Padukone,Padmaavat,2018,Historical-Drama,7.0,585
6,Alia Bhatt,Raazi,2018,Spy-Thriller,7.7,197
7,Akshay Kumar,Toilet: Ek Prem Katha,2017,Comedy-Drama,7.2,311
8,Kangana Ranaut,Queen,2014,Comedy-Drama,8.2,108
9,Rajkummar Rao,Stree,2018,Horror-Comedy,7.5,180


In [29]:
cols = ["Year"] + [col for col in df.columns if col != "Year"]
df = df[cols]

In [30]:
df

Unnamed: 0,Year,Actors,Film,Genre,IMDB,BoxOffice (INR Crore)
0,2013,Shah Rukh Khan,Chennai Express,Action-Comedy,6.9,423
1,2016,Aamir Khan,Dangal,Sports-Drama,8.4,2024
2,2015,Salman Khan,Bajrangi Bhaijaan,Drama,8.1,969
3,2012,Ranbir Kapoor,Barfi!,Romantic-Comedy,8.1,175
4,2018,Ayushmann Khurrana,Andhadhun,Thriller,8.2,456
5,2018,Deepika Padukone,Padmaavat,Historical-Drama,7.0,585
6,2018,Alia Bhatt,Raazi,Spy-Thriller,7.7,197
7,2017,Akshay Kumar,Toilet: Ek Prem Katha,Comedy-Drama,7.2,311
8,2014,Kangana Ranaut,Queen,Comedy-Drama,8.2,108
9,2018,Rajkummar Rao,Stree,Horror-Comedy,7.5,180
