*Once your data is clean, the next step is to reshape, reformat, and reorder it as needed for analysis. Pandas provides plenty of flexible tools to do this.*

**Ways for Data Transformation**
1. Sorting and Ranking
2. Renaming and Reordering columns


In [45]:
import pandas as pd
df = pd.read_csv("../assets/data/practical-Imp.csv")
df

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9


#### 1. Sorting & Ranking

>📍 Resetting the Index :  
> After sorting or filtering, the index can become disordered. The `reset_index()` method re-creates a default, sequential index for the DataFrame.
>  * `drop=True` prevents the creation of an 'index' column when using `df.reset_index()`.
>  * `inplace=True` updates the changes on the original DataFrame without needing to reassign it.
>    ```python
>    df.reset_index(drop=True, inplace=True)
>    ```


In [20]:
"""Sorting"""
# Sorting by index
df.sort_index()

# Ascending order
df.sort_values(['IMDb'])

# Descending order
df.sort_values(['IMDb'], ascending=False)

# Sorting by multiple columns
# ! Always use .copy to guarantee that the original DataFrame is not modified
df_copy = df.sort_values(['IMDb', 'Year'], ascending=[False, True]).copy()
# Add a new column with the sorted index of the original DataFrame
df_copy.reset_index()
# Drop = true - Removes the 'index' column just created
# inplace = true - Update the changes on the original DataFrame
df_copy.reset_index(drop=True, inplace=True)

Unnamed: 0,index,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,2,Aamir Khan,Dangal,2016,Biography,2024,8.4
1,5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
2,11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
3,6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
4,0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
5,4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
6,8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
7,7,Hrithik Roshan,War,2019,Action,475,6.5
8,10,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1
9,1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0


In [26]:
"""
    Ranking
    The `.rank()` method assigns a rank to each entry in a column, which is useful for understanding the 
    relative position of each data point.
"""
# Add a new column with the rank of each entry in the 'IMDb' column
df['Rank'] = df['IMDb'].rank(ascending=False)
df
# Currently we are getting ranks with decimal values, which is not very useful.
# To get integer ranks, we can use the `method` parameter.
df['Rank'] = df['IMDb'].rank(ascending=False, method='dense')
df

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb,Rank
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2,5.0
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0,9.0
2,Aamir Khan,Dangal,2016,Biography,2024,8.4,1.0
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6,11.0
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0,6.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3,2.0
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5,4.0
7,Hrithik Roshan,War,2019,Action,475,6.5,7.0
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0,6.0
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9,10.0



#### 2.Renaming & Reordering Columns

In [53]:
"""Renaming"""
# Renaming specific Items
df.rename(columns={"film": "rajat Ki Films", "IMDb": "IMDb RatingX"}, inplace=True)

# Renaming via Function
# This will convert all column names to lowercase, strip whitespace, and replace spaces with underscores
df.rename(columns=lambda x: x.strip().lower().replace(" ", "_"), inplace=True)
df

# * Another way 
# Renaming via List Comprehension
# '_' to ' '
df.columns = [col.strip().lower().replace("_", " ") for col in df.columns]
df

Unnamed: 0,actor,rajat ki films,year,genre,boxoffice(inr crore),imdb ratingx
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9


In [69]:
"""Reordering"""
# Original DataFrame
# df = pd.read_csv("../assets/data/practical-Imp.csv")
# print(df)

# Select columns using correct names from the loaded DataFrame
df = df[["Film", "IMDb", "Year", "Genre"]]
df

# Move Genre to the first position : Using simple python skills 🥹
cols = ["Genre"] + [col for col in df.columns if col != "Genre"]
df = df[cols]
df

Unnamed: 0,Genre,Film,IMDb,Year
0,Action,Pathaan,7.2,2023
1,Action,Tiger Zinda Hai,6.0,2017
2,Biography,Dangal,8.4,2016
3,Fantasy,Brahmastra,5.6,2022
4,Historical,Padmaavat,7.0,2018
5,Thriller,Andhadhun,8.3,2018
6,Horror Comedy,Stree,7.5,2018
7,Action,War,6.5,2019
8,Comedy,Good Newwz,7.0,2019
9,Horror Comedy,Bhool Bhulaiyaa 2,5.9,2022
