# Data Transformation in Pandas

Once your data is **clean**, the next step is to **reshape, reformat, and reorder** it for analysis.  
Pandas provides flexible tools for sorting, ranking, renaming, and rearranging data.

---

## üîΩ Sorting & Ranking

### Sort by Values

```python
df.sort_values("Age")                     # Ascending sort
df.sort_values("Age", ascending=False)    # Descending sort
df.sort_values(["Age", "Salary"])         # Sort by multiple columns
```

**Explanation:**  
`df.sort_values(["Age", "Salary"])` sorts first by **Age**, and if there are ties (same Age), it sorts by **Salary**.

---

### Reset Index

If you want the index to start from 0 and be sequential, use:

```python
df.reset_index(drop=True, inplace=True)
```
‚úÖ The `drop=True` option removes the old index column instead of keeping it as a separate column.

---

### Sort by Index

Sort the DataFrame based on its **index values**:

```python
df.sort_index()
```

Useful when the index is unordered after dropping rows or performing operations that shuffle data.

---

## üèÖ Ranking Data

The `.rank()` function assigns **ranks** to numeric values in a column.

```python
df["Rank"] = df["Score"].rank()                 # Default (average method)
df["Rank"] = df["Score"].rank(method="dense")   # Dense method: 1, 2, 2, 3
```

### üìò About Ranking Methods

| Method | Description |
|---------|--------------|
| `average` | Default. Tied values get the average of their ranks. |
| `min` | Tied values get the minimum rank. |
| `max` | Tied values get the maximum rank. |
| `first` | Ranks are assigned in the order they appear. |
| `dense` | Ties get the same rank, and the next rank is consecutive (no gaps). |

Example:
| Score | Rank (dense) |
|--------|--------------|
| 90 | 1 |
| 85 | 2 |
| 85 | 2 |
| 80 | 3 |

---

## ‚úèÔ∏è Renaming Columns & Index

Rename columns or index labels easily using `.rename()`:

```python
df.rename(columns={"oldName": "newName"}, inplace=True)
df.rename(index={0: "row1", 1: "row2"}, inplace=True)
```

### Rename All Columns at Once

```python
df.columns = ["Name", "Age", "City"]
```

---

## üîÑ Changing Column Order

You can reorder columns by passing a new list of column names:

```python
df = df[["City", "Name", "Age"]]
```

### Move a Specific Column to the Front

```python
cols = ["Name"] + [col for col in df.columns if col != "Name"]
df = df[cols]
```

This ensures **"Name"** appears first, while the order of the remaining columns stays the same.

---

## ‚úÖ Summary

| Task | Method | Description |
|------|---------|-------------|
| Sort by column | `df.sort_values()` | Sorts rows by one or more columns |
| Sort by index | `df.sort_index()` | Sorts based on DataFrame index |
| Reset index | `df.reset_index()` | Resets and reorders DataFrame index |
| Rank data | `df["col"].rank()` | Assigns ranks to numeric values |
| Rename columns | `df.rename(columns={})` | Changes column names |
| Reorder columns | `df = df[new_order]` | Rearranges columns manually |

---

### üí° Key Takeaway

- Use **sorting** to organize your data logically.  
- Use **ranking** to assign positions or scores.  
- Use **renaming and reordering** to keep data structured and readable.  
- These transformations are essential for **EDA (Exploratory Data Analysis)** and **visualization preparation**.


In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv('p2.csv')
df

Unnamed: 0,Name,Age,City,Gender,Email,Join_Date,Purchase_Amount
0,John,25.0,Delhi,M,john@example.com,2023-01-15,5000.0
1,Ana,,mumbai,F,ana123@gmail.com,15/02/2023,3500.0
2,Rahul,19.0,Delhi,M,rahul@xyz,2023/03/01,
3,Sara,17.0,del,F,sara@example.com,2023-03-10,2000.0
4,John,25.0,Delhi,M,john@example.com,2023-01-15,5000.0
5,Mona,22.0,Mum,f,mona@mail.com,2023/3/20,4500.0


In [3]:
df2 = df.fillna(0)

In [4]:
df2

Unnamed: 0,Name,Age,City,Gender,Email,Join_Date,Purchase_Amount
0,John,25.0,Delhi,M,john@example.com,2023-01-15,5000.0
1,Ana,0.0,mumbai,F,ana123@gmail.com,15/02/2023,3500.0
2,Rahul,19.0,Delhi,M,rahul@xyz,2023/03/01,0.0
3,Sara,17.0,del,F,sara@example.com,2023-03-10,2000.0
4,John,25.0,Delhi,M,john@example.com,2023-01-15,5000.0
5,Mona,22.0,Mum,f,mona@mail.com,2023/3/20,4500.0


In [5]:
df2.sort_values('Age')                            # For sorting the values

Unnamed: 0,Name,Age,City,Gender,Email,Join_Date,Purchase_Amount
1,Ana,0.0,mumbai,F,ana123@gmail.com,15/02/2023,3500.0
3,Sara,17.0,del,F,sara@example.com,2023-03-10,2000.0
2,Rahul,19.0,Delhi,M,rahul@xyz,2023/03/01,0.0
5,Mona,22.0,Mum,f,mona@mail.com,2023/3/20,4500.0
0,John,25.0,Delhi,M,john@example.com,2023-01-15,5000.0
4,John,25.0,Delhi,M,john@example.com,2023-01-15,5000.0


In [6]:
df2.sort_values('Age', ascending=False)                 # In Desc Order

Unnamed: 0,Name,Age,City,Gender,Email,Join_Date,Purchase_Amount
0,John,25.0,Delhi,M,john@example.com,2023-01-15,5000.0
4,John,25.0,Delhi,M,john@example.com,2023-01-15,5000.0
5,Mona,22.0,Mum,f,mona@mail.com,2023/3/20,4500.0
2,Rahul,19.0,Delhi,M,rahul@xyz,2023/03/01,0.0
3,Sara,17.0,del,F,sara@example.com,2023-03-10,2000.0
1,Ana,0.0,mumbai,F,ana123@gmail.com,15/02/2023,3500.0


In [7]:
df2.sort_values(['Age', 'Purchase_Amount'])                # Sort based on age but if tie it sort by Amount

Unnamed: 0,Name,Age,City,Gender,Email,Join_Date,Purchase_Amount
1,Ana,0.0,mumbai,F,ana123@gmail.com,15/02/2023,3500.0
3,Sara,17.0,del,F,sara@example.com,2023-03-10,2000.0
2,Rahul,19.0,Delhi,M,rahul@xyz,2023/03/01,0.0
5,Mona,22.0,Mum,f,mona@mail.com,2023/3/20,4500.0
0,John,25.0,Delhi,M,john@example.com,2023-01-15,5000.0
4,John,25.0,Delhi,M,john@example.com,2023-01-15,5000.0


In [8]:
df2.reset_index(drop=True)                           # It reset the index and give new data 

Unnamed: 0,Name,Age,City,Gender,Email,Join_Date,Purchase_Amount
0,John,25.0,Delhi,M,john@example.com,2023-01-15,5000.0
1,Ana,0.0,mumbai,F,ana123@gmail.com,15/02/2023,3500.0
2,Rahul,19.0,Delhi,M,rahul@xyz,2023/03/01,0.0
3,Sara,17.0,del,F,sara@example.com,2023-03-10,2000.0
4,John,25.0,Delhi,M,john@example.com,2023-01-15,5000.0
5,Mona,22.0,Mum,f,mona@mail.com,2023/3/20,4500.0


In [9]:
df2.sort_index()                           # It Change the OG data and sort the index

Unnamed: 0,Name,Age,City,Gender,Email,Join_Date,Purchase_Amount
0,John,25.0,Delhi,M,john@example.com,2023-01-15,5000.0
1,Ana,0.0,mumbai,F,ana123@gmail.com,15/02/2023,3500.0
2,Rahul,19.0,Delhi,M,rahul@xyz,2023/03/01,0.0
3,Sara,17.0,del,F,sara@example.com,2023-03-10,2000.0
4,John,25.0,Delhi,M,john@example.com,2023-01-15,5000.0
5,Mona,22.0,Mum,f,mona@mail.com,2023/3/20,4500.0


In [10]:
# We can use inplace=True to change the OG data 
# df2.reset_index(drop=True, inplace=True)        # It not give view it changes in OG Data

In [11]:
# New Column Name
df2['Rank'] = df2['Age'].rank(method='dense')         # Its formula 
df2

Unnamed: 0,Name,Age,City,Gender,Email,Join_Date,Purchase_Amount,Rank
0,John,25.0,Delhi,M,john@example.com,2023-01-15,5000.0,5.0
1,Ana,0.0,mumbai,F,ana123@gmail.com,15/02/2023,3500.0,1.0
2,Rahul,19.0,Delhi,M,rahul@xyz,2023/03/01,0.0,3.0
3,Sara,17.0,del,F,sara@example.com,2023-03-10,2000.0,2.0
4,John,25.0,Delhi,M,john@example.com,2023-01-15,5000.0,5.0
5,Mona,22.0,Mum,f,mona@mail.com,2023/3/20,4500.0,4.0


In [12]:
df2['Rank'] = df2['Age'].rank(ascending=False, method='dense')
df2

Unnamed: 0,Name,Age,City,Gender,Email,Join_Date,Purchase_Amount,Rank
0,John,25.0,Delhi,M,john@example.com,2023-01-15,5000.0,1.0
1,Ana,0.0,mumbai,F,ana123@gmail.com,15/02/2023,3500.0,5.0
2,Rahul,19.0,Delhi,M,rahul@xyz,2023/03/01,0.0,3.0
3,Sara,17.0,del,F,sara@example.com,2023-03-10,2000.0,4.0
4,John,25.0,Delhi,M,john@example.com,2023-01-15,5000.0,1.0
5,Mona,22.0,Mum,f,mona@mail.com,2023/3/20,4500.0,2.0


###  Renaming Columns & Index

In [13]:
df2.rename(columns={"Purchase_Amount": "Amount"}, inplace=True)
df2

Unnamed: 0,Name,Age,City,Gender,Email,Join_Date,Amount,Rank
0,John,25.0,Delhi,M,john@example.com,2023-01-15,5000.0,1.0
1,Ana,0.0,mumbai,F,ana123@gmail.com,15/02/2023,3500.0,5.0
2,Rahul,19.0,Delhi,M,rahul@xyz,2023/03/01,0.0,3.0
3,Sara,17.0,del,F,sara@example.com,2023-03-10,2000.0,4.0
4,John,25.0,Delhi,M,john@example.com,2023-01-15,5000.0,1.0
5,Mona,22.0,Mum,f,mona@mail.com,2023/3/20,4500.0,2.0


### Changing Column Order

In [14]:
df2 = df2[['Name', 'Rank', 'City', 'Age', 'Gender', 'Email', 'Join_Date', 'Amount']]
df2

Unnamed: 0,Name,Rank,City,Age,Gender,Email,Join_Date,Amount
0,John,1.0,Delhi,25.0,M,john@example.com,2023-01-15,5000.0
1,Ana,5.0,mumbai,0.0,F,ana123@gmail.com,15/02/2023,3500.0
2,Rahul,3.0,Delhi,19.0,M,rahul@xyz,2023/03/01,0.0
3,Sara,4.0,del,17.0,F,sara@example.com,2023-03-10,2000.0
4,John,1.0,Delhi,25.0,M,john@example.com,2023-01-15,5000.0
5,Mona,2.0,Mum,22.0,f,mona@mail.com,2023/3/20,4500.0


### Practice 

In [15]:
df2.sort_values('Age')

Unnamed: 0,Name,Rank,City,Age,Gender,Email,Join_Date,Amount
1,Ana,5.0,mumbai,0.0,F,ana123@gmail.com,15/02/2023,3500.0
3,Sara,4.0,del,17.0,F,sara@example.com,2023-03-10,2000.0
2,Rahul,3.0,Delhi,19.0,M,rahul@xyz,2023/03/01,0.0
5,Mona,2.0,Mum,22.0,f,mona@mail.com,2023/3/20,4500.0
0,John,1.0,Delhi,25.0,M,john@example.com,2023-01-15,5000.0
4,John,1.0,Delhi,25.0,M,john@example.com,2023-01-15,5000.0


In [16]:
df2.reset_index(drop=True)

Unnamed: 0,Name,Rank,City,Age,Gender,Email,Join_Date,Amount
0,John,1.0,Delhi,25.0,M,john@example.com,2023-01-15,5000.0
1,Ana,5.0,mumbai,0.0,F,ana123@gmail.com,15/02/2023,3500.0
2,Rahul,3.0,Delhi,19.0,M,rahul@xyz,2023/03/01,0.0
3,Sara,4.0,del,17.0,F,sara@example.com,2023-03-10,2000.0
4,John,1.0,Delhi,25.0,M,john@example.com,2023-01-15,5000.0
5,Mona,2.0,Mum,22.0,f,mona@mail.com,2023/3/20,4500.0


In [17]:
df2.sort_values(['Name', 'Age'], ascending=True)

Unnamed: 0,Name,Rank,City,Age,Gender,Email,Join_Date,Amount
1,Ana,5.0,mumbai,0.0,F,ana123@gmail.com,15/02/2023,3500.0
4,John,1.0,Delhi,25.0,M,john@example.com,2023-01-15,5000.0
0,John,1.0,Delhi,25.0,M,john@example.com,2023-01-15,5000.0
5,Mona,2.0,Mum,22.0,f,mona@mail.com,2023/3/20,4500.0
2,Rahul,3.0,Delhi,19.0,M,rahul@xyz,2023/03/01,0.0
3,Sara,4.0,del,17.0,F,sara@example.com,2023-03-10,2000.0


In [18]:
df2['Rank1'] = df2['Amount'].rank(ascending = False,method='dense')
df2

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2['Rank1'] = df2['Amount'].rank(ascending = False,method='dense')


Unnamed: 0,Name,Rank,City,Age,Gender,Email,Join_Date,Amount,Rank1
0,John,1.0,Delhi,25.0,M,john@example.com,2023-01-15,5000.0,1.0
1,Ana,5.0,mumbai,0.0,F,ana123@gmail.com,15/02/2023,3500.0,3.0
2,Rahul,3.0,Delhi,19.0,M,rahul@xyz,2023/03/01,0.0,5.0
3,Sara,4.0,del,17.0,F,sara@example.com,2023-03-10,2000.0,4.0
4,John,1.0,Delhi,25.0,M,john@example.com,2023-01-15,5000.0,1.0
5,Mona,2.0,Mum,22.0,f,mona@mail.com,2023/3/20,4500.0,2.0


In [19]:
df2.rename(columns={'Rank1' : 'ARank'})

Unnamed: 0,Name,Rank,City,Age,Gender,Email,Join_Date,Amount,ARank
0,John,1.0,Delhi,25.0,M,john@example.com,2023-01-15,5000.0,1.0
1,Ana,5.0,mumbai,0.0,F,ana123@gmail.com,15/02/2023,3500.0,3.0
2,Rahul,3.0,Delhi,19.0,M,rahul@xyz,2023/03/01,0.0,5.0
3,Sara,4.0,del,17.0,F,sara@example.com,2023-03-10,2000.0,4.0
4,John,1.0,Delhi,25.0,M,john@example.com,2023-01-15,5000.0,1.0
5,Mona,2.0,Mum,22.0,f,mona@mail.com,2023/3/20,4500.0,2.0


In [20]:
df2.columns = [
    "Full_Name", "Rank_Position", "City_Name", "Age_Years",
    "Gender_Type", "Email_Address", "Joining_Date",
    "Total_Amount", "Actual_Rank"
].copy()                                                                # Whene we have to rename all columns

In [21]:
df2

Unnamed: 0,Full_Name,Rank_Position,City_Name,Age_Years,Gender_Type,Email_Address,Joining_Date,Total_Amount,Actual_Rank
0,John,1.0,Delhi,25.0,M,john@example.com,2023-01-15,5000.0,1.0
1,Ana,5.0,mumbai,0.0,F,ana123@gmail.com,15/02/2023,3500.0,3.0
2,Rahul,3.0,Delhi,19.0,M,rahul@xyz,2023/03/01,0.0,5.0
3,Sara,4.0,del,17.0,F,sara@example.com,2023-03-10,2000.0,4.0
4,John,1.0,Delhi,25.0,M,john@example.com,2023-01-15,5000.0,1.0
5,Mona,2.0,Mum,22.0,f,mona@mail.com,2023/3/20,4500.0,2.0


#### Prac. 

In [22]:
import pandas as pd

data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
    "Age": [23, 35, 29, 35, 30],
    "City": ["Delhi", "Mumbai", "Chennai", "Delhi", "Mumbai"],
    "Salary": [50000, 70000, 62000, 70000, 58000],
    "Score": [85, 92, 78, 92, 88]
}

df0 = pd.DataFrame(data)

In [23]:
df0

Unnamed: 0,Name,Age,City,Salary,Score
0,Alice,23,Delhi,50000,85
1,Bob,35,Mumbai,70000,92
2,Charlie,29,Chennai,62000,78
3,David,35,Delhi,70000,92
4,Eva,30,Mumbai,58000,88


In [25]:
# Q1. Sort the DataFrame in descending order of Salary.

df0.sort_values('Salary', ascending=False)

Unnamed: 0,Name,Age,City,Salary,Score
1,Bob,35,Mumbai,70000,92
3,David,35,Delhi,70000,92
2,Charlie,29,Chennai,62000,78
4,Eva,30,Mumbai,58000,88
0,Alice,23,Delhi,50000,85


In [26]:
# Q2. Sort the DataFrame by Age, and if Age is same, sort by Name.

df0.sort_values(['Age', 'Name'])

Unnamed: 0,Name,Age,City,Salary,Score
0,Alice,23,Delhi,50000,85
2,Charlie,29,Chennai,62000,78
4,Eva,30,Mumbai,58000,88
1,Bob,35,Mumbai,70000,92
3,David,35,Delhi,70000,92


In [32]:
# Q3. Reset the index after sorting, without keeping the old index.

df0.reset_index(drop=True)

Unnamed: 0,Name,Age,City,Salary,Score
0,Alice,23,Delhi,50000,85
1,Bob,35,Mumbai,70000,92
2,Charlie,29,Chennai,62000,78
3,David,35,Delhi,70000,92
4,Eva,30,Mumbai,58000,88


In [34]:
# Q4. Rank the Score column using the dense method and store it in a new column Rank

df0['Rank'] = df0['Score'].rank(method='dense')
df0

Unnamed: 0,Name,Age,City,Salary,Score,Rank
0,Alice,23,Delhi,50000,85,2.0
1,Bob,35,Mumbai,70000,92,4.0
2,Charlie,29,Chennai,62000,78,1.0
3,David,35,Delhi,70000,92,4.0
4,Eva,30,Mumbai,58000,88,3.0


In [36]:
# Q5. Rename column Score to Performance.

df0.rename(columns={'Score' : 'NScore'})

Unnamed: 0,Name,Age,City,Salary,NScore,Rank
0,Alice,23,Delhi,50000,85,2.0
1,Bob,35,Mumbai,70000,92,4.0
2,Charlie,29,Chennai,62000,78,1.0
3,David,35,Delhi,70000,92,4.0
4,Eva,30,Mumbai,58000,88,3.0


In [38]:
# Q7. Sort the DataFrame by index (after all changes).

df0.sort_index()

Unnamed: 0,Name,Age,City,Salary,Score,Rank
0,Alice,23,Delhi,50000,85,2.0
1,Bob,35,Mumbai,70000,92,4.0
2,Charlie,29,Chennai,62000,78,1.0
3,David,35,Delhi,70000,92,4.0
4,Eva,30,Mumbai,58000,88,3.0
