# **`Data Science Learners Hub`**

**Module : Python**

**email** : [datasciencelearnershub@gmail.com](mailto:datasciencelearnershub@gmail.com)

## **`#3: Data Manipulation with Pandas`**
7. **Data Filtering and Selection**
   - Conditional selection
   - Using boolean indexing

8. **Data Sorting and Ranking**
   - Sorting by columns
   - Ranking data

9. **Grouping and Aggregation**
   - GroupBy operations
   - Aggregation functions (sum, mean, count, etc.)

### **`8. Data Sorting and Ranking`**

#### **`Sorting a Pandas DataFrame by Columns`**

#### Sorting Basics:

Sorting in Pandas involves arranging the rows of a DataFrame based on the values in one or more columns. The `sort_values()` method is commonly used for this purpose.

#### Sorting by a Single Column:

In [1]:
import pandas as pd

# Sample DataFrame
data = {'Name': ['Laxman', 'Rajesh', 'Ganga', 'Jamuna'],
        'Age': [25, 30, 22, 35],
        'Salary': [50000, 60000, 45000, 70000]}

df = pd.DataFrame(data)

# Sorting by the 'Age' column in ascending order
sorted_df_age_asc = df.sort_values(by='Age', ascending=True)

# Displaying the sorted DataFrame
print("DataFrame Sorted by Age in Ascending Order:")
print(sorted_df_age_asc)

DataFrame Sorted by Age in Ascending Order:
     Name  Age  Salary
2   Ganga   22   45000
0  Laxman   25   50000
1  Rajesh   30   60000
3  Jamuna   35   70000


#### Sorting by Multiple Columns:

In [2]:
import pandas as pd

# Sample DataFrame
data = {'Name': ['Laxman', 'Rajesh', 'Ganga', 'Jamuna'],
        'Age': [25, 30, 22, 35],
        'Salary': [50000, 60000, 45000, 70000],
        'Department': ['HR', 'IT', 'Marketing', 'IT']}

df = pd.DataFrame(data)

# Sorting by 'Department' in ascending order, then 'Salary' in descending order
sorted_df_multi_columns = df.sort_values(by=['Department', 'Salary'], ascending=[True, False])

# Displaying the sorted DataFrame with multiple columns
print("DataFrame Sorted by Department (Asc) and Salary (Desc):")
print(sorted_df_multi_columns)


DataFrame Sorted by Department (Asc) and Salary (Desc):
     Name  Age  Salary Department
0  Laxman   25   50000         HR
3  Jamuna   35   70000         IT
1  Rajesh   30   60000         IT
2   Ganga   22   45000  Marketing


#### Explanation:

- **Single Column Sorting:**
  - `df.sort_values(by='Column_Name')` sorts the DataFrame based on the specified column.
  - `ascending=True` sorts in ascending order; set to `False` for descending order.

- **Multiple Columns Sorting:**
  - `by=['Column1', 'Column2']` sorts by multiple columns in the specified order.
  - `ascending=[True, False]` controls the sorting order for each column.

#### Scenarios for Sorting:

1. **Top Performers Analysis:**
   - Sort by a performance metric (e.g., sales, ratings) to identify top performers.

2. **Time Series Data:**
   - Sort time series data by date or timestamp for chronological analysis.

3. **Hierarchical Sorting:**
   - Sort by one column, then another, for hierarchical analysis (e.g., sorting by department, then salary).

4. **Identifying Outliers:**
   - Sort by numerical columns to identify outliers or extremes in the dataset.

#### Considerations:

- **Preserving Index:**
  - **`Note :`**Sorting may change the DataFrame's index. Use `ignore_index=True` to reset the index.

- **In-Place Sorting:**
  - Use `inplace=True` to perform sorting in-place and modify the original DataFrame.

#### Tips:

- **Chaining Sorting Conditions:**
  - Chain sorting conditions for nuanced analysis, ensuring the correct order of sorting.

- **Numeric vs. Lexicographic Sorting:**
  - Be aware of numeric vs. lexicographic (string-based) sorting when working with mixed data types.

Sorting is crucial for organizing and analyzing data in a meaningful way. It is employed in various scenarios to extract insights and identify patterns within a dataset. Understanding the `sort_values()` method and its parameters enables efficient data sorting in Pandas.


#### **`Ranking Data in a Pandas DataFrame`**

#### Ranking Basics:

Ranking in Pandas involves assigning ranks to the values in a DataFrame based on certain criteria. The `rank()` method is commonly used for this purpose.

#### Basic Ranking:


In [3]:
import pandas as pd

# Sample DataFrame
data = {'Name': ['Laxman', 'Rajesh', 'Ganga', 'Jamuna'],
        'Score': [85, 90, 85, 92]}

df = pd.DataFrame(data)

# Ranking by 'Score' in ascending order
df['Rank'] = df['Score'].rank()

# Displaying the DataFrame with ranks
print("DataFrame with Ranks:")
print(df)

DataFrame with Ranks:
     Name  Score  Rank
0  Laxman     85   1.5
1  Rajesh     90   3.0
2   Ganga     85   1.5
3  Jamuna     92   4.0


Explanation:
- The rank() function in pandas assigns ranks to values in ascending order by default. When there are tied values in the 'Score' column (values that are the same), the rank() function assigns them the average rank.

#### Handling Ties:

In [4]:
# Handling ties by averaging ranks
df['Rank_Avg'] = df['Score'].rank(method='average')

# Displaying the DataFrame with averaged ranks for ties
print("\nDataFrame with Averaged Ranks for Ties:")
print(df)


DataFrame with Averaged Ranks for Ties:
     Name  Score  Rank  Rank_Avg
0  Laxman     85   1.5       1.5
1  Rajesh     90   3.0       3.0
2   Ganga     85   1.5       1.5
3  Jamuna     92   4.0       4.0


Explanation:

- In pandas, the rank() method is used to assign ranks to elements in a DataFrame or Series. The method parameter allows you to specify how to handle ties (cases where two or more elements have the same value). The default method is 'average', but there are other options available.

#### rank() with different methods:

In [6]:
import pandas as pd

# Sample DataFrame with additional data
data = {'Name': ['Laxman', 'Rajesh', 'Ganga', 'Jamuna', 'Ram', 'Sita', 'Hanuman'],
        'Score': [85, 90, 85, 92, 88, 90, 92]}

df = pd.DataFrame(data)

# Ranking by 'Score' with various methods
df['Rank'] = df['Score'].rank()                       # Uses the default method ('average')
df['Rank_min'] = df['Score'].rank(method='min')        # Assigns the minimum rank to tied values
df['Rank_max'] = df['Score'].rank(method='max')        # Assigns the maximum rank to tied values
df['Rank_first'] = df['Score'].rank(method='first')    # Assigns ranks in the order they appear in the array
df['Rank_dense'] = df['Score'].rank(method='dense')    # Like 'min', but the rank increases by 1 between tied values

# Displaying the DataFrame with ranks using different methods
print("DataFrame with Ranks:")
print(df)


DataFrame with Ranks:
      Name  Score  Rank  Rank_min  Rank_max  Rank_first  Rank_dense
0   Laxman     85   1.5       1.0       2.0         1.0         1.0
1   Rajesh     90   4.5       4.0       5.0         4.0         3.0
2    Ganga     85   1.5       1.0       2.0         2.0         1.0
3   Jamuna     92   6.5       6.0       7.0         6.0         4.0
4      Ram     88   3.0       3.0       3.0         3.0         2.0
5     Sita     90   4.5       4.0       5.0         5.0         3.0
6  Hanuman     92   6.5       6.0       7.0         7.0         4.0


#### Customizing Ranking Method:

In [5]:
# Customizing ranking method to assign the lowest rank to tied values
df['Rank_Min'] = df['Score'].rank(method='min')

# Displaying the DataFrame with the lowest ranks for ties
print("\nDataFrame with Lowest Ranks for Ties:")
print(df)


DataFrame with Lowest Ranks for Ties:
     Name  Score  Rank  Rank_Avg  Rank_Min
0  Laxman     85   1.5       1.5       1.0
1  Rajesh     90   3.0       3.0       3.0
2   Ganga     85   1.5       1.5       1.0
3  Jamuna     92   4.0       4.0       4.0


#### Explanation:

- **Basic Ranking:**
  - `df['Score'].rank()` assigns ranks to values in the 'Score' column in ascending order.

- **Handling Ties:**
  - `method='average'` (default) averages ranks for tied values.
  - Other methods include `'min'`, `'max'`, `'first'`, and `'dense'`.

- **Customizing Ranking Method:**
  - `method='min'` assigns the minimum rank to tied values.

#### Scenarios for Ranking:

1. **Competition Results:**
   - Rank participants based on their scores in a competition.

2. **Academic Performance:**
   - Rank students based on their exam scores.

3. **Sales Performance:**
   - Rank sales representatives based on their sales figures.

4. **Sporting Events:**
   - Rank teams or athletes based on their performance.

#### Considerations:

- **Ties Handling:**
  - Choose a tie-breaking method based on the nature of your analysis.

- **Numeric vs. Lexicographic Ranking:**
  - Understand the difference between numeric and lexicographic ranking for mixed data types.

#### Tips:

- **Ascending vs. Descending Order:**
  - Use `ascending=False` to rank in descending order.

- **Ranking Specific Columns:**
  - Apply ranking to specific columns for targeted analysis.

Ranking is a valuable tool for analyzing and comparing values within a dataset. Understanding the various ranking methods and how to handle ties provides flexibility in adapting ranking to different scenarios.
