## Experiment 2

### Data Filtering: Filter rows or columns based on specified criteria, such as removing outliers or selecting data within a certain range.

### **Data Filtering**

**Data Filtering** is the process of selecting a specific subset of data from a larger dataset based on predefined conditions or criteria. This can involve removing irrelevant, redundant, or erroneous data to ensure that only the most relevant and accurate information is retained for analysis. Data filtering enhances the quality and clarity of the data, making it easier to identify patterns, trends, and insights. It can be applied to both rows and columns, and is commonly used for:

- **Excluding outliers**: Removing extreme values that might skew the results.
- **Selecting specific ranges**: Filtering data that falls within a certain numerical or categorical range.
- **Matching conditions**: Keeping only rows or columns that satisfy given logical conditions (e.g., selecting data where a column value is greater than a specified threshold).

Data filtering ensures that the analysis focuses on meaningful and targeted information, improving the overall efficiency of data processing and decision-making.

### Importing Required Libraries

In [2]:
import pandas as pd

### Creating Dataframe

In [3]:
# Sample DataFrame

# Creating Dictionary
data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
    "Score": [85, 92, 78, 88, 90],
    "age": [20, 21, 19, 22, 20]
}

# Creating Dataframe
df = pd.DataFrame(data)

# Printing Dataframe
print("Dataframe is:\n")
print(df)

Dataframe is:

      Name  Score  age
0    Alice     85   20
1      Bob     92   21
2  Charlie     78   19
3    David     88   22
4      Eva     90   20


### 1. Filtering Rows Based on a Condition
#### Let's filter rows where the score is greater than or equal to 80.

In [8]:
# Filter rows where the score is greater than or equal to 80
filtered_df = df[df["Score"] >= 80]

# Printing Filtered Dataframe
print("Dataframe Filtered based on Score/Condition is:\n")
print(filtered_df)

Dataframe Filtered based on Score/Condition is:

    Name  Score  age
0  Alice     85   20
1    Bob     92   21
3  David     88   22
4    Eva     90   20


### 2. Removing Outliers
#### Now, we'll remove outliers by keeping only scores within a specific range (e.g., 70 to 100).

In [7]:
# Lower bound of the range
lower_bound = 70

# Upper bound of the range
upper_bound = 100

# Filter rows based on the score range
filtered_df = df[(df["Score"] >= lower_bound) & (df["Score"] <= upper_bound)]

# Printing Filtered Dataframe
print("Dataframe Filtered based on range for a single column is:\n")
print(filtered_df)

Dataframe Filtered based on range for a single column is:

      Name  Score  age
0    Alice     85   20
1      Bob     92   21
2  Charlie     78   19
3    David     88   22
4      Eva     90   20


### 3. Selecting Specific Columns
#### Finally, let's select only the 'Name' and 'Score' columns.

In [5]:
# Select only the ‘Name’ and ‘Score’ columns
selected_columns_df = df[[ "Name", "Score" ]]

# Printing Filtered Dataframe
print("Dataframe Filtered based on column names Name and Score is:\n")
print(selected_columns_df)

Dataframe Filtered based on column names Name and Score is:

      Name  Score
0    Alice     85
1      Bob     92
2  Charlie     78
3    David     88
4      Eva     90
