# **Activity 2: Python-Pandas Exercise**

Objectives:
- Understand Python syntax (variables, loops, functions).
- Learn Pandas basics (Series, DataFrames, reading files).
- Perform data cleaning (handling missing values, correcting formats, removing duplicates).
- Apply concepts in a real-world case study.

# Part 1: Hands-on Python & Pandas Basics

1. Install the Pandas library in your environment.

2. Import the  pandas package under the name `pd`

3. Print the pandas version

4. Create a variable `x` with the value 10 and a string variable `y` with "Fortes in Fide!"

5. Define a list with numbers `[1, 2, 3, 4, 5]` and a dictionary with keys `name` and `age`

6. Write a function `greet(name)` that returns "Magis, (name)"!

7. Write a Python function that takes a user’s name as input and prints a personalized greeting.

8. Modify **Number 7** that if the user does not enter a name, it defaults to "Guest".

9. Create a Pandas Series from `[10, 20, 30, 40]`.

10.  Create a DataFrame with columns `A` and `B`.

# Part 2: Working with a Dataset 🛥️

1. Load the Titanic dataset from a local file and display the first five rows.

2. Display the dataset's column names, data types.

3. Display the dataset's missing values.

4. Display the `Name`, `Age`, and `Fare` columns from the dataset. (first 10)

 5. Print the descriptive statistics of the Titanic dataset.

6. Remove rows with missing values in the `Age` column.

7. Remove duplicate rows from the dataset.

8. Compute and display the correlation matrix of the dataset.

# Part 2: Working with Case Studies

When working on these case studies, **always ensure that your code is properly documented and clearly presented**. Follow these key principles:  

### **1. Always Show Your Code**  
- Every step of data exploration, cleaning, and analysis should include **visible code outputs**.  
- Do not skip showing your process, as transparency is essential for reproducibility.  

### **2. Proper Documentation is Necessary**  
- Use **comments (`#`) in Python** to explain your code clearly.  
- Add **Markdown cells** to describe each step before executing the code.  
- Explain key findings in simple language to make the analysis easy to understand.  

### **3. Use Readable and Organized Code**  
- Follow a **step-by-step approach** to keep the notebook structured.  
- Use **proper variable names** and avoid hardcoding values where possible.

# **Case Study 1: Iris Flower Classification** 🌸  

### **Background**  
A botanical research institute wants to develop an automated system that classifies different species of **iris flowers** based on their **sepal and petal measurements**.  The dataset consists of **150 samples**, labeled as **Setosa, Versicolor, or Virginica**.  

### **Problem Statement**  
Can we use **sepal and petal dimensions** to correctly classify the **species of an iris flower**?  

### **Task Description**  

#### **1. Data Exploration**  
- Load the dataset and display the first few rows.  
- Identify any missing or inconsistent values.  

#### **2. Data Cleaning**  
- Check for missing values and handle them appropriately.  
- Convert categorical species labels into a format suitable for analysis.  

#### **3. Basic Data Analysis**  
- Find the average sepal and petal dimensions for each species.  
- Identify correlations between different flower measurements.  

#### **4. Visualization**  
- Create simple visualizations (e.g., histograms, scatter plots) to understand data distribution.  

#### **5. Insights & Interpretation**  
- Summarize key findings, such as which features best distinguish flower species.  

# **Case Study 2: Netflix Content Analysis** 🎬  

## **Background**  
Netflix is a leading streaming platform with a vast collection of movies and TV shows. The company wants to analyze its **content library** to understand trends in **genres, release years, and regional distribution**.  

## **Problem Statement**  
How can we use **Netflix’s dataset** to gain insights into content distribution, popular genres, and release trends over time?  

## **Task Description**  

### **1. Data Exploration**  
- Load the dataset and inspect its structure.  
- Identify key columns such as title, genre, release year, and country.  

### **2. Data Cleaning**  
- Check for missing or incorrect values in key columns.  
- Remove duplicates and format the date-related data properly.  

### **3. Basic Data Analysis**  
- Count the number of movies vs. TV shows.  
- Identify the most common genres and countries producing content.  
- Analyze the number of releases per year to observe trends.  

### **4. Insights & Interpretation**  
- Summarize key findings, such as trends in Netflix's content production over time.  
