# Pandas Quizz

Here some sample data for the exercises:

In [None]:
import pandas as pd

In [None]:
# Sample data for demonstration
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, None, 35, 45, 29],
    'JoinDate': ['2023-01-15', '2022-05-10', '2023-07-19', '2021-11-03', '2022-08-21'],
    'Department': ['HR', 'IT', 'IT', 'HR', 'Marketing'],
    'Salary': [55000, 60000, 75000, 50000, 70000]
}

# Convert data into a DataFrame
df = pd.DataFrame(data)

display(df)

## Task 1: Filtering Data

*Key Concepts: DataFrame filtering, logical conditions.*

- Filter the df with employees to show only those working in the "IT" department.
- Further, filter this subset to include only employees with a salary above 60'000.

In [None]:
# solve the task here:

## Task 2: Handling Missing Data

*Key Concepts: Checking for missing values, handling missing data, calculating averages.*

- Check for any missing values in each column of the DataFrame.
- If there are missing values in the "Age" column, fill these with the average age of all employees (`df['Age'].mean()`).

In [None]:
# solve the task here:

## Task 3: Date and Time Manipulation and Aggregation with Groupby

*Key Concepts: Date and time conversion, creating new columns, groupby with aggregation.*

- Convert the "JoinDate" column to a date format.
- Calculate the number of days each employee has been in the company based on the "JoinDate" column and add this as a new column called "DaysInCompany". (Hint: The current date is `pd.Timestamp.now()`)
- Use `groupby` to calculate the average salary and maximum "DaysInCompany" for each department.

In [None]:
# solve the task here:

## Solutions

In [None]:
# Task 1: Filtering Data
# ----------------------
# Step 1: Filter to show only employees in the 'IT' department
it_department = df[df['Department'] == 'IT']
print(it_department)

# Step 2: Further filter to include only employees with Salary > 60000
high_salary_it = it_department[it_department['Salary'] > 60000]
print(f"Employees in IT department with salary above 60000:\n {high_salary_it}")

# Task 2: Handling Missing Data
# -----------------------------
# Step 1: Check for missing values in each column
print(f"\nMissing values in each column:\n {df.isna().sum(axis=0)}")

# Step 2: Fill missing 'Age' values with the average age
average_age = df['Age'].mean()
df['Age'] = df['Age'].fillna(average_age)
print(f"\nData after filling missing 'Age' values with the average:\n {df}", )

# Task 3: Date and Time Manipulation and Aggregation with Groupby
# ---------------------------------------------------------------
# Step 1: Convert 'JoinDate' to a datetime format
df['JoinDate'] = pd.to_datetime(df['JoinDate'], format="%Y-%m-%d")

# Step 2: Calculate 'DaysInCompany' based on 'JoinDate'
df['DaysInCompany'] = (pd.Timestamp.now() - df['JoinDate']).dt.days
print(f"\nData with 'DaysInCompany' calculated:\n {df}")

# Step 3: Group by 'Department' and calculate average salary and max 'DaysInCompany'
department_summary = df.groupby('Department').agg(
    AverageSalary=('Salary', 'mean'),
    MaxDaysInCompany=('DaysInCompany', 'max')
)
print(f"\nDepartment summary (average salary and max days in company):\n {department_summary}")