# Assignment 2 - Working with Pandas

**Objective:** Learn to create and manipulate DataFrames for data analysis.

This assignment covers:
- Creating and displaying DataFrames
- Computing summary statistics
- Adding new columns
- Filtering data
- Grouping and aggregating data
- Sorting and saving data to CSV

In [None]:
# Import Pandas library
import pandas as pd

## Task 1: Create and Explore a DataFrame

Create a DataFrame with employee information and perform basic operations.

In [None]:
# Create the DataFrame with employee data
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [24, 27, 22, 32, 29],
    'Department': ['HR', 'Finance', 'IT', 'Marketing', 'HR'],
    'Salary': [45000, 54000, 50000, 62000, 47000]
}

df = pd.DataFrame(data)
print("Employee DataFrame:")
print(df)

### Task 1a: Print the first five rows

In [None]:
# Display the first five rows
print("First five rows of the DataFrame:")
print(df.head())

### Task 1b: Get summary statistics of 'Age' and 'Salary' columns

In [None]:
# Get summary statistics for Age and Salary columns
print("Summary statistics for Age:")
print(df['Age'].describe())

print("\nSummary statistics for Salary:")
print(df['Salary'].describe())

### Task 1c: Calculate the average salary of employees in the 'HR' department

In [None]:
# Filter employees in HR department and calculate average salary
hr_avg_salary = df[df['Department'] == 'HR']['Salary'].mean()
print(f"Average salary of employees in HR department: ${hr_avg_salary:,.2f}")

## Task 2: Add a New Column

Add a 'Bonus' column which is 10% of the salary.

In [None]:
# Add a new column 'Bonus' which is 10% of Salary
df['Bonus'] = df['Salary'] * 0.10

print("DataFrame with Bonus column:")
print(df)

## Task 3: Filter the DataFrame

Filter the DataFrame to show employees aged between 25 and 30.

In [None]:
# Filter employees aged between 25 and 30 (inclusive)
filtered_df = df[(df['Age'] >= 25) & (df['Age'] <= 30)]

print("Employees aged between 25 and 30:")
print(filtered_df)

## Task 4: Group by Department

Group the data by 'Department' and calculate the average salary for each department.

In [None]:
# Group by Department and calculate average salary
dept_avg_salary = df.groupby('Department')['Salary'].mean()

print("Average salary by Department:")
print(dept_avg_salary)

# Alternative: Show as a DataFrame with better formatting
print("\nAverage salary by Department (as DataFrame):")
dept_stats = df.groupby('Department')['Salary'].mean().reset_index()
dept_stats.columns = ['Department', 'Average Salary']
print(dept_stats)

## Task 5: Sort and Save to CSV

Sort the DataFrame by 'Salary' in ascending order and save the result to a new CSV file.

In [None]:
# Sort the DataFrame by Salary in ascending order
sorted_df = df.sort_values(by='Salary', ascending=True)

print("DataFrame sorted by Salary (ascending):")
print(sorted_df)

# Save to CSV file
sorted_df.to_csv('employees_sorted_by_salary.csv', index=False)
print("\nDataFrame saved to 'employees_sorted_by_salary.csv'")

## Summary

This assignment covered:
- Creating DataFrames from dictionaries
- Displaying and exploring DataFrames with `head()` and `describe()`
- Filtering data based on conditions
- Adding new columns with computed values
- Grouping data and calculating aggregates
- Sorting DataFrames
- Saving DataFrames to CSV files

Pandas is an essential library for data manipulation and analysis in Python, providing powerful tools for working with structured data.