# Module 10: Advanced Python Tools Assignment (Data Science)

This notebook contains solutions for all three parts of the Module 10 assignment:
1. **NumPy**
2. **Pandas**
3. **Matplotlib**

Each section includes explanations, code, and output visualizations.

## Assignment 1: Working with NumPy

**Objective:** Understand the basics of NumPy, array creation, and manipulation.

### Step 1: Import and Setup

In [None]:
import numpy as np
print('NumPy imported successfully!')

### 1. Create a 1D NumPy array with integers from 1 to 20 and perform operations

In [None]:
# Create a 1D array
arr = np.arange(1, 21)
print("Array:", arr)

# Compute statistics
print("Sum:", np.sum(arr))
print("Mean:", np.mean(arr))
print("Median:", np.median(arr))
print("Standard Deviation:", np.std(arr))

# Find indices of elements > 10
indices = np.where(arr > 10)
print("Indices of elements greater than 10:", indices)

### 2. Create a 2D NumPy array of shape 4x4 with numbers 1–16

In [None]:
arr2d = np.arange(1, 17).reshape(4, 4)
print("4x4 Array:\n", arr2d)

# Transpose
print("\nTranspose:\n", arr2d.T)

# Row-wise and column-wise sums
print("\nRow-wise sums:", np.sum(arr2d, axis=1))
print("Column-wise sums:", np.sum(arr2d, axis=0))

### 3. Create two 3x3 arrays of random integers (1–20) and perform arithmetic operations

In [None]:
A = np.random.randint(1, 21, (3,3))
B = np.random.randint(1, 21, (3,3))

print("Matrix A:\n", A)
print("Matrix B:\n", B)

# Element-wise operations
print("\nA + B =\n", A + B)
print("\nA - B =\n", A - B)
print("\nA * B =\n", A * B)

# Dot product
print("\nDot Product (A.B) =\n", np.dot(A, B))

### 4. Reshape a 1D array of size 12 into a 3x4 2D array and slice first 2 rows and last 2 columns

In [None]:
arr_1d = np.arange(1, 13)
arr_2d = arr_1d.reshape(3, 4)
print("Original 3x4 array:\n", arr_2d)

# Slice first 2 rows and last 2 columns
sliced = arr_2d[:2, -2:]
print("\nSliced array (first 2 rows, last 2 columns):\n", sliced)

## Assignment 2: Working with Pandas

**Objective:** Learn to create and manipulate DataFrames for data analysis.

In [None]:
import pandas as pd
print('Pandas imported successfully!')

### 1. Create a DataFrame with employee data

In [None]:
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [24, 27, 22, 32, 29],
    'Department': ['HR', 'Finance', 'IT', 'Marketing', 'HR'],
    'Salary': [45000, 54000, 50000, 62000, 47000]
}

df = pd.DataFrame(data)
print(df)

# Summary statistics
print("\nSummary Statistics (Age and Salary):")
print(df[['Age', 'Salary']].describe())

# Average salary of HR employees
avg_hr_salary = df[df['Department'] == 'HR']['Salary'].mean()
print("\nAverage HR Salary:", avg_hr_salary)

### 2. Add a new column 'Bonus' (10% of Salary)

In [None]:
df['Bonus'] = df['Salary'] * 0.10
print(df)

### 3. Filter employees aged between 25 and 30

In [None]:
filtered_df = df[(df['Age'] >= 25) & (df['Age'] <= 30)]
print(filtered_df)

### 4. Group by Department and calculate average salary

In [None]:
grouped = df.groupby('Department')['Salary'].mean()
print(grouped)

### 5. Sort by Salary and save to CSV

In [None]:
sorted_df = df.sort_values(by='Salary', ascending=True)
print(sorted_df)

# Save to CSV file
sorted_df.to_csv('sorted_employees.csv', index=False)
print("\nCSV file 'sorted_employees.csv' created successfully!")

## Assignment 3: Working with Matplotlib

**Objective:** Practice data visualization techniques for better data representation.

In [None]:
import matplotlib.pyplot as plt
print('Matplotlib imported successfully!')

### 1. Line Plot

In [None]:
x = [1, 2, 3, 4, 5]
y = [10, 15, 25, 30, 50]

plt.plot(x, y, marker='o', linestyle='-', color='b')
plt.title('Simple Line Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.grid(True)
plt.show()

### 2. Bar Graph for Student Marks

In [None]:
students = ['John', 'Jane', 'Alice', 'Bob']
marks = [75, 85, 60, 90]

plt.bar(students, marks, color=['skyblue', 'orange', 'green', 'purple'])
plt.title('Student Marks in a Subject')
plt.xlabel('Students')
plt.ylabel('Marks')
plt.show()

### 3. Pie Chart for Revenue Distribution

In [None]:
regions = ['North America', 'Europe', 'Asia', 'Others']
revenue = [45, 25, 20, 10]

explode = [0.1 if r == max(revenue) else 0 for r in revenue]

plt.pie(revenue, labels=regions, autopct='%1.1f%%', startangle=90, explode=explode)
plt.title('Revenue Distribution by Region')
plt.show()

### 4. Histogram for Frequency Distribution

In [None]:
data = np.random.randint(1, 101, 1000)

plt.hist(data, bins=20, color='teal', edgecolor='black')
plt.title('Frequency Distribution of Random Integers')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

### ✅ Summary
This notebook includes complete implementations of all three assignments:
1. NumPy operations
2. Pandas data manipulation
3. Matplotlib visualizations

You can run each cell to see the outputs and plots.