To practice the provided questions in this collab, you can choose one of the following options:

1. **Download the File:**
   - Once the notebook is open, go to the "File" menu at the top left.
   - Select "Download .py" from the dropdown menu.
   - The notebook will be downloaded in Python script format (`.py`).
2. **Copy and Create New Google Colab File:**
   - Open a new Google Colab notebook.
   - Create a new code cell by clicking the "+" button.
   - Paste the copied question into the code cell.


# ***Python***

**Functions**

- **`Functions`**: Functions are blocks of code that perform a specific task. They take input, process it, and return an output. Functions help in organizing code, promoting reusability, and making the code more modular.


https://docs.python.org/3/tutorial/controlflow.html#defining-functions

In [None]:
# Function to calculate the factorial of a number
def factorial(n):
    if n == 0 or n == 1:
        return 1
    else:
        return n * factorial(n - 1)

num = int(input("Enter a number: "))
result = factorial(num)
print("Factorial of", num, "is", result)

Enter a number: 5
Factorial of 5 is 120


Practice Question : Write a function that takes two arguments and returns their sum.

**If-Else**

- **`If-Else Statements`**: If-else statements are used to make decisions in code. They allow you to execute different blocks of code based on whether a condition is true or false. If the condition is true, the code in the "if" block is executed; otherwise, the code in the "else" block is executed.



https://docs.python.org/3/tutorial/controlflow.html#if-statements

In [None]:
# Check if a person is eligible to vote
age = int(input("Enter your age: "))

if age >= 18:
    print("You are eligible to vote.")
else:
    print("You are not eligible to vote.")


KeyboardInterrupt: ignored

Practice Question : Write a program that checks if a given number is even or odd using if-else statements.

**LOOPS**


- **`LOOPS`**: Loops are constructs in programming that allow a specific block of code to be executed repeatedly. They are essential for automating repetitive tasks and iterating over data structures. Loops enable efficient and organized code execution, enhancing code reusability and modularity by encapsulating repetitive operations within a concise structure.

In [None]:

# Function to calculate the sum of numbers from 1 to n using a loop
def calculate_sum(n):
    total = 0
    for i in range(1, n + 1):
        total += i
    return total

# Input from the user
num = int(input("Enter a number: "))

# Call the function and print the result
result = calculate_sum(num)
print("Sum of numbers from 1 to", num, "is", result)

Enter a number: 5
Sum of numbers from 1 to 5 is 15


Practice Question: Write a Python loop that prints the first 10 positive integers in ascending order.


**Match-Case**

- **`Match-Case` :**  The `match` keyword is employed to compare an expression's value with a sequence of patterns. Using the `case` keyword followed by a specific value or pattern, the corresponding code block executes when the expression matches. If none of the cases match, the _ case is activated as the default option. This construct streamlines the decision-making process by facilitating concise pattern-based comparisons.


https://www.geeksforgeeks.org/python-match-case-statement/

In [None]:
# Using the match statement in Python 3.10

def identify_day(day_number):
    match day_number:
        case 1:
            day_name = "Sunday"
        case 2:
            day_name = "Monday"
        case 3:
            day_name = "Tuesday"
        case 4:
            day_name = "Wednesday"
        case 5:
            day_name = "Thursday"
        case 6:
            day_name = "Friday"
        case 7:
            day_name = "Saturday"
        case _:
            day_name = "Invalid day"
    return day_name

user_input = int(input("Enter a day number (1-7): "))
print(identify_day(user_input))

Practice Question : Use match-case to determine the type of a value(integer, float or string).
```python
value = 42




# ***Numpy***

**1. Array Creation**

- `np.array`: Creates a NumPy array, which is a multi-dimensional, homogeneous data structure that can hold various types of elements.

https://numpy.org/doc/stable/reference/generated/numpy.array.html

In [None]:
import numpy as np

# Create a NumPy array
arr = np.array([1, 2, 3, 4, 5])
print("NumPy Array:", arr)

Practice Question : Create a NumPy array containing the values [1, 2, 3, 4, 5].

- `np.zeros`: Generates an array filled with zeros of a specified shape.

https://numpy.org/doc/stable/reference/generated/numpy.zeros.html

In [None]:
# Create an array of zeros
zeros_arr = np.zeros((3, 4))  # 3 rows, 4 columns
print("Zeros Array:")
print(zeros_arr)

Practice Question : Create a 2x3 NumPy array filled with zeros.


- `np.ones`: Creates an array filled with ones of a specified shape.

https://numpy.org/doc/stable/reference/generated/numpy.ones.html


In [None]:
# Create an array of ones
ones_arr = np.ones((2, 3))  # 2 rows, 3 columns
print("Ones Array:")
print(ones_arr)

Practice Question : Create a 3x2 NumPy array filled with ones.

- `np.random.rand`: Generates random numbers in a specified shape from a uniform distribution over [0, 1).

https://numpy.org/doc/stable/reference/random/generated/numpy.random.rand.html

In [None]:

# Generate random numbers between 0 and 1
random_arr = np.random.rand(3, 2)  # 3 rows, 2 columns
print("Random Array:")
print(random_arr)


Practice Question : Generate a 1D NumPy array with 5 random values between 0 and 1.

**2. Array Operations**

In [None]:
# Create two NumPy arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

- `np.add`: Element-wise addition of two arrays or a scalar and an array.

https://numpy.org/doc/stable/reference/generated/numpy.add.html

In [None]:
# Perform element-wise addition
add_result = np.add(arr1, arr2)
print("Addition Result:", add_result)


Practice Question : Add two NumPy arrays element-wise.
```python
array1 = np.array([16, 27, 83])
array2 = np.array([40, 15, 26])


- `np.subtract`: Element-wise subtraction of one array from another or a scalar from an array.

https://numpy.org/doc/stable/reference/generated/numpy.subtract.html

In [None]:
# Perform element-wise subtraction
subtract_result = np.subtract(arr2, arr1)
print("Subtraction Result:", subtract_result)

Practice Question: Subtract one NumPy array from another element-wise.
```python
array1 = np.array([4, 5, 6])
array2 = np.array([1, 2, 3])


- `np.multiply`: Element-wise multiplication of two arrays or a scalar and an array.

https://numpy.org/doc/stable/reference/generated/numpy.multiply.html

In [None]:
# Perform element-wise multiplication
multiply_result = np.multiply(arr1, arr2)
print("Multiplication Result:", multiply_result)

Practice Question: Multiply two NumPy arrays element-wise.
```python
array1 = np.array([2, 3, 4])
array2 = np.array([5, 6, 7])


- `np.divide`: Element-wise division of one array by another or a scalar by an array.

https://numpy.org/doc/stable/reference/generated/numpy.divide.html

In [None]:
# Perform element-wise division
divide_result = np.divide(arr2, arr1)
print("Division Result:", divide_result)

Practice Question : Divide one NumPy array by another element-wise.
```python
array1 = np.array([10, 20, 30])
array2 = np.array([2, 4, 6])


- `np.sqrt`: Computes the element-wise square root of an array.

https://numpy.org/doc/stable/reference/generated/numpy.sqrt.html

In [None]:
# Compute element-wise square root
sqrt_result = np.sqrt(arr1)
print("Square Root Result:", sqrt_result)

Practice Question : Calculate the square root of a NumPy array's elements.
```python
array = np.array([9, 16, 25])


**3. Array Attributes**

In [None]:
import numpy as np

# Create a 2D array
arr = np.array([[1, 2, 3], [4, 5, 6]])

- `ndarray.shape`: Returns a tuple representing the dimensions of the array.

https://numpy.org/doc/stable/reference/generated/numpy.ndarray.shape.html

In [None]:
# Get the shape of the array
shape = arr.shape
print("Array Shape:", shape)

Practice Question : Get the shape of a 2D NumPy array.
```python
array = np.array([[1, 2, 3], [4, 5, 6]])


- `ndarray.size`: Returns the total number of elements in the array

https://numpy.org/doc/stable/reference/generated/numpy.ndarray.size.html

In [None]:
# Get the total number of elements
size = arr.size
print("Array Size:", size)

Practice Question : Find the number of elements in a 1D NumPy array.
```python
array = np.array([10, 20, 30, 40, 50])


- `ndarray.ndim`: Returns the number of dimensions (axes) of the array.

https://numpy.org/doc/stable/reference/generated/numpy.ndarray.ndim.html

In [None]:
# Get the number of dimensions
ndim = arr.ndim
print("Number of Dimensions:", ndim)

Practice Question : Determine the number of dimensions in a 3D NumPy array.
```python
array = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])


- `ndarray.dtype`: Returns the data type of the elements in the array.

https://numpy.org/doc/stable/reference/generated/numpy.ndarray.dtype.html

In [None]:
# Get the data type of the elements
dtype = arr.dtype
print("Data Type:", dtype)

Practice Question : Get the data type of elements in a NumPy array.
```python
array = np.array([1.5, 2.5, 3.5])


**4. Indexing and Slicing:**

- Indexing (`ndarray[index]`): Allows you to access a specific element in a NumPy ndarray using its index value. The index can be an integer or a tuple of integers for multidimensional arrays.

https://numpy.org/doc/stable/reference/arrays.indexing.html

In [None]:
# Create a 1D array
arr_1d = np.array([1, 2, 3, 4, 5])

# Indexing: Access a specific element
element = arr_1d[2]  # Access the element at index 2 (3rd element)
print("Indexed Element:", element)

# Create a 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Indexing: Access an element in a 2D array
element_2d = arr_2d[1, 2]  # Access the element at row 1, column 2 (6)
print("Indexed 2D Element:", element_2d)



Practice Question : Get the element at index 2 from a NumPy array.
```python
array = np.array([10, 20, 30, 40, 50])


- Slicing (`ndarray[start:end]`): Allows you to extract a portion of a NumPy ndarray using a range of indices. The `start` index is inclusive, and the `end` index is exclusive. This creates a new ndarray that contains the specified slice of elements.

https://numpy.org/doc/stable/reference/arrays.indexing.html

In [None]:
# Slicing: Extract a portion of a 1D array
sliced_1d = arr_1d[1:4]  # Extract elements from index 1 to 3 (exclusive)
print("Sliced 1D Array:", sliced_1d)

# Slicing: Extract a portion of a 2D array
sliced_2d = arr_2d[0:2, 1:3]  # Extract rows 0 to 1 and columns 1 to 2
print("Sliced 2D Array:")
print(sliced_2d)

Practice Question : Extract elements from index 1 to 3 (inclusive) from a NumPy array.
```python
array = np.array([10, 20, 30, 40, 50])


5. Aggregation and Statistics:

- `np.sum`: Computes the sum of array elements along a specified axis or over the entire array.

https://numpy.org/doc/stable/reference/generated/numpy.sum.html

In [None]:
# Create a 1D array
arr = np.array([4, 2, 9, 5, 1, 8, 6, 3, 7])

# Compute the sum of array elements
sum_result = np.sum(arr)
print("Sum:", sum_result)

Practice Question : Calculate the sum of elements in a NumPy array.
```python
array = np.array([10, 20, 30, 40, 50])


- `np.mean`: Calculates the mean (average) of array elements along a specified axis or over the entire array.

https://numpy.org/doc/stable/reference/generated/numpy.mean.html

In [None]:
# Calculate the mean of array elements
mean_result = np.mean(arr)
print("Mean:", mean_result)

Practice Question : Calculate the mean (average) of elements in a NumPy array.
```python
array = np.array([15, 25, 35, 45, 55])


- `np.median`: Computes the median (middle value) of array elements along a specified axis or over the entire array.

https://numpy.org/doc/stable/reference/generated/numpy.median.html

In [None]:
# Calculate the median of array elements
median_result = np.median(arr)
print("Median:", median_result)

Practice Question : Find the median of elements in a NumPy array.
```python
array = np.array([8, 10, 12, 14, 16])


- `np.std`: Calculates the standard deviation of array elements along a specified axis or over the entire array.

https://numpy.org/doc/stable/reference/generated/numpy.std.html

In [None]:
# Calculate the standard deviation of array elements
std_result = np.std(arr)
print("Standard Deviation:", std_result)

Practice Question : Calculate the standard deviation of elements in a NumPy array.
```python
array = np.array([5, 10, 15, 20, 25])


- `np.min`: Finds the minimum value among array elements along a specified axis or over the entire array.

https://numpy.org/doc/stable/reference/generated/numpy.min.html

In [None]:
# Find the minimum value in the array
min_result = np.min(arr)
print("Minimum:", min_result)

Practice Question : Find the minimum value in a NumPy array.
```python
array = np.array([18, 12, 9, 23, 15])


- `np.max`: Finds the maximum value among array elements along a specified axis or over the entire array.

https://numpy.org/doc/stable/reference/generated/numpy.max.html

In [None]:
# Find the maximum value in the array
max_result = np.max(arr)
print("Maximum:", max_result)

Practice Question : Find the maximum value in a NumPy array.
```python
array = np.array([32, 45, 27, 54, 38])


**6. Array Manipulation:**

- `ndarray.reshape()`: Reshapes an ndarray to a specified new shape while maintaining the original data. The total number of elements must remain unchanged.

https://numpy.org/doc/stable/reference/generated/numpy.ndarray.reshape.html

In [None]:
import numpy as np

# Create a 2D array
arr = np.array([[1, 2, 3], [4, 5, 6]])

# Reshape the array to a different shape
reshaped_arr = arr.reshape(3, 2)  # Reshaping to 3 rows, 2 columns
print("Reshaped Array:")
print(reshaped_arr)

Practice Question : Reshape a 1D NumPy array into a 2D array with 3 rows and 2 columns.
```python
array = np.array([1, 2, 3, 4, 5, 6])


- `ndarray.transpose()`: Transposes the dimensions of an ndarray, effectively flipping rows and columns.

https://numpy.org/doc/stable/reference/generated/numpy.ndarray.transpose.html

In [None]:
# Transpose the array
transposed_arr = arr.transpose()  # Transpose rows and columns
print("Transposed Array:")
print(transposed_arr)

Practice Question : Transpose a 2D NumPy array (swap rows and columns).
```python
array = np.array([[1, 2, 3], [4, 5, 6]])


**7. Loading and Saving Data**

- `np.load`: Loads and returns an ndarray or a dictionary of ndarrays from a binary file (usually with a `.npy` extension) created using np.save.

https://numpy.org/doc/stable/reference/generated/numpy.load.html

In [None]:
# Create an ndarray
arr = np.array([1, 2, 3, 4, 5])

# Save the ndarray to a file
np.save('saved_array.npy', arr)


print("Original Array:", arr)

- `np.save`: Saves an ndarray or a dictionary of ndarrays to a binary file with the `.npy` extension, preserving the data and metadata of the array(s).

https://numpy.org/doc/stable/reference/generated/numpy.save.html

In [None]:
# Load the saved ndarray from the file
loaded_arr = np.load('saved_array.npy')

print("Loaded Array:", loaded_arr)

**8. Handling Missing Data:**

- `np.nan`: Represents a floating-point "Not a Number" value, often used to indicate missing or undefined data in arrays.

https://numpy.org/doc/stable/reference/constants.html#numpy.nan

In [None]:
# Create an array with NaN values
arr_with_nan = np.array([1.0, np.nan, 3.0, np.nan, 5.0])

Practice Question : Create a NumPy array with a NaN (Not a Number) value.

- `np.isnan`: Returns a boolean array indicating which elements of an array are NaN (Not a Number).


https://numpy.org/doc/stable/reference/generated/numpy.isnan.html

In [None]:
# Check for NaN values using np.isnan
nan_mask = np.isnan(arr_with_nan)
print("NaN Mask:", nan_mask)

Practice Question : Check for NaN values in a NumPy array.
```python
array = np.array([1, 2, np.nan, 4, 5])


- `np.where`: Returns the indices where a specified condition is met in an array, or returns values from two arrays based on a condition.

https://numpy.org/doc/stable/reference/generated/numpy.where.html

In [None]:
# Create an array with conditions
condition = np.array([True, False, True, False, True])

# Use np.where to get indices where the condition is True
indices = np.where(condition)
print("Indices where condition is True:", indices)

# Use np.where to get values based on a condition
values = np.where(condition, arr_with_nan, 0)
print("Values based on condition:", values)

Practice Question : Find the indices of non-NaN values in a NumPy array.
```python
array = np.array([1, 2, np.nan, 4, 5])


# ***Pandas***


**1. DataFrame Creation:**

**- `pd.DataFrame()`:** Creates a two-dimensional tabular data structure (DataFrame) in pandas, where data is organized in rows and columns. You can use it to manually structure and input data.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

In [None]:
import pandas as pd

# Creating a DataFrame from a dictionary
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 28],
        'City': ['New York', 'Los Angeles', 'Chicago']}

df = pd.DataFrame(data)
print(df)

-  `pd.read_csv()` Reads data from a CSV file and constructs a DataFrame in pandas. This function automatically infers column names and data types from the file's first row, treating it as a header.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html


In [None]:
df = pd.read_csv('File name')

- **Practice Question**

In [None]:
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 22, 28, 24],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami']
}

Practice Question : Create a dataframe using above data

**2. Data Cleaning**

In [None]:
data = {'Product': ['A', 'B', 'C', 'D'],
        'Sales': [100, None, 150, 200]}

df = pd.DataFrame(data)

In [None]:
# Table Name - Student Grades

dataset = {
    'Name': ['Alice', 'Bob', 'Carol', 'David', 'Alice'],
    'Math': [85, 70, 92, 64, 78],
    'English': [90, 78, 88, 72, 85],
    'Science': [88, 82, 95, 68, 80]
}

* `df.rename()`: Renames columns or index labels of a DataFrame.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename.html



In [None]:
# Rename columns
df_renamed = df.rename(columns={'Product': 'Item', 'Sales': 'Revenue'})

print(df_renamed)

**Practice Question :**
Rename the columns of the "Student Grades" table to 'Student', 'Math Score', 'English Score', and 'Science Score'. Display the modified table.

* `df.replace()`: Replaces specified values in a DataFrame with new values.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.replace.html

In [None]:
# Replace missing values
df_replaced = df.replace(None, 0)

**Practice Question :**
Replace all occurrences of 'Alice' with 'Alicia' in the 'Name' column. Display the modified table.

* `df.dropna()`: Removes rows with missing values from a DataFrame.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html

In [None]:
# Drop rows with missing values
df_dropped = df.dropna()

* `df.fillna()`: Fills missing values in a DataFrame with specified values or using specific methods.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html




**Question 4:**
Fill in the missing values in the 'Math' column with the average math score (rounded to the nearest whole number). Display the modified table.


In [None]:
# Fill missing values with a specific value
df_filled = df.fillna(0)

* `df.drop_duplicates()`: Removes duplicate rows from a DataFrame.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop_duplicates.html

In [None]:
# Drop duplicate rows
df_duplicates_removed = df.drop_duplicates()

**Practice Question :**
Remove duplicate rows from the "Student Grades" table based on the student's name. Display the modified table.

* `df.astype()`: Converts the data type of one or more columns in a DataFrame.bold text

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html

In [None]:
# Convert 'Sales' column to float data type
df_converted = df.astype({'Sales': float})

- `df.isnull()`: Returns a DataFrame of the same shape as the input, with Boolean values indicating the presence of missing values.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isnull.html

In [None]:
# Check for missing values
missing_values_check = df.isnull()

- **Practice questions**








**Question 5:**
Create a new DataFrame that shows whether each cell in the "Student Grades" table is null (True) or not (False). Display the DataFrame.


**3. Data Manipulation**

In [None]:
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [22, 21, 23],
        'Grade': ['A', 'B', 'B']}

df = pd.DataFrame(data)

In [None]:
# Practice question table
dataset = {
    'Name': ['Alice', 'Bob', 'Carol', 'David', 'Emily'],
    'Age': [25, 30, 22, 28, 24],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami']
}


- `df.loc[]`: Allows you to access a group of rows and columns by labels or a boolean array along both axes.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html

In [None]:
# Using df.loc[] to access rows and columns by labels
alice_data = df.loc[0]  # Access data for Alice
grades = df.loc[:, 'Grade']  # Access 'Grade' column

print(alice_data)

print(grades)

Practice Question : select the rows where the 'Age' column is greater than 25 from the given practice table1.

- `df.iloc[]`: Provides integer-based indexing to access rows and columns of a DataFrame.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html

In [None]:
# Using df.iloc[] to access rows and columns by integer indexing
first_row = df.iloc[0]  # Access the first row
ages = df.iloc[:, 1]  # Access the 'Age' column

print(first_row)
print(ages)

Practice Question : select the data in the second row and third column from the given pracice table1.

- `df.apply()`: Applies a function along an axis of the DataFrame.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

In [None]:
# Using df.apply() to apply a function along an axis
def add_years(age):
    return age + 1

df['Age_Plus_One'] = df['Age'].apply(add_years)  # Apply the function to 'Age' column

print(df)

Practice Question : Calculate the total score (sum of Math, English, and Science) for each student in the given DataFrame.
```python
dataset = {
    'Name': ['Alice', 'Bob', 'Carol', 'David', 'Emily'],
    'Math': [85, 70, 92, 64, 78],
    'English': [90, 78, 88, 72, 85],
    'Science': [88, 82, 95, 68, 80]
}



- `df.map()`: Applies a function element-wise to the elements of a column or a Series.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.map.html

In [None]:
# Using df.map() to apply a function element-wise to a Series
grade_mapping = {'A': 'Excellent', 'B': 'Good'}
df['Grade_Description'] = df['Grade'].map(grade_mapping)  # Map grades to descriptions

print(df['Grade_Description'])

Practice Question : Replace the city names with their corresponding city codes in the 'City' column of the given DataFrame.
```python
dataset = {
    'Name': ['Alice', 'Bob', 'Carol', 'David', 'Emily'],
    'Age': [25, 30, 22, 28, 24],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami']
}


**4. Sorting and Aggregating**

In [None]:
data = {'Product': ['A', 'B', 'A', 'B', 'A'],
        'Sales': [100, 150, 200, 120, 180],
        'Region': ['East', 'West', 'East', 'West', 'East']}

df = pd.DataFrame(data)

In [None]:
# practice question table
dataset = {
    'Name': ['Alice', 'Bob', 'Carol', 'David', 'Emily'],
    'Age': [25, 30, 22, 28, 24],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami']
}


- `df.sort_values()`: Sorts the DataFrame rows based on specified column(s) values.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html

In [None]:
# Sort the DataFrame by 'Sales' in ascending order
df_sorted = df.sort_values(by='Sales')

print(df)

Practice Question : Sort the DataFrame by the 'Age' column in ascending order.

- `df.groupby()`: Groups the DataFrame rows based on unique values in one or more columns.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

In [None]:
# Group the DataFrame by 'Product'
grouped = df.groupby('Product')

Practice Question : Group the DataFrame by the 'City' column and calculate the mean age and maximum age for each city.

- `group.agg()`: Performs aggregation operations on groups created using `groupby()`.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.agg.html

In [None]:
# Aggregate sales data for each product group
agg_results = grouped['Sales'].agg(['sum', 'mean'])

print(agg_results)

- `df.reset_index()`: Resets the index of the DataFrame, optionally adding a new default index.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reset_index.html

In [None]:
# Reset the index of the grouped DataFrame
agg_results_reset = agg_results.reset_index()

Practice Question : Reset the index of the grouped DataFrame from Question 2 to the default index.

**5. Data Visualization**

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

data = {'Category': ['A', 'B', 'C', 'D', 'E'],
        'Value': [10, 20, 15, 25, 30]}

df = pd.DataFrame(data)


- `df.plot()`: Creates various types of plots directly from a DataFrame using the `matplotlib` library.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html

In [None]:
# Create a line plot using df.plot()
df.plot(x='Category', y='Value', kind='line')
plt.title('Line Plot')
plt.show()

**Practice Question :** Create a line plot of the 'Age' column against the 'Name' column from the given DataFrame.
```python
data = {
    'Name': ['Alice', 'Bob', 'Carol', 'David', 'Emily'],
    'Age': [25, 30, 22, 28, 24],
}


- `plt.hist()`: Plots a histogram to visualize the distribution of data using `matplotlib`.

https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html

In [None]:
# Create a histogram using plt.hist()
plt.hist(df['Value'], bins=3, edgecolor='black')
plt.title('Histogram')
plt.show()

**Practice Question :** Create a histogram of the 'Age' column from the given DataFrame .
```python
data = {
    'Name': ['Alice', 'Bob', 'Carol', 'David', 'Emily'],
    'Age': [25, 30, 22, 28, 24],
}


- `plt.boxplot()`: Generates a box plot to display the distribution and identify outliers using `matplotlib`.

https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.boxplot.html

In [None]:
# Create a box plot using plt.boxplot()
plt.boxplot(df['Value'])
plt.title('Box Plot')
plt.show()

**Practice Question :** Generate a box plot of the 'Age' column from the given DataFrame.
```python
data = {
    'Name': ['Alice', 'Bob', 'Carol', 'David', 'Emily'],
    'Age': [25, 30, 22, 28, 24],
}


- `plt.bar()`: Creates a bar chart to compare categorical data using `matplotlib`.

https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.bar.html

In [None]:
# Create a bar chart using plt.bar()
plt.bar(df['Category'], df['Value'])
plt.title('Bar Chart')
plt.show()

**Practice Question :** Create a bar plot of the 'Name' column against the 'Age' column using the given DataFrame.
```python
data = {
    'Name': ['Alice', 'Bob', 'Carol', 'David', 'Emily'],
    'Age': [25, 30, 22, 28, 24],
}


- `plt.scatter()`: Produces a scatter plot to show the relationship between two numerical variables using `matplotlib`

https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html

In [None]:
# Create a scatter plot using plt.scatter()
plt.scatter(df['Category'], df['Value'])
plt.title('Scatter Plot')
plt.show()

Practice Question : Generate a scatter plot of the 'Math' column against the 'Science' column from the given DataFrame .
```python
data = {
    'Name': ['Alice', 'Bob', 'Carol', 'David', 'Emily'],
    'Math': [85, 70, 92, 64, 78],
    'Science': [88, 82, 95, 68, 80]
}


- `sns.barplot()`: Generates a bar plot using the `seaborn` library, often used for statistical estimation.

https://seaborn.pydata.org/generated/seaborn.barplot.html

In [None]:
# Create a bar plot using sns.barplot()
sns.barplot(x='Category', y='Value', data=df)
plt.title('Seaborn Bar Plot')
plt.show()

Practice Question : Use Seaborn's `sns.barplot()` to create a bar plot of the average 'Age' for each 'Name' from the given DataFrame.
```python
data = {
    'Name': ['Alice', 'Bob', 'Carol', 'David', 'Emily'],
    'Age': [25, 30, 22, 28, 24],
}


- `sns.lineplot()`: Creates a line plot using the `seaborn` library to visualize trends or relationships in data.

https://seaborn.pydata.org/generated/seaborn.lineplot.html

In [None]:
# Create a line plot using sns.lineplot()
sns.lineplot(x='Category', y='Value', data=df)
plt.title('Seaborn Line Plot')
plt.show()

Practice Question : Generate a line plot of the 'Age' column against the 'Name' column  from the given DataFrame.
```python
data = {
    'Name': ['Alice', 'Bob', 'Carol', 'David', 'Emily'],
    'Age': [25, 30, 22, 28, 24],
}


**6. Data Transformation**

In [None]:
data = {'Product': ['A', 'A', 'B', 'B'],
        'Region': ['East', 'West', 'East', 'West'],
        'Sales': [100, 150, 200, 120]}

df = pd.DataFrame(data)

- `pd.pivot_table()`: Creates a pivot table from a DataFrame, allowing you to summarize and aggregate data based on multiple dimensions.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html

In [None]:
# Create a pivot table to summarize sales by product and region
pivot_table = pd.pivot_table(df, values='Sales', index='Product', columns='Region', aggfunc='sum')

print("Pivot Table:")
print(pivot_table)

Practice Question : Create a pivot table from the given DataFrame that shows the average 'Score' for each 'Subject' and 'Student'.
```python
data = {
    'Student': ['Alice', 'Bob', 'Alice', 'Bob', 'Alice', 'Bob'],
    'Subject': ['Math', 'Math', 'English', 'English', 'Science', 'Science'],
    'Score': [85, 70, 90, 78, 88, 82]
}


- `df.stack()`: Reshapes the DataFrame by "stacking" the columns to create a multi-level index.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.stack.html

In [None]:
# Stack the DataFrame to create a multi-level index
stacked_df = df.set_index(['Product', 'Region']).stack()

print("\nStacked DataFrame:")
print(stacked_df)

Practice Question : Transform the given DataFrame into a stacked format.
```python
data = {
    'Student': ['Alice', 'Bob'],
    'Math': [85, 70],
    'English': [90, 78],
    'Science': [88, 82]
}


- `df.unstack()`: Reshapes a multi-level index DataFrame by "unstacking" one level of the index to create columns.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.unstack.html

In [None]:
# Unstack the DataFrame to reshape it back
unstacked_df = stacked_df.unstack()

print("\nUnstacked DataFrame:")
print(unstacked_df)

Practice Question : Unstack the 'Subject' index level of the pivot table from Question 1.
```python
data = {
    'Student': ['Alice', 'Bob'],
    'Math': [85, 70],
    'English': [90, 78],
    'Science': [88, 82]
}


**7. Handling DateTime Data**

In [None]:
import pandas as pd

data = {'Date': ['2023-01-15', '2023-03-20', '2023-06-10']}
df = pd.DataFrame(data)

- `pd.to_datetime()`: Converts a column or array of date-like values into datetime objects.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html

In [None]:
# Convert the 'Date' column to datetime objects
df['Date'] = pd.to_datetime(df['Date'])

print(df['Date'])

Practice Question : Convert the 'Date' column in the given DataFrame to datetime format .
```python
data = {
    'Name': ['Alice', 'Bob', 'Carol'],
    'Date': ['2022-03-15', '2021-09-10', '2023-01-25']
}


- `df.dt.year`: Extracts the year component from a datetime-like column in a DataFrame.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.year.html

In [None]:
# Extract the year components
df['Year'] = df['Date'].dt.year

print(df['Year'])

Practice Question : Extract the year from the 'Date' column in the given DataFrame .
```python
data = {
    'Name': ['Alice', 'Bob', 'Carol'],
    'Date': ['2022-03-15', '2021-09-10', '2023-01-25']
}


- `df.dt.month`: Extracts the month component from a datetime-like column in a DataFrame.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.month.html

In [None]:
# Extract the  month components

df['Month'] = df['Date'].dt.month

print(df['Month'])

Practice Question : Extract the month from the 'Date' column in the given DataFrame .
```python
data = {
    'Name': ['Alice', 'Bob', 'Carol'],
    'Date': ['2022-03-15', '2021-09-10', '2023-01-25']
}


**8. Merging Data**

- `pd.merge()`: Combines two or more DataFrames based on common columns or indices using various types of joins, similar to SQL join operations.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge.html

In [None]:
data1 = {'ID': [1, 2, 3],
         'Name': ['Alice', 'Bob', 'Charlie']}
df1 = pd.DataFrame(data1)

data2 = {'ID': [1, 2, 4],
         'Age': [25, 30, 28]}
df2 = pd.DataFrame(data2)

# Merge the DataFrames based on the 'ID' column using an inner join
merged_df = pd.merge(df1, df2, on='ID', how='inner')

print(merged_df)


Practice Question : Perform an inner merge between the two given DataFrames ('left_df' and 'right_df') on the 'ID' column.
```python
left_data = {
    'ID': [1, 2, 3],
    'Name': ['Alice', 'Bob', 'Carol']
}

right_data = {
    'ID': [2, 3, 4],
    'Age': [25, 30, 22]
}


**9. Handling Large Datasets**

- `pd.concat()`: Concatenates (joins) multiple DataFrames along a specified axis, either row-wise or column-wise, allowing you to combine data from different sources or manipulate dataframes together.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html

In [None]:
data1 = {'ID': [1, 2, 3],
         'Name': ['Alice', 'Bob', 'Charlie']}
df1 = pd.DataFrame(data1)

data2 = {'ID': [4, 5, 6],
         'Name': ['David', 'Eve', 'Frank']}
df2 = pd.DataFrame(data2)

# Concatenate the DataFrames row-wise
concatenated_df = pd.concat([df1, df2], ignore_index=True)

print(concatenated_df)


Practice Question : Concatenate the two given DataFrames ('df1' and 'df2') vertically.
```python
data1 = {
    'Name': ['Alice', 'Bob', 'Carol'],
    'Age': [25, 30, 22]
}

data2 = {
    'Name': ['David', 'Emily'],
    'Age': [28, 24]
}
