#Q1. List any five functions of the pandas library with execution.  

## **Here are five commonly used functions in Pandas with examples:**

1. **pd.read_csv()**: Used to read a CSV file and load it into a DataFrame.  
2. **df.head()**: Displays the first `n` rows of the DataFrame.  
3. **df.describe()**: Provides summary statistics for numerical columns.  
4. **df.drop()**: Removes rows or columns from the DataFrame.  
5. **df.sort_values()**: Sorts the DataFrame by a specific column.

### **Example with execution:**

In [1]:
import pandas as pd

# 1. pd.read_csv() - Reading a CSV file into a DataFrame (assuming 'data.csv' exists)
# df = pd.read_csv('data.csv')

# 2. df.head() - Displaying the first 3 rows of the DataFrame
data = {'Name': ['Alice', 'Bob', 'Clarie', 'David'],
        'Age': [25, 30, 27, 22],
        'Gender': ['Female', 'Male', 'Female', 'Male']}
df = pd.DataFrame(data)
print("Head of DataFrame:\n", df.head(3))

# 3. df.describe() - Displaying summary statistics
print("\nSummary Statistics:\n", df.describe())

# 4. df.drop() - Dropping the 'Gender' column
df_dropped = df.drop(columns=['Gender'])
print("\nDataFrame after dropping 'Gender' column:\n", df_dropped)

# 5. df.sort_values() - Sorting by 'Age' column
sorted_df = df.sort_values(by='Age', ascending=False)
print("\nDataFrame sorted by 'Age':\n", sorted_df)

Head of DataFrame:
      Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Clarie   27  Female

Summary Statistics:
              Age
count   4.000000
mean   26.000000
std     3.366502
min    22.000000
25%    24.250000
50%    26.000000
75%    27.750000
max    30.000000

DataFrame after dropping 'Gender' column:
      Name  Age
0   Alice   25
1     Bob   30
2  Clarie   27
3   David   22

DataFrame sorted by 'Age':
      Name  Age  Gender
1     Bob   30    Male
2  Clarie   27  Female
0   Alice   25  Female
3   David   22    Male


# Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the DataFrame with a new index that starts from 1 and increments by 2 for each row.

## **Solution:**  
We can use the `pd.Index` function to create a custom index that starts from 1 and increments by 2 for each row. Then, we assign this new index to the DataFrame using the `df.index` attribute.

### **Function Implementation:**

In [2]:
import pandas as pd

# Creating a sample DataFrame
data = {
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8],
    'C': [9, 10, 11, 12]
}
df = pd.DataFrame(data)

# Function to re-index DataFrame with custom index
def reindex_dataframe(df):
    # Creating a new index that starts from 1 and increments by 2
    new_index = pd.Index(range(1, len(df) * 2 + 1, 2))

    # Re-indexing the DataFrame
    df.index = new_index
    return df

# Applying the function to re-index the DataFrame
df_reindexed = reindex_dataframe(df)
print(df_reindexed)

   A  B   C
1  1  5   9
3  2  6  10
5  3  7  11
7  4  8  12


# Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The function should print the sum to the console.  

## **Solution:**  
We can use the `head()` function to get the first three rows of the 'Values' column and then calculate the sum. Alternatively, we can also use slicing to directly select the first three values from the column.

### **Function Implementation:**

In [3]:
import pandas as pd

# Creating a sample DataFrame
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Function to calculate the sum of the first three values in the 'Values' column
def sum_first_three(df):
    # Selecting the first three values in the 'Values' column
    first_three_values = df['Values'].head(3)

    # Calculating the sum of the first three values
    sum_values = first_three_values.sum()

    # Printing the sum
    print("Sum of the first three values:", sum_values)

# Calling the function to calculate and print the sum
sum_first_three(df)

Sum of the first three values: 60


# Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column.

## **Solution:**  
We can use the `str.split()` function to split the text in the 'Text' column into words and then count the number of words for each row. The result can be stored in a new column called 'Word_Count'.
###Function Implementation


In [4]:
import pandas as pd

# Creating a sample DataFrame
data = {'Text': ['Hello world', 'Python is great', 'Pandas is awesome', 'Data science is fun']}
df = pd.DataFrame(data)

# Function to create 'Word_Count' column
def count_words(df):
    # Creating a new column 'Word_Count' by counting the number of words in each row of 'Text'
    df['Word_Count'] = df['Text'].str.split().str.len()

    return df

# Applying the function to the DataFrame
df_with_word_count = count_words(df)
print(df_with_word_count)

                  Text  Word_Count
0          Hello world           2
1      Python is great           3
2    Pandas is awesome           3
3  Data science is fun           4


# Q5. How are `DataFrame.size()` and `DataFrame.shape()` different?

## **Explanation:**

Both `DataFrame.size` and `DataFrame.shape` are used to get the dimensions of a DataFrame, but they return different types of information.

### **Differences:**

1. **`DataFrame.size`**:
   - Returns the **total number of elements** in the DataFrame.
   - The value is calculated as the **product of the number of rows and columns**.
   - It is a single integer value representing the total count of cells (values) in the DataFrame.
   
2. **`DataFrame.shape`**:
   - Returns a **tuple** of two values: **(number of rows, number of columns)**.
   - It provides the **dimensions** of the DataFrame.
   - It is often used to check the structure of the DataFrame (how many rows and columns it has).


# Q6. Which function of pandas do we use to read an Excel file?

## **Answer:**  
To read an Excel file in Pandas, we use the `pd.read_excel()` function.


# Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email addresses in the format 'username@domain.com'. Write a Python function that creates a new column 'Username' in df that contains only the username part of each email address.

#  The username is the part of the email address that appears before the '@' symbol. For example, if the email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your function should extract the username from each email address and store it in the new 'Username' column.  

## **Solution:**  
We can use the `str.split()` function in Pandas to split the email address by the '@' symbol and then extract the first part, which is the username. We will store the result in a new column called 'Username'.

### **Function Implementation:**

In [5]:
import pandas as pd

# Creating a sample DataFrame with email addresses
data = {'Email': ['john.doe@example.com', 'jane.smith@domain.com', 'alex.lee@company.org']}
df = pd.DataFrame(data)

# Function to extract the username from the email and create a 'Username' column
def extract_username(df):
    # Creating a new 'Username' column by splitting the email address and getting the part before '@'
    df['Username'] = df['Email'].str.split('@').str[0]

    return df

# Applying the function to the DataFrame
df_with_username = extract_username(df)
print(df_with_username)

                   Email    Username
0   john.doe@example.com    john.doe
1  jane.smith@domain.com  jane.smith
2   alex.lee@company.org    alex.lee


# Q8.  You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The function should return a new DataFrame that contains only the selected rows.
   For example, if df contains the following values:  

   

## **Solution:**  
To select rows based on conditions for multiple columns, we can use boolean indexing in Pandas. The conditions can be combined using the `&` operator for 'AND' logic.

### **Function Implementation:**

In [11]:
import pandas as pd

# Creating a sample DataFrame
data1 = {'A': [3, 8, 6, 2, 9], 'B': [5, 2, 9, 3, 1], 'C': [1, 7, 4, 5, 2]}
df = pd.DataFrame(data1)

# Function to select rows based on conditions
def select_rows(df):
    # Applying the conditions: A > 5 and B < 10
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]

    return selected_rows

# Applying the function to the DataFrame
selected_df = select_rows(df)
print(selected_df)

   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


# Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean, median, and standard deviation of the values in the 'Values' column.

## **Solution:**  
We can use the `mean()`, `median()`, and `std()` functions of the Pandas DataFrame to calculate the mean, median, and standard deviation, respectively. These functions operate on a single column (Series) to provide the required statistics.

### **Function Implementation:**


In [12]:
import pandas as pd

# Creating a sample DataFrame
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Function to calculate mean, median, and standard deviation
def calculate_statistics(df):
    mean_value = df['Values'].mean()
    median_value = df['Values'].median()
    std_value = df['Values'].std()

    # Printing the calculated values
    print(f"Mean: {mean_value}")
    print(f"Median: {median_value}")
    print(f"Standard Deviation: {std_value}")

# Calling the function to calculate statistics
calculate_statistics(df)

Mean: 30.0
Median: 30.0
Standard Deviation: 15.811388300841896


# Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days for each row in the DataFrame. The moving average should be calculated using a window of size 7 and should include the current day.

## **Solution:**  
To calculate the moving average of the sales for the past 7 days, we can use the `rolling()` function provided by Pandas, which calculates a moving window over the data. By specifying a window size of 7 and using `.mean()`, we can calculate the moving average.

### **Function Implementation:**

In [13]:
import pandas as pd

# Creating a sample DataFrame with 'Date' and 'Sales'
data = {'Date': pd.date_range(start='2023-01-01', periods=10, freq='D'),
        'Sales': [100, 120, 130, 140, 150, 160, 170, 180, 190, 200]}
df = pd.DataFrame(data)

# Function to calculate the moving average of sales for the past 7 days
def calculate_moving_average(df):
    # Setting 'Date' as the index for correct rolling window calculation
    df.set_index('Date', inplace=True)

    # Calculating the 7-day moving average using a rolling window of size 7
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()

    return df

# Applying the function to the DataFrame
df_with_moving_average = calculate_moving_average(df)
print(df_with_moving_average)

            Sales  MovingAverage
Date                            
2023-01-01    100     100.000000
2023-01-02    120     110.000000
2023-01-03    130     116.666667
2023-01-04    140     122.500000
2023-01-05    150     128.000000
2023-01-06    160     133.333333
2023-01-07    170     138.571429
2023-01-08    180     150.000000
2023-01-09    190     160.000000
2023-01-10    200     170.000000


# Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g. Monday, Tuesday) corresponding to each date in the 'Date' column.

## **Solution:**  
We can use the `dt.day_name()` function to extract the weekday name from the 'Date' column. This function returns the full name of the day (e.g., 'Monday', 'Tuesday') for each date.

### **Function Implementation:**

In [14]:
import pandas as pd

# Creating a sample DataFrame with 'Date'
data = {'Date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'])}
df = pd.DataFrame(data)

# Function to add 'Weekday' column
def add_weekday_column(df):
    # Creating a new column 'Weekday' by extracting weekday name from 'Date' column
    df['Weekday'] = df['Date'].dt.day_name()

    return df

# Applying the function to the DataFrame
df_with_weekday = add_weekday_column(df)
print(df_with_weekday)

        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


# Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

## **Solution:**  
To filter rows based on a date range, we can use boolean indexing with the `pd.to_datetime()` function to convert the date strings into datetime objects. We then compare the 'Date' column to the specified range and select the rows that fall within this range.

### **Function Implementation:**

In [15]:
import pandas as pd

# Creating a sample DataFrame with 'Date' column containing timestamps
data = {'Date': pd.to_datetime(['2023-01-01', '2023-01-05', '2023-01-10', '2023-02-01', '2023-01-20'])}
df = pd.DataFrame(data)

# Function to select rows with dates between '2023-01-01' and '2023-01-31'
def select_date_range(df):
    # Defining the start and end date
    start_date = pd.to_datetime('2023-01-01')
    end_date = pd.to_datetime('2023-01-31')

    # Filtering the DataFrame to select rows within the date range
    selected_rows = df[(df['Date'] >= start_date) & (df['Date'] <= end_date)]

    return selected_rows

# Applying the function to the DataFrame
selected_df = select_date_range(df)
print(selected_df)

        Date
0 2023-01-01
1 2023-01-05
2 2023-01-10
4 2023-01-20


# Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to be imported?

## **Answer:**  
The first and foremost necessary library that needs to be imported in order to use the basic functions of pandas is the `pandas` library itself. This is done by importing it as:
