Q1. List any five functions of the pandas library with execution.

Answer =  read_csv: Reads a CSV file into a DataFrame.

In [None]:
import pandas as pd
df = pd.read_csv('file.csv')


head: Displays the first n rows of a DataFrame (default is 5).

In [None]:
print(df.head())


info: Provides a concise summary of a DataFrame, including data types and missing values.

In [None]:
df.info()


describe: Generates descriptive statistics of a DataFrame.

In [None]:
print(df.describe())


groupby: Groups data in a DataFrame based on specified criteria.

In [None]:
grouped_data = df.groupby('column_name')


Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the
DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [None]:
import pandas as pd

def reindex_with_increment(df):
    # Create a new index starting from 1 and incrementing by 2
    new_index = range(1, 2 * len(df) + 1, 2)
    
    # Set the new index for the DataFrame
    df_reindexed = df.set_index(pd.Index(new_index))
    
    return df_reindexed

# Example usage:
# Assuming df is your original DataFrame with columns 'A', 'B', and 'C'
# df = ...

# Call the function to re-index the DataFrame
df_reindexed = reindex_with_increment(df)

# Display the re-indexed DataFrame
print(df_reindexed)


Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that
iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The
function should print the sum to the console.
For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function should
calculate and print the sum of the first three values, which is 60.

In [None]:
import pandas as pd

def calculate_sum_of_first_three_values(df):
    # Ensure 'Values' column exists in the DataFrame
    if 'Values' not in df.columns:
        print("Error: 'Values' column not found.")
        return

    # Select the first three values from the 'Values' column
    first_three_values = df['Values'].head(3)

    # Calculate and print the sum
    sum_of_first_three_values = first_three_values.sum()
    print("Sum of the first three values:", sum_of_first_three_values)

# Example usage:
# Assuming df is your original DataFrame with a 'Values' column
# df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})

# Call the function to calculate and print the sum of the first three values
calculate_sum_of_first_three_values(df)


Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column
'Word_Count' that contains the number of words in each row of the 'Text' column.

In [None]:
import pandas as pd

def add_word_count_column(df):
    # Ensure 'Text' column exists in the DataFrame
    if 'Text' not in df.columns:
        print("Error: 'Text' column not found.")
        return

    # Create a new 'Word_Count' column by applying a lambda function to count words
    df['Word_Count'] = df['Text'].apply(lambda x: len(str(x).split()))

# Example usage:
# Assuming df is your original DataFrame with a 'Text' column
# df = pd.DataFrame({'Text': ['This is a sample text.', 'Another example.']})

# Call the function to add the 'Word_Count' column
add_word_count_column(df)

# Display the DataFrame with the new 'Word_Count' column
print(df)


Q5. How are DataFrame.size() and DataFrame.shape() different?

Answer = DataFrame.size:

Returns the total number of elements in the DataFrame.
It is calculated as the product of the number of rows and the number of columns in the DataFrame.
The result is a single integer representing the total size of the DataFrame.

In [None]:
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Get the size of the DataFrame
size_of_df = df.size
print("Size of DataFrame:", size_of_df)


DataFrame.shape:

Returns a tuple representing the dimensions of the DataFrame.
The tuple contains two elements: the number of rows and the number of columns, respectively.
It provides a more detailed breakdown of the DataFrame's structure.

In [None]:
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Get the shape of the DataFrame
shape_of_df = df.shape
print("Shape of DataFrame:", shape_of_df)


Q6. Which function of pandas do we use to read an excel file?

In [None]:
import pandas as pd

# Replace 'your_file.xlsx' with the actual path or URL to your Excel file
df = pd.read_excel('your_file.xlsx')

# Now, 'df' is a DataFrame containing the data from the Excel file


Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email
addresses in the format 'username@domain.com'. Write a Python function that creates a new column
'Username' in df that contains only the username part of each email address.
The username is the part of the email address that appears before the '@' symbol. For example, if the
email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your
function should extract the username from each email address and store it in the new 'Username'
column.

In [None]:
import pandas as pd

def extract_username_from_email(df):
    # Ensure 'Email' column exists in the DataFrame
    if 'Email' not in df.columns:
        print("Error: 'Email' column not found.")
        return

    # Extract the username from the 'Email' column and create a new 'Username' column
    df['Username'] = df['Email'].str.split('@').str[0]

# Example usage:
# Assuming df is your original DataFrame with an 'Email' column
# df = pd.DataFrame({'Email': ['john.doe@example.com', 'jane.smith@example.org']})

# Call the function to add the 'Username' column
extract_username_from_email(df)

# Display the DataFrame with the new 'Username' column
print(df)


Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects
all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The
function should return a new DataFrame that contains only the selected rows.
For example, if df contains the following values:
A B C
0 3 5 1
1 8 2 7
2 6 9 4
3 2 3 5
4 9 1 2
Your function should select the following rows: A B C
1 8 2 7
4 9 1 2
The function should return a new DataFrame that contains only the selected rows.

In [None]:
import pandas as pd

def filter_dataframe(df):
    # Ensure columns 'A' and 'B' exist in the DataFrame
    if 'A' not in df.columns or 'B' not in df.columns:
        print("Error: Columns 'A' and 'B' not found.")
        return None

    # Use boolean indexing to select rows based on the conditions
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]

    return selected_rows

# Example usage:
# Assuming df is your original DataFrame with columns 'A', 'B', and 'C'
# df = pd.DataFrame({'A': [3, 8, 6, 2, 9], 'B': [5, 2, 9, 3, 1], 'C': [1, 7, 4, 5, 2]})

# Call the function to get the selected rows
selected_rows_df = filter_dataframe(df)

# Display the new DataFrame containing the selected rows
print(selected_rows_df)


Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean,
median, and standard deviation of the values in the 'Values' column.

In [None]:
import pandas as pd

def calculate_stats(df):
    # Ensure 'Values' column exists in the DataFrame
    if 'Values' not in df.columns:
        print("Error: 'Values' column not found.")
        return None

    # Calculate mean, median, and standard deviation of the 'Values' column
    mean_value = df['Values'].mean()
    median_value = df['Values'].median()
    std_deviation = df['Values'].std()

    # Print the results
    print("Mean:", mean_value)
    print("Median:", median_value)
    print("Standard Deviation:", std_deviation)

# Example usage:
# Assuming df is your original DataFrame with a 'Values' column
# df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})

# Call the function to calculate and print the statistics
calculate_stats(df)


Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to
create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days
for each row in the DataFrame. The moving average should be calculated using a window of size 7 and
should include the current day.

In [None]:
import pandas as pd

def calculate_moving_average(df):
    # Ensure 'Sales' and 'Date' columns exist in the DataFrame
    if 'Sales' not in df.columns or 'Date' not in df.columns:
        print("Error: 'Sales' or 'Date' column not found.")
        return None

    # Sort the DataFrame by the 'Date' column if it's not already sorted
    df = df.sort_values(by='Date')

    # Calculate the moving average using a window of size 7
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()

    return df

# Example usage:
# Assuming df is your original DataFrame with 'Sales' and 'Date' columns
# df = pd.DataFrame({'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'],
#                    'Sales': [10, 15, 20, 25, 30]})

# Convert 'Date' column to datetime format
df['Date'] = pd.to_datetime(df['Date'])

# Call the function to calculate the moving average
df_with_ma = calculate_moving_average(df)

# Display the DataFrame with the new 'MovingAverage' column
print(df_with_ma)


Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new
column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g.
Monday, Tuesday) corresponding to each date in the 'Date' column.
For example, if df contains the following values:
Date
0 2023-01-01
1 2023-01-02
2 2023-01-03
3 2023-01-04
4 2023-01-05
Your function should create the following DataFrame:

Date Weekday
0 2023-01-01 Sunday
1 2023-01-02 Monday
2 2023-01-03 Tuesday
3 2023-01-04 Wednesday
4 2023-01-05 Thursday
The function should return the modified DataFrame.

In [None]:
import pandas as pd

def add_weekday_column(df):
    # Ensure 'Date' column exists in the DataFrame
    if 'Date' not in df.columns:
        print("Error: 'Date' column not found.")
        return None

    # Convert 'Date' column to datetime format
    df['Date'] = pd.to_datetime(df['Date'])

    # Create a new 'Weekday' column containing the weekday names
    df['Weekday'] = df['Date'].dt.day_name()

    return df

# Example usage:
# Assuming df is your original DataFrame with a 'Date' column
# df = pd.DataFrame({'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']})

# Call the function to add the 'Weekday' column
df_with_weekday = add_weekday_column(df)

# Display the modified DataFrame with the new 'Weekday' column
print(df_with_weekday)


Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python
function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

In [None]:
import pandas as pd

def filter_by_date_range(df):
    # Ensure 'Date' column exists in the DataFrame
    if 'Date' not in df.columns:
        print("Error: 'Date' column not found.")
        return None

    # Convert 'Date' column to datetime format
    df['Date'] = pd.to_datetime(df['Date'])

    # Define the date range
    start_date = '2023-01-01'
    end_date = '2023-01-31'

    # Use boolean indexing to select rows within the date range
    selected_rows = df[(df['Date'] >= start_date) & (df['Date'] <= end_date)]

    return selected_rows

# Example usage:
# Assuming df is your original DataFrame with a 'Date' column
# df = pd.DataFrame({'Date': ['2023-01-01', '2023-01-15', '2023-01-25', '2023-02-05']})

# Call the function to get the selected rows within the date range
selected_rows_df = filter_by_date_range(df)

# Display the new DataFrame containing the selected rows
print(selected_rows_df)


Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to
be imported?

Answer = To use the basic functions of pandas, you need to import the pandas library. You typically import pandas with the alias pd. Here's a common import statement:


In [None]:
import pandas as pd
