Q1. List any five functions of the pandas library with execution.

import pandas as pd

df = pd.read_csv('data.csv')
df.head()
df.info()
df.shape()
df.describe()


Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the
DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [2]:
import pandas as pd

def reindex_dataframe(df):
    new_index = pd.RangeIndex(start=1, stop=len(df)*2, step=2)
    df = df.reindex(new_index)
    return df


df = pd.DataFrame({'A': [10, 20, 30], 'B': [40, 50, 60], 'C': [70, 80, 90]})
print("Original DataFrame:")
print(df)
print("\nReindexed DataFrame:")
print(reindex_dataframe(df))


Original DataFrame:
    A   B   C
0  10  40  70
1  20  50  80
2  30  60  90

Reindexed DataFrame:
      A     B     C
1  20.0  50.0  80.0
3   NaN   NaN   NaN
5   NaN   NaN   NaN


Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that
iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The
function should print the sum to the console.
For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function should
calculate and print the sum of the first three values, which is 60.

In [3]:
import pandas as pd

def calculate_sum(df):
    values = df['Values'].tolist()
    sum_of_first_three = sum(values[:3])
    print("Sum of the first three values:", sum_of_first_three)

# Example usage:
df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})
calculate_sum(df)


Sum of the first three values: 60


Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column
'Word_Count' that contains the number of words in each row of the 'Text' column.

In [5]:
import pandas as pd

def add_word_count(df):
    df['Word_Count'] = df['Text'].apply(lambda x: len(x.split()))
    return df


df = pd.DataFrame({'Text': ['Hello, how are you?', 'I am fine.', 'Python is great!']})
print("Original DataFrame:")
print(df)
print("\nDataFrame with Word_Count:")
print(add_word_count(df))


Original DataFrame:
                  Text
0  Hello, how are you?
1           I am fine.
2     Python is great!

DataFrame with Word_Count:
                  Text  Word_Count
0  Hello, how are you?           4
1           I am fine.           3
2     Python is great!           3


Q5. How are DataFrame.size() and DataFrame.shape() different?

The DataFrame.size and DataFrame.shape functions in pandas are used to get different information about the DataFrame:

DataFrame.size: It returns the total number of elements in the DataFrame, which is equal to the product of the number of rows and columns.
DataFrame.shape: It returns a tuple containing the dimensions of the DataFrame, where the first element represents the number of rows and the second element represents the number of columns.

In [6]:
import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
size = df.size
print("DataFrame size:", size)


DataFrame size: 6


import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
shape = df.shape
print("DataFrame shape:", shape)


Q6. Which function of pandas do we use to read an excel file?

The function used to read an Excel file in pandas is pd.read_excel():

import pandas as pd


df = pd.read_excel('data.xlsx')


Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email
addresses in the format 'username@domain.com'. Write a Python function that creates a new column
'Username' in df that contains only the username part of each email address.
The username is the part of the email address that appears before the '@' symbol. For example, if the
email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your
function should extract the username from each email address and store it in the new 'Username'
column.

In [12]:
import pandas as pd

def extract_username(df):
    df['Username'] = df['Email'].str.split('@').str[0]
    return df


df = pd.DataFrame({'Email': ['john.doe@example.com', 'jane.smith@example.com']})
print("Original DataFrame:")
print(df)
print("\nDataFrame with Username:")
print(extract_username(df))


Original DataFrame:
                    Email
0    john.doe@example.com
1  jane.smith@example.com

DataFrame with Username:
                    Email    Username
0    john.doe@example.com    john.doe
1  jane.smith@example.com  jane.smith


Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects
all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The
function should return a new DataFrame that contains only the selected rows.

In [13]:
import pandas as pd

def select_rows(df):
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    return selected_rows

# Example usage:
df = pd.DataFrame({'A': [3, 8, 6, 2, 9], 'B': [5, 2, 9, 3, 1], 'C': [1, 7, 4, 5, 2]})
print("Original DataFrame:")
print(df)
print("\nSelected Rows:")
print(select_rows(df))


Original DataFrame:
   A  B  C
0  3  5  1
1  8  2  7
2  6  9  4
3  2  3  5
4  9  1  2

Selected Rows:
   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean,
median, and standard deviation of the values in the 'Values' column.

In [14]:
import pandas as pd

def calculate_stats(df):
    values = df['Values']
    mean = values.mean()
    median = values.median()
    std_dev = values.std()
    print("Mean:", mean)
    print("Median:", median)
    print("Standard Deviation:", std_dev)

# Example usage:
df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})
calculate_stats(df)


Mean: 30.0
Median: 30.0
Standard Deviation: 15.811388300841896


Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to
create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days
for each row in the DataFrame. The moving average should be calculated using a window of size 7 and
should include the current day.

In [15]:
import pandas as pd

def calculate_moving_average(df):
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    return df

# Example usage:
df = pd.DataFrame({'Sales': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]})
print("Original DataFrame:")
print(df)
print("\nDataFrame with MovingAverage:")
print(calculate_moving_average(df))


Original DataFrame:
   Sales
0     10
1     20
2     30
3     40
4     50
5     60
6     70
7     80
8     90
9    100

DataFrame with MovingAverage:
   Sales  MovingAverage
0     10           10.0
1     20           15.0
2     30           20.0
3     40           25.0
4     50           30.0
5     60           35.0
6     70           40.0
7     80           50.0
8     90           60.0
9    100           70.0


Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new
column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g.
Monday, Tuesday) corresponding to each date in the 'Date' column.

In [17]:
import pandas as pd

def add_weekday(df):
    df['Weekday'] = pd.to_datetime(df['Date']).dt.day_name()
    return df

# Example usage:
df = pd.DataFrame({'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']})
print("Original DataFrame:")
print(df)
print("\nDataFrame with Weekday:")
print(add_weekday(df))


Original DataFrame:
         Date
0  2023-01-01
1  2023-01-02
2  2023-01-03
3  2023-01-04
4  2023-01-05

DataFrame with Weekday:
         Date    Weekday
0  2023-01-01     Sunday
1  2023-01-02     Monday
2  2023-01-03    Tuesday
3  2023-01-04  Wednesday
4  2023-01-05   Thursday


Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python
function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

In [18]:
import pandas as pd

def select_rows_between_dates(df):
    mask = (df['Date'] >= '2023-01-01') & (df['Date'] <= '2023-01-31')
    selected_rows = df[mask]
    return selected_rows

# Example usage:
df = pd.DataFrame({'Date': ['2022-12-31', '2023-01-01', '2023-01-15', '2023-01-31', '2023-02-01']})
print("Original DataFrame:")
print(df)
print("\nSelected Rows:")
print(select_rows_between_dates(df))


Original DataFrame:
         Date
0  2022-12-31
1  2023-01-01
2  2023-01-15
3  2023-01-31
4  2023-02-01

Selected Rows:
         Date
1  2023-01-01
2  2023-01-15
3  2023-01-31


Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to
be imported?

In [19]:
import pandas as pd
