In [None]:
Q1. List any five functions of the pandas library with execution.

In [None]:
read_csv(): Read a CSV file into a DataFrame.
head(): Display the first n rows of the DataFrame.
groupby(): Group rows based on a column and perform aggregate functions.
plot(): Plot data from a DataFrame.
to_csv(): Write the DataFrame to a CSV file.

In [None]:
Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the 
DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [4]:
import pandas as pd

def reindex_dataframe(df):
    df.reset_index(drop=True, inplace=True)
    df['NewIndex'] = (df.index * 2) + 1

    df.set_index('NewIndex', inplace=True)

    return df

data = {'A': [10, 20, 30], 'B': [40, 50, 60], 'C': [70, 80, 90]}
df = pd.DataFrame(data)

df = reindex_dataframe(df)

print(df)


           A   B   C
NewIndex            
1         10  40  70
3         20  50  80
5         30  60  90


In [None]:
Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that 
iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The 
function should print the sum to the console.

For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function should 
calculate and print the sum of the first three values, which is 60.

In [6]:
import pandas as pd
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)
sum_of_first_three_values(df)


60


In [None]:
Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column 
'Word_Count' that contains the number of words in each row of the 'Text' column.

In [8]:
import pandas as pd

def add_word_count_column(df):
    df['Word_Count'] = df['Text'].apply(lambda x: len(str(x).split()))
    return df
data = {'Text': [' text.', ' example sentence.', ' words.']}
df = pd.DataFrame(data)
df = add_word_count_column(df)
print(df)


                 Text  Word_Count
0               text.           1
1   example sentence.           2
2              words.           1


In [None]:
Q5. How are DataFrame.size() and DataFrame.shape() different?

In [None]:
DataFrame.size() returns the total number of elements in the DataFrame.
                It provides the count of all cells in the DataFrame, including NaN (missing) values.
                 The result is an integer representing the total size of the DataFrame.
DataFrame.shape() returns a tuple representing the dimensionality of the DataFrame.
                 The tuple contains two elements: the number of rows and the number of columns.

In [None]:
Q6. Which function of pandas do we use to read an excel file?

In [None]:
pd.read_excel() function to read data from an Excel file.

In [None]:
Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email 
addresses in the format 'username@domain.com'. Write a Python function that creates a new column 
'Username' in df that contains only the username part of each email address.

The username is the part of the email address that appears before the '@' symbol. For example, if the 
email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your 
function should extract the username from each email address and store it in the new 'Username' 
column

In [11]:
import pandas as pd

def extract_username(df):
    df['Username'] = df['Email'].apply(lambda email: email.split('@')[0])
    return df

data = {'Email': ['john.doe@example.com', 'jane.smith@example.com', 'bob@gmail.com']}
df = pd.DataFrame(data)

df = extract_username(df)

print(df)


                    Email    Username
0    john.doe@example.com    john.doe
1  jane.smith@example.com  jane.smith
2           bob@gmail.com         bob


In [None]:
Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects 
all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The 
function should return a new DataFrame that contains only the selected rows.

For example, if df contains the following values:

   A   B   C

0  3   5   1

1  8   2   7

2  6   9   4

3  2   3   5

4  9   1   2
Your function should select the following rows:   A   B   C

1  8   2   7

4  9   1   2

The function should return a new DataFrame that contains only the selected rows.

In [12]:
import pandas as pd

def filter_dataframe(df):
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    return selected_rows

data = {'A': [3, 8, 6, 2, 9],
        'B': [5, 2, 9, 3, 1],
        'C': [1, 7, 4, 5, 2]}
df = pd.DataFrame(data)

selected_df = filter_dataframe(df)

print(selected_df)


   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


In [None]:
Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean, 
median, and standard deviation of the values in the 'Values' column.

In [15]:
import pandas as pd

def calculate_statistics(df):
    if 'Values' in df:
        return df['Values'].agg(['mean', 'median', 'std']).to_dict()
    else:
        print("DataFrame doesn't have a 'Values' column.")
        return None

data = {'Values': [10, 15, 20, 25, 30]}
df = pd.DataFrame(data)

result = calculate_statistics(df)

if result is not None:
    for stat, value in result.items():
        print(f"{stat.capitalize()}: {value}")


Mean: 20.0
Median: 20.0
Std: 7.905694150420948


In [None]:
Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to 
create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days 
for each row in the DataFrame. The moving average should be calculated using a window of size 7 and 
should include the current day.

In [16]:
import pandas as pd

def calculate_moving_average(df):
    df['Date'] = pd.to_datetime(df['Date'])
    df = df.sort_values(by='Date')
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    return df

data = {'Date': ['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06', '2022-01-07'],
        'Sales': [10, 15, 20, 25, 30, 35, 40]}
df = pd.DataFrame(data)

df = calculate_moving_average(df)

print(df)


        Date  Sales  MovingAverage
0 2022-01-01     10           10.0
1 2022-01-02     15           12.5
2 2022-01-03     20           15.0
3 2022-01-04     25           17.5
4 2022-01-05     30           20.0
5 2022-01-06     35           22.5
6 2022-01-07     40           25.0


In [None]:
Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new 
column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g. 
Monday, Tuesday) corresponding to each date in the 'Date' column.

For example, if df contains the following values:

         Date

0  2023-01-01

1  2023-01-02

2  2023-01-03

3  2023-01-04

4  2023-01-05
Your function should create the following DataFrame:


         Date    Weekday

0  2023-01-01    Sunday

1  2023-01-02     Monday

2  2023-01-03    Tuesday

3  2023-01-04    Wednesday

4  2023-01-05    Thursday

The function should return the modified DataFrame.

In [17]:
import pandas as pd

def add_weekday_column(df):
    df['Date'] = pd.to_datetime(df['Date'])
    
    df['Weekday'] = df['Date'].dt.strftime('%A')

    return df
data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']}
df = pd.DataFrame(data)
df = add_weekday_column(df)
print(df)

        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


In [None]:
Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python 
function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

In [18]:
import pandas as pd

def filter_date_range(df):
    df['Date'] = pd.to_datetime(df['Date'])
    
    start_date = '2023-01-01'
    end_date = '2023-01-31'

    filtered_df = df[df['Date'].between(start_date, end_date)]

    return filtered_df
data = {'Date': ['2023-01-01', '2023-01-15', '2023-01-30', '2023-02-05']}
df = pd.DataFrame(data)
filtered_df = filter_date_range(df)
print(filtered_df)


        Date
0 2023-01-01
1 2023-01-15
2 2023-01-30


In [None]:
Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to 
be imported?

In [None]:
import pandas as pd
