Q1. List any five functions of the pandas library with execution.

In [1]:
'''Certainly! Here are five commonly used functions in the Pandas library along with their execution examples:

head() and tail():

head(): Returns the first n rows (by default, n=5).
tail(): Returns the last n rows (by default, n=5).
info(): Provides a concise summary of the DataFrame, including the data types of each column and memory usage.

describe(): Generates descriptive statistics that summarize the central tendency, dispersion, and shape of the dataset's distribution.

shape: Returns a tuple representing the dimensions of the DataFrame (rows, columns).

columns: Returns the column labels of the DataFrame.

index: Returns the index (row labels) of the DataFrame.
'''

#Example
import pandas as pd
data = {
    'Product': ['A', 'B', 'C', 'D'],
    'Sales': [100, 150, None, 200],
    'Profit': [10, 15, 5, None]
}
df = pd.DataFrame(data)
print(df.describe())
print()
print(df.head())
print()
print(df.tail())
print()
print(df.info())
print()
print(df.shape)
print()
print(df.columns)
print()
print(df.index)
print()
print(df.dropna())
print()
print(df)

    

       Sales  Profit
count    3.0     3.0
mean   150.0    10.0
std     50.0     5.0
min    100.0     5.0
25%    125.0     7.5
50%    150.0    10.0
75%    175.0    12.5
max    200.0    15.0

  Product  Sales  Profit
0       A  100.0    10.0
1       B  150.0    15.0
2       C    NaN     5.0
3       D  200.0     NaN

  Product  Sales  Profit
0       A  100.0    10.0
1       B  150.0    15.0
2       C    NaN     5.0
3       D  200.0     NaN

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Product  4 non-null      object 
 1   Sales    3 non-null      float64
 2   Profit   3 non-null      float64
dtypes: float64(2), object(1)
memory usage: 224.0+ bytes
None

(4, 3)

Index(['Product', 'Sales', 'Profit'], dtype='object')

RangeIndex(start=0, stop=4, step=1)

  Product  Sales  Profit
0       A  100.0    10.0
1       B  150.0    15.0

  Product  Sales  Profit
0   

Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the
DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [2]:
import pandas as pd

def reindex_dataframe(df):
    # Create a new index starting from 1 and incrementing by 2
    new_index = range(1, 2 * len(df) + 1, 2)
    
    # Set the new index for the DataFrame
    df_reindexed = df.set_index(pd.Index(new_index))
    
    return df_reindexed

# Example DataFrame
df = pd.DataFrame({
    'A': [10, 20, 30],
    'B': [40, 50, 60],
    'C': [70, 80, 90]
})

print("Original DataFrame:")
print(df)

# Re-index the DataFrame using the function
df_reindexed = reindex_dataframe(df)

print("\nDataFrame after re-indexing:")
print(df_reindexed)



Original DataFrame:
    A   B   C
0  10  40  70
1  20  50  80
2  30  60  90

DataFrame after re-indexing:
    A   B   C
1  10  40  70
3  20  50  80
5  30  60  90


Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that
iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The
function should print the sum to the console.

In [3]:
import pandas as pd

def calculate_sum_of_first_three(df):
    # Check if the DataFrame has the 'Values' column
    if 'Values' not in df.columns:
        print("The DataFrame does not have a column named 'Values'.")
        return
    
    # Calculate the sum of the first three values in the 'Values' column
    sum_of_first_three = df['Values'].iloc[:3].sum()
    
    # Print the sum to the console
    print(f"The sum of the first three values in the 'Values' column is: {sum_of_first_three}")

# Example DataFrame
df = pd.DataFrame({
    'Values': [10,20,30,40,50]
})

print("Original DataFrame:")
print(df)

# Call the function to calculate the sum of the first three values
calculate_sum_of_first_three(df)


Original DataFrame:
   Values
0      10
1      20
2      30
3      40
4      50
The sum of the first three values in the 'Values' column is: 60


Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column
'Word_Count' that contains the number of words in each row of the 'Text' column.

In [4]:
import pandas as pd

def calculate_word_count(df):
    # Check if the DataFrame has the 'Text' column
    if 'Text' not in df.columns:
        print("The DataFrame does not have a column named 'Text'.")
        return
    
    # Create a new column 'Word_Count' containing the number of words in each row of the 'Text' column
    df['Word_Count'] = df['Text'].apply(lambda x: len(str(x).split()))
    
    return df

# Example DataFrame
df = pd.DataFrame({
    'Text': ['Hello, world!', 'Python is great', 'Data Science']
})

print("Original DataFrame:")
print(df)

# Call the function to create the 'Word_Count' column
df_with_word_count = calculate_word_count(df)

print("\nDataFrame with Word_Count:")
print(df_with_word_count)


Original DataFrame:
              Text
0    Hello, world!
1  Python is great
2     Data Science

DataFrame with Word_Count:
              Text  Word_Count
0    Hello, world!           2
1  Python is great           3
2     Data Science           2


Q5. How are DataFrame.size() and DataFrame.shape() different?

In [5]:
'''DataFrame.size:

Returns the total number of elements in the DataFrame.
It is calculated as the product of the number of rows and columns in the DataFrame.
The size attribute provides the count of all elements, not just unique elements.'''

#Example

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Print the size of the DataFrame
print("DataFrame.size:", df.size)  # Output will be 6 (3 rows * 2 columns)
print()

'''DataFrame.shape:

Returns a tuple representing the dimensions of the DataFrame.
The tuple contains two elements: the number of rows followed by the number of columns.
It provides a clearer understanding of the DataFrame's structure in terms of its dimensions.'''

#Example

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Print the shape of the DataFrame
print("DataFrame.shape:", df.shape)  # Output will be (3, 2) indicating 3 rows and 2 columns


DataFrame.size: 6

DataFrame.shape: (3, 2)


Q6. Which function of pandas do we use to read an excel file?


    
    

Ans:
    
        import pandas as pd

    # Specify the path to the Excel file
    excel_file_path = 'path_to_your_excel_file.xlsx'  # Replace with your file path

    # Read the Excel file and create a DataFrame
    df = pd.read_excel(excel_file_path)

    # Display the first few rows of the DataFrame
    print(df.head())


Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email
addresses in the format 'username@domain.com'. Write a Python function that creates a new column
'Username' in df that contains only the username part of each email address.
The username is the part of the email address that appears before the '@' symbol. For example, if the
email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your
function should extract the username from each email address and store it in the new 'Username'
column.

In [6]:
import pandas as pd

def extract_username(df):
    # Check if the DataFrame has the 'Email' column
    if 'Email' not in df.columns:
        print("The DataFrame does not have a column named 'Email'.")
        return
    
    # Extract the username from the 'Email' column
    df['Username'] = df['Email'].str.split('@').str[0]
    
    return df

# Example DataFrame
df = pd.DataFrame({
    'Email': ['john.doe@example.com', 'jane.smith@example.com', 'robert.johnson@example.com']
})

print("Original DataFrame:")
print(df)

# Call the function to extract the username and create the 'Username' column
df_with_username = extract_username(df)

print("\nDataFrame with Username:")
print(df_with_username)



Original DataFrame:
                        Email
0        john.doe@example.com
1      jane.smith@example.com
2  robert.johnson@example.com

DataFrame with Username:
                        Email        Username
0        john.doe@example.com        john.doe
1      jane.smith@example.com      jane.smith
2  robert.johnson@example.com  robert.johnson


Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects
all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The
function should return a new DataFrame that contains only the selected rows.
For example, if df contains the following values:
A B C
0 3 5 1
1 8 2 7
2 6 9 4
3 2 3 5
4 9 1 2

Your function should select the following rows: A B C
1 8 2 7
4 9 1 2
The function should return a new DataFrame that contains only the selected rows.

In [7]:
import pandas as pd

def filter_dataframe(df):
    # Check if the DataFrame has columns 'A' and 'B'
    if 'A' not in df.columns or 'B' not in df.columns:
        print("The DataFrame does not have columns 'A' and 'B'.")
        return
    
    # Filter the DataFrame based on the conditions
    filtered_df = df[(df['A'] > 5) & (df['B'] < 10)]
    
    return filtered_df

# Create the original DataFrame
df = pd.DataFrame({
    'A': [3, 8, 6, 2, 9],
    'B': [5, 2, 9, 3, 1],
    'C': [1, 7, 4, 5, 2]
})

print("Original DataFrame:")
print(df)

# Call the function to filter the DataFrame based on the conditions
filtered_df = filter_dataframe(df)

print("\nFiltered DataFrame:")
print(filtered_df)


Original DataFrame:
   A  B  C
0  3  5  1
1  8  2  7
2  6  9  4
3  2  3  5
4  9  1  2

Filtered DataFrame:
   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean,
median, and standard deviation of the values in the 'Values' column.

In [8]:
import pandas as pd

def calculate_stats(df):
    # Check if the DataFrame has the 'Values' column
    if 'Values' not in df.columns:
        print("The DataFrame does not have a column named 'Values'.")
        return
    
    # Calculate the mean, median, and standard deviation of the 'Values' column
    mean_value = df['Values'].mean()
    median_value = df['Values'].median()
    std_deviation = df['Values'].std()
    
    # Return the calculated statistics
    return mean_value, median_value, std_deviation

# Create the original DataFrame
df = pd.DataFrame({
    'Values': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
})

print("Original DataFrame:")
print(df)

# Call the function to calculate the mean, median, and standard deviation
mean_value, median_value, std_deviation = calculate_stats(df)

print("\nMean of 'Values':", mean_value)
print("Median of 'Values':", median_value)
print("Standard Deviation of 'Values':", std_deviation)


Original DataFrame:
   Values
0      10
1      20
2      30
3      40
4      50
5      60
6      70
7      80
8      90
9     100

Mean of 'Values': 55.0
Median of 'Values': 55.0
Standard Deviation of 'Values': 30.276503540974915


Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to
create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days
for each row in the DataFrame. The moving average should be calculated using a window of size 7 and
should include the current day.

In [9]:
import pandas as pd

def calculate_moving_average(df):
    # Check if the DataFrame has the 'Sales' and 'Date' columns
    if 'Sales' not in df.columns or 'Date' not in df.columns:
        print("The DataFrame is missing either 'Sales' or 'Date' column.")
        return
    
    # Sort the DataFrame by the 'Date' column if it's not already sorted
    df = df.sort_values(by='Date')
    
    # Calculate the moving average using a window of size 7
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    
    return df

# Create the original DataFrame
df = pd.DataFrame({
    'Date': pd.date_range(start='2022-01-01', end='2022-01-10'),
    'Sales': [100, 110, 105, 115, 120, 125, 130, 135, 140, 145]
})

print("Original DataFrame:")
print(df)

# Call the function to calculate the moving average
df_with_moving_average = calculate_moving_average(df)

print("\nDataFrame with Moving Average:")
print(df_with_moving_average)


Original DataFrame:
        Date  Sales
0 2022-01-01    100
1 2022-01-02    110
2 2022-01-03    105
3 2022-01-04    115
4 2022-01-05    120
5 2022-01-06    125
6 2022-01-07    130
7 2022-01-08    135
8 2022-01-09    140
9 2022-01-10    145

DataFrame with Moving Average:
        Date  Sales  MovingAverage
0 2022-01-01    100     100.000000
1 2022-01-02    110     105.000000
2 2022-01-03    105     105.000000
3 2022-01-04    115     107.500000
4 2022-01-05    120     110.000000
5 2022-01-06    125     112.500000
6 2022-01-07    130     115.000000
7 2022-01-08    135     120.000000
8 2022-01-09    140     124.285714
9 2022-01-10    145     130.000000


Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new
column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g.
Monday, Tuesday) corresponding to each date in the 'Date' column.
For example, if df contains the following values:
Date
0 2023-01-01
1 2023-01-02
2 2023-01-03
3 2023-01-04
4 2023-01-05
Your function should create the following DataFrame:

Date Weekday
0 2023-01-01 Sunday
1 2023-01-02 Monday
2 2023-01-03 Tuesday
3 2023-01-04 Wednesday
4 2023-01-05 Thursday
The function should return the modified DataFrame.

In [10]:
import pandas as pd

def add_weekday_column(df):
    # Check if the DataFrame has the 'Date' column
    if 'Date' not in df.columns:
        print("The DataFrame does not have a column named 'Date'.")
        return
    
    # Convert the 'Date' column to datetime format if it's not already in datetime format
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Create the 'Weekday' column containing the weekday names
    df['Weekday'] = df['Date'].dt.day_name()
    
    return df

# Create the original DataFrame
df = pd.DataFrame({
    'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']
})

print("Original DataFrame:")
print(df)

# Call the function to add the 'Weekday' column
df_with_weekday = add_weekday_column(df)

print("\nDataFrame with Weekday Column:")
print(df_with_weekday)


Original DataFrame:
         Date
0  2023-01-01
1  2023-01-02
2  2023-01-03
3  2023-01-04
4  2023-01-05

DataFrame with Weekday Column:
        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python
function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

In [11]:
import pandas as pd

def filter_by_date_range(df):
    # Check if the DataFrame has the 'Date' column
    if 'Date' not in df.columns:
        print("The DataFrame does not have a column named 'Date'.")
        return
    
    # Convert the 'Date' column to datetime format if it's not already in datetime format
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Filter the DataFrame to select rows where the 'Date' is between '2023-01-01' and '2023-01-31'
    filtered_df = df[(df['Date'] >= '2023-01-01') & (df['Date'] <= '2023-01-31')]
    
    return filtered_df

# Create the original DataFrame
df = pd.DataFrame({
    'Date': ['2023-01-01', '2023-01-15', '2023-01-31', '2023-02-01', '2023-02-15']
})

print("Original DataFrame:")
print(df)

# Call the function to filter rows based on the date range
filtered_df = filter_by_date_range(df)

print("\nDataFrame with Date between '2023-01-01' and '2023-01-31':")
print(filtered_df)


Original DataFrame:
         Date
0  2023-01-01
1  2023-01-15
2  2023-01-31
3  2023-02-01
4  2023-02-15

DataFrame with Date between '2023-01-01' and '2023-01-31':
        Date
0 2023-01-01
1 2023-01-15
2 2023-01-31


Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to
be imported?

Ans : 
    
    To use the basic functions of pandas, the first and foremost necessary library that needs to be imported is pandas itself.
    
    import pandas as pd