## Q1. List any five functions of the pandas library with execution.

1. read_csv()

Reads a CSV file into a DataFrame.

In [1]:
import pandas as pd

# Example CSV content
csv_content = """Name, Age, City
John, 28, New York
Anna, 22, London
Peter, 34, Berlin
"""

# Writing the CSV content to a file for demonstration
with open('example.csv', 'w') as file:
    file.write(csv_content)

# Reading the CSV file into a DataFrame
df = pd.read_csv('example.csv')
print(df)


    Name   Age       City
0   John    28   New York
1   Anna    22     London
2  Peter    34     Berlin


2. head()

Returns the first n rows of a DataFrame.

In [2]:
# Display the first 2 rows of the DataFrame
print(df.head(2))


   Name   Age       City
0  John    28   New York
1  Anna    22     London


3. describe()

Generates descriptive statistics of a DataFrame.

In [3]:
# Generate descriptive statistics
print(df.describe())


        Age
count   3.0
mean   28.0
std     6.0
min    22.0
25%    25.0
50%    28.0
75%    31.0
max    34.0


groupby()

Groups the DataFrame using a specified column.

In [4]:
# Group by 'City' and calculate the mean age
grouped_df = df.groupby('City')['Age'].mean()
print(grouped_df)


KeyError: 'City'

plot()

Plots the data in the DataFrame.

In [5]:
import matplotlib.pyplot as plt

# Plotting the data
df.plot(kind='bar', x='Name', y='Age')
plt.show()


KeyError: 'Age'

## Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [6]:
import pandas as pd

def reindex_dataframe(df):
    # Create a new index starting from 1 and incrementing by 2
    new_index = range(1, 2 * len(df) + 1, 2)
    # Assign the new index to the DataFrame
    df.index = new_index
    return df

# Example usage
data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

df_reindexed = reindex_dataframe(df)
print("\nRe-indexed DataFrame:")
print(df_reindexed)


Original DataFrame:
   A  B  C
0  1  4  7
1  2  5  8
2  3  6  9

Re-indexed DataFrame:
   A  B  C
1  1  4  7
3  2  5  8
5  3  6  9


## Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The function should print the sum to the console.

In [7]:
import pandas as pd

def sum_first_three_values(df):
    # Calculate the sum of the first three values in the 'Values' column
    total_sum = df['Values'].head(3).sum()
    # Print the sum to the console
    print("Sum of the first three values:", total_sum)

# Example usage
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

sum_first_three_values(df)


Sum of the first three values: 60


## Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column.

In [8]:
import pandas as pd

def add_word_count_column(df):
    # Create the 'Word_Count' column by applying a lambda function to count the words in each row of the 'Text' column
    df['Word_Count'] = df['Text'].apply(lambda x: len(x.split()))
    return df

# Example usage
data = {'Text': ['This is a sentence.', 'Another sentence.', 'Yet another one.']}
df = pd.DataFrame(data)

df_with_word_count = add_word_count_column(df)
print(df_with_word_count)


                  Text  Word_Count
0  This is a sentence.           4
1    Another sentence.           2
2     Yet another one.           3


## Q5. How are DataFrame.size() and DataFrame.shape() different?

DataFrame.size and DataFrame.shape are attributes of a Pandas DataFrame that provide information about the DataFrame's dimensions, but they are different in what they return:
DataFrame.size

    Definition: Returns the number of elements in the DataFrame.
    Calculation: It is the product of the number of rows and columns.
    Type: Returns a single integer.
    Example:

In [9]:
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})
print(df.size)  # Output: 6


6


DataFrame.shape

    Definition: Returns a tuple representing the dimensions of the DataFrame.
    Contents: The tuple contains the number of rows and the number of columns.
    Type: Returns a tuple of two integers.
    Example:

In [10]:
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})
print(df.shape)  # Output: (3, 2)


(3, 2)


## Q6. Which function of pandas do we use to read an excel file?

To read an Excel file into a Pandas DataFrame, you use the pandas.read_excel() function. This function can read both .xls and .xlsx file formats.

In [12]:
import pandas as pd

# Read an Excel file into a DataFrame
## df = pd.read_excel('file_path.xlsx')


Parameters

    io: The path to the Excel file (string), or a file-like object, or an xlrd workbook.
    sheet_name: The name or index of the sheet to be read (default is the first sheet).
    header: The row (0-indexed) to use as the column names (default is 0).
    names: List of column names to use (if header is None).
    usecols: The columns to read (can be a list of column names or indices).
    skiprows: The number of rows to skip at the beginning (default is 0).

## Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email addresses in the format 'username@domain.com'. Write a Python function that creates a new column 'Username' in df that contains only the username part of each email address.

In [13]:
import pandas as pd

def extract_username(df):
    # Create the 'Username' column by applying a lambda function to extract the username part of each email address
    df['Username'] = df['Email'].apply(lambda x: x.split('@')[0])
    return df

# Example usage
data = {'Email': ['john.doe@example.com', 'jane.smith@domain.com', 'user123@service.org']}
df = pd.DataFrame(data)

df_with_usernames = extract_username(df)
print(df_with_usernames)


                   Email    Username
0   john.doe@example.com    john.doe
1  jane.smith@domain.com  jane.smith
2    user123@service.org     user123


## Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The function should return a new DataFrame that contains only the selected rows. For example, if df contains the following values: A B C 0 3 5 1 1 8 2 7 2 6 9 4 3 2 3 5 4 9 1 2 Your function should select the following rows: A B C 1 8 2 7 4 9 1 2 The function should return a new DataFrame that contains only the selected rows.

In [14]:
import pandas as pd

def select_rows(df):
    # Apply the conditions to filter the DataFrame
    selected_df = df[(df['A'] > 5) & (df['B'] < 10)]
    return selected_df

# Example usage
data = {'A': [3, 8, 6, 2, 9], 'B': [5, 2, 9, 3, 1], 'C': [1, 7, 4, 5, 2]}
df = pd.DataFrame(data)

selected_df = select_rows(df)
print(selected_df)


   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


## Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean, median, and standard deviation of the values in the 'Values' column.

In [15]:
import pandas as pd

def calculate_statistics(df):
    # Calculate the mean, median, and standard deviation of the 'Values' column
    mean_value = df['Values'].mean()
    median_value = df['Values'].median()
    std_dev_value = df['Values'].std()
    
    # Print the results
    print(f"Mean: {mean_value}")
    print(f"Median: {median_value}")
    print(f"Standard Deviation: {std_dev_value}")
    
    # Return the results as a dictionary (optional)
    return {
        'mean': mean_value,
        'median': median_value,
        'std_dev': std_dev_value
    }

# Example usage
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

statistics = calculate_statistics(df)


Mean: 30.0
Median: 30.0
Standard Deviation: 15.811388300841896


## Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days for each row in the DataFrame. The moving average should be calculated using a window of size 7 and should include the current day.

In [16]:
import pandas as pd

def add_moving_average(df):
    # Ensure the 'Date' column is in datetime format
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Sort the DataFrame by 'Date' to ensure the rolling calculation is done correctly
    df = df.sort_values('Date')
    
    # Calculate the 7-day moving average of the 'Sales' column
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    
    return df

# Example usage
data = {
    'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', 
             '2023-01-06', '2023-01-07', '2023-01-08', '2023-01-09', '2023-01-10'],
    'Sales': [100, 150, 200, 250, 300, 350, 400, 450, 500, 550]
}
df = pd.DataFrame(data)

df_with_moving_average = add_moving_average(df)
print(df_with_moving_average)


        Date  Sales  MovingAverage
0 2023-01-01    100          100.0
1 2023-01-02    150          125.0
2 2023-01-03    200          150.0
3 2023-01-04    250          175.0
4 2023-01-05    300          200.0
5 2023-01-06    350          225.0
6 2023-01-07    400          250.0
7 2023-01-08    450          300.0
8 2023-01-09    500          350.0
9 2023-01-10    550          400.0


## Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g. Monday, Tuesday) corresponding to each date in the 'Date' column. For example, if df contains the following values: Date 0 2023-01-01 1 2023-01-02 2 2023-01-03 3 2023-01-04 4 2023-01-05 Your function should create the following DataFrame: Date Weekday 0 2023-01-01 Sunday 1 2023-01-02 Monday 2 2023-01-03 Tuesday 3 2023-01-04 Wednesday 4 2023-01-05 Thursday The function should return the modified DataFrame.

In [17]:
import pandas as pd

def add_weekday_column(df):
    # Ensure 'Date' column is in datetime format
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Create 'Weekday' column by extracting weekday names
    df['Weekday'] = df['Date'].dt.day_name()
    
    return df

# Example usage
data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']}
df = pd.DataFrame(data)
df = add_weekday_column(df)
print(df)


        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


## Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

In [18]:
import pandas as pd

def filter_dates(df):
    # Ensure 'Date' column is in datetime format
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Define the date range
    start_date = '2023-01-01'
    end_date = '2023-01-31'
    
    # Filter rows where 'Date' is between start_date and end_date
    filtered_df = df[(df['Date'] >= start_date) & (df['Date'] <= end_date)]
    
    return filtered_df

# Example usage
data = {'Date': ['2023-01-01', '2023-01-10', '2023-01-20', '2023-02-01']}
df = pd.DataFrame(data)
filtered_df = filter_dates(df)
print(filtered_df)


        Date
0 2023-01-01
1 2023-01-10
2 2023-01-20


## Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to be imported?

In [19]:
import pandas as pd