Q1. To print the data present in the second row of the DataFrame df, you can use either .iloc[] or .loc[] function as follows:

In [3]:
import pandas as pd

# Data for the DataFrame
data = {'Name': ['Alice', 'Bob', 'Claire'],
        'Age': [25, 30, 27],
        'Gender': ['Female', 'Male', 'Female']}

# Create the DataFrame
df = pd.DataFrame(data)

# Using iloc to access the second row
print(df.iloc[1])

# Using loc to access the second row
print(df.loc[1])


Name       Bob
Age         30
Gender    Male
Name: 1, dtype: object
Name       Bob
Age         30
Gender    Male
Name: 1, dtype: object


Q2.The difference between the functions loc and iloc in pandas.DataFrame is how they are used to access rows and columns:

loc: Uses labels or index names to access rows and columns. It is primarily label-based and works with row and column labels.
iloc: Uses integer-based indexing to access rows and columns. It is primarily integer position-based and works with row and column positions.
For example:

In [4]:
# Using loc
print(df.loc[1])  # Access the row with label/index name 1

# Using iloc
print(df.iloc[1])  # Access the row with integer position 1


Name       Bob
Age         30
Gender    Male
Name: 1, dtype: object
Name       Bob
Age         30
Gender    Male
Name: 1, dtype: object


Q3. To reindex the given DataFrame df using the list [3, 0, 1, 2] and store it in the variable new_df, you can use the reindex() function as follows

In [5]:
reindex = [3, 0, 1, 2]
new_df = df.reindex(reindex)


In [6]:
print(new_df.loc[2])
print(new_df.iloc[2])


Name      Claire
Age         27.0
Gender    Female
Name: 2, dtype: object
Name       Bob
Age       30.0
Gender    Male
Name: 1, dtype: object


Q4. To find the mean of each and every column present in the DataFrame df1, you can use the mean() function on the DataFrame:

In [8]:
import pandas as pd
import numpy as np

# Data for the DataFrame
columns = ['column_1', 'column_2', 'column_3', 'column_4', 'column_5', 'column_6']
indices = [1, 2, 3, 4, 5, 6]
data = np.random.rand(6, 6)

# Create the DataFrame
df1 = pd.DataFrame(data, columns=columns, index=indices)

# Mean of each column
mean_of_columns = df1.mean()
print(mean_of_columns)



column_1    0.367104
column_2    0.334614
column_3    0.341966
column_4    0.485167
column_5    0.540367
column_6    0.514063
dtype: float64


In [9]:
# Standard deviation of column 'column_2'
std_column_2 = df1['column_2'].std()
print(std_column_2)


0.32537879060614244


Q5. If you replace the data present in the second row of column 'column_2' with a string variable, it will raise an error because the data type of column 'column_2' is expected to be numeric (e.g., float or int), and pandas will not be able to calculate the mean with string values.

For example, if you try to replace the second row of column 'column_2' with a string:

In [11]:
import pandas as pd
import numpy as np

# Data for the DataFrame
columns = ['column_1', 'column_2', 'column_3', 'column_4', 'column_5', 'column_6']
indices = [1, 2, 3, 4, 5, 6]
data = np.random.rand(6, 6)

# Create the DataFrame
df1 = pd.DataFrame(data, columns=columns, index=indices)

# Replace the data in the second row of 'column_2' with a string
df1.loc[2, 'column_2'] = 'abc'

# Convert the 'column_2' to numeric data type
df1['column_2'] = pd.to_numeric(df1['column_2'], errors='coerce')

# Calculate the mean of 'column_2'
mean_column_2 = df1['column_2'].mean()
print(mean_column_2)



0.6061926261105203


Q6. In pandas, window functions are used for performing calculations over a sliding window of data points in a time series or a DataFrame. Window functions help to analyze and process data in rolling or expanding windows.

Types of window functions in pandas:

Rolling: Provides the ability to perform calculations over a fixed-size window of data points.
Expanding: Calculates statistics over an expanding window, where the size of the window increases with each data point.
Exponential Moving Average (EMA): Computes the exponential moving average over a given window.

Q7. To print only the current month and year at the time of answering this question, you can use the datetime module in pandas:

In [13]:
import pandas as pd

# Get the current date and time
current_datetime = pd.Timestamp.now()

# Print the current month and year
print(f"Current Month: {current_datetime.month}")
print(f"Current Year: {current_datetime.year}")


Current Month: 7
Current Year: 2023


Q8. Here's a Python program that takes two dates as input and calculates the difference between them in days, hours, and minutes using Pandas timedelta:

In [14]:
import pandas as pd

# Function to calculate the difference between two dates
def date_difference(start_date, end_date):
    start_datetime = pd.to_datetime(start_date)
    end_datetime = pd.to_datetime(end_date)
    
    time_difference = end_datetime - start_datetime
    days = time_difference.days
    hours = time_difference.seconds // 3600
    minutes = (time_difference.seconds // 60) % 60
    
    return days, hours, minutes

# Prompt user to enter dates
start_date = input("Enter the start date (YYYY-MM-DD): ")
end_date = input("Enter the end date (YYYY-MM-DD): ")

# Calculate the difference
days, hours, minutes = date_difference(start_date, end_date)

# Display the result
print(f"Difference: {days} days, {hours} hours, {minutes} minutes")


Enter the start date (YYYY-MM-DD): 2021-09-12
Enter the end date (YYYY-MM-DD): 2021-10-12
Difference: 30 days, 0 hours, 0 minutes


To write a Python program that reads a CSV file containing categorical data and converts a specified column to a categorical data type, you can use the pandas read_csv() function and then convert the column to a categorical data type using the astype() function. Here's an example

In [None]:
import pandas as pd

# Function to convert a specified column to categorical data type
def convert_to_categorical(file_path, column_name, category_order):
    # Read the CSV file into a DataFrame
    df = pd.read_csv(file_path)

    # Convert the specified column to a categorical data type
    df[column_name] = df[column_name].astype('category', categories=category_order, ordered=True)

    # Sort the DataFrame based on the specified column
    df.sort_values(by=column_name, inplace=True)

    return df

# Prompt user to enter file path, column name, and category order
file_path = input("Enter the file path: ")
column_name = input("Enter the column name: ")
category_order = input("Enter the category order (comma-separated): ").split(',')

# Call the function and display the sorted data
sorted_df = convert_to_categorical(file_path, column_name, category_order)
print(sorted_df)


Q10. To write a Python program that reads a CSV file containing sales data for different products and visualizes the data using a stacked bar chart to show the sales of each product category over time, you can use the pandas read_csv() function to read the CSV file and the plot() function with the kind='bar' parameter to create the stacked bar chart. Here's an example:



In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Function to visualize sales data using a stacked bar chart
def visualize_sales_data(file_path):
    # Read the CSV file into a DataFrame
    df = pd.read_csv(file_path)

    # Set the 'Date' column as the index
    df['Date'] = pd.to_datetime(df['Date'])
    df.set_index('Date', inplace=True)

    # Plot the stacked bar chart
    df.plot(kind='bar', stacked=True, figsize=(10, 6))
    plt.xlabel('Date')
    plt.ylabel('Sales')
    plt.title('Sales Data - Stacked Bar Chart')
    plt.show()

# Prompt user to enter file path
file_path = input("Enter the file path: ")

# Call the function to visualize the data
visualize_sales_data(file_path)


Q11. To write a Python program that reads a CSV file containing student data and calculates the mean, median, and mode of the test scores, you can use the pandas read_csv() function to read the CSV file, and then use the mean(), median(), and mode() functions to calculate the respective statistics. Here's an example:

In [None]:
import pandas as pd

# Function to calculate mean, median, and mode of test scores
def calculate_statistics(file_path):
    # Read the CSV file into a DataFrame
    df = pd.read_csv(file_path)

    # Calculate the mean, median, and mode of the test scores
    mean_score = df['Test Score'].mean()
    median_score = df['Test Score'].median()
    mode_score = df['Test Score'].mode()

    # Display the results in a table
    statistics_table = pd.DataFrame({'Statistic': ['Mean', 'Median', 'Mode'],
                                     'Value': [mean_score, median_score, ', '.join(map(str, mode_score))]})

    print(statistics_table)

# Prompt user to enter file path
file_path = input("Enter the file path: ")

# Call the function to calculate and display statistics
calculate_statistics(file_path)
