### Q1: Print the data present in the second row of the dataframe, df

In [None]:
import pandas as pd
course_name = ['Data Science', 'Machine Learning', 'Big Data', 'Data Engineer']
duration = [2,3,6,4]
df = pd.DataFrame(data = {'course_name' : course_name, 'duration' : duration})
# Printing the data in the second row
df.iloc[1]  # Second row, index 1


### Q2: Difference between loc and iloc in pandas.DataFrame

The `.loc[]` function is label-based, meaning it selects data by row and column labels, while `.iloc[]` is integer-location-based and selects data by row and column index positions.  
For example:
- `df.loc[1]` accesses the row with label `1`, while `df.iloc[1]` accesses the row at index position `1`.

### Q3: Reindexing the dataframe and finding new_df.loc[2] and new_df.iloc[2]

In [None]:
reindex = [3, 0, 1, 2]
new_df = df.reindex(reindex)
# Display new_df.loc[2] and new_df.iloc[2]
new_df.loc[2], new_df.iloc[2]


### Q4: Statistical measurements

In [None]:
# Creating a dataframe for statistical measurements
columns = ['column_1', 'column_2', 'column_3', 'column_4', 'column_5', 'column_6']
indices = [1,2,3,4,5,6]
df1 = pd.DataFrame(np.random.rand(6,6), columns = columns, index = indices)

# (i) Mean of each column
df1.mean()

# (ii) Standard deviation of 'column_2'
df1['column_2'].std()


### Q5: Replace data in second row of 'column_2' and find mean

In [None]:
# Attempt to replace second row of column_2 with a string
df1.loc[2, 'column_2'] = 'string_value'

# Now, attempting to calculate mean of column_2 (expected to raise an error)
try:
    df1['column_2'].mean()
except TypeError as e:
    str(e)


Explanation: Replacing a numerical value with a string in `column_2` causes the `mean` function to fail, as it cannot calculate a mean with mixed data types (string and numeric). To calculate the mean, all entries in the column must be numeric.

### Q6: Windows functions in pandas

Windows functions in pandas allow for the application of rolling calculations, such as moving averages, to data over a specified window. Types include:
1. **Rolling**: Applies functions over a fixed-size moving window (e.g., rolling mean).
2. **Expanding**: Applies functions over a window that grows with each observation (e.g., cumulative sum).
3. **EWM (Exponentially Weighted Mean)**: Calculates weighted averages where weights decay exponentially over time (e.g., EWM mean).


### Q7: Print current month and year

In [None]:
# Getting the current month and year
current_date = pd.Timestamp.now()
current_date.strftime('%B %Y')


### Q8: Calculate difference between two dates in days, hours, and minutes

In [None]:
# Function to calculate date difference
def date_difference(date1, date2):
    date1 = pd.to_datetime(date1)
    date2 = pd.to_datetime(date2)
    delta = date2 - date1
    days = delta.days
    hours, remainder = divmod(delta.seconds, 3600)
    minutes, _ = divmod(remainder, 60)
    return days, hours, minutes

# Example
date1 = '2024-01-01'
date2 = '2024-01-03'
date_difference(date1, date2)


### Q9: Convert column in CSV to categorical data

In [None]:
# Sample code to read CSV and convert a column to categorical
file_path = input("Enter CSV file path: ")
column_name = input("Enter column name to convert to categorical: ")
categories = input("Enter category order separated by commas: ").split(',')

# Read CSV and convert specified column to categorical
df_cat = pd.read_csv(file_path)
df_cat[column_name] = pd.Categorical(df_cat[column_name], categories=categories, ordered=True)
df_cat.sort_values(by=column_name, inplace=True)
df_cat.head()


### Q10: Visualize sales data using stacked bar chart

In [None]:
import matplotlib.pyplot as plt

# Read CSV file for sales data and create a stacked bar chart
file_path = input("Enter CSV file path: ")
df_sales = pd.read_csv(file_path)

# Assuming the CSV has columns 'Date', 'Product', 'Sales'
df_sales['Date'] = pd.to_datetime(df_sales['Date'])
df_pivot = df_sales.pivot_table(index='Date', columns='Product', values='Sales', aggfunc='sum')
df_pivot.plot(kind='bar', stacked=True, figsize=(10, 6))
plt.xlabel('Date')
plt.ylabel('Sales')
plt.title('Sales by Product Category Over Time')
plt.show()


### Q11: Calculate and display mean, median, and mode for test scores

In [None]:
# Sample code to calculate statistics from CSV
file_path = input("Enter the CSV file path: ")
df_student = pd.read_csv(file_path)

mean_score = df_student['Test Score'].mean()
median_score = df_student['Test Score'].median()
mode_score = df_student['Test Score'].mode().tolist()

# Display results in a table
from tabulate import tabulate

table = [["Mean", mean_score], ["Median", median_score], ["Mode", ', '.join(map(str, mode_score))]]
print(tabulate(table, headers=["Statistic", "Value"], tablefmt="grid"))
