## Q1. List any five functions of the pandas library with execution.

Ans= Here are five common functions of the Pandas library along with their executions:

1) read_csv(): This function is used to read data from a CSV file and create a DataFrame.

import pandas as pd

df = pd.read_csv('data.csv')

print(df)

2) head(): This function is used to display the first few rows of a DataFrame

import pandas as pd

data = {'Name': ['John', 'Alice', 'Bob', 'Emily', 'Jack'],
        'Age': [25, 28, 32, 19, 45],
        'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']}
        
df = pd.DataFrame(data)

print(df.head(3))

3) info(): This function provides a concise summary of a DataFrame, including the data types and memory usage.

import pandas as pd

data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': [25, 28, 32],
        'City': ['New York', 'Paris', 'London']}
        
df = pd.DataFrame(data)

print(df.info())

4) describe(): This function generates descriptive statistics of a DataFrame, including count, mean, standard deviation, minimum, maximum, and quartile values.

import pandas as pd

data = {'Name': ['John', 'Alice', 'Bob', 'Emily', 'Jack'],
        'Age': [25, 28, 32, 19, 45],
        'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']}
        
df = pd.DataFrame(data)

print(df.describe())

5) groupby(): This function is used for grouping rows of a DataFrame based on a specific column or multiple columns.

import pandas as pd

data = {'Name': ['John', 'Alice', 'Bob', 'Emily', 'Jack'],
        'Age': [25, 28, 32, 19, 45],
        'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney'],
        'Salary': [5000, 6000, 4500, 5500, 7000]}
        
df = pd.DataFrame(data)

grouped_df = df.groupby('City')['Salary'].mean()

print(grouped_df)


## Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the DataFrame with a new index that starts from 1 and increments by 2 for each row.

Ans=

In [3]:
import pandas as pd

def reindex_dataframe(df):
    new_index = pd.Index(range(1, len(df) * 2, 2))
    df_reindexed = df.set_index(new_index)
    return df_reindexed

data = {'A': [10, 20, 30],
        'B': [40, 50, 60],
        'C': [70, 80, 90]}
df = pd.DataFrame(data)

df_reindexed = reindex_dataframe(df)

print(df_reindexed)



    A   B   C
1  10  40  70
3  20  50  80
5  30  60  90


## Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The function should print the sum to the console.

Ans=

In [4]:
import pandas as pd

def calculate_sum(df):
    values_column = df['Values']  
    sum_first_three = sum(values_column[:3])  
    print("Sum of the first three values:", sum_first_three)

data = {'Values': [10, 20, 30, 40, 50, 60]}
df = pd.DataFrame(data)

calculate_sum(df)


Sum of the first three values: 60


## Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column.

Ans=

In [5]:
import pandas as pd

def add_word_count(df):
    df['Word_Count'] = df['Text'].apply(lambda x: len(str(x).split()))
    return df

data = {'Text': ['This is a sample text', 'Hello world', 'Python programming is fun']}
df = pd.DataFrame(data)

df_with_word_count = add_word_count(df)

print(df_with_word_count)


                        Text  Word_Count
0      This is a sample text           5
1                Hello world           2
2  Python programming is fun           4


## Q5. How are DataFrame.size() and DataFrame.shape() different?

Ans=

The DataFrame.size and DataFrame.shape attributes in Pandas provide different information about the dimensions of a DataFrame:

DataFrame.size: This attribute returns the total number of elements in the DataFrame, which is calculated by multiplying the number of rows (DataFrame.shape[0]) by the number of columns (DataFrame.shape[1]). In other words, it represents the total count of cells in the DataFrame.

DataFrame.shape: This attribute returns a tuple representing the dimensions of the DataFrame. The tuple consists of two elements: the number of rows and the number of columns, respectively. So, DataFrame.shape[0] represents the number of rows, and DataFrame.shape[1] represents the number of columns.


In [6]:
import pandas as pd

data = {'A': [1, 2, 3],
        'B': [4, 5, 6],
        'C': [7, 8, 9]}
df = pd.DataFrame(data)

# Calculating DataFrame size
df_size = df.size

# Retrieving DataFrame shape
df_shape = df.shape

print("DataFrame Size:", df_size)
print("DataFrame Shape:", df_shape)

DataFrame Size: 9
DataFrame Shape: (3, 3)


## Q6. Which function of pandas do we use to read an excel file?

Ans= In Pandas, the function used to read an Excel file is pd.read_excel(). It allows you to read data from an Excel file and create a DataFrame.

import pandas as pd

df = pd.read_excel('data.xlsx')

print(df)


### Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email addresses in the format 'username@domain.com'. Write a Python function that creates a new column 'Username' in df that contains only the username part of each email address. The username is the part of the email address that appears before the '@' symbol. For example, if the email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your function should extract the username from each email address and store it in the new 'Username' column.

Ans=

In [9]:
import pandas as pd

def extract_username(df):
    df['Username'] = df['Email'].str.split('@').str[0]
    return df

data = {'Email': ['john@example.com', 'alice@example.com', 'bob@example.com']}
df = pd.DataFrame(data)

df_with_username = extract_username(df)

print(df_with_username)


               Email Username
0   john@example.com     john
1  alice@example.com    alice
2    bob@example.com      bob


## Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The function should return a new DataFrame that contains only the selected rows.

Ans=

In [10]:
import pandas as pd

def select_rows(df):
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    return selected_rows

data = {'A': [3,8,6,2,9],
        'B': [5,2,9,3,1],
        'C': [1,7,4,5,2]}
df = pd.DataFrame(data)

selected_df = select_rows(df)

print(selected_df)


   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


## Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean, median, and standard deviation of the values in the 'Values' column.
Ans=

In [11]:
import pandas as pd

def calculate_stats(df):
    mean_value = df['Values'].mean()
    median_value = df['Values'].median()
    std_value = df['Values'].std()
    return mean_value, median_value, std_value

data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

mean, median, std = calculate_stats(df)

print("Mean:", mean)
print("Median:", median)
print("Standard Deviation:", std)


Mean: 30.0
Median: 30.0
Standard Deviation: 15.811388300841896


## Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days for each row in the DataFrame. The moving average should be calculated using a window of size 7 and should include the current day.

Ans=

In [12]:
import pandas as pd

def calculate_moving_average(df):
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    return df

data = {'Date': ['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06', '2022-01-07'],
        'Sales': [10, 15, 12, 8, 11, 14, 16]}
df = pd.DataFrame(data)

df['Date'] = pd.to_datetime(df['Date'])

df.sort_values('Date', inplace=True)

df_with_ma = calculate_moving_average(df)

print(df_with_ma)


        Date  Sales  MovingAverage
0 2022-01-01     10      10.000000
1 2022-01-02     15      12.500000
2 2022-01-03     12      12.333333
3 2022-01-04      8      11.250000
4 2022-01-05     11      11.200000
5 2022-01-06     14      11.666667
6 2022-01-07     16      12.285714


## Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g. Monday, Tuesday) corresponding to each date in the 'Date' column.

Ans=

In [14]:
import pandas as pd

def add_weekday_column(df):
    df['Weekday'] = df['Date'].dt.strftime('%A')
    return df

data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']}
df = pd.DataFrame(data)

df['Date'] = pd.to_datetime(df['Date'])

df_with_weekday = add_weekday_column(df)

print(df_with_weekday)


        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


## Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

Ans=

In [15]:
import pandas as pd

def select_rows_by_date_range(df, start_date, end_date):
    mask = (df['Date'] >= start_date) & (df['Date'] <= end_date)
    selected_rows = df.loc[mask]
    return selected_rows

data = {'Date': ['2022-12-31', '2023-01-01', '2023-01-15', '2023-01-31', '2023-02-10']}
df = pd.DataFrame(data)

df['Date'] = pd.to_datetime(df['Date'])

start_date = pd.to_datetime('2023-01-01')
end_date = pd.to_datetime('2023-01-31')
selected_df = select_rows_by_date_range(df, start_date, end_date)

print(selected_df)


        Date
1 2023-01-01
2 2023-01-15
3 2023-01-31


## Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to be imported?

The first and foremost library that needs to be imported to use the basic functions of pandas is the pandas library itself. The pandas library provides the necessary data structures and functions to efficiently work with data in tabular form, such as the DataFrame object.

To import the pandas library in Python, you can use the following import statement:
    
import pandas as pd
