### <b>Question No. 1</b>

Here are five functions of the pandas library along with their execution:

1. **`read_csv()`**: This function is used to read a CSV file into a DataFrame.

```python
import pandas as pd

# Read a CSV file into a DataFrame
df = pd.read_csv('data.csv')
print(df.head())
```

2. **`groupby()`**: This function is used to group data in a DataFrame based on one or more columns.

```python
# Grouping the data by 'Gender' and computing the average age for each gender
avg_age_by_gender = df.groupby('Gender')['Age'].mean()
print(avg_age_by_gender)
```

3. **`merge()`**: This function is used to merge two DataFrames based on a common column.

```python
# Merging two DataFrames on the 'ID' column
merged_df = pd.merge(df1, df2, on='ID')
print(merged_df.head())
```

4. **`fillna()`**: This function is used to fill missing values in a DataFrame with a specified value or method.

```python
# Filling missing values in the 'Age' column with the mean age
mean_age = df['Age'].mean()
df['Age'] = df['Age'].fillna(mean_age)
print(df.head())
```

5. **`pivot_table()`**: This function creates a spreadsheet-style pivot table as a DataFrame.

```python
# Creating a pivot table to show the average age of each gender for each city
pivot_table = df.pivot_table(index='City', columns='Gender', values='Age', aggfunc='mean')
print(pivot_table)
```

### <b>Question No. 2

In [1]:
import pandas as pd

def reindex_with_increment(df):
    # Reset the index to default (0, 1, 2, ...)
    df = df.reset_index(drop=True)
    # Create a new index starting from 1 and incrementing by 2
    new_index = pd.Series(range(1, 2 * len(df), 2))
    # Set the new index
    df.index = new_index
    return df

# Example DataFrame
df = pd.DataFrame({'A': [10, 20, 30], 'B': [40, 50, 60], 'C': [70, 80, 90]})

# Reindex the DataFrame
df_reindexed = reindex_with_increment(df)

print(df_reindexed)

    A   B   C
1  10  40  70
3  20  50  80
5  30  60  90


### <b>Question No. 3

In [2]:
import pandas as pd

def calculate_sum_of_first_three_values(df):
    sum_values = 0
    # Iterate over the DataFrame rows
    for index, row in df.iterrows():
        # Add the value to the sum
        sum_values += row['Values']
        # Break the loop after adding the first three values
        if index == 2:
            break
    # Print the sum to the console
    print("Sum of the first three values:", sum_values)

# Example DataFrame
df = pd.DataFrame({'Values': [10, 20, 30, 40, 50, 60, 70]})

# Call the function with the DataFrame
calculate_sum_of_first_three_values(df)


Sum of the first three values: 60


### <b>Question No. 4

In [3]:
import pandas as pd

def count_words(text):
    # Split the text into words using whitespace as a delimiter
    words = text.split()
    # Return the number of words
    return len(words)

def add_word_count_column(df):
    # Apply the count_words function to each row in the 'Text' column
    df['Word_Count'] = df['Text'].apply(count_words)
    return df

# Example DataFrame
data = {'Text': ['Abhishek Kumar Singh', 'Ravi Ranjan', 'Prajapati Aditya Nath Gautam Vats']}
df = pd.DataFrame(data)

# Add the 'Word_Count' column to the DataFrame
df = add_word_count_column(df)

print(df)


                                Text  Word_Count
0               Abhishek Kumar Singh           3
1                        Ravi Ranjan           2
2  Prajapati Aditya Nath Gautam Vats           5


### <b>Question No. 5

In Pandas, `DataFrame.size` and `DataFrame.shape` are attributes that provide different information about the DataFrame:

1. **`DataFrame.size`**: This attribute returns the total number of elements in the DataFrame, which is equal to the product of the number of rows and columns. It does not include any NaN/missing values in the count.

2. **`DataFrame.shape`**: This attribute returns a tuple representing the dimensions of the DataFrame. The tuple contains two elements: the number of rows and the number of columns.

Here's a simple example to illustrate the difference:

```python
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Print DataFrame size and shape
print("DataFrame size:", df.size)
print("DataFrame shape:", df.shape)
```

Output:
```
DataFrame size: 6
DataFrame shape: (3, 2)
```

In this example, the DataFrame has 3 rows and 2 columns. Therefore, the size is 3 * 2 = 6, and the shape is (3, 2).

### <b>Question No. 6

To read an Excel file in Pandas, we use the `read_excel()` function.

### <b>Question No. 7

In [4]:
import pandas as pd

def extract_username(email):
    # Split the email address at the '@' symbol and return the first part
    return email.split('@')[0]

def add_username_column(df):
    # Apply the extract_username function to each row in the 'Email' column
    df['Username'] = df['Email'].apply(extract_username)
    return df

# Example DataFrame
data = {'Email': ['jalalpur.aks@gmail.com', 'hacrjit@outlook.com']}
df = pd.DataFrame(data)

# Add the 'Username' column to the DataFrame
df = add_username_column(df)

print(df)

                    Email      Username
0  jalalpur.aks@gmail.com  jalalpur.aks
1     hacrjit@outlook.com       hacrjit


### <b>Question No. 8

In [5]:
import pandas as pd

def select_rows(df):
    # Select rows where value in column 'A' is greater than 5 and value in column 'B' is less than 10
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    return selected_rows

# Example DataFrame
data = {'A': [3, 7, 2, 6, 8], 'B': [9, 5, 3, 8, 10], 'C': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)

# Select rows
selected_df = select_rows(df)

print(selected_df)

   A  B  C
1  7  5  2
3  6  8  4


### <b>Question No. 9

In [6]:
import pandas as pd

def calculate_statistics(df):
    # Calculate mean, median, and standard deviation of the 'Values' column
    mean_value = df['Values'].mean()
    median_value = df['Values'].median()
    std_value = df['Values'].std()
    return mean_value, median_value, std_value

# Example DataFrame
data = {'Values': [10, 28, 30, 41, 50]}
df = pd.DataFrame(data)

# Calculate statistics
mean_value, median_value, std_value = calculate_statistics(df)

print("Mean:", mean_value)
print("Median:", median_value)
print("Standard Deviation:", std_value)

Mean: 31.8
Median: 30.0
Standard Deviation: 15.073154945133417


### <b>Question No. 10

In [7]:
import pandas as pd

def calculate_moving_average(df):
    # Calculate the moving average of the 'Sales' column with a window size of 7
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    return df

# Example DataFrame
data = {'Date': pd.date_range(start='2022-01-01', periods=10),
        'Sales': [100, 150, 200, 250, 300, 350, 400, 450, 500, 550]}
df = pd.DataFrame(data)

# Calculate moving average
df = calculate_moving_average(df)
print(df)

        Date  Sales  MovingAverage
0 2022-01-01    100          100.0
1 2022-01-02    150          125.0
2 2022-01-03    200          150.0
3 2022-01-04    250          175.0
4 2022-01-05    300          200.0
5 2022-01-06    350          225.0
6 2022-01-07    400          250.0
7 2022-01-08    450          300.0
8 2022-01-09    500          350.0
9 2022-01-10    550          400.0


### <b>Question No. 11

In [8]:
import pandas as pd

def add_weekday_column(df):
    # Convert 'Date' column to datetime format if not already
    df['Date'] = pd.to_datetime(df['Date'])
    # Add a new column 'Weekday' containing the weekday name
    df['Weekday'] = df['Date'].dt.day_name()
    return df

# Example DataFrame
data = {'Date': ['2024-01-01', '2024-01-02', '2004-01-03', '2024-01-04', '2028-01-05']}
df = pd.DataFrame(data)

# Add 'Weekday' column
df = add_weekday_column(df)

print(df)

        Date    Weekday
0 2024-01-01     Monday
1 2024-01-02    Tuesday
2 2004-01-03   Saturday
3 2024-01-04   Thursday
4 2028-01-05  Wednesday


### <b>Question No. 12

In [9]:
import pandas as pd

def select_rows_between_dates(df):
    # Convert 'Date' column to datetime format if not already
    df['Date'] = pd.to_datetime(df['Date'])
    # Select rows where 'Date' is between '2023-01-01' and '2023-01-31'
    selected_rows = df[(df['Date'] >= '2020-01-01') & (df['Date'] <= '2023-01-31')]
    return selected_rows

# Example DataFrame
data = {'Date': ['2002-12-31', '2021-01-05', '2019-01-15', '2022-02-01', '2023-02-15']}
df = pd.DataFrame(data)

# Select rows between '2023-01-01' and '2023-01-31'
selected_df = select_rows_between_dates(df)

print(selected_df)

        Date
1 2021-01-05
3 2022-02-01


### <b>Question No. 13

To use the basic functions of pandas, we need to import the pandas library. 