## Q1. List any five functions of the pandas library with execution.

Certainly! Here are five functions from the pandas library along with their execution examples:

1. **`read_csv()`**: This function is used to read data from a CSV file and create a DataFrame.

```python
import pandas as pd

# Read data from a CSV file into a DataFrame
df = pd.read_csv('data.csv')

print(df.head())
```

2. **`groupby()`**: Used to group data based on one or more columns and perform aggregate operations on the groups.

```python
# Assuming 'df' is a DataFrame
grouped = df.groupby('Category')['Price'].mean()

print(grouped)
```

3. **`dropna()`**: Removes rows or columns with missing values (NaN).

```python
# Assuming 'df' is a DataFrame
cleaned_df = df.dropna()

print(cleaned_df.head())
```

4. **`merge()`**: Combines two DataFrames based on a common column or index.

```python
# Assuming 'df1' and 'df2' are DataFrames
merged_df = pd.merge(df1, df2, on='ID')

print(merged_df.head())
```

5. **`pivot_table()`**: Creates a pivot table from a DataFrame, allowing you to summarize and rearrange data.

```python
# Assuming 'df' is a DataFrame
pivot_table = df.pivot_table(index='Category', values='Revenue', aggfunc='sum')

print(pivot_table)
```

Please note that these examples assume that you have the necessary data or DataFrames (`df`, `df1`, `df2`) and corresponding CSV files (`data.csv`) to work with.

## Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [1]:
import pandas as pd

def reindex_with_custom_increment(df):
    new_index = range(1, 2 * len(df) + 1, 2)  # New index starting from 1 and incrementing by 2
    df_reindexed = df.copy()
    df_reindexed.index = new_index
    return df_reindexed

# Sample DataFrame
data = {
    'A': [10, 20, 30],
    'B': [5, 15, 25],
    'C': [100, 200, 300]
}

df = pd.DataFrame(data)

# Re-index the DataFrame
reindexed_df = reindex_with_custom_increment(df)

print(reindexed_df)


    A   B    C
1  10   5  100
3  20  15  200
5  30  25  300


## Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The function should print the sum to the console.

In [2]:
import pandas as pd

def calculate_sum_of_first_three(df):
    values_column = df['Values']  # Extract the 'Values' column
    sum_first_three = values_column.head(3).sum()  # Sum of the first three values
    print("Sum of the first three values:", sum_first_three)

# Sample DataFrame
data = {
    'Values': [10, 20, 30, 40, 50]
}

df = pd.DataFrame(data)

# Calculate and print the sum of the first three values
calculate_sum_of_first_three(df)


Sum of the first three values: 60


## Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column.

In [3]:
import pandas as pd

def count_words(text):
    return len(text.split())  # Split the text into words and count them

# Sample DataFrame
data = {
    'Text': [
        "This is a sample sentence.",
        "Another example for word count.",
        "Just a few words here."
    ]
}

df = pd.DataFrame(data)

# Apply the count_words function to create the 'Word_Count' column
df['Word_Count'] = df['Text'].apply(count_words)

print(df)


                              Text  Word_Count
0       This is a sample sentence.           5
1  Another example for word count.           5
2           Just a few words here.           5


## Q5. How are DataFrame.size() and DataFrame.shape() different?

`DataFrame.size` and `DataFrame.shape` are both attributes of Pandas DataFrames that provide information about the size and shape of the DataFrame, but they provide different types of information:

1. **`DataFrame.size`**:
   - `DataFrame.size` returns the total number of elements in the DataFrame, which is equal to the number of rows multiplied by the number of columns.
   - It provides the total count of cells in the DataFrame, regardless of their content.

Example:
```python
import pandas as pd

data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)

print(df.size)  # Output: 6 (2 rows * 3 columns)
```

2. **`DataFrame.shape`**:
   - `DataFrame.shape` returns a tuple representing the dimensions of the DataFrame. The tuple contains two values: the number of rows and the number of columns.
   - It provides information about the structure of the DataFrame in terms of rows and columns.

Example:
```python
import pandas as pd

data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)

print(df.shape)  # Output: (3, 2) (3 rows, 2 columns)
```

In summary, `DataFrame.size` gives you the total number of elements (cells) in the DataFrame, while `DataFrame.shape` gives you the dimensions of the DataFrame in terms of rows and columns.

## Q6. Which function of pandas do we use to read an excel file?

To read an Excel file using Pandas, you can use the `pd.read_excel()` function. This function reads data from an Excel file and returns it as a Pandas DataFrame.

Here's how you can use the `pd.read_excel()` function to read an Excel file:

```python
import pandas as pd

# Read data from an Excel file into a DataFrame
df = pd.read_excel('file.xlsx')

print(df.head())
```

In this example, replace `'file.xlsx'` with the path to your Excel file. The function reads the data from the Excel file and creates a DataFrame containing the data.

## Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email addresses in the format 'username@domain.com'. Write a Python function that creates a new column 'Username' in df that contains only the username part of each email address.

In [None]:
import pandas as pd

def extract_username(email):
    return email.split('@')[0]  # Split by '@' and take the first part

# Sample DataFrame
data = {
    'Email': [
        'john.doe@example.com',
        'jane.smith@example.com',
        'alice.wonderland@example.com'
    ]
}

df = pd.DataFrame(data)

# Apply the extract_username function to create the 'Username' column
df['Username'] = df['Email'].apply(extract_username)

print(df)


                          Email          Username
0          john.doe@example.com          john.doe
1        jane.smith@example.com        jane.smith
2  alice.wonderland@example.com  alice.wonderland


## Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The function should return a new DataFrame that contains only the selected rows.

In [6]:
import pandas as pd

# Sample DataFrame
data = {
    'A': [3, 8, 6, 2, 9],
    'B': [5, 2, 9, 3, 1],
    'C': [1, 7, 4, 5, 2]
}

df = pd.DataFrame(data)

def filter_dataframe(df):
    filtered_df = df[(df['A'] > 5) & (df['B'] < 10)]
    return filtered_df

# Call the function to get the filtered DataFrame
filtered_df = filter_dataframe(df)

print(filtered_df)


   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


## Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean, median, and standard deviation of the values in the 'Values' column.

In [7]:
import pandas as pd

def calculate_statistics(df):
    values_column = df['Values']  # Extract the 'Values' column

    mean = values_column.mean()  # Calculate the mean
    median = values_column.median()  # Calculate the median
    std_dev = values_column.std()  # Calculate the standard deviation

    return mean, median, std_dev

# Sample DataFrame
data = {
    'Values': [10, 20, 30, 40, 50]
}

df = pd.DataFrame(data)

# Calculate statistics using the function
mean, median, std_dev = calculate_statistics(df)

print("Mean:", mean)
print("Median:", median)
print("Standard Deviation:", std_dev)


Mean: 30.0
Median: 30.0
Standard Deviation: 15.811388300841896


## Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days for each row in the DataFrame. The moving average should be calculated using a window of size 7 and should include the current day.

In [8]:
import pandas as pd

def calculate_moving_average(df):
    window_size = 7
    df['MovingAverage'] = df['Sales'].rolling(window=window_size, min_periods=1).mean()
    return df

# Sample DataFrame
data = {
    'Date': ['2023-08-01', '2023-08-02', '2023-08-03', '2023-08-04', '2023-08-05'],
    'Sales': [100, 150, 200, 120, 180]
}

df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])  # Convert 'Date' column to datetime

# Calculate moving average using the function
df_with_moving_avg = calculate_moving_average(df)

print(df_with_moving_avg)


        Date  Sales  MovingAverage
0 2023-08-01    100          100.0
1 2023-08-02    150          125.0
2 2023-08-03    200          150.0
3 2023-08-04    120          142.5
4 2023-08-05    180          150.0


## Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g. Monday, Tuesday) corresponding to each date in the 'Date' column.

In [9]:
import pandas as pd

def add_weekday_column(df):
    df['Weekday'] = df['Date'].dt.strftime('%A')  # Extract weekday name and add to 'Weekday' column
    return df

# Sample DataFrame
data = {
    'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']
}

df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])  # Convert 'Date' column to datetime

# Add 'Weekday' column using the function
df_with_weekday = add_weekday_column(df)

print(df_with_weekday)


        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


## Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Pythonfunction to select all rows where the date is between '2023-01-01' and '2023-01-31'.

In [10]:
import pandas as pd

def filter_by_date_range(df):
    start_date = '2023-01-01'
    end_date = '2023-01-31'
    filtered_df = df[(df['Date'] >= start_date) & (df['Date'] <= end_date)]
    return filtered_df

# Sample DataFrame
data = {
    'Date': [
        '2023-01-01', '2023-01-15', '2023-01-20', '2023-01-31', '2023-02-10'
    ]
}

df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])  # Convert 'Date' column to datetime

# Filter by date range using the function
filtered_df = filter_by_date_range(df)

print(filtered_df)


        Date
0 2023-01-01
1 2023-01-15
2 2023-01-20
3 2023-01-31


## Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to be imported? 

To use the basic functions of Pandas, you need to import the `pandas` library itself. You typically import it using the following line:

```python
import pandas as pd
```

By convention, most users alias `pandas` as `pd` to make the code shorter and more readable. This is the first and foremost step you should take before using any Pandas functions in your Python code.