Q1. List any five functions of the pandas library with execution.

Sure, here are five common functions of the pandas library along with their execution:

1. **`read_csv()`**: This function is used to read a CSV file into a pandas DataFrame.

```python
import pandas as pd

# Read a CSV file into a DataFrame
df = pd.read_csv('data.csv')
print(df.head())
```

2. **`head()`**: This function is used to display the first few rows of a DataFrame.

```python
import pandas as pd

# Create a sample DataFrame
data = {'A': [1, 2, 3, 4, 5], 'B': ['a', 'b', 'c', 'd', 'e']}
df = pd.DataFrame(data)

# Display the first 3 rows of the DataFrame
print(df.head(3))
```

3. **`describe()`**: This function is used to generate descriptive statistics of the DataFrame's numerical columns.

```python
import pandas as pd

# Create a sample DataFrame
data = {'A': [1, 2, 3, 4, 5], 'B': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Generate descriptive statistics
print(df.describe())
```

4. **`groupby()`**: This function is used to group DataFrame using a mapper or by a Series of columns.

```python
import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Alice', 'Bob'],
        'Score': [80, 90, 85, 95]}
df = pd.DataFrame(data)

# Group DataFrame by 'Name' and calculate mean score
grouped = df.groupby('Name').mean()
print(grouped)
```

5. **`to_csv()`**: This function is used to write DataFrame to a CSV file.

```python
import pandas as pd

# Create a sample DataFrame
data = {'A': [1, 2, 3, 4, 5], 'B': ['a', 'b', 'c', 'd', 'e']}
df = pd.DataFrame(data)

# Write DataFrame to a CSV file
df.to_csv('output.csv', index=False)
``` 

These are just a few of the many functions available in the pandas library for data manipulation and analysis.

Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [2]:
import pandas as pd

def reindex_dataframe(df):
    # Generate a new index starting from 1 and incrementing by 2
    new_index = range(1, len(df) * 2, 2)
    
    # Re-index the DataFrame with the new index
    df.index = new_index
    
    return df

# Example usage
# Assuming 'df' is your DataFrame with columns 'A', 'B', and 'C'

# Create a sample DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)

# Call the function to re-index the DataFrame
df_reindexed = reindex_dataframe(df)

print(df_reindexed)

   A  B  C
1  1  4  7
3  2  5  8
5  3  6  9


Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. Thefunction should print the sum to the console.
For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function should calculate and print the sum of the first three values, which is 60.

In [3]:
import pandas as pd

def calculate_sum(df):
    # Select the first three values in the 'Values' column using iloc and calculate their sum
    sum_first_three = df['Values'].iloc[:3].sum()
    
    # Print the sum to the console
    print("Sum of the first three values:", sum_first_three)

# Example usage
# Assuming 'df' is your DataFrame with a column 'Values'

# Create a sample DataFrame
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Call the function to calculate the sum of the first three values
calculate_sum(df)

Sum of the first three values: 60


Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column.

In [4]:
import pandas as pd

def count_words(df):
    # Split text into words and count the number of words
    df['Word_Count'] = df['Text'].str.split().apply(len)
    
    return df

# Example usage
# Assuming 'df' is your DataFrame with a column 'Text'

# Create a sample DataFrame
data = {'Text': ['This is a sample text', 'Another example', 'Yet another text']}
df = pd.DataFrame(data)

# Call the function to count words
df_with_word_count = count_words(df)

print(df_with_word_count)

                    Text  Word_Count
0  This is a sample text           5
1        Another example           2
2       Yet another text           3


Q5. How are DataFrame.size() and DataFrame.shape() different?

Both `DataFrame.size` and `DataFrame.shape` are attributes of a Pandas DataFrame that provide information about the structure of the DataFrame, but they provide different kinds of information:

1. **DataFrame.size**:
   - `DataFrame.size` returns the total number of elements in the DataFrame, which is calculated by multiplying the number of rows by the number of columns.
   - It represents the total number of cells or entries in the DataFrame.
   - The returned value is an integer.

2. **DataFrame.shape**:
   - `DataFrame.shape` returns a tuple representing the dimensions of the DataFrame, i.e., the number of rows and columns.
   - It provides the number of rows as the first element and the number of columns as the second element.
   - The returned value is a tuple of integers.

Here's a summary of the differences:

- `DataFrame.size` returns the total number of elements (rows * columns).
- `DataFrame.shape` returns a tuple of the number of rows and columns.

For example:
```python
import pandas as pd

# Create a DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)

# Using DataFrame.size
print("DataFrame size:", df.size)  # Output: 6

# Using DataFrame.shape
print("DataFrame shape:", df.shape)  # Output: (3, 2)
```

In this example, the DataFrame `df` has 3 rows and 2 columns. Therefore, `df.size` returns 6 (3 rows * 2 columns), while `df.shape` returns `(3, 2)`, indicating 3 rows and 2 columns.

Q6. Which function of pandas do we use to read an excel file?

In [5]:
import pandas as pd

# Read data from an Excel file
df = pd.read_excel('file.xlsx')

FileNotFoundError: [Errno 2] No such file or directory: 'file.xlsx'

Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email addresses in the format 'username@domain.com'. Write a Python function that creates a new column 'Username' in df that contains only the username part of each email address.

The username is the part of the email address that appears before the '@' symbol. For example, if the email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your function should extract the username from each email address and store it in the new 'Username' column.

In [None]:
import pandas as pd

def extract_username(df):
    # Split email addresses at '@' symbol and select the first part (the username)
    df['Username'] = df['Email'].str.split('@').str[0]
    
    return df

# Example usage
# Assuming 'df' is your DataFrame with a column 'Email'

# Create a sample DataFrame
data = {'Email': ['john.doe@example.com', 'jane.doe@example.com', 'bob.smith@example.com']}
df = pd.DataFrame(data)

# Call the function to extract usernames
df_with_username = extract_username(df)

print(df_with_username)

Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects
all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The
function should return a new DataFrame that contains only the selected rows.
For example, if df contains the following values:


A B C
0 3 5 1
1 8 2 7
2 6 9 4
3 2 3 5
4 9 1 2

Your function should select the following rows: A B C
1 8 2 7
4 9 1 2
The function should return a new DataFrame that contains only the selected rows.

In [None]:
import pandas as pd

def select_rows(df):
    # Select rows where 'A' is greater than 5 and 'B' is less than 10
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    
    return selected_rows

# Example usage
# Assuming 'df' is your DataFrame with columns 'A', 'B', and 'C'

# Create a sample DataFrame
data = {'A': [3, 8, 6, 2, 9],
        'B': [5, 2, 9, 3, 1],
        'C': [1, 7, 4, 5, 2]}
df = pd.DataFrame(data)

# Call the function to select rows
selected_df = select_rows(df)

print(selected_df)

Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean, median, and standard deviation of the values in the 'Values' column.

In [None]:
import pandas as pd

def calculate_stats(df):
    # Calculate mean, median, and standard deviation
    mean_value = df['Values'].mean()
    median_value = df['Values'].median()
    std_value = df['Values'].std()
    
    return mean_value, median_value, std_value

# Example usage
# Assuming 'df' is your DataFrame with a column 'Values'

# Create a sample DataFrame
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Call the function to calculate statistics
mean_value, median_value, std_value = calculate_stats(df)

print("Mean:", mean_value)
print("Median:", median_value)
print("Standard Deviation:", std_value)

Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days for each row in the DataFrame. The moving average should be calculated using a window of size 7 and should include the current day.

In [None]:
import pandas as pd

def add_moving_average_column(df):
    # Convert 'Date' column to datetime data type
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Sort DataFrame by date
    df = df.sort_values(by='Date')
    
    # Calculate moving average using a rolling window of size 7
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    
    return df

# Example usage
# Assuming 'df' is your DataFrame with columns 'Date' and 'Sales'

# Create a sample DataFrame
data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'],
        'Sales': [100, 200, 300, 400, 500]}
df = pd.DataFrame(data)

# Call the function to add the 'MovingAverage' column
df_with_ma = add_moving_average_column(df)

print(df_with_ma)

Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g. Monday, Tuesday) corresponding to each date in the 'Date' column.

For example, if df contains the following values:
Date
0 2023-01-01
1 2023-01-02
2 2023-01-03
3 2023-01-04
4 2023-01-05
Your function should create the following DataFrame:

Date Weekday
0 2023-01-01 Sunday
1 2023-01-02 Monday
2 2023-01-03 Tuesday
3 2023-01-04 Wednesday
4 2023-01-05 Thursday
The function should return the modified DataFrame.

In [None]:
import pandas as pd

def add_weekday_column(df):
    # Convert 'Date' column to datetime data type
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Extract weekday name and create new column 'Weekday'
    df['Weekday'] = df['Date'].dt.strftime('%A')
    
    return df

# Example usage
# Assuming 'df' is your DataFrame with a column 'Date'

# Create a sample DataFrame
data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']}
df = pd.DataFrame(data)

# Call the function to add the 'Weekday' column
df_with_weekday = add_weekday_column(df)

print(df_with_weekday)

Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

In [None]:
import pandas as pd

def select_rows_by_date(df, start_date, end_date):
    # Convert 'Date' column to datetime data type
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Filter rows based on the date range
    mask = (df['Date'] >= start_date) & (df['Date'] <= end_date)
    selected_rows = df.loc[mask]
    
    return selected_rows

# Example usage
# Assuming 'df' is your DataFrame with a column 'Date'

# Create a sample DataFrame
data = {'Date': ['2023-01-01', '2023-01-15', '2023-01-20', '2023-02-01'],
        'Value': [10, 20, 30, 40]}
df = pd.DataFrame(data)

# Call the function to select rows between '2023-01-01' and '2023-01-31'
start_date = '2023-01-01'
end_date = '2023-01-31'
selected_rows = select_rows_by_date(df, start_date, end_date)

print(selected_rows)

        Date  Value
0 2023-01-01     10
1 2023-01-15     20
2 2023-01-20     30


Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to be imported?

pandas library using the following block of code
import pandas as pd