### Q1. List any five functions of the pandas library with execution.

### Ans:
5 common functions along with their execution examples:

1. read_csv(): Used to read data from a CSV file into a DataFrame.

In [12]:
import pandas as pd
data = pd.read_csv('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv')

2. head(): Displays the first few rows of a DataFrame to inspect the data.

In [13]:
data.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


3. info(): Provides information about the DataFrame, including data types and non-null values.

In [14]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


4. describe(): Generates summary statistics of the numeric columns in the DataFrame.

In [15]:
data.describe()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,891.0,891.0,891.0,714.0,891.0,891.0,891.0
mean,446.0,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208
std,257.353842,0.486592,0.836071,14.526497,1.102743,0.806057,49.693429
min,1.0,0.0,1.0,0.42,0.0,0.0,0.0
25%,223.5,0.0,2.0,20.125,0.0,0.0,7.9104
50%,446.0,0.0,3.0,28.0,0.0,0.0,14.4542
75%,668.5,1.0,3.0,38.0,1.0,0.0,31.0
max,891.0,1.0,3.0,80.0,8.0,6.0,512.3292


5. groupby(): Allows you to group data based on one or more columns and perform operations like aggregation.

In [17]:
g = data.groupby('Pclass')
g.mean()

  g.mean()


Unnamed: 0_level_0,PassengerId,Survived,Age,SibSp,Parch,Fare
Pclass,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,461.597222,0.62963,38.233441,0.416667,0.356481,84.154687
2,445.956522,0.472826,29.87763,0.402174,0.380435,20.662183
3,439.154786,0.242363,25.14062,0.615071,0.393075,13.67555


### Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the DataFrame with a new index that starts from 1 and increments by 2 for each row.

### Ans: 
We can create a new index for a Pandas DataFrame that starts from 1 and increments by 2 for each row using the reset_index() function along with a custom index creation. Here's a Python function to achieve this:

In [18]:
def reindex_with_increment(df):
    new_index = pd.Series(range(1, 2 * len(df) + 1, 2), name='NewIndex')
    
    df_reset = df.reset_index(drop=True)
    df_reset = pd.concat([new_index, df_reset], axis=1)
    
    return df_reset

data = {'A': [10, 20, 30],
        'B': [40, 50, 60],
        'C': [70, 80, 90]}
df = pd.DataFrame(data)

result = reindex_with_increment(df)
print(result)


   NewIndex   A   B   C
0         1  10  40  70
1         3  20  50  80
2         5  30  60  90


### Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The function should print the sum to the console. 
### For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function should calculate and print the sum of the first three values, which is 60.

### Ans:

In [20]:
import pandas as pd

def calculate_sum_of_first_three(df):
    if 'Values' not in df.columns:
        print("The 'Values' column does not exist in the DataFrame.")
        return

    first_three_values = df['Values'].head(3)
    sum_of_first_three = first_three_values.sum()

    print("Sum of the first three values:", sum_of_first_three)


data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

calculate_sum_of_first_three(df)


Sum of the first three values: 60


### Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column.

### Ans:

In [21]:
import pandas as pd

def add_word_count_column(df):
    # Check if the 'Text' column exists in the DataFrame
    if 'Text' not in df.columns:
        print("The 'Text' column does not exist in the DataFrame.")
        return

    # Use the str.split() method to split each text row into words and count them
    df['Word_Count'] = df['Text'].str.split().apply(len)

# Example usage:
# Create a sample DataFrame 'df' with a 'Text' column
data = {'Text': ["This is a sample sentence.",
                 "Count the words in this text.",
                 "Pandas is great for data analysis."]}
df = pd.DataFrame(data)

# Call the function to add the 'Word_Count' column
add_word_count_column(df)

# Print the updated DataFrame with the 'Word_Count' column
print(df)


                                 Text  Word_Count
0          This is a sample sentence.           5
1       Count the words in this text.           6
2  Pandas is great for data analysis.           6


### Q5. How are DataFrame.size() and DataFrame.shape() different?

### Ans:
In Pandas, DataFrame.size and DataFrame.shape are two different attributes that provide information about the dimensions and size of a DataFrame.

DataFrame.size is an attribute that returns the total number of elements in the DataFrame, which is equal to the product of the number of rows and columns.

DataFrame.shape is an attribute that returns a tuple representing the dimensions of the DataFrame.

### Q6. Which function of pandas do we use to read an excel file?

### Ans:



To Read an Excel file into a DataFrame

df = pd.read_excel('your_excel_file.xlsx')


In [24]:
df = pd.read_excel('players_data.xlsx')
df

### Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email addresses in the format 'username@domain.com'. Write a Python function that creates a new column 'Username' in df that contains only the username part of each email address.
### The username is the part of the email address that appears before the '@' symbol. For example, if the email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your function should extract the username from each email address and store it in the new 'Username' column.

### Ans:

In [25]:
import pandas as pd

def extract_username(df):
    # Check if the 'Email' column exists in the DataFrame
    if 'Email' not in df.columns:
        print("The 'Email' column does not exist in the DataFrame.")
        return

    # Use the str.split() method to split each email address at '@' and extract the username part
    df['Username'] = df['Email'].str.split('@').str[0]

# Example usage:
# Create a sample DataFrame 'df' with an 'Email' column
data = {'Email': ["john.doe@example.com",
                 "jane.smith@example.com",
                 "bob.johnson@example.com"]}
df = pd.DataFrame(data)

# Call the function to extract and add the 'Username' column
extract_username(df)

# Print the updated DataFrame with the 'Username' column
print(df)


                     Email     Username
0     john.doe@example.com     john.doe
1   jane.smith@example.com   jane.smith
2  bob.johnson@example.com  bob.johnson


### Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The function should return a new DataFrame that contains only the selected rows.
### For example, if df contains the following values:
### A B C
### 0 3 5 1
### 1 8 2 7
### 2 6 9 4
### 3 2 3 5
### 4 9 1 2
### Your function should select the following rows: A B C
### 1 8 2 7
### 4 9 1 2
### The function should return a new DataFrame that contains only the selected rows.

### Ans:

In [26]:
import pandas as pd

def select_rows(df):
    # Use boolean indexing to select rows that meet the specified conditions
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    
    return selected_rows

# Example usage:
# Create a sample DataFrame 'df' with columns 'A', 'B', and 'C'
data = {'A': [3, 8, 6, 2, 9],
        'B': [5, 2, 9, 3, 1],
        'C': [1, 7, 4, 5, 2]}
df = pd.DataFrame(data)

# Call the function to select rows based on the conditions
selected_df = select_rows(df)

# Print the selected DataFrame
print(selected_df)


   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


### Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean, median, and standard deviation of the values in the 'Values' column.

### Ans:

In [27]:
import pandas as pd

def calculate_stats(df):
    # Check if the 'Values' column exists in the DataFrame
    if 'Values' not in df.columns:
        print("The 'Values' column does not exist in the DataFrame.")
        return

    # Calculate mean, median, and standard deviation
    mean_value = df['Values'].mean()
    median_value = df['Values'].median()
    std_deviation = df['Values'].std()

    return mean_value, median_value, std_deviation

# Example usage:
# Create a sample DataFrame 'df' with a 'Values' column
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Call the function to calculate statistics
mean, median, std_dev = calculate_stats(df)

# Print the calculated statistics
print("Mean:", mean)
print("Median:", median)
print("Standard Deviation:", std_dev)


Mean: 30.0
Median: 30.0
Standard Deviation: 15.811388300841896


### Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days for each row in the DataFrame. The moving average should be calculated using a window of size 7 and should include the current day.

### Ans:

In [28]:
import pandas as pd

def calculate_moving_average(df):
    # Check if the 'Sales' and 'Date' columns exist in the DataFrame
    if 'Sales' not in df.columns or 'Date' not in df.columns:
        print("The required columns ('Sales' and 'Date') do not exist in the DataFrame.")
        return

    # Sort the DataFrame by 'Date' in ascending order (if it's not already sorted)
    df = df.sort_values(by='Date')

    # Calculate the moving average using a rolling window of size 7
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()

    return df

# Example usage:
# Create a sample DataFrame 'df' with 'Sales' and 'Date' columns
data = {'Sales': [100, 150, 200, 180, 250, 220, 300, 280, 350, 320],
        'Date': pd.date_range(start='2023-08-01', periods=10)}
df = pd.DataFrame(data)

# Call the function to calculate the moving average and add the 'MovingAverage' column
df = calculate_moving_average(df)

# Print the updated DataFrame
print(df)


   Sales       Date  MovingAverage
0    100 2023-08-01     100.000000
1    150 2023-08-02     125.000000
2    200 2023-08-03     150.000000
3    180 2023-08-04     157.500000
4    250 2023-08-05     176.000000
5    220 2023-08-06     183.333333
6    300 2023-08-07     200.000000
7    280 2023-08-08     225.714286
8    350 2023-08-09     254.285714
9    320 2023-08-10     271.428571


### Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g. Monday, Tuesday) corresponding to each date in the 'Date' column.
### For example, if df contains the following values:
### Date
### 0 2023-01-01
### 1 2023-01-02
### 2 2023-01-03
### 3 2023-01-04
### 4 2023-01-05
### Your function should create the following DataFrame:
### Date Weekday
### 0 2023-01-01 Sunday
### 1 2023-01-02 Monday
### 2 2023-01-03 Tuesday
### 3 2023-01-04 Wednesday
### 4 2023-01-05 Thursday
### The function should return the modified DataFrame.

### Ans:

In [29]:
import pandas as pd

def add_weekday_column(df):
    # Check if the 'Date' column exists in the DataFrame
    if 'Date' not in df.columns:
        print("The 'Date' column does not exist in the DataFrame.")
        return

    # Convert the 'Date' column to a datetime data type if it's not already
    df['Date'] = pd.to_datetime(df['Date'])

    # Extract the weekday names and add them to a new 'Weekday' column
    df['Weekday'] = df['Date'].dt.strftime('%A')

    return df

# Example usage:
# Create a sample DataFrame 'df' with a 'Date' column
data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']}
df = pd.DataFrame(data)

# Call the function to add the 'Weekday' column
df = add_weekday_column(df)

# Print the modified DataFrame
print(df)


        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


### Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

### Ans:

In [30]:
import pandas as pd

def select_rows_between_dates(df):
    # Check if the 'Date' column exists in the DataFrame
    if 'Date' not in df.columns:
        print("The 'Date' column does not exist in the DataFrame.")
        return

    # Convert the 'Date' column to a datetime data type if it's not already
    df['Date'] = pd.to_datetime(df['Date'])

    # Define the date range
    start_date = pd.to_datetime('2023-01-01')
    end_date = pd.to_datetime('2023-01-31')

    # Use boolean indexing to select rows within the specified date range
    selected_rows = df[(df['Date'] >= start_date) & (df['Date'] <= end_date)]

    return selected_rows

# Example usage:
# Create a sample DataFrame 'df' with a 'Date' column containing timestamps
data = {'Date': ['2023-01-05 10:00:00', '2023-01-15 14:30:00', '2023-02-02 08:45:00']}
df = pd.DataFrame(data)

# Call the function to select rows within the date range
selected_df = select_rows_between_dates(df)

# Print the selected DataFrame
print(selected_df)

                 Date
0 2023-01-05 10:00:00
1 2023-01-15 14:30:00


### Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to be imported?

### Ans:
To use the basic functions of Pandas, the first and foremost library that needs to be imported is, of course, the Pandas library itself. You should import Pandas using the import statement, typically with the alias pd for convenience, which is a common convention in the Python data science community

In [31]:
import pandas as pd
