Q1. List any five functions of the pandas library with execution.

#Answer

Here are five functions from the pandas library along with their executions:

1. `read_csv()`: This function is used to read data from a CSV file and create a pandas DataFrame.

In [8]:
import pandas as pd

# Read a CSV file and create a DataFrame
data = pd.read_csv('data.csv')
print(data.head())


     name           mail   phone_num
0  sanjay   as@gmail.com   455455554
1   sannn  sds@gmail.com  7878878878


2. `head()`: This function is used to display the first few rows of a DataFrame.


In [9]:
import pandas as pd

# Create a DataFrame
data = {'Name': ['John', 'Alice', 'Bob', 'Emma', 'Alex'],
        'Age': [25, 30, 35, 40, 45],
        'City': ['New York', 'London', 'Paris', 'Sydney', 'Tokyo']}
df = pd.DataFrame(data)

# Display the first 3 rows of the DataFrame
print(df.head(3))


    Name  Age      City
0   John   25  New York
1  Alice   30    London
2    Bob   35     Paris


3. `info()`: This function provides a summary of the DataFrame, including the column names, data types, and non-null counts.


In [10]:
import pandas as pd

# Create a DataFrame
data = {'Name': ['John', 'Alice', 'Bob', 'Emma', 'Alex'],
        'Age': [25, 30, 35, 40, 45],
        'City': ['New York', 'London', 'Paris', 'Sydney', 'Tokyo']}
df = pd.DataFrame(data)

# Display information about the DataFrame
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    5 non-null      object
 1   Age     5 non-null      int64 
 2   City    5 non-null      object
dtypes: int64(1), object(2)
memory usage: 248.0+ bytes


4. `groupby()`: This function is used for grouping data based on one or more columns, allowing for aggregation and analysis.


In [11]:
import pandas as pd

# Create a DataFrame
data = {'Name': ['John', 'Alice', 'Bob', 'Emma', 'Alex'],
        'Age': [25, 30, 35, 40, 45],
        'City': ['New York', 'London', 'Paris', 'Sydney', 'Tokyo']}
df = pd.DataFrame(data)

# Group the data by 'City' column and calculate the mean age
grouped_data = df.groupby('City')['Age'].mean()
print(grouped_data)

City
London      30.0
New York    25.0
Paris       35.0
Sydney      40.0
Tokyo       45.0
Name: Age, dtype: float64


5. `fillna()`: This function is used to fill missing or NaN (null) values in a DataFrame with a specified value or method.


In [13]:
import pandas as pd
import numpy as np

# Create a DataFrame with missing values
data = {'Name': ['John', 'Alice', 'Bob', np.nan, 'Alex'],
        'Age': [25, 30, np.nan, 40, 45],
        'City': ['New York', 'London', 'Paris', 'Sydney', 'Tokyo']}
df = pd.DataFrame(data)

# Fill missing values in 'Age' column with the mean age
df['Age'].fillna(df['Age'].mean(), inplace=True)
print(df)

    Name   Age      City
0   John  25.0  New York
1  Alice  30.0    London
2    Bob  35.0     Paris
3    NaN  40.0    Sydney
4   Alex  45.0     Tokyo


                      -------------------------------------------------------------------

Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the
DataFrame with a new index that starts from 1 and increments by 2 for each row.

#Answer

We can use the reset_index() function in pandas to re-index the DataFrame with a new index that starts from 1 and increments by 2 for each row. Here's a Python function that accomplishes this:

In [14]:
import pandas as pd

def reindex_dataframe(df):
    new_index = pd.Index(range(1, len(df) * 2, 2))
    df = df.reset_index(drop=True)
    df.index = new_index
    return df

# Example usage
data = {'A': [10, 20, 30, 40],
        'B': [50, 60, 70, 80],
        'C': [90, 100, 110, 120]}
df = pd.DataFrame(data)

# Re-index the DataFrame
df_reindexed = reindex_dataframe(df)
print(df_reindexed)


    A   B    C
1  10  50   90
3  20  60  100
5  30  70  110
7  40  80  120


                      -------------------------------------------------------------------

Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that
iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The
function should print the sum to the console.
For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function should
calculate and print the sum of the first three values, which is 60.

#Answer

We can use the iterrows() function in pandas to iterate over the DataFrame and calculate the sum of the first three values in the 'Values' column. Here's a Python function that achieves this:

In [15]:
import pandas as pd

def calculate_sum(df):
    total_sum = 0
    for index, row in df.iterrows():
        if index < 3:
            total_sum += row['Values']
    print("Sum of the first three values:", total_sum)

# Example usage
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Calculate the sum
calculate_sum(df)


Sum of the first three values: 60


                      -------------------------------------------------------------------

Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column
'Word_Count' that contains the number of words in each row of the 'Text' column.

#Answer

We can use the apply() function in pandas along with a lambda function to create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column. Here's a Python function that accomplishes this:

In [16]:
import pandas as pd

def add_word_count(df):
    df['Word_Count'] = df['Text'].apply(lambda x: len(str(x).split()))
    return df

# Example usage
data = {'Text': ['This is a sample sentence.',
                 'Another sentence with more words.',
                 'A short phrase.']}
df = pd.DataFrame(data)

# Add the 'Word_Count' column
df = add_word_count(df)
print(df)


                                Text  Word_Count
0         This is a sample sentence.           5
1  Another sentence with more words.           5
2                    A short phrase.           3


                      -------------------------------------------------------------------

Q5. How are DataFrame.size() and DataFrame.shape() different?

#Answer

The DataFrame.size and DataFrame.shape are both attributes of a pandas DataFrame in Python, but they provide different information about the DataFrame:

DataFrame.size:

DataFrame.size returns the total number of elements in the DataFrame, which is equal to the product of the number of rows and the number of columns.
It provides the count of all values in the DataFrame, including both non-null and null values.
The size attribute does not take any arguments and does not require parentheses when accessing it.
DataFrame.shape:

DataFrame.shape returns a tuple that represents the dimensions of the DataFrame.
It provides the number of rows followed by the number of columns in the DataFrame.
The shape attribute does not take any arguments, and parentheses are required when accessing it to retrieve the tuple.

                       -------------------------------------------------------------------

Q6. Which function of pandas do we use to read an excel file?

#Answer

In pandas, the function commonly used to read an Excel file is read_excel(). This function allows you to read data from an Excel file and create a pandas DataFrame.

                        -------------------------------------------------------------------

Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email
addresses in the format 'username@domain.com'. Write a Python function that creates a new column
'Username' in df that contains only the username part of each email address.
The username is the part of the email address that appears before the '@' symbol. For example, if the
email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your
function should extract the username from each email address and store it in the new 'Username'
column.

#Answer

We can use the str.split() method in pandas along with a lambda function to extract the username part from each email address and create a new column 'Username' in the DataFrame. Here's a Python function that achieves this:

In [18]:
import pandas as pd

def extract_username(df):
    df['Username'] = df['Email'].apply(lambda x: x.split('@')[0])
    return df

# Example usage
data = {'Email': ['john.doe@example.com', 'alice.smith@example.com', 'bob.johnson@example.com']}
df = pd.DataFrame(data)

# Extract the username
df = extract_username(df)
print(df)


                     Email     Username
0     john.doe@example.com     john.doe
1  alice.smith@example.com  alice.smith
2  bob.johnson@example.com  bob.johnson


                        -------------------------------------------------------------------

In [None]:
Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects
all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The
function should return a new DataFrame that contains only the selected rows.
For example, if df contains the following values:
A B C
0 3 5 1
1 8 2 7
2 6 9 4
3 2 3 5
4 9 1 2

Your function should select the following rows: A B C
1 8 2 7
4 9 1 2
The function should return a new DataFrame that contains only the selected rows.

#Answer 

We can use conditional indexing in pandas to select rows based on specific conditions. Here's a Python function that selects rows from the DataFrame df where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10:

In [21]:
import pandas as pd

def select_rows(df):
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    return selected_rows

# Example usage
data = {'A': [3, 8, 6, 2, 9],
        'B': [5, 2, 9, 3, 1],
        'C': [1, 7, 4, 5, 2]}
df = pd.DataFrame(data)

# Select rows based on conditions
selected_df = select_rows(df)
print(selected_df)


   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


                        -------------------------------------------------------------------

Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean,
median, and standard deviation of the values in the 'Values' column.

#Answer

We can use the mean(), median(), and std() functions in pandas to calculate the mean, median, and standard deviation of the values in the 'Values' column of a DataFrame. Here's a Python function that accomplishes this:

In [22]:
import pandas as pd

def calculate_statistics(df):
    values = df['Values']
    mean_value = values.mean()
    median_value = values.median()
    std_value = values.std()
    return mean_value, median_value, std_value

# Example usage
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Calculate statistics
mean, median, std = calculate_statistics(df)
print("Mean:", mean)
print("Median:", median)
print("Standard Deviation:", std)


Mean: 30.0
Median: 30.0
Standard Deviation: 15.811388300841896


                        -------------------------------------------------------------------

Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to
create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days
for each row in the DataFrame. The moving average should be calculated using a window of size 7 and
should include the current day.

#Answer

To create a new column 'MovingAverage' in the DataFrame 'df' that contains the moving average of the sales for the past 7 days, you can use the rolling() function in pandas. Here's a Python function that achieves this:

In [23]:
import pandas as pd

def calculate_moving_average(df):
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    return df

# Example usage
data = {'Sales': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
        'Date': pd.date_range(start='2022-01-01', periods=10)}
df = pd.DataFrame(data)

# Calculate the moving average
df = calculate_moving_average(df)
print(df)


   Sales       Date  MovingAverage
0     10 2022-01-01           10.0
1     20 2022-01-02           15.0
2     30 2022-01-03           20.0
3     40 2022-01-04           25.0
4     50 2022-01-05           30.0
5     60 2022-01-06           35.0
6     70 2022-01-07           40.0
7     80 2022-01-08           50.0
8     90 2022-01-09           60.0
9    100 2022-01-10           70.0


                        -------------------------------------------------------------------

In [None]:
Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new
column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g.
Monday, Tuesday) corresponding to each date in the 'Date' column.
For example, if df contains the following values:
Date
0 2023-01-01
1 2023-01-02
2 2023-01-03
3 2023-01-04
4 2023-01-05
Your function should create the following DataFrame:

Date Weekday
0 2023-01-01 Sunday
1 2023-01-02 Monday
2 2023-01-03 Tuesday
3 2023-01-04 Wednesday
4 2023-01-05 Thursday
The function should return the modified DataFrame.

#Answer

To create a new column 'Weekday' in the DataFrame 'df' that contains the weekday name corresponding to each date in the 'Date' column, you can use the dt accessor in pandas to access the datetime properties of the 'Date' column. Here's a Python function that accomplishes this:

In [26]:
import pandas as pd

def add_weekday(df):
    df['Weekday'] = df['Date'].dt.day_name()
    return df

# Example usage
data = {'Date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'])}
df = pd.DataFrame(data)

# Add the 'Weekday' column
df = add_weekday(df)
print(df)


        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


                        -------------------------------------------------------------------