# Q1. List any five functions of the pandas library with execution.

Here are five functions of the Pandas library with example code:

The DataFrame() function creates a pandas DataFrame, which is a 2-dimensional labeled data structure with columns of potentially different types. The DataFrame can be created from a variety of sources, such as a dictionary, a NumPy array, or a CS

In [1]:
import pandas as pd

# Creating a DataFrame from a dictionary
data = {"Name": ["Alice", "Bob", "Charlie"], "Age": [25, 30, 35]}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
2,Charlie,35


describe(): The describe() function generates descriptive statistics of a pandas DataFrame, such as count, mean, standard deviation, and more. This function is useful for quickly understanding the distribution and range of values in a dataset.
Example:

In [3]:
import seaborn as sns

df = sns.load_dataset('tips')
df.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


3)pivot_table(): This function is used to create a spreadsheet-style pivot table based on a Pandas DataFrame, which is useful for summarizing and aggregating data.

Example

In [5]:
import pandas as pd

# Creating a DataFrame
data = {"Name": ["Alice", "Bob", "Charlie", "David", "Ella", "Frank"],
        "Sex": ["F", "M", "M", "M", "F", "M"],
        "Age": [25, 30, 35, 40, 25, 30],
        "Salary": [50000, 60000, 70000, 80000, 55000, 65000]}
df = pd.DataFrame(data)

# Creating a pivot table
pivot_df = pd.pivot_table(df, index=["Sex", "Age"], values="Salary", aggfunc="mean")

pivot_df

Unnamed: 0_level_0,Unnamed: 1_level_0,Salary
Sex,Age,Unnamed: 2_level_1
F,25,52500
M,30,62500
M,35,70000
M,40,80000


4)merge(): The merge() function merges two pandas DataFrames based on one or more common columns. This function is useful for combining data from multiple sources based on a shared key.

In [6]:
import pandas as pd

# Creating two DataFrames
df1 = pd.DataFrame({"Name": ["Alice", "Bob"], "Age": [25, 30]})
df2 = pd.DataFrame({"Name": ["Bob", "Charlie"], "Salary": [50000, 60000]})

# Merging two DataFrames based on a common column
merged_df = pd.merge(df1, df2, on="Name")
merged_df

Unnamed: 0,Name,Age,Salary
0,Bob,30,50000


5)apply(): The apply() function applies a given function to each element of a pandas DataFrame or Series. This function is useful for transforming data in a DataFrame or Series according to a specified operatiom

In [11]:
import pandas as pd

# Creating a DataFrame
data = {"Name": ["Alice", "Bob", "Charlie"],
        "Age": [25, 30, 35]}
df = pd.DataFrame(data)




# Applying the function to the "Age" column of the DataFrame
df["Age"] = df["Age"].apply(lambda x : x+20)
df

Unnamed: 0,Name,Age
0,Alice,45
1,Bob,50
2,Charlie,55


# Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the DataFrame with a new index that starts from 1 and increments by 2 for each row.

Here's a Python function that re-indexes the DataFrame df with a new index that starts from 1 and increments by 2 for each row:

In [13]:
import pandas as pd

# Re-indexing the DataFrame using the reindex_df() function
def reindex_df(df):
    new_index = range(1, 2*len(df)+1, 2)
    df = df.set_index(pd.Index(new_index))
    return df

#Driver Code
# Creating a sample DataFrame
df = pd.DataFrame({'A': [10, 20, 30, 40],
                   'B': [50, 60, 70, 80],
                   'C': [90, 100, 110, 120]})
df = reindex_df(df)
print(df)

    A   B    C
1  10  50   90
3  20  60  100
5  30  70  110
7  40  80  120


# Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. Thefunction should print the sum to the console.For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function shouldcalculate and print the sum of the first three values, which is 60.

In [14]:
def sum_first_three_values(df):
    values_col = df['Values']
    first_three_values = values_col.iloc[:3]
    sum_first_three = sum(first_three_values)
    print(f"The sum of the first three values in the 'Values' column is {sum_first_three}.")
    
#Driver Code
# Creating a sample DataFrame
df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})

# Calculating the sum of the first three values in the 'Values' column using the sum_first_three_values() function
sum_first_three_values(df)

The sum of the first three values in the 'Values' column is 60.


# Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column.

In [16]:
def add_word_count_column(df):
    # Split the 'Text' column on whitespace characters and count the resulting list
    word_count = df['Text'].str.split().apply(len)

    # Add the new 'Word_Count' column to the DataFrame
    df['Word_Count'] = word_count
    return df

#Driver code
# Creating a sample DataFrame
df = pd.DataFrame({'Text': ['This is a sample sentence',
                            'Here is another sentence',
                            'One more sentence for the example']})

# Adding 'Word_Count' column to the DataFrame
df = add_word_count_column(df)

# Displaying the updated DataFrame
df

Unnamed: 0,Text,Word_Count
0,This is a sample sentence,5
1,Here is another sentence,4
2,One more sentence for the example,6


# Q5. How are DataFrame.size() and DataFrame.shape() different?

 DataFrame.size() and DataFrame.shape() are both methods used to get information about the shape of a Pandas DataFrame, but they return different information.

DataFrame.size() returns the total number of elements in the DataFrame, which is equal to the product of the number of rows and the number of columns. It is equivalent to DataFrame.shape[0] * DataFrame.shape[1].

On the other hand, DataFrame.shape() returns a tuple containing the number of rows and the number of columns in the DataFrame, respectively. For example, if a DataFrame has 5 rows and 3 columns, DataFrame.shape() would return (5, 3).

In summary, DataFrame.size() returns the total number of elements in the DataFrame, while DataFrame.shape() returns a tuple containing the number of rows and columns in the DataFrame.

Here are examples of how to use DataFrame.size() and DataFrame.shape():

# Q6. Which function of pandas do we use to read an excel file?


We use the read_excel() function of pandas to read an excel file. This function reads the data from an excel file and returns a pandas DataFrame.

The basic syntax of read_excel() function is as follows:

In [None]:
pd.read_excel('file_name.xlsx', sheet_name='Sheet1')

# Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains emailaddresses in the format 'username@domain.com'. Write a Python function that creates a new column'Username' in df that contains only the username part of each email address.

In [21]:
import pandas as pd

# Function to extract username from email address
def extract_username(email):
    return email.split('@')[0]

#Driver Code
# Creating a sample DataFrame
df = pd.DataFrame({'Email': ['officialaakash@example.com', 'singhvirat@example.com', 'jimmy.smith@example.com']})

# Creating a new 'Username' column by applying the 'extract_username' function to the 'Email' column
df['Username'] = df['Email'].apply(extract_username)

print(df)

                        Email        Username
0  officialaakash@example.com  officialaakash
1      singhvirat@example.com      singhvirat
2     jimmy.smith@example.com     jimmy.smith


# Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. Thefunction should return a new DataFrame that contains only the selected rows. For example, if dfcontainsthe following values:

A B C 0 3 5 1 1 8 2 7 2 6 9 4 3 2 3 5 4 9 1 2

Your function should select the following rows: A B C 1 8 2 7 4 9 1 2

The function should return a new DataFrame that contains only the selected rows.

Here's the code for selecting the desired rows:

In [22]:
import pandas as pd

#Function for returing the new dataFrame of selected rows
def select_rows(df):
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    return selected_rows


#Driver code
# Creating a sample DataFrame
df = pd.DataFrame({'A': [3, 8, 6, 2, 9],
                   'B': [5, 2, 9, 3, 1],
                   'C': [1, 7, 4, 5, 2]})

new_df = select_rows(df)

print(new_df)

   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


# Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean,median, and standard deviation of the values in the 'Values' column.

In [23]:
import pandas as pd
import numpy as np

#Function to create mean, median, std
def calculate_stats(df):
    values_col = df['Values']
    mean = np.mean(values_col)
    median = np.median(values_col)
    std_dev = np.std(values_col)
    print(f"Mean: {mean}, Median: {median}, Standard Deviation: {std_dev}")
    
#Driver Code   

# Creating a sample DataFrame
df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})

# Calculating mean, median, and standard deviation of the values in the 'Values' column using the calculate_stats() function
calculate_stats(df)

Mean: 30.0, Median: 30.0, Standard Deviation: 14.142135623730951


# Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 daysfor each row in the DataFrame. The moving average should be calculated using a window of size 7 andshould include the current day.

In [24]:
import pandas as pd

def moving_average(df):
    # Sort the DataFrame by date
    df = df.sort_values(by='Date')
    
    # Calculate the rolling mean with window size 7
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    
    return df


#Driver Code
# Create a sample DataFrame
df = pd.DataFrame({'Sales': [10, 15, 20, 25, 30, 35, 40, 45, 50],
                   'Date': pd.date_range('2022-01-01', periods=9)})

# Calculate the moving average
df = moving_average(df)

# Print the result
print(df)

   Sales       Date  MovingAverage
0     10 2022-01-01           10.0
1     15 2022-01-02           12.5
2     20 2022-01-03           15.0
3     25 2022-01-04           17.5
4     30 2022-01-05           20.0
5     35 2022-01-06           22.5
6     40 2022-01-07           25.0
7     45 2022-01-08           30.0
8     50 2022-01-09           35.0


# Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a newcolumn 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g.Monday, Tuesday) corresponding to each date in the 'Date' column.

In [26]:
import pandas as pd

# Function to add weekday column
def add_weekday_column(df):
    # Convert 'Date' column to datetime
    df['Date'] = pd.to_datetime(df['Date'])  # Add new 'Weekday' column with weekday name
    df['Weekday'] = df['Date'].dt.day_name()
    return df

#Driver Code
# Create example DataFrame
df = pd.DataFrame({'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']})

# Call function to add weekday column to DataFrame
df = add_weekday_column(df)

# Print modified DataFrame
print(df)

        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


# Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Pythonfunction to select all rows where the date is between '2023-01-01' and '2023-01-31'.

In [28]:
import pandas as pd

# create function to select rows within date range
def select_dates(df, start_date, end_date):
    return df[(df['Date'] >= start_date) & (df['Date'] <= end_date)]



# Driver Code
# create dataframe with date range
df = pd.DataFrame({
    'Date': pd.date_range(start='2023-01-01', end='2023-02-28', freq='D')
})

# select rows within date range
selected_df = select_dates(df, '2023-01-01', '2023-01-31')
print(selected_df)

         Date
0  2023-01-01
1  2023-01-02
2  2023-01-03
3  2023-01-04
4  2023-01-05
5  2023-01-06
6  2023-01-07
7  2023-01-08
8  2023-01-09
9  2023-01-10
10 2023-01-11
11 2023-01-12
12 2023-01-13
13 2023-01-14
14 2023-01-15
15 2023-01-16
16 2023-01-17
17 2023-01-18
18 2023-01-19
19 2023-01-20
20 2023-01-21
21 2023-01-22
22 2023-01-23
23 2023-01-24
24 2023-01-25
25 2023-01-26
26 2023-01-27
27 2023-01-28
28 2023-01-29
29 2023-01-30
30 2023-01-31


# Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs tobe imported?

In [None]:
import pandas as pd