In [None]:
# Ans-1

In [None]:
Sure, here are five functions of the pandas library along with examples of how to use them:

read_csv() - This function is used to read data from a CSV file and create a pandas DataFrame object.

In [None]:
import pandas as pd

# read a CSV file and create a DataFrame
df = pd.read_csv('data.csv')
print(df.head())

In [None]:
dropna() - This function is used to remove any rows with missing values (NaN) from a DataFrame.

In [None]:
import pandas as pd

# create a DataFrame with some missing values
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, None, 35]}
df = pd.DataFrame(data)

# drop any rows with missing values
df = df.dropna()
print(df)

In [None]:
groupby() - This function is used to group data in a DataFrame by one or more columns and perform aggregate functions on the groups.

In [None]:
import pandas as pd

# create a DataFrame with some data
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 35, 40], 'Gender': ['F', 'M', 'M', 'M']}
df = pd.DataFrame(data)

# group the data by Gender and calculate the mean age
grouped = df.groupby('Gender')
mean_age = grouped['Age'].mean()
print(mean_age)

In [None]:
merge() - This function is used to merge two DataFrames together based on a common column.

In [None]:
import pandas as pd

# create two DataFrames
df1 = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]})
df2 = pd.DataFrame({'Name': ['Bob', 'Charlie', 'David'], 'Occupation': ['Doctor', 'Engineer', 'Lawyer']})

# merge the two DataFrames on the 'Name' column
merged_df = pd.merge(df1, df2, on='Name')
print(merged_df)

In [None]:
pivot_table() - This function is used to create a pivot table from a DataFrame, which is a table that summarizes the data based on one or more columns.

In [None]:
import pandas as pd

# create a DataFrame with some data
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 35, 40], 'Gender': ['F', 'M', 'M', 'M'], 'Salary': [50000, 60000, 70000, 80000]}
df = pd.DataFrame(data)

# create a pivot table that shows the average salary by gender and age group
pivot = df.pivot_table(index='Gender', columns=pd.cut(df['Age'], [20, 30, 40]), values='Salary', aggfunc='mean')
print(pivot)

In [None]:
# Ans-2-

In [None]:
Python function that re-indexes the DataFrame with a new index that starts from 1 and increments by 2 for each row:

In [None]:
import pandas as pd

def reindex_df(df):
    # get the number of rows in the DataFrame
    num_rows = df.shape[0]
    
    # create a new index that starts from 1 and increments by 2 for each row
    new_index = pd.RangeIndex(start=1, stop=num_rows*2, step=2)
    
    # reindex the DataFrame with the new index
    df = df.reindex(new_index)
    
    return df

In [None]:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
df_reindexed = reindex_df(df)
print(df_reindexed)

In [None]:
# Ans-3-

In [None]:
Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column:

In [None]:
import pandas as pd

def sum_first_three_values(df):
    # get the first three values in the 'Values' column
    values = df['Values'][:3]
    
    # calculate the sum of the values
    total = values.sum()
    
    # print the sum to the console
    print(total)

In [None]:
df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})
sum_first_three_values(df)

In [None]:
# Ans-4

In [None]:
import pandas as pd

def add_word_count_column(df):
    # split the 'Text' column by whitespace to get a list of words for each row
    word_lists = df['Text'].str.split()
    
    # count the number of words in each row and create a new 'Word_Count' column
    df['Word_Count'] = [len(words) for words in word_lists]
    
    return df

In [None]:
df = pd.DataFrame({'Text': ['This is a sentence', 'This is another sentence', 'And this is a third sentence']})
df = add_word_count_column(df)
print(df)

In [None]:
# Ans-5

In [None]:
Both DataFrame.size() and DataFrame.shape() are methods in Pandas that provide information about the size of a DataFrame, but they differ in what they return:

DataFrame.size() returns the total number of elements in the DataFrame, which is equal to the number of rows multiplied by the number of columns. This method returns a single integer value.
DataFrame.shape() returns a tuple containing the number of rows and number of columns in the DataFrame, respectively. This method returns two integer values.
For example, consider the following DataFrame:

In [None]:
import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

In [None]:
>>> df.size
6

In [None]:
>>> df.shape
(3, 2)

In [None]:
This is a tuple containing the number of rows (3) and the number of columns (2) in the DataFrame.

So in summary, DataFrame.size() returns the total number of elements in the DataFrame as a single integer value, while DataFrame.shape() returns a tuple containing the number of rows and columns in the DataFrame.

In [None]:
# Ans-6

In [None]:
To read an Excel file into a Pandas DataFrame, you can use the read_excel() function in Pandas.

Here's an example:

In [None]:
import pandas as pd

# read the Excel file into a DataFrame
df = pd.read_excel('filename.xlsx')

# print the first 5 rows of the DataFrame
print(df.head())

In [None]:
In this example, filename.xlsx is the name of the Excel file you want to read. The read_excel() function reads the file and returns a DataFrame.

Note that the read_excel() function has many optional parameters that allow you to specify things like the sheet to read, which rows to skip, how to handle missing values, etc. You can find more information about these parameters in the Pandas documentation.

In [None]:
# Ans-7

In [None]:
Python function that extracts the username from each email address in the 'Email' column of a Pandas DataFrame and stores it in a new 'Username' column:

In [None]:
import pandas as pd

def extract_username(df):
    # create a new 'Username' column in the DataFrame
    df['Username'] = ''

    # iterate over the 'Email' column and extract the username for each email address
    for i, email in enumerate(df['Email']):
        username = email.split('@')[0]
        df.at[i, 'Username'] = username
    
    # return the updated DataFrame
    return df

In [None]:
In this function, we first create a new 'Username' column in the DataFrame with empty strings as the initial values. Then we iterate over the 'Email' column using a for loop and the enumerate() function to get the index and value of each element in the column. For each email address, we use the split() method to separate the username from the domain name by splitting the string at the '@' symbol, and then we assign the extracted username to the corresponding row in the 'Username' column using the at[] method. Finally, we return the updated DataFrame with the new 'Username' column.

You can call this function with your Pandas DataFrame as the input, like this:

In [None]:
# create a sample DataFrame
df = pd.DataFrame({'Email': ['john.doe@example.com', 'jane.smith@example.com']})

# call the extract_username function
df = extract_username(df)

# print the updated DataFrame
print(df)

In [None]:
# Ans-8

In [None]:
import pandas as pd

def select_rows(df):
    # use boolean indexing to select the rows where 'A' > 5 and 'B' < 10
    new_df = df[(df['A'] > 5) & (df['B'] < 10)]
    
    # return the new DataFrame
    return new_df

In [None]:
# create a sample DataFrame
df = pd.DataFrame({'A': [1, 6, 3, 8], 'B': [7, 2, 9, 4], 'C': [5, 6, 7, 8]})

# call the select_rows function
new_df = select_rows(df)

# print the new DataFrame
print(new_df)