# Q1. List any five functions of the pandas library with execution.

1) head: This function is used to display the first n rows of a DataFrame. By default, it shows the first 5 rows.

In [2]:
import pandas as pd

# Create a sample DataFrame
data = {'Name': ['John', 'Emma', 'Peter', 'Lisa', 'Mike'],
        'Age': [25, 28, 22, 30, 35]}
df = pd.DataFrame(data)

# Display the first 3 rows
print(df.head(3))


    Name  Age
0   John   25
1   Emma   28
2  Peter   22


2)  info: This function provides a summary of the DataFrame, including the column names, data types, and non-null counts.

In [3]:
import pandas as pd

# Create a sample DataFrame
data = {'Name': ['John', 'Emma', 'Peter', 'Lisa', 'Mike'],
        'Age': [25, 28, 22, 30, 35]}
df = pd.DataFrame(data)

# Get summary information about the DataFrame
print(df.info())


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    5 non-null      object
 1   Age     5 non-null      int64 
dtypes: int64(1), object(1)
memory usage: 208.0+ bytes
None


3) describe: This function generates descriptive statistics of the DataFrame, such as count, mean, standard deviation, minimum, maximum, and quartile values.

In [4]:
import pandas as pd

# Create a sample DataFrame
data = {'Name': ['John', 'Emma', 'Peter', 'Lisa', 'Mike'],
        'Age': [25, 28, 22, 30, 35]}
df = pd.DataFrame(data)

# Generate descriptive statistics
print(df.describe())


             Age
count   5.000000
mean   28.000000
std     4.949747
min    22.000000
25%    25.000000
50%    28.000000
75%    30.000000
max    35.000000


4) groupby: This function is used for grouping data based on specified criteria. It allows you to perform operations on specific groups within the DataFrame.

In [5]:
import pandas as pd

# Create a sample DataFrame
data = {'Name': ['John', 'Emma', 'Peter', 'Lisa', 'Mike'],
        'Department': ['IT', 'HR', 'IT', 'HR', 'IT'],
        'Salary': [5000, 6000, 4500, 5500, 4000]}
df = pd.DataFrame(data)

# Group the DataFrame by Department and calculate the average salary
grouped_df = df.groupby('Department')['Salary'].mean()
print(grouped_df)


Department
HR    5750.0
IT    4500.0
Name: Salary, dtype: float64


5) fillna: This function is used to fill missing values (NaN) in a DataFrame or Series with a specified value or a method like forward-fill or backward-fill.


In [6]:
import pandas as pd
import numpy as np

# Create a sample DataFrame with missing values
data = {'A': [1, np.nan, 3, np.nan, 5],
        'B': [np.nan, 2, np.nan, 4, np.nan]}
df = pd.DataFrame(data)

# Fill missing values with 0
filled_df = df.fillna(0)
print(filled_df)


     A    B
0  1.0  0.0
1  0.0  2.0
2  3.0  0.0
3  0.0  4.0
4  5.0  0.0


# Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [7]:
import pandas as pd

def reindex_dataframe(df):
    new_index = pd.RangeIndex(start=1, step=2, stop=len(df)*2)
    df = df.reset_index(drop=True)
    df.index = new_index
    return df


In [8]:
# Example usage
df = pd.DataFrame({'A': [10, 20, 30], 'B': [40, 50, 60], 'C': [70, 80, 90]})
print("Original DataFrame:")
print(df)

df_reindexed = reindex_dataframe(df)
print("\nReindexed DataFrame:")
print(df_reindexed)


Original DataFrame:
    A   B   C
0  10  40  70
1  20  50  80
2  30  60  90

Reindexed DataFrame:
    A   B   C
1  10  40  70
3  20  50  80
5  30  60  90


# Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The function should print the sum to the console.For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function should calculate and print the sum of the first three values, which is 60.

In [9]:
import pandas as pd

def calculate_sum_of_first_three(df):
    values = df['Values'].values[:3]  # Extract the first three values
    sum_of_first_three = sum(values)  # Calculate the sum
    print("Sum of the first three values:", sum_of_first_three)

# Example usage
df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})
print("Original DataFrame:")
print(df)

calculate_sum_of_first_three(df)


Original DataFrame:
   Values
0      10
1      20
2      30
3      40
4      50
Sum of the first three values: 60


# Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column.




In [10]:
import pandas as pd

def count_words(df):
    df['Word_Count'] = df['Text'].apply(lambda x: len(str(x).split()))
    return df


In [11]:
# Example usage
df = pd.DataFrame({'Text': ['This is a sentence.', 'Python programming is fun!', 'Data analysis with pandas.']})
print("Original DataFrame:")
print(df)

df_with_word_count = count_words(df)
print("\nDataFrame with Word_Count:")
print(df_with_word_count)


Original DataFrame:
                         Text
0         This is a sentence.
1  Python programming is fun!
2  Data analysis with pandas.

DataFrame with Word_Count:
                         Text  Word_Count
0         This is a sentence.           4
1  Python programming is fun!           4
2  Data analysis with pandas.           4


# Q5. How are DataFrame.size() and DataFrame.shape() different?

- DataFrame.size() returns the total number of elements (cells) in the DataFrame.
- DataFrame.shape() returns a tuple representing the dimensions of the DataFrame (number of rows and columns).

In [12]:
import pandas as pd

# Create a sample DataFrame
data = {'A': [1, 2, 3],
        'B': [4, 5, 6]}
df = pd.DataFrame(data)

# Print the size and shape of the DataFrame
print("DataFrame size:", df.size)
print("DataFrame shape:", df.shape)


DataFrame size: 6
DataFrame shape: (3, 2)


# Q6. Which function of pandas do we use to read an excel file?

In [13]:
import pandas as pd
#df = pd.read_excel('data.xlsx', sheet_name='Sheet1')
#print(df)


# Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email addresses in the format 'username@domain.com'. Write a Python function that creates a new column 'Username' in df that contains only the username part of each email address.The username is the part of the email address that appears before the '@' symbol. For example, if the email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your function should extract the username from each email address and store it in the new 'Username'column.

In [15]:
import pandas as pd

def extract_username(df):
    df['Username'] = df['Email'].str.split('@').str.get(0)
    return df


In [16]:
# Example DataFrame
data = {'Email': ['john.doe@example.com', 'jane.smith@example.com']}
df = pd.DataFrame(data)

# Extract usernames
df = extract_username(df)

# Print the updated DataFrame
print(df)


                    Email    Username
0    john.doe@example.com    john.doe
1  jane.smith@example.com  jane.smith


# Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The function should return a new DataFrame that contains only the selected rows.
#### For example, if df contains the following values:
A B C
0 3 5 1
1 8 2 7
2 6 9 4
3 2 3 5
4 9 1 2
#### Your function should select the following rows: A B C
1 8 2 7
4 9 1 2
- The function should return a new DataFrame that contains only the selected rows.

In [17]:
import pandas as pd

def select_rows(df):
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    return selected_rows


In [18]:
# Example DataFrame
data = {'A': [3, 1, 7, 6, 3, 5, 4],
        'B': [5, 8, 2, 9, 2, 4, 9],
        'C': [1, 2, 3, 4, 5, 1, 2]}
df = pd.DataFrame(data)

# Select rows
selected_df = select_rows(df)

# Print the selected DataFrame
print(selected_df)


   A  B  C
2  7  2  3
3  6  9  4


# Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean,median, and standard deviation of the values in the 'Values' column.

In [19]:
import pandas as pd

def calculate_statistics(df):
    mean_value = df['Values'].mean()
    median_value = df['Values'].median()
    std_value = df['Values'].std()
    return mean_value, median_value, std_value


In [20]:
# Example DataFrame
data = {'Values': [5, 8, 2, 9, 2, 4, 9]}
df = pd.DataFrame(data)

# Calculate statistics
mean, median, std = calculate_statistics(df)

# Print the results
print("Mean:", mean)
print("Median:", median)
print("Standard Deviation:", std)


Mean: 5.571428571428571
Median: 5.0
Standard Deviation: 3.1014589500826255


# Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days for each row in the DataFrame. The moving average should be calculated using a window of size 7 and should include the current day.

In [21]:
import pandas as pd

def calculate_moving_average(df):
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    return df


In [22]:
# Example DataFrame
data = {'Sales': [10, 12, 15, 11, 9, 13, 14, 16, 18, 20, 17, 19]}
dates = pd.date_range(start='2023-01-01', periods=12)
df = pd.DataFrame({'Date': dates, 'Sales': data['Sales']})

# Calculate moving average
df = calculate_moving_average(df)

# Print the updated DataFrame
print(df)


         Date  Sales  MovingAverage
0  2023-01-01     10      10.000000
1  2023-01-02     12      11.000000
2  2023-01-03     15      12.333333
3  2023-01-04     11      12.000000
4  2023-01-05      9      11.400000
5  2023-01-06     13      11.666667
6  2023-01-07     14      12.000000
7  2023-01-08     16      12.857143
8  2023-01-09     18      13.714286
9  2023-01-10     20      14.428571
10 2023-01-11     17      15.285714
11 2023-01-12     19      16.714286


# Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g.Monday, Tuesday) corresponding to each date in the 'Date' column.
#### For example, if df contains the following values:
Date
0 2023-01-01
1 2023-01-02
2 2023-01-03
3 2023-01-04
4 2023-01-05
#### Your function should create the following DataFrame:

- Date Weekday
0 2023-01-01 Sunday
1 2023-01-02 Monday
2 2023-01-03 Tuesday
3 2023-01-04 Wednesday
4 2023-01-05 Thursday
- The function should return the modified DataFrame.

In [23]:
import pandas as pd

def add_weekday_column(df):
    df['Weekday'] = df['Date'].dt.strftime('%A')
    return df


In [24]:
# Example DataFrame
dates = pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'])
df = pd.DataFrame({'Date': dates})

# Add weekday column
df = add_weekday_column(df)

# Print the updated DataFrame
print(df)


        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


# Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

In [25]:
import pandas as pd

def select_rows_within_date_range(df):
    start_date = pd.to_datetime('2023-01-01')
    end_date = pd.to_datetime('2023-01-31')
    selected_rows = df[(df['Date'] >= start_date) & (df['Date'] <= end_date)]
    return selected_rows


In [26]:
# Example DataFrame
dates = pd.date_range(start='2023-01-01', end='2023-02-15')
df = pd.DataFrame({'Date': dates})

# Select rows within date range
selected_df = select_rows_within_date_range(df)

# Print the selected DataFrame
print(selected_df)


         Date
0  2023-01-01
1  2023-01-02
2  2023-01-03
3  2023-01-04
4  2023-01-05
5  2023-01-06
6  2023-01-07
7  2023-01-08
8  2023-01-09
9  2023-01-10
10 2023-01-11
11 2023-01-12
12 2023-01-13
13 2023-01-14
14 2023-01-15
15 2023-01-16
16 2023-01-17
17 2023-01-18
18 2023-01-19
19 2023-01-20
20 2023-01-21
21 2023-01-22
22 2023-01-23
23 2023-01-24
24 2023-01-25
25 2023-01-26
26 2023-01-27
27 2023-01-28
28 2023-01-29
29 2023-01-30
30 2023-01-31


# Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to be imported?

# The first and foremost necessary library that needs to be imported to use the basic functions of pandas is the Pandas library itself. The Pandas library is a powerful data manipulation and analysis tool, and it provides various data structures and functions to work with structured data, such as DataFrames and Series.

- To import the Pandas library, you can use the following import statement:

```python
import pandas as pd
```

- In this import statement, `import pandas` imports the Pandas library, and `as pd` provides an alias "pd" to refer to the Pandas library in your code. This is a common convention used by the Pandas community and makes it easier to refer to Pandas functions and objects throughout your code.

- Once you have imported the Pandas library, you can use its functions and objects by prefixing them with `pd.`. For example, you can create a DataFrame using `pd.DataFrame()`, access columns using `df['column_name']`, and perform various data manipulation and analysis tasks using the functions provided by Pandas.