### Q1. List any five functions of the pandas library with execution.

#### Five commonly used functions of the pandas library with their execution:

###### pandas.read_csv(): This function reads a CSV file and creates a DataFrame object.

In [2]:
import pandas as pd

# Read a CSV file
df = pd.read_csv('UpdatedResumeDataSet.csv')

# Display the first five rows of the DataFrame
print(df.head())

       Category                                             Resume
0  Data Science  Skills * Programming Languages: Python (pandas...
1  Data Science  Education Details \r\nMay 2013 to May 2017 B.E...
2  Data Science  Areas of Interest Deep Learning, Control Syste...
3  Data Science  Skills â¢ R â¢ Python â¢ SAP HANA â¢ Table...
4  Data Science  Education Details \r\n MCA   YMCAUST,  Faridab...


###### pandas.DataFrame(): This function creates a DataFrame object from a Python dictionary.

In [3]:
import pandas as pd

# Create a dictionary
data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': [30, 25, 35],
        'Country': ['USA', 'Canada', 'Australia']}

# Create a DataFrame from the dictionary
df = pd.DataFrame(data)

# Display the DataFrame
print(df)

    Name  Age    Country
0   John   30        USA
1  Alice   25     Canada
2    Bob   35  Australia


###### pandas.Series(): This function creates a Series object from a Python list.

In [4]:
import pandas as pd

# Create a list
data = [10, 20, 30, 40, 50]

# Create a Series from the list
s = pd.Series(data)

# Display the Series
print(s)

0    10
1    20
2    30
3    40
4    50
dtype: int64


###### pandas.concat(): This function concatenates two or more DataFrames along a specified axis.

In [5]:
import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [4, 5, 6], 'B': [7, 8, 9]})

# Concatenate the DataFrames along the rows
df = pd.concat([df1, df2], axis=0)

# Display the concatenated DataFrame
print(df)

   A  B
0  1  4
1  2  5
2  3  6
0  4  7
1  5  8
2  6  9


###### pandas.DataFrame.drop(): This function drops one or more specified columns or rows from a DataFrame.

In [6]:
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})

# Drop the 'B' column
df = df.drop('B', axis=1)

# Display the DataFrame
print(df)

   A  C
0  1  7
1  2  8
2  3  9


### Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [10]:
import pandas as pd

def reindex_df(df):
    # Reset the index of the DataFrame to start from 0
    df.reset_index(drop=True, inplace=True)
    # Set the new index to start from 1 and increment by 2 for each row
    df.index = pd.RangeIndex(start=1, stop=2*len(df), step=2)
    return df

In [11]:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
reindex_df(df)

Unnamed: 0,A,B,C
1,1,4,7
3,2,5,8
5,3,6,9


### Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The function should print the sum to the console.
### For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function should calculate and print the sum of the first three values, which is 60.

In [15]:
import pandas as pd

def sum_first_three_values(df):

    # Get the first three values of the 'Values' column
    first_three_values = df['Values'].iloc[:3]
    # Calculate the sum of the first three values
    sum_first_three = sum(first_three_values)
    # Print the sum to the console
    print(f"The sum of the first three values is: {sum_first_three}")

In [23]:
data = {'Name': ['John', 'Alice', 'Bob', 'Salman', 'Shaikh'],
        'Values': [10, 20, 30, 40, 50],
        'Country': ['USA', 'Canada', 'Australia','India','Germany']}
df1 = pd.DataFrame(data)

In [24]:
sum_first_three_values(df1)

The sum of the first three values is: 60


### Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column.

In [25]:
import pandas as pd

def add_word_count_column(df):
    # Split the text in the 'Text' column into words and count the number of words
    df['Word_Count'] = df['Text'].apply(lambda x: len(str(x).split(" ")))
    return df

In [31]:
data = {'Name': ['John', 'Alice', 'Bob', 'Salman', 'Shaikh'],
        'Values': [10, 20, 30, 40, 50],
        'Text': ['United States of America', 'Canada', 'Great Britain','Republic of India','Germany']}
df2 = pd.DataFrame(data)

In [32]:
add_word_count_column(df2)

Unnamed: 0,Name,Values,Text,Word_Count
0,John,10,United States of America,4
1,Alice,20,Canada,1
2,Bob,30,Great Britain,2
3,Salman,40,Republic of India,3
4,Shaikh,50,Germany,1


### Q5. How are DataFrame.size() and DataFrame.shape() different?

#### Both DataFrame.size and DataFrame.shape are attributes of a Pandas DataFrame that provide information about the size of the DataFrame, but they are different in the following ways:

#### DataFrame.size returns the total number of elements in the DataFrame, which is equal to the product of the number of rows and the number of columns. In other words, DataFrame.size gives you the total number of cells in the DataFrame.

#### DataFrame.shape returns a tuple that contains the number of rows and the number of columns in the DataFrame, respectively. In other words, DataFrame.shape gives you the dimensions of the DataFrame.

In [33]:
import pandas as pd

# Create a DataFrame with 3 rows and 2 columns
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)

# Get the size of the DataFrame
df_size = df.size

# Get the shape of the DataFrame
df_shape = df.shape

print(f"Size of the DataFrame: {df_size}")
print(f"Shape of the DataFrame: {df_shape}")

Size of the DataFrame: 6
Shape of the DataFrame: (3, 2)


### Q6. Which function of pandas do we use to read an excel file?

#### To read an Excel file in pandas, we can use the read_excel() function.

In [36]:
import pandas as pd

# Read an Excel file named 'data.xlsx' into a pandas DataFrame
df = pd.read_excel('file_example.xls')

# Print the DataFrame
print(df.head())

   0 First Name  Last Name  Gender        Country  Age        Date    Id
0  1      Dulce      Abril  Female  United States   32  15/10/2017  1562
1  2       Mara  Hashimoto  Female  Great Britain   25  16/08/2016  1582
2  3     Philip       Gent    Male         France   36  21/05/2015  2587
3  4   Kathleen     Hanner  Female  United States   25  15/10/2017  3549
4  5    Nereida    Magwood  Female  United States   58  16/08/2016  2468


### Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email addresses in the format 'username@domain.com'. Write a Python function that creates a new column= 'Username' in df that contains only the username part of each email address.
#### The username is the part of the email address that appears before the '@' symbol. For example, if the email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your function should extract the username from each email address and store it in the new 'Username' column.

In [37]:
import pandas as pd

def extract_username(df):
    # Split the email address in the 'Email' column into username and domain using the '@' symbol
    df['Username'] = df['Email'].apply(lambda x: x.split('@')[0])
    return df

In [40]:
data = {'Name': ['John', 'Alice', 'Bob', 'Salman', 'Shaikh'],
        'Values': [10, 20, 30, 40, 50],
        'Email': ['john.doe@example.com', 'alice23@gmail.com', 'bob4323@gmail.com','salman89743@gmail.com','shaikh42343@gmail.com']}
df3 = pd.DataFrame(data)

In [41]:
extract_username(df3)

Unnamed: 0,Name,Values,Email,Username
0,John,10,john.doe@example.com,john.doe
1,Alice,20,alice23@gmail.com,alice23
2,Bob,30,bob4323@gmail.com,bob4323
3,Salman,40,salman89743@gmail.com,salman89743
4,Shaikh,50,shaikh42343@gmail.com,shaikh42343


#### Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The function should return a new DataFrame that contains only the selected rows.
#### For example, if df contains the following values:
#### A B C
#### 0 3 5 1
#### 1 8 2 7
#### 2 6 9 4
#### 3 2 3 5
#### 4 9 1 2

#### Your function should select the following rows: A B C
#### 1 8 2 7
#### 4 9 1 2
#### The function should return a new DataFrame that contains only the selected rows.

In [42]:
import pandas as pd

def select_rows(df):

    # Use boolean indexing to select rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    return selected_rows

In [43]:
df5 = pd.DataFrame({'A': [3,8,6,2,9], 'B': [5,2,9,3,1], 'C': [1,7,4,5,2]})

In [45]:
df_new = select_rows(df5)

In [46]:
df_new

Unnamed: 0,A,B,C
1,8,2,7
2,6,9,4
4,9,1,2


### Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean, median, and standard deviation of the values in the 'Values' column.

In [47]:
data = {'Name': ['John', 'Alice', 'Bob', 'Salman', 'Shaikh'],
        'Values': [10, 20, 30, 40, 50],
        'Email': ['john.doe@example.com', 'alice23@gmail.com', 'bob4323@gmail.com','salman89743@gmail.com','shaikh42343@gmail.com']}

In [75]:
import pandas as pd

def calculate_statistics(df):

    values = df['Values']
    mean = values.mean()
    median = values.median()
    std = values.std()
    return mean, median, std

In [77]:
df6 = pd.DataFrame(data)
mean, median, std = calculate_statistics(df6)

In [78]:
print("Mean:",mean)
print("Median:",median)
print("Standard Deviation:",std)

Mean: 30.0
Median: 30.0
Standard Deviation: 15.811388300841896


### Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days for each row in the DataFrame. The moving average should be calculated using a window of size 7 and should include the current day.

In [80]:
import pandas as pd

def calculate_moving_average(df):
    
    # Sort the DataFrame by date
    df = df.sort_values('Date')

    # Calculate the moving average using a rolling window of size 7
    ma = df['Sales'].rolling(window=7, min_periods=1).mean()

    # Create a new DataFrame with the original columns and the new 'MovingAverage' column
    new_df = pd.DataFrame({'Date': df['Date'], 'Sales': df['Sales'], 'MovingAverage': ma})

    return new_df

In [82]:
df_order = pd.read_excel('OrdersData.xlsx')

In [84]:
df_order_new = calculate_moving_average(df_order)

In [85]:
df_order_new

Unnamed: 0,Date,Sales,MovingAverage
7980,2014-01-03,16.448,16.448000
739,2014-01-04,11.784,14.116000
740,2014-01-04,272.736,100.322667
741,2014-01-04,3.540,76.127000
1759,2014-01-05,19.536,64.808800
...,...,...,...
5091,2017-12-30,3.024,110.107143
908,2017-12-30,52.776,114.846571
907,2017-12-30,90.930,84.640000
1296,2017-12-30,13.904,56.669143


### Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a newcolumn 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g. Monday, Tuesday) corresponding to each date in the 'Date' column. For example, if df contains the following values:
#### Date
#### 0 2023-01-01
#### 1 2023-01-02
#### 2 2023-01-03
#### 3 2023-01-04
#### 4 2023-01-05
#### Your function should create the following DataFrame:

#### Date Weekday
#### 0 2023-01-01 Sunday
#### 1 2023-01-02 Monday
#### 2 2023-01-03 Tuesday
#### 3 2023-01-04 Wednesday
#### 4 2023-01-05 Thursday
#### The function should return the modified DataFrame.

In [86]:
import pandas as pd

def add_weekday_column(df):
    # Convert 'Date' column to datetime if necessary
    if not pd.api.types.is_datetime64_any_dtype(df['Date']):
        df['Date'] = pd.to_datetime(df['Date'])
    
    # Extract weekday name from 'Date' column and create new column 'Weekday'
    df['Weekday'] = df['Date'].dt.day_name()
    
    return df

In [89]:
df = pd.DataFrame({'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']})
df = add_weekday_column(df)
print(df)

        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


### Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

In [90]:
import pandas as pd

def select_rows_between_dates(df):
    # Convert 'Date' column to datetime
    df['Date'] = pd.to_datetime(df['Date'])

    # Select rows between '2023-01-01' and '2023-01-31'
    selected_rows = df.loc[(df['Date'] >= '2023-01-01') & (df['Date'] <= '2023-01-31')]

    return selected_rows

In [98]:
data = {'Name': ['John', 'Alice', 'Bob', 'Salman', 'Shaikh'],
        'Values': [10, 20, 30, 40, 50],
        'Email': ['john.doe@example.com', 'alice23@gmail.com', 'bob4323@gmail.com','salman89743@gmail.com','shaikh42343@gmail.com'],
        'Date': ['2023-01-01', '2021-01-02', '2023-01-03', '2023-01-04', '2022-01-05']
       }
dfdate = pd.DataFrame(data)
print(dfdate)
print("\n########################__After Running Function__####################\n")
df_rows = select_rows_between_dates(dfdate)
print(df_rows)

     Name  Values                  Email        Date
0    John      10   john.doe@example.com  2023-01-01
1   Alice      20      alice23@gmail.com  2021-01-02
2     Bob      30      bob4323@gmail.com  2023-01-03
3  Salman      40  salman89743@gmail.com  2023-01-04
4  Shaikh      50  shaikh42343@gmail.com  2022-01-05

########################__After Running Function__####################

     Name  Values                  Email       Date
0    John      10   john.doe@example.com 2023-01-01
2     Bob      30      bob4323@gmail.com 2023-01-03
3  Salman      40  salman89743@gmail.com 2023-01-04


### Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to be imported?

#### The first and foremost necessary library that needs to be imported to use the basic functions of pandas is the pandas library itself. The commonly used way to import the pandas library is:

In [99]:
import pandas as pd

###### The "pd" alias is often used in the pandas community as a shorthand for "pandas." By importing the pandas library, you gain access to all of the basic functions and data structures that pandas provides, such as Series, DataFrame, and read_csv().