'''Q1. List any five functions of the pandas library with execution.'''
Sure, here are five functions of the pandas library along with their execution:

1. **`read_csv()`**: This function is used to read data from a CSV file into a DataFrame.
   
   
   import pandas as pd

   #### Read data from a CSV file into a DataFrame
   df = pd.read_csv('data.csv')
   ```

2. **`head()`**: This function is used to display the first few rows of a DataFrame.

   ```python
   # Display the first 5 rows of the DataFrame
   print(df.head())
   ```

3. **`info()`**: This function is used to print a concise summary of a DataFrame including the data types of each column and the number of non-null values.

   ```python
   # Print a concise summary of the DataFrame
   print(df.info())
   ```

4. **`describe()`**: This function generates descriptive statistics for numerical columns in the DataFrame, such as count, mean, standard deviation, minimum, maximum, and quartile values.

   ```python
   # Generate descriptive statistics for numerical columns
   print(df.describe())
   ```

5. **`groupby()`**: This function is used to group data in the DataFrame based on one or more columns, and perform operations (e.g., aggregation, transformation) on the grouped data.

   ```python
   # Group data by 'Category' column and calculate the mean of 'Value' column within each group
   grouped_data = df.groupby('Category')['Value'].mean()
   print(grouped_data)
   ```

These functions are commonly used in data analysis tasks with pandas to perform various operations such as data loading, data exploration, and data manipulation.

In [18]:
'''Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the
DataFrame with a new index that starts from 1 and increments by 2 for each row.'''
import pandas as pd

data = {'A','B','C'}
df = pd.DataFrame(data)
print(df)

def reindex_dataframe(df):
    new_index = [i*2+1 for i in range(len(df))]
    df_reindexed = df.set_index(pd.Index(new_index))
    return df_reindexed

reindex_dataframe(df)

   0
0  A
1  C
2  B


Unnamed: 0,0
1,A
3,C
5,B


In [19]:
'''Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that
iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The
function should print the sum to the console.'''

import pandas as pd

data = {'Values':[10, 20, 30, 40, 50]}
df = pd.DataFrame(data)
df

def calcSum(df):
    sum_of_three_nums = sum(df['Values'].iloc[0:3])
    return sum_of_three_nums

calcSum(df)

60

In [42]:
'''Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column
'Word_Count' that contains the number of words in each row of the 'Text' column.'''

import pandas as pd

data = {'Text':['text']}
df = pd.DataFrame(data)
df

def createColumn(df, columnName):
    word_count = len(df['Text'].iloc[0])
    
    df[columnName] = word_count
    return df

createColumn(df,'Word_Count')   

Unnamed: 0,Text,Word_Count
0,text,4


### Q5. How are DataFrame.size() and DataFrame.shape() different?

`DataFrame.size` and `DataFrame.shape` are both attributes of a pandas DataFrame, but they serve different purposes:

1. **DataFrame.size**: This attribute returns the total number of elements in the DataFrame, which is calculated by multiplying the number of rows by the number of columns. It returns an integer representing the total size of the DataFrame.

   ```python
   import pandas as pd

   # Create a DataFrame
   df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

   # Get the total number of elements in the DataFrame
   total_elements = df.size
   print("Total elements in DataFrame:", total_elements)
   ```

   Output:
   ```
   Total elements in DataFrame: 6
   ```

2. **DataFrame.shape**: This attribute returns a tuple representing the dimensions of the DataFrame, where the first element of the tuple is the number of rows and the second element is the number of columns. It provides a convenient way to get the number of rows and columns of the DataFrame.

   ```python
   import pandas as pd

   # Create a DataFrame
   df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

   # Get the dimensions of the DataFrame
   dimensions = df.shape
   print("Dimensions of DataFrame (rows, columns):", dimensions)
   ```

   Output:
   ```
   Dimensions of DataFrame (rows, columns): (3, 2)
   ```

In summary, `DataFrame.size` returns the total number of elements (rows * columns) in the DataFrame, while `DataFrame.shape` returns a tuple containing the number of rows and columns in the DataFrame.

### Q6. Which function of pandas do we use to read an excel file?
1. **`read_csv()`**: This function is used to read data from a CSV file into a DataFrame.


In [65]:
'''Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email
addresses in the format 'username@domain.com'. Write a Python function that creates a new column
'Username' in df that contains only the username part of each email address.

The username is the part of the email address that appears before the '@' symbol. For example, if the
email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your
function should extract the username from each email address and store it in the new 'Username'
column.'''

import pandas as pd

data = {'Email':['pardeep.rj90@gmail.com','test@test.com']}

df = pd.DataFrame(data)

def userName(df,columnName,textSeperator):
    usernames = df['Email'].str.split(textSeperator).str[0]
    # print(usernames)
    df[columnName] = usernames
    return df

userName(df,'userName','@') 

Unnamed: 0,Email,userName
0,pardeep.rj90@gmail.com,pardeep.rj90
1,test@test.com,test


In [43]:
'''Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects
all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The
function should return a new DataFrame that contains only the selected rows.'''

import pandas as pd

data = {'A': [3, 8, 6, 2, 9], 'B': [5, 2, 9, 3, 1], 'C': [1, 7, 4, 5, 2]}
df = pd.DataFrame(data)
# print(df)

def select_rows(df):
    # Apply the conditions to filter the DataFrame
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    return selected_rows

# Example usage
result = select_rows(df)
print(result)



   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


In [48]:
'''Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean,
median, and standard deviation of the values in the 'Values' column.'''

import pandas as pd
data = {'Values':[10,154,258,9899999,52]}
df = pd.DataFrame(data)

def calcMeanMedianDeviation(df):
    mean_value = df['Values'].mean()
    median_value = df['Values'].median()
    std_deviation = df['Values'].std()
    return mean_value, median_value, std_deviation

mean, median ,std_deviation = calcMeanMedianDeviation(df)
mean, median ,std_deviation

(1980094.6, 154.0, 4427361.154467163)

In [50]:
'''Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to
create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days
for each row in the DataFrame. The moving average should be calculated using a window of size 7 and
should include the current day.'''

import pandas as pd

def calculate_moving_average(df):
    # Ensure the Date column is in datetime format
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Sort the DataFrame by Date in case it's not sorted
    df = df.sort_values(by='Date')
    
    # Calculate the moving average with a window size of 7, including the current day
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    
    return df

# Example usage:
data = {
    'Date': [
        '2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05', 
        '2024-01-06', '2024-01-07', '2024-01-08', '2024-01-09', '2024-01-10'
    ],
    'Sales': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
}
df = pd.DataFrame(data)

result = calculate_moving_average(df)
print(result)


        Date  Sales  MovingAverage
0 2024-01-01     10           10.0
1 2024-01-02     20           15.0
2 2024-01-03     30           20.0
3 2024-01-04     40           25.0
4 2024-01-05     50           30.0
5 2024-01-06     60           35.0
6 2024-01-07     70           40.0
7 2024-01-08     80           50.0
8 2024-01-09     90           60.0
9 2024-01-10    100           70.0


In [59]:
'''Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new
column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g.
Monday, Tuesday) corresponding to each date in the 'Date' column.'''
import pandas as pd

def add_weekday_column(df):

    df['Date'] = pd.to_datetime(df['Date'])
    df['Weekday'] = df['Date'].dt.day_name()
    
    return df

data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']}
df = pd.DataFrame(data)

result = add_weekday_column(df)
print(result)


        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


In [61]:
'''Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python
function to select all rows where the date is between '2023-01-01' and '2023-01-31'.'''

import pandas as pd

def select_date_range(df):
    df['date']= pd.to_datetime(df['date'])
    
    start_date = '2023-01-01'
    end_date = '2023-01-31'
    
    filtered_data = (df['date'] >= start_date) & (df['date'] <= end_date)
    selected_rows = df[filtered_data]
    
    return selected_rows

# Example usage:
data = {
    'date': [
        '2022-12-31', '2023-01-01', '2023-01-15', '2023-01-31', '2023-02-01'
    ],
    'Value': [10, 20, 30, 40, 50]
}
df = pd.DataFrame(data)

result = select_date_range(df)
print(result)


        date  Value
1 2023-01-01     20
2 2023-01-15     30
3 2023-01-31     40


### Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to be imported?

To use the basic functions of pandas, the first and foremost necessary library that needs to be imported is pandas 