## Introduction to Functions

Definition: A function is a block of organized, reusable code that is used to perform a single, related action. Functions provide better modularity for your application and a high degree of code reusability.


In [None]:
"""
def function_name(parameters):
 """
 Docstring explaining the function.
 """
 # Function body
 return result
"""


## Key Concepts
1. Function Definition and Calling:
- Define a function using the def keyword.
- Call a function using its name followed by parentheses.
2. Parameters and Arguments:
- Parameters: Variables listed inside the parentheses in the function definition.
- Arguments: Values passed to the function when it is called.
3. Return Statement:
- The return statement is used to exit a function and go back to the place from where it was called.
4. Default Arguments:
- Function arguments can have default values.
5. Variable Scope:
- Variables defined inside a function are local to that function.

In [5]:
# Function with default Arguments
def greet (name="World"):
    """
    Function to greet a person with a default name.
    """
    return f"Hello, {name}!"

#Calling the fucntion
print(greet())
print(greet("Alice"))

Hello, World!
Hello, Alice!


In [9]:
# Function Returning Multiple Values 
def arithmetic_operations(a,b):
    """
    Function to perform arithmetic operations.
    """
    addition=a+b
    subtraction=a-b
    multiplication=a*b
    division=a/b if b!=0 else None 
    return addition, subtraction, multiplication,division

#Calling the function
add, sub, mul, div= arithmetic_operations(10,5)
print(f"Add: {add}, Subtract: {sub}, Multiply: {mul}, Divide: {div}")

Add: 15, Subtract: 5, Multiply: 50, Divide: 2.0


## Functions in Data Science

In [2]:
# Reading Data
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder
def read_data(file_path):
    
 """
 Function to read CSV data.
 """
 return pd.read_csv(file_path, header=None, names=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'])
                    
# Example usage
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
df = read_data(url)
print(df.head())


   sepal_length  sepal_width  petal_length  petal_width      species
0           5.1          3.5           1.4          0.2  Iris-setosa
1           4.9          3.0           1.4          0.2  Iris-setosa
2           4.7          3.2           1.3          0.2  Iris-setosa
3           4.6          3.1           1.5          0.2  Iris-setosa
4           5.0          3.6           1.4          0.2  Iris-setosa


In [3]:
# Data Preprocessing
def preprocess_data(df):
 """
 Function to preprocess data.
 """
    
 # Encode the target variable
 label_encoder = LabelEncoder()
 df['species'] = label_encoder.fit_transform(df['species'])
 return df
    
# Preprocess the data
df_clean = preprocess_data(df)
print(df_clean.head())

   sepal_length  sepal_width  petal_length  petal_width  species
0           5.1          3.5           1.4          0.2        0
1           4.9          3.0           1.4          0.2        0
2           4.7          3.2           1.3          0.2        0
3           4.6          3.1           1.5          0.2        0
4           5.0          3.6           1.4          0.2        0


In [4]:
# Model Training
def train_model(X, y):
 """
 Function to train a logistic regression model.
 """
    
 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
 model = LogisticRegression(max_iter=200)
 model.fit(X_train, y_train)
 return model, X_test, y_test
    
# Example usage
X = df_clean.drop("species", axis=1)
y = df_clean["species"]
model, X_test, y_test = train_model(X, y)
print(f"Model accuracy: {model.score(X_test, y_test)}")

Model accuracy: 1.0


## Exercise
A. Basic Function Creation:
1. Write a function square that takes a number and returns its square.
2. Write a function cube that takes a number and returns its cube.
   
B. Data Manipulation Functions:

3.  Write a function filter_above_threshold that takes a DataFrame and a threshold, and returns a DataFrame with rows where a specified
column's value is above the threshold.
4. Write a function normalize_column that takes a DataFrame and a column name, and returns the DataFrame with the specified column
normalized (values between 0 and 1).

C. Model Evaluation:

5. Write a function evaluate_model that takes a model, test features, and test labels, and returns the model's performance metrics (e.g.,
accuracy, precision, recall).


In [5]:
#A1:
def square(number):

    return(number*number)

print(square(6))

36


In [9]:
#A2:
def cube(number):

    answer= pow(number,3)
    return answer

print(cube(2))

8


In [10]:
#B3:
def filter_above_threshold(df,column, threshold):
    
    filter_df=df[df[column]>threshold]
    return filter_df
 

In [13]:
#B4:
#normalized_value= (val-min/max-min)

def normalize_column(df,column_name):
    column_min=df[column_name].min()
    column_max=df[column_name].max()

    df[column_name]=(df[column_name]-column_min)/(column_max-column_min)
    return df


In [None]:
#C5:GO OVER THIS

def evaluate_model(model,test_feautures,test_labels)

## Sample Questions

D. Theoretical Questions:

6. What is the difference between a parameter and an argument?
7. Explain the concept of variable scope with examples.

E. Coding Questions:

8. Implement a function that calculates the factorial of a number using recursion.
9. Write a function that takes a list of numbers and returns a dictionary with the mean, median, and mode of the list

F. Data Science Specific:

10. Write a function to compute the Root Mean Squared Error (RMSE) for a set of predictions and actual values.
11. Implement a function to split a DataFrame into training and testing sets with a specified ratio.


#### D6 Answer:
-A parameter is a variable that is taken in by a function, which specifies the input type for the function.
-Whereas, an argument is the actual value given when calling the function

#### D7 Answer:
-Variable scope is the part of your code where a variable can be used or accessed. It is like the boundaries within which the variable exists. This includes Local scope, Non local scope,
and Global scope 

In [20]:
#D7 ex)

#Local Scope: variable created inside of a function, which is only able 
#to be used inside the function

def myfunc():
  x = 300
  print(x)

myfunc()
# print(x)  # This would cause an error because x is not accessible here



#Non Local scope: used in nested function whose local scope is not defined 

def outer_func():
    outer_var = "Hi"

    def inner_func():
        nonlocal outer_var
        outer_var = "Hello"

    inner_func()
    print(outer_var)

outer_func()
#Output: Hello



#Global Scope: variables availible within any scope, global and local 

x = 300

def myfunc():
  print(x)

myfunc()
print(x)

#Output: 
#300
#300

300
Hello
300
300


In [32]:
#E8:

def factorial(number):
    if number==0 or number==1 :
        return 1 
    else:
        return number*factorial(number-1)
  

print(factorial(8))


40320


In [75]:
#E9:

import numpy as np

def calculate_stats(list):
    # Calculate Mean
    mean = np.mean(list)
    
    # Calculate Median
    median = np.median(list)
    
    # Calculate Mode
    mode = int(np.bincount(list).argmax())
    
    # Create and return dictionary with results
    stats = {
        'mean': mean,
        'median': median,
        'mode': mode
    }
    
    return stats

list = [1, 2, 3, 4, 5, 5, 6, 6, 6, 7]
stats = calculate_stats(numbers)
print(stats)
    

{'mean': 4.5, 'median': 5.0, 'mode': 6}


In [85]:
#F10:

def rmse(predict, actual):
    n=len(predict)
    square_error=0

    #sum of squared errors 
    for i in range(n): 
        square_error +=(predict[i]-actual[i])**2

    mean_square_error=square_error/n

    rmse_val=mean_square_error**0.5

    return rmse_val 



In [None]:
#F11:

#ASK

## Real-World Questions on Functions


In [89]:
# Question 1: Calculate the Monthly Sales Average
""" You are given a dictionary where the keys are product names and the values are lists of monthly
sales figures for each product. Write a function to calculate the average monthly sales for each
product. """

import numpy as np

# Sample data
sales_data = {
 'Product_A': [150, 200, 250, 300],
 'Product_B': [400, 500, 600, 700],
 'Product_C': [100, 150, 200, 250]
}


def avg_monthly_sales (sample_data):
    averages = {}
    for product,sales in sales_data.items():
        averages[product]=np.mean(sales)
    return averages
        
# Example usage
averages = avg_monthly_sales(sales_data)
print(averages)


{'Product_A': 225.0, 'Product_B': 550.0, 'Product_C': 175.0}


In [90]:
# Question 2: Count Unique Values in a DataFrame Column
"""Given a DataFrame, write a function to count the number of unique values in a specified column."""

import pandas as pd

# Sample data
data = {
 'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Alice'],
 'Age': [25, 30, 35, 40, 25]
}
df = pd.DataFrame(data)

def count_unique(df,column):
    return df[column].nunique()

# Example usage
unique_names_count = count_unique(df, 'Name')
print(unique_names_count)


4


In [97]:
# Question 3: Normalize Data in a DataFrame

""" Write a function to normalize the data in a DataFrame such that each value
in a column is scaled to a range of 0 to 1."""

import pandas as pd
import numpy as np

# Sample data
data = {
 'A': [1, 2, 3, 4, 5],
 'B': [10, 20, 30, 40, 50]
}
df = pd.DataFrame(data)

def normalize_data(df):
    result=df.copy()
    for column in result.columns:
        min_value=result[column].min()
        max_value=result[column].max()
        result[column]=((result[column]-min_value)/(max_value-min_value))
    return result

#Example usage:
normalized_df = normalize_data(df)
print(normalized_df)      

      A     B
0  0.00  0.00
1  0.25  0.25
2  0.50  0.50
3  0.75  0.75
4  1.00  1.00


In [99]:
# Question 4: Find Missing Values

""" Given a DataFrame, write a function to find and return the number of
missing values in each column."""

import pandas as pd

# Sample data
data = {
 'A': [1, 2, None, 4, 5],
 'B': [None, 2, 3, 4, None],
 'C': [1, None, 3, None, 5]
}
df = pd.DataFrame(data)

def find_missing(df):
    return df.isnull().sum()

#Example usage
cleaned_data=find_missing(df)
print(cleaned_data)


A    1
B    2
C    2
dtype: int64


In [116]:
# Question 5: Selecting Specific Rows and Columns

""" Given a DataFrame, how do you select rows where a specific column's
value is greater than a given threshold and only select certain columns?"""

import pandas as pd
# Sample data

data = {
 'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
 'Age': [24, 27, 22, 32, 29],
 'Score': [85, 95, 70, 60, 80]
}
df = pd.DataFrame(data)


def select_rows_and_columns(df, threshold, columns):
     """
 Function to select rows where 'Age' is greater than the threshold
 and select specified columns.
 """
     specified_data=df[df['Age']>threshold]
     return specified_data[columns]



#Example Usage
filtered_data=select_rows_and_columns(df, 25, ['Name', 'Age'])
print(filtered_data)



    Name  Age
1    Bob   27
3  David   32
4    Eve   29


In [117]:
# Question 6: Grouping and Aggregation

"""How do you group a DataFrame by a column and compute the sum of another
column?"""

import pandas as pd
# Sample data

data = {
 'Department': ['HR', 'IT', 'HR', 'IT', 'Finance'],
 'Salary': [50000, 60000, 45000, 65000, 70000]
}
df = pd.DataFrame(data)

def group_and_aggregate(df):
 """
 Function to group by 'Department' and compute the sum of 'Salary'.
 """
 return df.groupby('Department')['Salary'].sum().reset_index()
    
# Example usage
result = group_and_aggregate(df)
print(result)


  Department  Salary
0    Finance   70000
1         HR   95000
2         IT  125000


## Pandas Questions
- Question 1: Filter DataFrame by Multiple Conditions: Given a DataFrame with columns Name, Age, and Score, write a function to filter rows
where Age is greater than 25 and Score is greater than 80.
- Question 2: Pivot Table Creation: Given a DataFrame with columns Department, Employee, and Salary, write a function to create a pivot
table that shows the average salary for each department.
- Question 3: Merge DataFrames: Given two DataFrames, df1 and df2, which both have a common column ID, write a function to merge
these DataFrames on the ID column and return the merged DataFrame.
- Question 4: Handle Missing Values: Write a function to fill missing values in a DataFrame. If the column is numerical, fill with the mean of the
column. If the column is categorical, fill with the mode of the column.
- Question 5: Calculate Rolling Mean: Given a DataFrame with a time series column Value, write a function to calculate the rolling mean with
a window size of 3 and add it as a new column to the DataFrame.


In [138]:
#Q1:

import pandas as pd

data={
    'Name': ['Sam','Paul','Joe'],
    'Age':[25,29,30],
    'Score': [80,82,90]
}

df=pd.DataFrame(data)

def filter_df(df):
    filtered=df[(df['Age']>25) & (df['Score']>80)]
    return filtered

answer=filter_df(df)
print(answer)
    

   Name  Age  Score
1  Paul   29     82
2   Joe   30     90


In [139]:
#Q2:

import pandas as pd

data={
'Department': ['HR', 'IT', 'HR', 'IT', 'Finance'],
'Employee':['Alice' , 'Bob', 'Joe', 'Lucy', 'Steven'],
'Salary': [50000, 60000, 45000, 65000, 70000]
}

df=pd.DataFrame(data)

def avg_pivot(df):
    pivot_table=pd.pivot_table(df, values='Salary', index='Department', aggfunc='mean')
    return pivot_table

answer_pivot=avg_pivot(df)
print(answer_pivot)
    

             Salary
Department         
Finance     70000.0
HR          47500.0
IT          62500.0


In [137]:
#Q3:

data1 = {
    'ID': [1, 2, 3],
    'Name': ['Alice', 'Bob', 'Charlie']
}
df1 = pd.DataFrame(data1)

# Sample data for df2
data2 = {
    'ID': [1, 2, 4],
    'Score': [80, 85, 90],
}
df2 = pd.DataFrame(data2)

def merge_dataframes(df1, df2):
    """
    Function to merge two DataFrames on the 'ID' column.
    """
    merged_df = pd.merge(df1, df2, on='ID', how='inner')
    return merged_df

# Example usage
result = merge_dataframes(df1, df2)
print(result)

   ID   Name  Score
0   1  Alice     80
1   2    Bob     85


In [151]:
#Q4:

import pandas as pd

data={
    'Name': ['Sam','Paul',None],
    'Age':[25,None,30],
    'Score': [80,82,90]
}

df=pd.DataFrame(data)

def fill_missing(df):
    for column in df.columns:
         if df[column].isnull().any(): 
            if df[column].dtype in ['int64']:  # Numerical columns
                df[column].fillna(df[column].mean(), inplace=True)
            else:  # Categorical columns
                df[column].fillna(df[column].mode()[0], inplace=True)
    return df
        

filled_data=fill_missing(df)
print(filled_data)

   Name   Age  Score
0   Sam  25.0     80
1  Paul  25.0     82
2  Paul  30.0     90


In [152]:
#Q5:

data = {
    'Date': pd.date_range(start='2023-01-01', periods=10, freq='D'),
    'Value': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
}

df = pd.DataFrame(data)

def add_rolling_mean(df):
    """
    Function to calculate the rolling mean with a window size of 3
    and add it as a new column to the DataFrame.
    """
    df['RollingMean'] = df['Value'].rolling(window=3).mean()
    return df

# Example usage
result = add_rolling_mean(df)
print(result)

        Date  Value  RollingMean
0 2023-01-01     10          NaN
1 2023-01-02     20          NaN
2 2023-01-03     30         20.0
3 2023-01-04     40         30.0
4 2023-01-05     50         40.0
5 2023-01-06     60         50.0
6 2023-01-07     70         60.0
7 2023-01-08     80         70.0
8 2023-01-09     90         80.0
9 2023-01-10    100         90.0


## Lists Questions

- Question 1: Chunk a List: Write a function to split a list into chunks of a given size. For example, given the list [1, 2, 3, 4, 5, 6, 7, 8, 9] and
chunk size 3, the function should return [[1, 2, 3], [4, 5, 6], [7, 8, 9]].
- Question 2: List Intersection: Write a function to find the intersection of two lists, returning a list of elements that are present in both lists.
- Question 3: Rotate List: Write a function to rotate a list n positions to the left. For example, rotating [1, 2, 3, 4, 5] by 2 positions should result
in [3, 4, 5, 1, 2].
- Question 4: Find Duplicates: Write a function to find all duplicate elements in a list. The function should return a list of duplicates.
- Question 5: Cumulative Sum: Write a function to compute the cumulative sum of a list. For example, given the list [1, 2, 3, 4], the function
should return [1, 3, 6, 10].

In [156]:
#Q1:

list=[1,2,3,4,5,6,7,8,9]

def split(input_list, chunk_size):
     return [input_list[i:i+chunk_size] for i in range(0, len(input_list), chunk_size)]

final_list=split(list,3)
print(final_list)


[[1, 2, 3], [4, 5, 6], [7, 8, 9]]


In [164]:
#Q2:

def find_intersection(list1, list2):

    # Using list comprehension to find common elements
    intersection = [value for value in list1 if value in list2]
    
    return intersection

# Example usage
list1 = [1, 2, 3, 4, 5]
list2 = [4, 5, 6, 7, 8]
final_intersect_list = find_intersection(list1, list2)
print(final_intersect_list)

[4, 5]


In [165]:
#Q3:

def rotate_list_left(lst, n):
    
    # Perform the rotation using list slicing
    rotated_list = lst[n:] + lst[:n]
    
    return rotated_list

# Example usage
original_list = [1, 2, 3, 4, 5]
rotated_list = rotate_list_left(original_list, 2)
print(rotated_list)

[3, 4, 5, 1, 2]


In [166]:
#Q4:

def find_duplicates(lst):

    seen = set()
    duplicates = []
    
    # Iterate through the list
    for item in lst:
        # If element is already in the set, it's a duplicate
        if item in seen:
            duplicates.append(item)
        else:
            seen.add(item)
    
    return duplicates

# Example usage
my_list = [1, 2, 2, 3, 4, 4, 5, 5, 5]
duplicate_elements = find_duplicates(my_list)
print(duplicate_elements)

[2, 4, 5, 5]


In [167]:
#Q5:

def cumulative_sum(lst):
    # Initialize an empty list to store cumulative sums
    cumulative = []
    current_sum = 0
    
    # Iterate through the list and calculate cumulative sums
    for num in lst:
        current_sum += num
        cumulative.append(current_sum)
    
    return cumulative

# Example usage
my_list = [1, 2, 3, 4]
cumulative_sum_list = cumulative_sum(my_list)
print(cumulative_sum_list)

[1, 3, 6, 10]
