[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/gdsaxton/GDAN5400/blob/main/Week%204%20Notebooks/GDAN%205400%20-%20Week%204%20Notebooks%20%28I%29%20-%20Using%Functions.ipynb)

This notebook provides recipes for using built-in and custom functions in Python 

In [None]:
%%time
import datetime
print ("Current date and time : ", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"), '\n')

# Load Packages and Set Working Directory
Import several necessary Python packages. We will be using the <a href="http://pandas.pydata.org/">Python Data Analysis Library,</a> or <i>PANDAS</i>, extensively for our data manipulations in this and future tutorials.

In [None]:
import numpy as np
import pandas as pd
from pandas import DataFrame
from pandas import Series

<br>
PANDAS allows you to set various options for, among other things, inspecting the data. I like to be able to see all of the columns. Therefore, I typically include this line at the top of all my notebooks.

In [None]:
#http://pandas.pydata.org/pandas-docs/stable/options.html
pd.set_option('display.max_columns', None)
pd.set_option('max_colwidth', 250)
pd.set_option('display.max_info_columns', 500)

# Read in Data

In [None]:
import pandas as pd
import requests

# NOTE: replace `https://github.com/` with `https://raw.githubusercontent.com`
# https://github.com/gdsaxton/GDAN5400/blob/main/Coding%20Assignment%201/final_insurance_fraud.xlsx
url = 'https://raw.githubusercontent.com/gdsaxton/GDAN5400/main/Coding%20Assignment%201/final_insurance_fraud.xlsx'

# Download the file
response = requests.get(url)
with open('final_insurance_fraud.xlsx', 'wb') as f:
    f.write(response.content)

# Load the Excel file
df = pd.read_excel('final_insurance_fraud.xlsx', engine='openpyxl')

df[:2]

In [None]:
#APPLY DATA CLEANING OPERATIONS FROM CODING ASSIGNMENT 1
df = df[df['Policy Number'].notnull()]
df['Estimated cost to repair'] = df['Estimated cost to repair'].fillna(0)
df['Estimated cost to replace'] = df['Estimated cost to replace'].fillna(0)

# Working with Python `Functions`

**[ChatGPT prompt]** `What are *functions* in Python?`

# Built-In Functions

### **Example: Math and Numeric Operations**

- **`abs()`** – Returns the absolute value of a number.

- **`round()`** – Rounds a number to the nearest integer or specified number of decimal places.

- **`pow()`** – Returns the value of a number raised to a power.

- **`sum()`** – Sums the elements of an iterable (e.g., a list).

- **`mean()`** – Returns the average value.

- **`min()`, max()** – Return the minimum or maximum value from an iterable.

Apply `max()` to find the maximum hail diameter.     

In [None]:
max_hail = df['Hail Diameter'].max()
print("Maximum Hail Diameter:", max_hail)

Use `min()` to find the minimum wind speed.

In [None]:
min_wind_speed = df['Wind Speed'].min()
print("Minimum Wind Speed:", min_wind_speed)

### Creating New Columns with Built-In Functions**:
   - Use `round()` to round the rainfall values to one decimal place and store them in a new column.

In [None]:
df['Rounded_Rainfall'] = df['Rainfall'].round(1)
df[['Rainfall', 'Rounded_Rainfall']].head()

# Custom Functions
### What are they?
- Reusable blocks of code defined using `def` keyword
- Take inputs (parameters), perform operations, return outputs
- Follow DRY principle (Don't Repeat Yourself)
- Enable modular, organized, and maintainable code
- Can be called multiple times with different inputs

In [None]:
def is_large_repair(cost):
  return "Yes" if cost > 5000 else "No"

<br>Apply the function directly

In [None]:
large_repair_check = df['Estimated cost to repair'].apply(is_large_repair)
print(large_repair_check.head())

<br>Another Example: Categorize Age of Roof

In [None]:
def categorize_roof_age(age):
    if age <= 10:
        return "New"
    elif 11 <= age <= 20:
        return "Middle-aged"
    else:
        return "Old"

<br>Apply the function directly

In [None]:
roof_age_categories = df['Age of roof'].apply(categorize_roof_age)
print(roof_age_categories.head())

### **Example: Determine Large Square Footage**
- Write a function to check if the home has large square footage (greater than 3,000 square feet) and print the result.

In [None]:
def is_large_square_footage(square_feet):
    return "Large" if square_feet > 3000 else "Small"

<br>Apply the function directly

In [None]:
square_footage_check = df['Home Square Feet'].apply(is_large_square_footage)
print(len(square_footage_check))
print(square_footage_check.head())

<br>Now apply it to dataframe as a new variable

In [None]:
df['square_footage_type'] = df['Home Square Feet'].apply(is_large_square_footage)
df[['Home Square Feet', 'square_footage_type']][:5]

### `lambda` custom functions vs. formal named functions
Create a binary variable using `lambda` 

In [None]:
df['High_Hail_Flag'] = df['Hail Diameter'].apply(lambda x: 1 if x > 1.0 else 0)
df[['Hail Diameter', 'High_Hail_Flag']].head()

In [None]:
df['Hail Diameter'].value_counts().sort_index()

<br>Create an alternative version of the variable using a named function

In [None]:
def high_hail(hail_diameter):
    return "High Diameter" if hail_diameter>1 else "Low Diameter"

In [None]:
df['High_Hail_Flag_v2'] = df['Hail Diameter'].apply(high_hail)
df[['Hail Diameter', 'High_Hail_Flag', 'High_Hail_Flag_v2']].sample(10)