![alt text](<../images/just enough.png>)

# Just Enough Python for AI/Data Science
## Module 2: Thinking Like a Data Scientist
> This module helps you organize and work with data like a pro. You’ll master lists, tuples, dictionaries, and sets to store and retrieve data efficiently. Then, you’ll learn to write functions so you’re not rewriting the same code over and over—because real data scientists keep it clean and reusable!
### Day 6 - Functions: Don’t Repeat Yourself 
----

##### Overview:

- Functions are the lifeblood of clean, reusable code. 
- Cover how to make functions, pass arguments, return values, and use them to analyze data. 
- Bonus: introduce lambda functions to foreshadow what’s ahead.

#### What Are Functions and Why Do We Love Them?
Do you ever feel like you’re typing the same thing over and over? 
(Tedious, right?) 

Imagine you’re writing dozens of preprocessing steps for multiple datasets—it’s easier to just create **one function** to do the repetitive work. That’s where functions come swooping in to save your sanity.

Functions are **blocks of reusable code** that let you accomplish a task without rewriting it every time. They take input (if needed), do something with it, and often return a result. Think of them as mini-programs inside your program.

**1.  Anatomy of a Python Function**
- Let’s break down how to create your first function. 

In [1]:
def function_name(parameters):
    # Code block
    return value  # (Optional) Give something back


- `def`: The keyword to define a function.
- `function_name`: Be descriptive! Name your function based on what it does.
- `parameters`: Inputs for the function (optional).
- `return`: Outputs from the function, which you can store or print (also optional).

**Example: Your First Function**

In [2]:
def greet(name):
    print(f"Hello, {name}!")


Now, use the function by calling it:

In [3]:
greet("Ada")   # Outputs: Hello, Ada!
greet("Alan")  # Outputs: Hello, Alan!|}



Hello, Ada!
Hello, Alan!


#### 2: Parameters and Arguments

**Parameters**

Functions can have **parameters**, which are placeholders for input values. You define them inside the parentheses.

**Arguments**

When calling the function, you provide **arguments**, which replace the parameters with actual values.

Let’s expand the greeting function to accept more info using parameters:

In [4]:
def greet(name, mood):
    print(f"Hello, {name}! You seem {mood} today.")


In [5]:
greet("Ada", "brilliant")  # Outputs: Hello, Ada! You seem brilliant today.
greet("Alan", "curious")   # Outputs: Hello, Alan! You seem curious today.


Hello, Ada! You seem brilliant today.
Hello, Alan! You seem curious today.


#### 3: Returning Values

Functions don’t always just print stuff—they can also give you results back using `return`. This is essential for Data Science tasks where you might compute something like averages or predictions.

- Example

In [6]:
def add(x, y):
    return x + y


In [7]:
result = add(10, 5)
print(result)  # Outputs: 15

15


Now you can reuse the `add()` function for any numbers—much better than rewriting the same code!

#### 4: Default Parameters
- Sometimes you want to provide a **default** value in case an argument isn’t passed. Python's got you covered:
- Example

In [8]:
def greet(name, mood="great"):
    print(f"Hello, {name}! You seem {mood} today.")


In [9]:
greet("Ada")        # Outputs: Hello, Ada! You seem great today.
greet("Alan", "excited")  # Outputs: Hello, Alan! You seem excited today.


Hello, Ada! You seem great today.
Hello, Alan! You seem excited today.


Default parameters make functions flexible and user-friendly.

#### 5: Variable Scope (Where Do Variables Live?)

- Variables inside a function are **local**, meaning they can’t be accessed outside. But variables defined globally can be accessed within a function.

In [10]:
global_number = 5

def multiply_numbers(x):
    y = 12
    return x * y * global_number

result = multiply_numbers(3)
print(result)  # Outputs 180

# Attempting to access a local variable outside the function 
# print(y)


180


In [11]:
# Modifying a global variable from within a function
counter = 0

def increment_counter():
    global counter
    counter += 1
    print(counter)  # Outputs 4

increment_counter()
print(counter)  # Outputs 1


1
1


- Access: You can read global variables inside a function without declaring them as global.
- Modification: To modify a global variable inside a function, you must explicitly declare it as global using the global keyword. Otherwise, Python will attempt to treat any assignment to that variable as the creation of a new local variable, leading to potential errors or unexpected behavior


#### 6: Anonymous Functions (a.k.a. Lambdas)
- **Lambdas** are one-liner functions that are quick and disposable. You can use them when you don’t need a full-blown function definition.
- Example: Lambda for Squaring a Number

In [12]:
square = lambda x: x**2
print(square(5))  # Outputs: 25


25


In [13]:
numbers = [1, 2, 3, 4]
squared = map(lambda x: x**2, numbers)
print(list(squared))  # Outputs: [1, 4, 9, 16]
print(squared)  # Outputs: <map object at 0x...>


[1, 4, 9, 16]
<map object at 0x00000227043EC5B0>


![alt text](../images/fig1_lambda-expression.jpg)

Lambdas are perfect for short, one-off operations but stick to `def` for more complex tasks

#### 🚀 Quick Summary:

| **Lambda Syntax**               | **Equivalent `def` Function**            |
|----------------------------------|-----------------------------------------|
| `lambda x: x + 10`               | `def add_ten(x): return x + 10`        |
| `lambda a, b: a * b`             | `def multiply(a, b): return a * b`     |
| `lambda x: x % 2 == 0`           | `def is_even(x): return x % 2 == 0`    |

- Lambda functions are powerful for concise operations, perfect for short, one-off operations but stick to `def` for more complex tasks. Readability should always come first!** 🚀


#### 7: Best Practices for Functions
- **Keep Functions Short and Sweet:** Each function should do one thing (avoid multitasking!).
- **Name Functions Clearly:** Use names that explain what the function does—your future self will thank you.
    - `def calculate_mean()`✅ 
    - `def do_stuff()`❌ 
- **Comment Your Code:** Add comments to explain what the function does or any tricky parts.
- **Test Before Using:** Test your function with different inputs to make sure it behaves as expected.

#### 8. Why Functions Matter
- They reduce code repetition. Instead of copy-pasting the same snippet 10 times, define it once and call it 10 times.
- They make your code easier to read, understand, and debug.
- They pave the way for modular code—especially useful in bigger Data Science or AI projects where you might have data cleaning steps, feature engineering, model training, etc.

#### Quick Exercises 
1. Create a greeting function that takes a name and age and says:

    - `"Hello, [name]! You are [age] years old."`

2. Write a function to calculate the area of a rectangle with default parameters length=10 and width=5.

3. Create a function is_even() that takes a number and returns True if it’s even, and False otherwise.

4. Write a function process_list() that takes a list of numbers and:
    - Removes duplicates,
    - Sorts the list in ascending order, and
    - Returns the cleaned list.
    - Use a lambda function inside Python’s filter() to extract all even numbers from the list:
        - `numbers = [3, 6, 8, 7, 5, 9, 2]`

**Please Note:** The solutions to above questions will be present at the end of next session's (Day 7: Meet Numpy) Notebook.

---

### Day 5 Exercise Solution

1. Create a dictionary to store some metadata about a dataset:
    - Total rows, total columns, and the type of analysis done (e.g., regression or classification).
    - Add a new key-value pair to store the dataset source (e.g., 'CSV file', 'database', or a URL).
    - Update the analysis type to "clustering."


In [14]:
# Step 1: Create a dictionary with a sample dataset metadata
dataset_metadata = {
    "total_rows": 10000,
    "total_columns": 12,
    "analysis_type": "regression"
}

print(dataset_metadata)

{'total_rows': 10000, 'total_columns': 12, 'analysis_type': 'regression'}


In [15]:
# Step 2: Add a new key for dataset source
dataset_metadata["source"] = "CSV file"
print(dataset_metadata)

{'total_rows': 10000, 'total_columns': 12, 'analysis_type': 'regression', 'source': 'CSV file'}


In [16]:
# Step 3: Update the analysis type to 'clustering'
dataset_metadata["analysis_type"] = "clustering"

print(dataset_metadata)

{'total_rows': 10000, 'total_columns': 12, 'analysis_type': 'clustering', 'source': 'CSV file'}


2. Use a set to remove duplicates from the following list:
    - `sample_data = [1, 2, 3, 1, 4, 2, 5, 3]`


In [17]:

# Given list with duplicates
sample_data = [1, 2, 3, 1, 4, 2, 5, 3]

# Convert list to set to remove duplicates
unique_data = set(sample_data)

print(sample_data)  # Original list with duplicates
print(unique_data)  # Set with unique values



[1, 2, 3, 1, 4, 2, 5, 3]
{1, 2, 3, 4, 5}


3. Write a dictionary that maps some feature names to their descriptions (e.g., "Age" → "Customer Age in Years"). Loop through and print all the features with their descriptions.

In [18]:
# Step 1: Create a dictionary mapping feature names to their descriptions
feature_descriptions = {
    "Age": "Customer Age in Years",
    "Income": "Monthly Income in USD",
    "Education": "Highest Degree Achieved",
    "Gender": "Customer's Gender",
    "City": "City of Residence"
}

print(feature_descriptions)

{'Age': 'Customer Age in Years', 'Income': 'Monthly Income in USD', 'Education': 'Highest Degree Achieved', 'Gender': "Customer's Gender", 'City': 'City of Residence'}


In [19]:
# Step 2: Loop through the dictionary and print each feature with its description
for feature, description in feature_descriptions.items():
    print(f"{feature}: {description}")

Age: Customer Age in Years
Income: Monthly Income in USD
Education: Highest Degree Achieved
Gender: Customer's Gender
City: City of Residence


4. Create two sets of numbers (set_a = {10, 20, 30}, set_b = {20, 30, 40}), and find:

    - Their union
    - Their intersection
    - The numbers in set_a but not set_b

In [20]:
# Define the two sets
set_a = {10, 20, 30}
set_b = {20, 30, 40}

# Union: All unique elements from both sets
union_set = set_a | set_b  # or set_a.union(set_b)


# Intersection: Common elements in both sets
intersection_set = set_a & set_b  # or set_a.intersection(set_b)


# Difference: Elements in set_a but not in set_b
difference_set = set_a - set_b  # or set_a.difference(set_b)



# Print results
print("Union:", union_set)
print("Intersection:", intersection_set)
print("Difference (set_a - set_b):", difference_set)

Union: {20, 40, 10, 30}
Intersection: {20, 30}
Difference (set_a - set_b): {10}


# HAPPY LEARNING