Write a function in Python

In [7]:
def greet(name):
    return f"Hello {name}"

print(greet("Jaspal"))

Hello Jaspal


We can also add a function with optional arguments like below.

In [24]:
def greet(name, age = None):
    if age:
        return f"Hello my name is {name} and age is {age}"
    return f"Hello {name}"

print(greet("Jaspal"))
print(greet("Jaspal",20))

Hello Jaspal
Hello my name is Jaspal and age is 20


Instantiate a class in python

In [8]:
class Person:
    def __init__(self, name):
        self.name = name
    def greet (self):
        return f"Hello my name is {self.name}"

In [9]:
p = Person("jaspal")

In [10]:
p.greet()

'Hello my name is jaspal'

In [11]:
Person("Tom").greet()

'Hello my name is Tom'

Work on list comprehensions - List comprehensions make python compact and more readable

In [12]:
numbers = [1, 2, 3, 4, 5]

squares = [n * n for n in numbers]

print(squares)

[1, 4, 9, 16, 25]


In [13]:
even_numbers = [n for n in numbers if n % 2 == 0]

print(even_numbers)

[2, 4]


In [14]:
people = [
    Person("Alice"),
    Person("Bob"),
    Person("Charlie")
]

names = [p.name for p in people]

print(names)

['Alice', 'Bob', 'Charlie']


Dictionaries in python - Dictionary is a key-value type data structure in python

In [15]:
user_ages = {
    "Alice": 30,
    "Bob": 25,
    "Charlie": 35
}

In [17]:
print (user_ages["Alice"])

30


In [18]:
for name, age in user_ages.items():
    print(name, age)


Alice 30
Bob 25
Charlie 35


Sets in python - Sets in python like in Java mean data structure that do not contain duplicates

In [19]:
numbers = [1, 2, 2, 3, 4, 4, 5]
unique_numbers = set(numbers)

In [20]:
print (unique_numbers)

{1, 2, 3, 4, 5}


Error handling - Try/catch block - Similar to Java, python also has try catch block for catching checked and unchecked exceptions

In [21]:
def divide(a, b):
    try:
        return a / b
    except ZeroDivisionError:
        return "Cannot divide by zero"


print(divide(10, 2))
print(divide(10, 0))

5.0
Cannot divide by zero


File handling - How to read/write from files in Python

In [23]:
# Write to file
with open("sample.txt", "w") as f:
    f.write("Hello from Python\n")

# Read from file
with open("sample.txt", "r") as f:
    content = f.read()

print(content)

Hello from Python



We can also manage file with JSON type of data

In [25]:
import json
user = {
    "name": "Alice",
    "age": 30,
    "active": True
}

# Write JSON
with open("user.json", "w") as f:
    json.dump(user, f)

# Read JSON
with open("user.json", "r") as f:
    loaded_user = json.load(f)

print(loaded_user)

{'name': 'Alice', 'age': 30, 'active': True}


Imperative vs Pythonic - in the below 2 examples we will see how python can be used to write 1 liners. First let's look at the old school way - imperative programming 

In [27]:
def get_active_users(users):
    active_users = []
    for user in users:
        if user["active"] is True:
            active_users.append(user)
    return active_users

users = [{"name": "J", "active": True},
        {"name": "K", "active":False}]

In [29]:
print(get_active_users(users))

[{'name': 'J', 'active': True}]


Now let's rewrite the above logic in pythonic way

In [33]:
def get_active_users_pythonic(users):
    return [user for user in users if user["active"]]

In [34]:
print (get_active_users_pythonic(users))

[{'name': 'J', 'active': True}]


How to take user inputs in python - In java you would take user inputs to a function by taking in arguments at run time like java Hello.java input1 input2 etc. Now let's do the same in python using the input function provided by Python

In [35]:
x = input ("enter your name")
print (f"name entered is {x}")

enter your name abcd


name entered is abcd


<b>Modules and imports</b> - How to import from another python file. First let's create a new file that gives us a function we need to import - Refer utils.py

In [53]:
from utils import is_adult
ages = [15, 18, 21]

for age in ages:
    print(age, is_adult(age))

15 False
18 True
21 True


We can also direclty import the function by function name

In [60]:
import utils
for age in ages:
    print(age, utils.is_adult(age))

15 False
18 True
21 True


Sometimes if you change utils.py and try to call the new function it fails with an error - function not found. This is because Jupyter is still referring to the cached version of the initial file. To fix this we need to reload utils.py like below.

In [90]:
import importlib
import utils
importlib.reload(utils)

<module 'utils' from '/Users/jaspalsingh/Goal2026/AI-Transition-2025/01-python/utils.py'>

How do you ensure that the new function was loaded? you can list all the functions and the location of the exact file that Python imports. 

In [91]:
print(utils.__file__)

dir(utils)

/Users/jaspalsingh/Goal2026/AI-Transition-2025/01-python/utils.py


['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'age_group',
 'build_age_group_lookup',
 'enrich_people_with_age_group',
 'is_adult',
 'is_adult_here',
 'prepare_people_data',
 'prepare_people_data_validated']

In [61]:
for age in ages:
    print(age, utils.is_adult(age))

15 False
18 True
21 True


In [62]:
adults = [age for age in ages if utils.is_adult(age)]
print(adults)

[18, 21]


Let's say your AI training dataset is a list of people with their age - you think a derived feature called age group will help the ML model learn better - then we can add the age group (senior, teen, etc) to the dataset like this

In [63]:
people = [
    {"name": "Aman", "age": 12},
    {"name": "Riya", "age": 17},
    {"name": "Karan", "age": 25},
    {"name": "Simran", "age": 65},
]

In [65]:
people_with_agegroup = [
    {"name": p["name"],
    "age": p["age"],
    "age_group": utils.age_group(p["age"])} 
    for p in people
]
print (people_with_agegroup)

[{'name': 'Aman', 'age': 12, 'age_group': 'child'}, {'name': 'Riya', 'age': 17, 'age_group': 'teen'}, {'name': 'Karan', 'age': 25, 'age_group': 'adult'}, {'name': 'Simran', 'age': 65, 'age_group': 'senior'}]


Now let's say we want to filter only adults

In [68]:
adults = [p["name"] for p in people_with_agegroup if p["age_group"]=="adult"]
print (adults)

['Karan']


What if we want like a look up table with name and age group only

In [72]:
name_with_age_group = {p["name"]:p["age_group"] for p in people_with_agegroup}
print (name_with_age_group)

{'Aman': 'child', 'Riya': 'teen', 'Karan': 'adult', 'Simran': 'senior'}


In [77]:
enriched_people = utils.enrich_people_with_age_group(people)

In [78]:
adults = [p for p in enriched_people if p["age_group"] == "adult"]
adults

[{'name': 'Karan', 'age': 25, 'age_group': 'adult'}]

In [84]:
assert len(enriched_people) == len(people)
assert len(adults) != len(people)

In [86]:
enriched_people, age_group = utils.prepare_people_data(people)
enriched_people
age_group

{'Aman': 'child', 'Riya': 'teen', 'Karan': 'adult', 'Simran': 'senior'}

In [87]:
assert age_group['Aman']=='child'

What happens when we get invalid data like a name/age that is missing? or an invalid age value - Remember in AI the quality of data that is fed into the model is as important as the model architecture. To handle this we create a function that can handle invalid data - Refer utils.prepare_people_data_validated

In [92]:
invalid_people = [
    {"name": "Aman", "age": 12},
    {"name": "Riya", "age": -1},
    {"name": "Karan"},
    {"name": "Simran", "age": 65},
]

In [99]:
valid, errors = utils.prepare_people_data_validated(invalid_people)

print(valid)
print(errors)

[{'name': 'Aman', 'age': 12, 'age_group': 'child'}, {'name': 'Riya', 'age': -1, 'age_group': 'child'}, {'name': 'Simran', 'age': 65, 'age_group': 'senior'}]
['Index 1: Age is invalid', 'Index 2: missing required fields']


What we did above was cleaning data after we get it, what if we stop the input of bad data in case we have someone sending us bad data? To do this we can implement type hints and contracts for input data validation

In [105]:
from typing import List

def get_unique_adults(ages:List[int]) -> List[int]:
    """
    This function takes a list of ages (integers) and returns all unique age for adults.
    An adult is someone with >= 18 years
    """
    if not isinstance(ages, list):
        raise TypeError ("age must be a list")
    if not all(isinstance(age, int) for age in ages):
        raise TypeError("all ages must be integers")
    if not all(age >= 0 for age in ages):
        raise ValueError("ages cannot be negative")
    adults = {age for age in ages if age >= 18}
    return sorted(adults)

In [111]:
get_unique_adults([12,67])

[67]

Let's write some tests to ensure our function does what it should

In [113]:
!pip install pytest

Collecting pytest
  Downloading pytest-9.0.2-py3-none-any.whl.metadata (7.6 kB)
Collecting iniconfig>=1.0.1 (from pytest)
  Downloading iniconfig-2.3.0-py3-none-any.whl.metadata (2.5 kB)
Collecting pluggy<2,>=1.5 (from pytest)
  Downloading pluggy-1.6.0-py3-none-any.whl.metadata (4.8 kB)
Downloading pytest-9.0.2-py3-none-any.whl (374 kB)
Downloading pluggy-1.6.0-py3-none-any.whl (20 kB)
Downloading iniconfig-2.3.0-py3-none-any.whl (7.5 kB)
Installing collected packages: pluggy, iniconfig, pytest
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3/3[0m [pytest]2m2/3[0m [pytest]
[1A[2KSuccessfully installed iniconfig-2.3.0 pluggy-1.6.0 pytest-9.0.2


In [118]:
import pytest
def test_basic_case():
    assert get_unique_adults([10, 18, 20, 25]) == [18, 20, 25]


def test_duplicates_removed():
    assert get_unique_adults([18, 18, 20, 20]) == [18, 20]


def test_negative_age_raises():
    with pytest.raises(ValueError):
        get_unique_adults([18, -1, 20])


def test_non_int_raises():
    with pytest.raises(TypeError):
        get_unique_adults([18, "20"])

In [119]:
test_basic_case()
test_duplicates_removed()
test_negative_age_raises()
test_non_int_raises()

Data in production can be messy and the below exercise is meant to write a function for data clean - The goal is to find people who are adults

In [120]:
raw_people = [
    {"name": "Aman", "age": 20},
    {"name": " Ravi ", "age": "25"},
    {"name": "neha", "age": 17},
    {"name": "John", "age": None},
    {"name": "", "age": 30},
    {"age": 40},
    {"name": "Sara", "age": 18},
]

We can also define a class using the decorator annotation <b>@dataclass</b>. This annotation helps us avoid boiler plate code like writing __init__ and other class functions. It's similar to Lombok in Java.

In [166]:
from dataclasses import dataclass

@dataclass(frozen=True)
class Person:
    name: str
    age: int

<a id="normalize_people"></a>
Normalize People

In [178]:
def normalize_people(raw: list[dict]) -> tuple[list[Person], list[str]]:
    valid_people: List[person] = []
    errors:List[person] = []
    if not isinstance(raw,list):
        return [],["Input must be a list of dictionaries"]
    for people in raw:
        if not isinstance(people,dict):
            errors.append(f"Invalid record type: {people}")
            continue
        name = people.get("name")
        age = people.get("age")
        if not isinstance(name, str) or not name.strip():
            errors.append(f"Invalid name in record {people}")
            continue
        try:
            age = int(age)
        except (TypeError, ValueError):
            errors.append(f"Invalid age in record {people}")
            continue
        if age < 18:
            errors.append(f"Underage person in record {people}")
            continue
        valid_people.append(
            Person(name=name.strip().title(), age=age)
        )
    return (valid_people, errors)
            
        

In [179]:
normalized_people = normalize_people(raw_people)

In [180]:
print (normalized_people[0], normalized_people[1])

[Person(name='Aman', age=20), Person(name='Ravi', age=25), Person(name='Sara', age=18)] ["Underage person in record {'name': 'neha', 'age': 17}", "Invalid age in record {'name': 'John', 'age': None}", "Invalid name in record {'name': '', 'age': 30}", "Invalid name in record {'age': 40}"]


## Introducting Pydantic

Pydantic is a runtime data validation and parsing engine driven by Python type hints. In raw python we define type hints in functions like List[str] -> List[int] but it's not mandatory for use to follow this and python allows input that doesn't follow this as well. This is where Pydantic makes it mandatory for schema validation.

This helps us avoid writing boiler plate code like check for isinstance() every time. Example below.

First install pydantic

In [181]:
!pip install pydantic



Now we will write the same function normalize_people that we wrote above using pydantic

In [182]:
from pydantic import BaseModel, Field, ValidationError, ConfigDict, field_validator

class Person(BaseModel):
    name: str = Field(..., min_length=1) #this means name is a string and Field(...) makes it mandatory with min lenght as 1
    age: int

    @field_validator("name")
    @classmethod
    def normalize_name(cls, value: str) -> str:
        value = value.strip() # to not allow '' type names
        if not value:
            raise ValueError("name cannot be empty")
        return value.title()

    @field_validator("age")
    @classmethod
    def validate_age(cls, value: int) -> int:
        if value < 18:
            raise ValueError("age must be >= 18")
        return value

    model_config = ConfigDict(frozen=True) #immutable

If we look at the above class definition of Person there are some methods that have not put schema validation constraints on the 2 allowed fields name and age. This similar to Java's jakarta validation annotations like Field(..., min_length=1) is similar to @NotNull @MinLength(1). 
It's interesting to see Python's this similarity to Java

Now with the Pydantic class definition above, let's rewrite the normalize_people() function - Notice how small that function is now compared to [Prev Normalize People](#normalize_people)

In [183]:
from typing import List, Tuple

def normalize_people(raw: List[dict]) -> Tuple[List[Person], List[str]]:
    valid_people: List[Person] = []
    errors: List[str] = []

    for record in raw:
        try:
            person = Person(**record)
            valid_people.append(person)
        except ValidationError as e:
            errors.append(f"{record} → {e.errors()}")

    return valid_people, errors

In [184]:
raw_people = [
    {"name": "Aman", "age": 20},
    {"name": " Ravi ", "age": "25"},
    {"name": "neha", "age": 17},
    {"name": "John", "age": None},
    {"name": "", "age": 30},
    {"age": 40},
    {"name": "Sara", "age": 18},
]

valid, errors = normalize_people(raw_people)

In [185]:
print (valid)
print (errors)

[Person(name='Aman', age=20), Person(name='Ravi', age=25), Person(name='Sara', age=18)]
["{'name': 'neha', 'age': 17} → [{'type': 'value_error', 'loc': ('age',), 'msg': 'Value error, age must be >= 18', 'input': 17, 'ctx': {'error': ValueError('age must be >= 18')}, 'url': 'https://errors.pydantic.dev/2.12/v/value_error'}]", "{'name': 'John', 'age': None} → [{'type': 'int_type', 'loc': ('age',), 'msg': 'Input should be a valid integer', 'input': None, 'url': 'https://errors.pydantic.dev/2.12/v/int_type'}]", "{'name': '', 'age': 30} → [{'type': 'string_too_short', 'loc': ('name',), 'msg': 'String should have at least 1 character', 'input': '', 'ctx': {'min_length': 1}, 'url': 'https://errors.pydantic.dev/2.12/v/string_too_short'}]", "{'age': 40} → [{'type': 'missing', 'loc': ('name',), 'msg': 'Field required', 'input': {'age': 40}, 'url': 'https://errors.pydantic.dev/2.12/v/missing'}]"]
