# Lab (Day 1) — Data Types with a CURP-like Identifier

## Goal
You will build a **CURP-like identifier** using Python **data types** and **string operations**.

This is a **training** version of the logic (simplified). We focus on:
- `str`, `int`, `list`, `dict`, `bool`, `None`
- indexing and slicing strings
- composing outputs with f-strings
- using `assert` as a first step toward unit testing

---

## Business Inputs (per person)
Each person is represented as a `dict` with required fields:
- `first_name` (str)
- `paternal_last` (str)
- `maternal_last` (str)
- `dob` (str in format `"YYYY-MM-DD"`)
- `sex` (str: `"H"` or `"M"`)
- `state` (str: 2-letter code like `"DF"`, `"BC"`)

---

## Simplified CURP-like Rules (for this lab)
We will produce an 18-character string:

1. **Block 1 (4 chars)**  
   - 1st letter of paternal last name  
   - 1st internal vowel of paternal last name (skip position 0)  
   - 1st letter of maternal last name  
   - 1st letter of first name

2. **Date (6 chars)**: `YYMMDD` extracted from `"YYYY-MM-DD"`

3. **Block 2 (3 chars)**: `sex` + first two letters of `state`

4. **Block 3 (3 chars)**  
   - 1st internal consonant of paternal last name  
   - 1st internal consonant of maternal last name  
   - 1st internal consonant of first name

5. **Suffix (2 chars)**: `"00"` (placeholder for later labs)

Total: 4 + 6 + 3 + 3 + 2 = 18 chars

---

## How to work
- Read each exercise cell carefully.
- Complete the `TODO` sections.
- Run the cell to validate your solution with `assert`.
- If an `assert` fails, read the error and fix the logic.


## Quick Warm-up — Data Types

Run the next cell and answer:
1) What is the type of `people`?  
2) What is the type of `people[0]`?  
3) What is the type of `people[0]["dob"]`?


---

### Indexing

In [1]:
names = ["Concepcion", "Juan"]
# Indexes [0, 1]

In [4]:
names[1]

'Juan'

In [7]:
names_dict = {"P1": "Concepcion", 
              "P2": "Juan"}

In [9]:
names_dict["P1"]

'Concepcion'

### Slicing

In [10]:
names = ["Concepcion", "Juan", "Carlos", "Laura"]

In [11]:
# iterable[start:end:steps]
names[1:]

['Juan', 'Carlos', 'Laura']

In [13]:
names[1:2 + 1]

['Juan', 'Carlos']

---

In [14]:
# Dataset for today (list of dicts)

people = [
    {
        "first_name": "Concepción",
        "paternal_last": "Salgado",
        "maternal_last": "Briseño",
        "dob": "1956-06-26",
        "sex": "M",
        "state": "DF",
    },
    {
        "first_name": "Juan Carlos",
        "paternal_last": "Hernández",
        "maternal_last": "López",
        "dob": "1998-11-03",
        "sex": "H",
        "state": "BC",
    },
    {
        "first_name": "María José",
        "paternal_last": "Muñoz",
        "maternal_last": "De la O",
        "dob": "2004-01-09",
        "sex": "M",
        "state": "BCS",  # For today we will use only the first 2 letters -> "BS"
    },
    {
        "first_name": "Luis",
        "paternal_last": "Ochoa",
        "maternal_last": "Ávila",
        "dob": "1985-12-30",
        "sex": "H",
        "state": "NL",
    },
]

p0 = people[0]
# Complete the missing 
print("people:", type(people))
print("people[0]:", type(people[0]))
print("people[0]['dob']:", type(people[0]['dob']))
print("Keys in people[0]:", list(people[0].keys()))


people: <class 'list'>
people[0]: <class 'dict'>
people[0]['dob']: <class 'str'>
Keys in people[0]: ['first_name', 'paternal_last', 'maternal_last', 'dob', 'sex', 'state']


In [17]:
list(people[0].keys())

['first_name', 'paternal_last', 'maternal_last', 'dob', 'sex', 'state']

In [15]:
len(people)

4

## Helper — Normalize Text (Provided)

Real-world names can contain:
- extra spaces
- uppercase/lowercase differences
- accents / diacritics (Á É Í Ó Ú Ñ)

To make our logic consistent, we normalize to:
- uppercase
- no accents
- single spaces

You do **not** need to modify this function today.


In [18]:
import unicodedata

VOWELS = set("AEIOU")

def normalize_text(s: str) -> str:
    """Uppercase, trim, remove diacritics, collapse spaces."""
    s = s.strip().upper() # What is happening here?
    s = unicodedata.normalize("NFD", s) # What does NFD means? 
    s = "".join(ch for ch in s if unicodedata.category(ch) != "Mn")  # remove diacritics
    s = " ".join(s.split())  # collapse multiple spaces
    return s

# Quick sanity checks (do not edit)
assert normalize_text("  María José  ") == "MARIA JOSE"
assert normalize_text("Briseño") == "BRISENO"
print("normalize_text OK")


normalize_text OK


## Exercise 0 — Dictionary Access + Types (5 min)

1) Print the first person's normalized fields.  
2) Use `assert` to confirm they are strings.

Expected:
- `first_name` becomes `"CONCEPCION"`
- `paternal_last` becomes `"SALGADO"`
- `maternal_last` becomes `"BRISENO"`


In [None]:
# TODO: Normalize these three fields from p0 and print them
first_name = normalize_text()
paternal = normalize_text()
maternal = normalize_text()

print(, , )

# Asserts (unit-test style)
assert isinstance(first_name, str)
assert isinstance(paternal, str)
assert isinstance(maternal, str)

assert first_name == "CONCEPCION"
assert paternal == "SALGADO"
assert maternal == "BRISENO"

print("Exercise 0 OK")


## Exercise 1 — First Internal Vowel (10–15 min)

Implement `first_internal_vowel(s)`:

- Input: a **normalized** string (uppercase, no accents)
- Return: the first vowel found in `s[1:]` (skip position 0)
- If no vowel is found, return `"X"`

Example:
- `"SALGADO"` → `"A"` (because we search `"ALGADO"`)


In [None]:
def first_internal_vowel(s: str) -> str:
    # TODO:
    # - loop through s[1:]
    # - if the character is in VOWELS, return it
    # - if none found, return "X"
    pass

# Tests
assert first_internal_vowel("SALGADO") == "A"
assert first_internal_vowel("BRISENO") == "I"
assert first_internal_vowel("BCDFG") == "X"  # no vowels

print("Exercise 1 OK")


## Exercise 2 — Date Slicing: `"YYYY-MM-DD"` → `"YYMMDD"` (10 min)

Implement `extract_yymmdd(dob)` using **string slicing** (no `datetime` today).

Example:
- `"1956-06-26"` → `"560626"`


In [None]:
def extract_yymmdd(dob: str) -> str:
    # TODO:
    # dob example: "1956-06-26"
    # Use slicing to get YY, MM, DD and return YYMMDD
    pass

# Tests
assert extract_yymmdd("1956-06-26") == "560626"
assert extract_yymmdd("1998-11-03") == "981103"
assert extract_yymmdd("2004-01-09") == "040109"

print("Exercise 2 OK")


## Exercise 3 — First Internal Consonant (10–15 min)

Implement `first_internal_consonant(s)`:

- Search in `s[1:]`
- Return the first character that:
  - is a letter (`ch.isalpha()`)
  - is **not** a vowel (`ch not in VOWELS`)
- If none found, return `"X"`

Examples:
- `"SALGADO"` → `"L"` (search `"ALGADO"`, first consonant is `L`)
- `"CONCEPCION"` → `"N"` (search `"ONCEPCION"`, skip `O` (vowel), then `N`)


In [None]:
def first_internal_consonant(s: str) -> str:
    # TODO:
    # - loop through s[1:]
    # - return first letter that is not a vowel
    # - if none, return "X"
    pass

# Tests
assert first_internal_consonant("SALGADO") == "L"
assert first_internal_consonant("BRISENO") == "R"
assert first_internal_consonant("CONCEPCION") == "N"
assert first_internal_consonant("AEIOU") == "X"  # all vowels

print("Exercise 3 OK")


## Exercise 4 — Assemble the CURP-like Key for `people[0]` (15–20 min)

Implement `build_curp_like(person)` using your functions.

Steps:
1) Normalize `first_name`, `paternal_last`, `maternal_last`
2) Build `block1`:
   - paternal[0] + first_internal_vowel(paternal) + maternal[0] + first_name[0]
3) Date block: `extract_yymmdd(dob)`
4) Block2: `sex + state[:2]` (uppercase)
5) Block3: internal consonants (paternal + maternal + first_name)
6) Add suffix `"00"`

Expected for Concepción Salgado Briseño (1956-06-26, M, DF):
`SABC560626MDFLRN00`


In [None]:
def build_curp_like(person: dict) -> str:
    # TODO:
    # - normalize fields
    # - build blocks as described
    # - return the final 18-character string
    pass

key0 = build_curp_like(people[0])
print("people[0] key:", key0)

assert key0 == "SABC560626MDFLRN00"
assert len(key0) == 18

print("Exercise 4 OK")


## Exercise 5 — Generate Keys for All People + Detect Duplicates (15–20 min)

1) Generate a list `keys` with a key for each person in `people`.
2) Create a dictionary `index` mapping `key -> person`.
3) If a key is duplicated, print a warning message.

Hints:
- Use a `for` loop
- Use `if key in index:` to detect duplicates

At the end:
- print the list of keys
- assert the list length matches the number of people
- assert all keys have length 18


In [None]:
# TODO: Build keys for all people
keys = []
index = {}

for person in people:
    key = build_curp_like(person)

    # Duplicate detection
    if key in index:
        print(f"WARNING: Duplicate key generated: {key}")

    index[key] = person
    keys.append(key)

print("All keys:")
for k in keys:
    print(" -", k)

# Asserts
assert len(keys) == len(people)
assert all(isinstance(k, str) for k in keys)
assert all(len(k) == 18 for k in keys)

print("Exercise 5 OK")


## Wrap-up

If you finished early, try these extensions (optional):
1) Print a small report: `first_name -> key`
2) Add a check that `sex` is only `"H"` or `"M"` and raise an error otherwise.
3) Add a check that `dob` has length 10 and has `'-'` in positions 4 and 7.

Tomorrow we will refactor this into cleaner functions and introduce more structured flow control.
