# Python Basics for ML (Data Types → Data Structures)

This notebook is a **hands-on refresher** of Python fundamentals you’ll use constantly in ML workflows.

## How to use this notebook
- Run cells top to bottom.
- Each code cell contains **numbered examples** (Example 1, Example 2, …).
- Feel free to modify values and re-run.

## What you’ll learn
- Core **data types**: `int`, `float`, `bool`, `str`, `NoneType`
- Type conversion and common pitfalls
- **Strings** (indexing, slicing, methods, formatting)
- **Data structures**: `list`, `tuple`, `set`, `dict`
- Iteration patterns (`for`, `while`, `range`, `enumerate`, `zip`) that show up in data work


In [1]:
# Quick environment check (optional)
import sys

print("Python version:", sys.version.split()[0])


Python version: 3.11.4


## 1) Variables and core data types

### Key ideas
- A **variable** is just a name pointing to an object.
- Python is **dynamically typed**: the type belongs to the object, not the variable name.
- You’ll constantly inspect types in data work to avoid subtle bugs (e.g., strings that look like numbers).

### Core types you must know
- `int`: integers of arbitrary precision
- `float`: double-precision floating point numbers
- `bool`: `True` / `False` (a subclass of `int`)
- `str`: Unicode text
- `None`: special singleton value meaning “no value / missing / not set”

### Type conversion
- Use `int(...)`, `float(...)`, `str(...)`, `bool(...)` to convert.
- Conversion can **fail** (e.g., `int("3.2")`), so you should be ready to handle exceptions.


In [2]:
# Examples 1–7: variables + core types

# Example 1: variable assignment + type inspection
x = 42
print("Example 1:", x, type(x))

# Example 2: ints are arbitrary precision
big = 10**50
print("Example 2:", big)

# Example 3: float basics
pi = 3.14159
print("Example 3:", pi, type(pi))

# Example 4: bool is a subtype of int
flag = True
print("Example 4:", flag, int(flag), isinstance(flag, int))

# Example 5: None is a real value
missing = None
print("Example 5:", missing, type(missing))

# Example 6: type conversion
s = "123"
print("Example 6:", int(s) + 7, float(s) / 2)

# Example 7: conversions can fail; handle with try/except
try:
    int("3.5")
except ValueError as e:
    print("Example 7: ValueError ->", e)


Example 1: 42 <class 'int'>
Example 2: 100000000000000000000000000000000000000000000000000
Example 3: 3.14159 <class 'float'>
Example 4: True 1 True
Example 5: None <class 'NoneType'>
Example 6: 130 61.5
Example 7: ValueError -> invalid literal for int() with base 10: '3.5'


## 2) Operators and comparisons

### Arithmetic operators
- `+`, `-`, `*`, `/` (true division)
- `//` (floor division), `%` (remainder/modulo), `**` (power)

### Comparison operators
- `==` equality vs `is` identity (same object)
- `<`, `<=`, `>`, `>=`

### Why this matters in ML
- You’ll compare numeric values, filter rows, and build boolean masks.
- Understanding integer vs float division helps avoid scaling/normalization mistakes.


In [3]:
# Examples 8–12: operators + comparisons

# Example 8: / vs //
print("Example 8:", 7 / 2, 7 // 2)

# Example 9: modulo is useful for periodic patterns
print("Example 9:", [i % 3 for i in range(10)])

# Example 10: power
print("Example 10:", 2**10)

# Example 11: chained comparisons
x = 7
print("Example 11:", 0 < x < 10)

# Example 12: == vs is
a = [1, 2]
b = [1, 2]
print("Example 12:", a == b, a is b)


Example 8: 3.5 3
Example 9: [0, 1, 2, 0, 1, 2, 0, 1, 2, 0]
Example 10: 1024
Example 11: True
Example 12: True False


## 3) Strings (`str`)

### Key ideas
- Strings are **immutable** (you can’t change them in place).
- You can **index** and **slice** them like sequences.
- Built-in methods (`lower`, `strip`, `replace`, `split`, `join`, …) are essential for text preprocessing.

### Common ML/data tasks involving strings
- Cleaning: trimming whitespace, normalizing case
- Parsing: splitting CSV-ish text, extracting tokens
- Formatting: building readable logs or file names


In [4]:
# Examples 13–20: strings

text = "  Machine Learning  "

# Example 13: stripping whitespace
print("Example 13:", text.strip())

# Example 14: case normalization
print("Example 14:", text.strip().lower())

# Example 15: indexing
s = "python"
print("Example 15:", s[0], s[-1])

# Example 16: slicing
print("Example 16:", s[1:4], s[::-1])

# Example 17: replace
print("Example 17:", "a,b,c".replace(",", " | "))

# Example 18: split + join
tokens = "red green blue".split()
print("Example 18:", tokens, "-".join(tokens))

# Example 19: f-strings (preferred formatting)
name, score = "Alice", 0.92341
print("Example 19:", f"{name} scored {score:.2%}")

# Example 20: immutability (creates a new string)
original = "data"
modified = original + " science"
print("Example 20:", original, "->", modified)


Example 13: Machine Learning
Example 14: machine learning
Example 15: p n
Example 16: yth nohtyp
Example 17: a | b | c
Example 18: ['red', 'green', 'blue'] red-green-blue
Example 19: Alice scored 92.34%
Example 20: data -> data science


## 4) Lists (`list`)

### Key ideas
- Lists are **ordered** and **mutable**.
- They’re a go-to container for batches of values (features, tokens, rows).
- You’ll use list comprehensions heavily for quick transformations.

### Important methods
- Add: `append`, `extend`, `insert`
- Remove: `remove`, `pop`, `clear`
- Inspect: `index`, `count`
- Reorder: `sort`, `reverse`

### Copying warning
- `b = a` does **not** copy a list; it creates a second reference to the same list.
- Use `a.copy()` or `list(a)` (deep copying ).


In [5]:
# Examples 21–29: lists

# Example 21: create + index
nums = [10, 20, 30, 40]
print("Example 21:", nums[0], nums[-1])

# Example 22: slicing
print("Example 22:", nums[1:3], nums[:2], nums[::2])

# Example 23: append vs extend
xs = [1, 2]
xs.append(3)
xs.extend([4, 5])
print("Example 23:", xs)

# Example 24: insert
xs.insert(1, 99)
print("Example 24:", xs)

# Example 25: pop returns the removed element
last = xs.pop()
print("Example 25:", last, xs)

# Example 26: remove deletes first matching value
xs.remove(99)
print("Example 26:", xs)

# Example 27: sorted() vs list.sort()
ys = [3, 1, 2]
print("Example 27:", sorted(ys), ys)  # sorted returns new list
ys.sort()
print("Example 27 (cont):", ys)       # sort mutates in place

# Example 28: list comprehension (map + filter in one)
squares_of_even = [n*n for n in range(10) if n % 2 == 0]
print("Example 28:", squares_of_even)

# Example 29: aliasing vs copying
a = [1, 2, 3]
b = a          # alias
c = a.copy()   # shallow copy
b.append(4)
print("Example 29:", "a=", a, "b=", b, "c=", c)


Example 21: 10 40
Example 22: [20, 30] [10, 20] [10, 30]
Example 23: [1, 2, 3, 4, 5]
Example 24: [1, 99, 2, 3, 4, 5]
Example 25: 5 [1, 99, 2, 3, 4]
Example 26: [1, 2, 3, 4]
Example 27: [1, 2, 3] [3, 1, 2]
Example 27 (cont): [1, 2, 3]
Example 28: [0, 4, 16, 36, 64]
Example 29: a= [1, 2, 3, 4] b= [1, 2, 3, 4] c= [1, 2, 3]


## 5) Tuples (`tuple`)

### Key ideas
- Tuples are **ordered** and **immutable**.
- They’re great for fixed-size records (e.g., `(x, y)` coordinates) or returning multiple values.
- Tuple unpacking is a very common Python pattern.


In [6]:
# Examples 30–33: tuples

# Example 30: tuple creation
point = (3, 7)
print("Example 30:", point, type(point))

# Example 31: single-element tuple needs a trailing comma
single = (42,)
print("Example 31:", single, type(single))

# Example 32: unpacking
x, y = point
print("Example 32:", x, y)

# Example 33: swap values (idiomatic)
a, b = 1, 2
a, b = b, a
print("Example 33:", a, b)


Example 30: (3, 7) <class 'tuple'>
Example 31: (42,) <class 'tuple'>
Example 32: 3 7
Example 33: 2 1


## 6) Sets (`set`)

### Key ideas
- Sets are **unordered** collections of **unique** elements.
- Use them for fast membership checks and deduplication.

### Set operations you should know
- Union: `|`
- Intersection: `&`
- Difference: `-`
- Symmetric difference: `^`

These operations show up often in feature engineering and cleaning categorical values.


In [None]:
# Examples 34–38: sets

# Example 34: deduplication
values = [1, 2, 2, 3, 3, 3]
unique = set(values)
print("Example 34:", unique)

# Example 35: membership is fast and readable
print("Example 35:", 2 in unique, 99 in unique)

# Example 36: add / discard
unique.add(10)
unique.discard(3)  # discard does NOT error if missing
print("Example 36:", unique)

# Example 37: remove errors if missing
try:
    unique.remove(999)
except KeyError as e:
    print("Example 37: KeyError ->", e)

# Example 38: set operations
A = {1, 2, 3}
B = {3, 4, 5}
print("Example 38 union:", A | B)
print("Example 38 intersect:", A & B)
print("Example 38 diff:", A - B)
print("Example 38 symdiff:", A ^ B)


## 7) Dictionaries (`dict`)

### Key ideas
- Dicts map **keys → values**.
- Keys must be **hashable** (e.g., `str`, `int`, `tuple` of immutables).
- Dicts are essential for:
  - Counting (frequency tables)
  - Label mappings (`class_name -> id`)
  - Structured records (JSON-like data)

### Common patterns
- Safe access with `get`
- Iteration using `.items()`
- Building dicts with comprehensions


In [12]:
# Examples 39–44: dictionaries

# Example 39: create + access
person = {"name": "Ada", "age": 36}
print("Example 39:", person["name"], person["age"])

# Example 40: get with default
print("Example 40:", person.get("city", "(unknown)"))

# Example 41: update / merge
person.update({"city": "London"})
print("Example 41:", person)

# Example 42: iterate keys/values/items
for k, v in person.items():
    print("Example 42:", k, "->", v)

# Example 43: counting (frequency dict)
words = ["ml", "ai", "ml", "data", "ai", "ml"]
counts = {}
for w in words:
    counts[w] = counts.get(w, 0) + 1
print("Example 43:", counts)

# Example 44: dict comprehension
squares = {n: n*n for n in range(6)}
print("Example 44:", squares)


Example 39: Ada 36
Example 40: (unknown)
Example 41: {'name': 'Ada', 'age': 36, 'city': 'London'}
Example 42: name -> Ada
Example 42: age -> 36
Example 42: city -> London
Example 43: {'ml': 3, 'ai': 2, 'data': 1}
Example 44: {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25}


## 8) Control flow and iteration

### `if` / `elif` / `else`
- Lets you branch logic based on conditions.

### `for` loops
- Iterate over any iterable: lists, strings, dicts, ranges, generators.

### `range`, `enumerate`, `zip`
- `range(n)`: sequence of integers (lazy, memory-efficient)
- `enumerate(iterable)`: get `(index, value)` pairs
- `zip(a, b)`: pair elements together (useful for feature/label pairing)

### `while` loops
- Repeat until a condition changes; be careful to avoid infinite loops.


In [13]:
# Examples 45–50: control flow + iteration

# Example 45: if/elif/else
score = 83
if score >= 90:
    grade = "A"
elif score >= 75:
    grade = "B"
else:
    grade = "C"
print("Example 45:", grade)

# Example 46: for + range
s = 0
for i in range(5):
    s += i
print("Example 46:", s)

# Example 47: enumerate
colors = ["red", "green", "blue"]
for idx, color in enumerate(colors, start=1):
    print("Example 47:", idx, color)

# Example 48: zip
xs = [1, 2, 3]
ys = [10, 20, 30]
for a, b in zip(xs, ys):
    print("Example 48:", a, b, "sum=", a + b)

# Example 49: while loop
n = 3
while n > 0:
    print("Example 49:", n)
    n -= 1

# Example 50: break / continue
for i in range(6):
    if i == 2:
        continue
    if i == 5:
        break
    print("Example 50:", i)


Example 45: B
Example 46: 10
Example 47: 1 red
Example 47: 2 green
Example 47: 3 blue
Example 48: 1 10 sum= 11
Example 48: 2 20 sum= 22
Example 48: 3 30 sum= 33
Example 49: 3
Example 49: 2
Example 49: 1
Example 50: 0
Example 50: 1
Example 50: 3
Example 50: 4
