# Python Basics

## Data Types

Basic - int, float, str, bool

In [None]:
x = 10 # int
pi = 3.14 # float
name = "Alice" # string
is_valid = True # boolean
print(x, type(x)) # Output: 10
print(pi, type(pi)) # Output: 3.14
print(name, type(name)) # Output: Alice
print(is_valid, type(is_valid)) # Output: True

Advanced: list, dict, tuple, set

In [None]:
# List: An ordered, mutable collection of elements. Allows duplicates.
fruits = ["apple", "banana", "cherry"]

print(fruits)

# Modify
fruits.append("orange")
print(fruits)

# Loop
for fruit in fruits:
    print(fruit)

# 2D list
matrix = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]
print(matrix[0])        # First row → [1, 2, 3]
print(matrix[0][1])     # Row 0, Column 1 → 2
print(matrix[2][2])     # Row 2, Column 2 → 9

print(matrix[0:2])    # First two rows

In [None]:
# Dictionary: A collection of key–value pairs used for fast lookup by key. Keys must be unique.
student = {
    "name": "Alice",
    "age": 25,
    "major": "Data Science"
}

print(student)
print(student["name"])

# Add new key
student["GPA"] = 3.8
print(student)

In [None]:
# Set: An unordered collection of unique elements. 
numbers = {1, 2, 3, 3, 2}
print(numbers)   # Duplicates removed

# Add element
numbers.add(5)
print(numbers)

In [None]:
# Tuple
coordinates = (10, 20)

print(coordinates)
print(type(coordinates))

## Mutability

- **Immutable**: `int`, `float`, `bool`, `str`, `tuple`, `frozenset`
- **Mutable**: `list`, `dict`, `set`, most user-defined objects

Immutable objects can't be changed "in place"; operations create new objects.

In [None]:
s = "hi"
t = s
s += "!"  # creates a new string object

print("s:", s)
print("t:", t)
print("s is t:", s is t)  # usually False after +=

## mutable objects (references)

In Python, for mutable objects, assignment does **not** copy the object—it points another name to the same object.

Let's see how this creates surprising behavior with lists.

In [None]:
a = [1, 2, 3]
b = a          # b points to the same list object
b.append(4)

print("a:", a)
print("b:", b)
print("a is b:", a is b)

### Copying: shallow vs deep
- `list(a)` or `a.copy()` makes a *shallow* copy (copies the container, not nested objects).
- For nested structures, consider `copy.deepcopy`.

In [None]:
import copy

x = [[1, 2], [3, 4]]
y = x.copy()           # shallow copy
z = copy.deepcopy(x)   # deep copy

y[0].append(99)        # modifies the nested list shared with x
z[1].append(88)        # modifies only z

print("x:", x)
print("y:", y)
print("z:", z)

## Selection and Loop

Simple `if`

In [None]:
age = 20

if age >= 18:
    print("You are an adult.")

`if-elif-else`

In [None]:
score = 85

if score >= 90:
    print("Grade: A")
elif score >= 80:
    print("Grade: B")
elif score >= 70:
    print("Grade: C")
else:
    print("Grade: Below C")

logical operator: `and`, `or`, `not`

In [None]:
x = 10

if x > 5 and x < 20:
    print("x is between 5 and 20")

`for` loop, used to iterate over a sequence (like a list or string) or other iterable objects

In [None]:
numbers = [1, 2, 3, 4]

for num in numbers:
    print(num)

In [None]:
for i in range(5):
    print(i)

Example: Count Positive Numbers

In [None]:
nums = [-3, 5, -1, 8, 2]

count = 0

for n in nums:
    if n > 0:
        count += 1

print("Positive count:", count)

## Comprehensions
- more concise (less code)
- more readable (e.g. Give me n squared for each n in range(10))
- usually fast


In [None]:
# Traditional loop
numbers = [1, 2, 3, 4]
squares = []

for n in numbers:
    squares.append(n**2)

print(squares)

In [None]:
# list comprehension
numbers = [1, 2, 3, 4]
squares = [n**2 for n in numbers]

print(squares)

In [None]:
# with condition (filtering)
numbers = [1, 2, 3, 4, 5, 6]
evens = [n for n in numbers if n % 2 == 0]

In [None]:
# remove negatives
data = [10, -5, 7, -3]
cleaned = [x for x in data if x > 0]
print(cleaned)

### Mini exercise
1) Create a list of strings `"item-0" ... "item-9"` using a comprehension.  
2) Create a generator that yields numbers divisible by 3 from 0..99 and sum them.

In [None]:
# Exercise 1
items = [f"item-{i}" for i in range(10)]
print(items)

# Exercise 2
g = (n for n in range(100) if n % 3 == 0)
print(sum(g))

## Generators vs. Comprehensions

- List comprehension:
    - Computes everything immediately
    - Stores all results in memory
    - Returns a list
- Generator expression:
    - Does NOT compute everything at once
    - Produces values one at a time
    - Returns a generator object

In [None]:
nums = range(10)

squares_list = [n * n for n in nums if n % 2 == 0]
squares_gen  = (n * n for n in nums if n % 2 == 0)

print("list:", squares_list)
print("gen:", squares_gen)  # generator object
print("consume gen:", list(squares_gen))

## Function

In [None]:
def multiply(x, y):
    return x * y

print(multiply(3, 4))

In [None]:
def calculate(a, b):
    return a + b, a * b

sum_result, product_result = calculate(5, 10)
print("Sum:", sum_result)  
print("Product:", product_result)

A lambda is a small anonymous function (one expression).

In [None]:
add = lambda a, b: a + b
print(add(2, 3))

Example: with `sorted`

In [None]:
students = [("Ann", 90), ("Ben", 75), ("Cara", 88)]
students_sorted = sorted(students, key=lambda x: x[1])
print(students_sorted)  # sort by score

In [None]:
# otherwise, we need to define a separate function
def get_num(arr):
    return arr[1]
students = [("Ann", 90), ("Ben", 75), ("Cara", 88)]
students_sorted = sorted(students, key=get_num)
print(students_sorted)  # sort by score

Mini exercise: handling missing values

In [None]:
def fill_missing(values, fill_value=0):
    return [fill_value if v is None else v for v in values]

data = [10, None, 5, None, 7]
print(fill_missing(data, fill_value=0))  # [10, 0, 5, 0, 7]

Mini exercise: accuracy score

In [None]:
def accuracy(y_true, y_pred):
    correct = sum(1 for t, p in zip(y_true, y_pred) if t == p)
    return correct / len(y_true)

y_true = [1, 0, 1, 1]
y_pred = [1, 0, 0, 1]
print(accuracy(y_true, y_pred)) 

Mini exercise: Compute Final Grade Average for Each Student (final_grade = 0.4 * midterm + 0.6 * final)

In [None]:
students = [
    ["Alice", [85, 92], "Data Science"],
    ["Bob", [78, 81], "Business"],
    ["Cara", [90, 88], "Data Science"],
    ["David", [65, 70], "Economics"],
    ["Emma", [95, 94], "Data Science"]
]

In [None]:
final_grades = []

for student in students:
    name = student[0]
    midterm = student[1][0]
    final = student[1][1]
    
    grade = 0.4 * midterm + 0.6 * final
    final_grades.append((name, grade))

print(final_grades)

In [None]:
# using list comprehension
final_grades = [
    (student[0], 0.4 * student[1][0] + 0.6 * student[1][1])
    for student in students
]

print(final_grades)

In [None]:
def mean(values: list[float]) -> float:
    if not values:
        raise ValueError("values must be non-empty")
    return sum(values) / len(values)

print(mean([1.0, 2.0, 3.0]))

## Common Methods and Built-in Functions

In [None]:
# String Methods
text = "  Hello Data Science "
print(text.strip())  # Remove leading/trailing whitespace
print(text.lower())  # Convert to lowercase
words = text.strip().lower().split(" ")  # Split into words
print(words)

In [None]:
# List Methods
nums = [3, 1, 4]

nums.append(5)      # add one item
nums.extend([6, 7]) # add multiple
nums.insert(0, 10)  # insert at position
nums.remove(3)      # remove value
nums.pop()          # remove last item
nums.sort()         # sort in place
nums.reverse()      # reverse list
nums.count(4)       # count occurrences
nums.index(1)       # find index

In [None]:
# Dictionary Methods
person = {"name": "Alice", "age": 25}

person.keys()
person.values()
person.items()
person.get("age")         # safer than person["age"]
person.update({"GPA":3.8})
person.pop("age")

In [None]:
# Set Methods
s1 = {1, 2, 3}
s2 = {3, 4, 5}

s1.add(6)
s1.remove(2)
s1.union(s2)
s1.intersection(s2)
s1.difference(s2)

In [None]:
# Important Built-in Functions

nums = [3, 1, 4, 1, 5]
names = ["Alice", "Bob", "Cara"]
scores = [85, 90, 78]

print("len:", len(nums)) # len() — number of elements
print("sum:", sum(nums)) # sum() — sum of elements
print("min:", min(nums)) # min() — smallest value
print("max:", max(nums)) # max() — largest value

# sorted() — returns new sorted list
print("sorted:", sorted(nums))
print("sorted reverse:", sorted(nums, reverse=True))

# round() — round number
pi = 3.14159
print("round:", round(pi, 2)) # round to 2 decimal places
print("type:", type(nums)) # type() — check data type
print("range:", list(range(5))) # range() — generate sequence
print("range with step:", list(range(1, 10, 2)))
print("abs:", abs(-10)) # absolute value

# all() — True if all values True
print("all > 0:", all(n > 0 for n in nums))

# any() — True if any value True
print("any > 4:", any(n > 4 for n in nums))

# zip() — combine sequences
paired = list(zip(names, scores))
print("zip:", paired)

# enumerate() — index + value
for i, name in enumerate(names):
    print("enumerate:", i, name)

# print() — formatted output
print("Formatted:", f"{names[0]} scored {scores[0]}")


## Python handles errors using try–except blocks

In [None]:
try:
    x = 10 / 0
except:
    print("Something went wrong.")

Catch Specific Errors (Best Practice).
Common errors:
- ZeroDivisionError
- ValueError
- TypeError
- IndexError
- KeyError
- FileNotFoundError

In [None]:
try:
    x = 10 / 0
except ZeroDivisionError:
    print("Cannot divide by zero.")

Multiple Except Blocks

In [None]:
try:
    num = int(input("Enter a number: "))
    result = 10 / num
except ValueError:
    print("Invalid input.")
except ZeroDivisionError:
    print("Cannot divide by zero.")

## File I/O

Use `with open(...)` so files close reliably.

In [None]:
# read entire file
with open("students.csv", "r") as f:
    content = f.read()

print(content)


In [None]:
# read line by line
with open("students.csv", "r") as f:
    next(f)  # skip header
    for line in f:
        print(line.strip())

Writing files

In [None]:
# 'a' - append mode
# 'w' - overwrite mode
with open("students.csv", "a") as f:
    f.write("Nicol,Math,89,82\n")

Mini exercise: Count how many students are in the file (ignore the header).

In [None]:
count = 0

with open("students.csv", "r") as f:
    next(f)  # skip header
    
    for line in f:
        count += 1

print("Number of students:", count)

Mini Excercise: Count how many students are in each major.

In [None]:
majors = {}

with open("students.csv", "r") as f:
    next(f)  # skip header
    
    for line in f:
        name, major, midterm, final = line.strip().split(",")
        
        if major not in majors:
            majors[major] = 0
        
        majors[major] += 1

print(majors)


## Module
A module is just a Python file (.py) containing functions, classes, or variables. A module = reusable code in a file. Benefits?
- Code Organization
- Reusability
- Collaboration
- Abstraction (you don't reinvent the wheel)
- Testing and Maintainability
- Performance
- Scalibility
- ... and more

import `math` module

In [None]:
from math import sqrt
print(sqrt(16))

In [None]:
# Import specific function
from math import sqrt
print(sqrt(16))

Alias import (very common in DS)

In [None]:
import numpy as np
import pandas as pd

Install modules (packages) using pip:

In [None]:
pip install numpy

In [None]:
pip install pandas

In [None]:
# install specific version
# pip install pandas==2.2.0

# upgrade package
# pip install --upgrade pandas

## Python Data Science Modules

Built-in Fundamental Modules

math            → basic math functions
random          → random numbers and sampling
statistics      → mean, median, standard deviation
datetime        → date and time handling
os              → interact with operating system
pathlib         → modern file path handling
sys             → Python interpreter information
json            → read/write JSON data
csv             → read/write CSV files
collections     → Counter, defaultdict, advanced containers
itertools       → efficient looping tools
logging         → structured program logging


Core Data Science Modules 

numpy           → numerical arrays and linear algebra foundation
pandas          → DataFrame data manipulation and cleaning
matplotlib      → base plotting library
seaborn         → statistical visualization built on matplotlib
scipy           → scientific computing tools
scikit-learn    → machine learning (regression, classification, clustering)

STATISTICS / ECONOMETRICS Modules

statsmodels     → statistical modeling and hypothesis testing
pymc            → Bayesian statistical modeling

DATA ENGINEERING / UTILITIES Modules

sqlalchemy      → database connections and ORM
requests        → HTTP/API requests
tqdm            → progress bars
joblib          → model saving and parallel processing

DEEP LEARNING Modules

torch           → PyTorch deep learning framework
tensorflow      → production-oriented deep learning framework
keras           → high-level neural network API
torchvision     → image datasets and models (PyTorch)
torchaudio      → audio tools (PyTorch)

MODERN AI / NLP Modules

transformers    → pretrained NLP and LLM models (Hugging Face)
datasets        → NLP datasets library
langchain       → LLM application orchestration


## Python Virtual Environment

WHY?
Avoid package conflicts.
One project = one environment.

### OPTION 1 — Using venv (Built-in Python, isolates Python packages only)

1) Create environment
python -m venv venv

2) Activate
Mac/Linux:
source venv/bin/activate

Windows:
venv\Scripts\activate

You should see:
(venv)

3) Install packages
pip install numpy pandas

4) Check environment
which python      (Mac/Linux)
where python      (Windows)

Or inside Python:
import sys
print(sys.executable)

It should point to /venv/

5) Save dependencies
pip freeze > requirements.txt

Reinstall later:
pip install -r requirements.txt

Deactivate when done:
deactivate

### OPTION 2 — Using Conda (Popular for Data Science, isolates Python + system dependencies (C libraries, MKL, CUDA, etc.))

1) Create environment
conda create -n myenv python=3.11

2) Activate
conda activate myenv

3) Install packages
conda install numpy pandas

(Optional: you can also use pip inside conda)

4) List environments
conda env list

5) Deactivate
conda deactivate

In [None]:
pip install numpy pandas