# Phase 4: Object-Oriented Programming
## The architecture of real software

Everything in Python is an object. When you write `name = "customers"`, that string is an object. 
When you call `name.upper()`, you are calling a method on that object. You have been using OOP 
since Phase 1 without knowing it.

Now you will learn to CREATE your own objects. This is how Odibi is built -- every engine, 
every connection, every pattern is a class. By the end of this notebook, you will understand 
how they all fit together.

**This is the most important phase for interviews.** OOP questions appear in almost every 
Python interview for mid-level and above.

---
## Section 1: What is a Class?

A **class** is a blueprint. An **object** (or **instance**) is something built from that blueprint.

Think of it like this:
- Class = the blueprint for a house
- Object = an actual house built from that blueprint
- You can build many houses (objects) from one blueprint (class)

### The `__init__` method and `self`

Every class has a special method called `__init__` (pronounced "dunder init"). 
It runs automatically when you create a new object. This is where you set up the object's data.

`self` is a reference to the object being created. Every method in a class receives `self` 
as its first parameter. Python passes it automatically -- you just need to include it in the definition.

Think of `self` as the object saying "me" -- `self.name` means "my name".

In [None]:
# Your first class
class Node:
    """Represents a single data pipeline node."""

    def __init__(self, name, source, write_mode="overwrite"):
        # These are INSTANCE ATTRIBUTES -- each Node object has its own copy
        self.name = name
        self.source = source
        self.write_mode = write_mode
        self.row_count = 0
        self.success = False

# Creating objects (instances) from the class
customers = Node("customers", "raw_customers.csv", "upsert")
orders = Node("orders", "raw_orders.csv", "append")

# Each object has its own data
print(f"Node: {customers.name}, Source: {customers.source}")
print(f"Node: {orders.name}, Mode: {orders.write_mode}")

# Modify an attribute
customers.row_count = 1542
customers.success = True
print(f"{customers.name}: {customers.row_count} rows, success={customers.success}")

Notice what just happened:
1. `class Node:` defines the blueprint
2. `Node("customers", "raw_customers.csv", "upsert")` creates an actual object
3. Python automatically calls `__init__` and passes the new object as `self`
4. Each object (customers, orders) has its own separate data

### Exercise 1.1: Create a Connection class

Create a class called `Connection` with:
- `__init__` that takes: `name` (str), `base_path` (str), `connection_type` (str, default "local")
- Set `self.is_connected = False` in __init__
- Create two Connection objects and print their attributes

In [None]:
# Exercise 1.1
# YOUR CODE HERE





# Test:
# local = Connection("local_dev", "./data", "local")
# azure = Connection("prod_azure", "abfss://container@account", "azure_blob")
# print(f"{local.name}: {local.connection_type}, connected={local.is_connected}")
# print(f"{azure.name}: {azure.connection_type}, connected={azure.is_connected}")

---
## Section 2: Methods -- Giving Objects Behavior

An attribute is data an object HAS. A method is something an object DOES.

Methods are just functions defined inside a class. They always take `self` as the first parameter.

In [None]:
class Node:
    """A pipeline node with behavior."""

    def __init__(self, name, source, write_mode="overwrite"):
        self.name = name
        self.source = source
        self.write_mode = write_mode
        self.row_count = 0
        self.success = False
        self.duration = 0.0

    def execute(self, rows):
        """Simulate executing this node."""
        self.row_count = rows
        self.success = True
        print(f"[OK] {self.name}: processed {self.row_count:,} rows")

    def get_summary(self):
        """Return a formatted summary string."""
        status = "SUCCESS" if self.success else "PENDING"
        return f"[{status}] {self.name}: {self.row_count:,} rows ({self.write_mode})"

# Use it
node = Node("customers", "raw_customers.csv", "upsert")
print(node.get_summary())  # [PENDING]

node.execute(1542)
print(node.get_summary())  # [SUCCESS]

### Exercise 2.1: Add methods to Connection

Take your Connection class from Exercise 1.1 and add:
- `connect()` -- sets is_connected to True, prints a message
- `disconnect()` -- sets is_connected to False
- `get_path(table_name)` -- returns base_path + "/" + table_name

In [None]:
# Exercise 2.1
# YOUR CODE HERE









# Test:
# conn = Connection("local_dev", "./data")
# conn.connect()
# print(conn.get_path("customers.csv"))
# conn.disconnect()
# print(f"Connected: {conn.is_connected}")

---
## Section 3: Properties -- Computed Attributes

A **property** makes a method look like an attribute. You access it without `()` 
but it runs a calculation behind the scenes.

Why use properties?
- Computed values that always stay up-to-date
- Validation when setting a value
- Read-only attributes

Odibi uses `@property` in `context.py` for the `columns` property.

In [None]:
class ValidationResult:
    """Result of a data validation check."""

    def __init__(self, test_name, total_rows, failed_rows=0):
        self.test_name = test_name
        self.total_rows = total_rows
        self.failed_rows = failed_rows

    @property
    def passed_rows(self):
        """Computed: total - failed."""
        return self.total_rows - self.failed_rows

    @property
    def pass_rate(self):
        """Computed: pass rate as a decimal."""
        if self.total_rows == 0:
            return 0.0
        return self.passed_rows / self.total_rows

    @property
    def passed(self):
        """Did the validation pass? (100% pass rate)."""
        return self.failed_rows == 0

# Use it -- notice NO parentheses, it looks like an attribute
result = ValidationResult("not_null", 1000, 23)
print(f"Test: {result.test_name}")
print(f"Pass rate: {result.pass_rate:.1%}")  # 97.7%
print(f"Passed: {result.passed}")  # False (23 failures)

### Exercise 3.1: Engine class with properties

Create an Engine class with:
- `__init__` takes `engine_type` (str)
- `@property engine_name` returns the type capitalized ("pandas" -> "Pandas")
- `@property is_distributed` returns True if engine_type is "spark", False otherwise

In [None]:
# Exercise 3.1
# YOUR CODE HERE






# Test:
# e = Engine("pandas")
# print(f"{e.engine_name}, distributed={e.is_distributed}")  # Pandas, False
# e2 = Engine("spark")
# print(f"{e2.engine_name}, distributed={e2.is_distributed}")  # Spark, True

---
## Section 4: Inheritance

**Inheritance** lets one class get all the attributes and methods of another class.

- The **parent** (or base) class provides shared behavior
- The **child** (or derived) class inherits it and can add or change behavior

This is how Odibi's connection system works:
- `BaseConnection` defines the interface (resolve_path, get_uri)
- `LocalConnection` inherits from it and implements local file system logic
- `AzureConnection` inherits from it and implements Azure blob storage logic

### The `super()` function

When a child class has its own `__init__`, it needs to call the parent's `__init__` too. 
That is what `super().__init__()` does -- it calls the parent's version.

In [None]:
# Inheritance example
class BaseConnection:
    """Base class for all connections."""

    def __init__(self, name, base_path):
        self.name = name
        self.base_path = base_path
        self.is_connected = False

    def connect(self):
        self.is_connected = True
        print(f"Connected to {self.name}")

    def resolve_path(self, relative_path):
        """Override this in child classes."""
        return f"{self.base_path}/{relative_path}"


class LocalConnection(BaseConnection):
    """Connection to local filesystem."""

    def __init__(self, base_path="./data"):
        # Call parent's __init__
        super().__init__(name="local", base_path=base_path)

    def resolve_path(self, relative_path):
        """Local paths use os.path.join logic."""
        return f"{self.base_path}/{relative_path}"


class AzureConnection(BaseConnection):
    """Connection to Azure Blob Storage."""

    def __init__(self, container, account):
        base_path = f"abfss://{container}@{account}.dfs.core.windows.net"
        super().__init__(name="azure", base_path=base_path)
        self.container = container
        self.account = account

    def resolve_path(self, relative_path):
        """Azure paths include the full URI."""
        return f"{self.base_path}/{relative_path}"


# Both types work the same way
local = LocalConnection("./data")
azure = AzureConnection("bronze", "mystorageaccount")

local.connect()
azure.connect()

print(local.resolve_path("customers.parquet"))
print(azure.resolve_path("customers.parquet"))

### Exercise 4.1: Build a class hierarchy

Create:
1. A `BaseTransformer` class with `name` attribute and a `transform(self, data)` method 
   that just returns data unchanged
2. A `RenameTransformer(BaseTransformer)` that takes a `mapping` dict and has a 
   `transform()` that prints which columns would be renamed
3. A `FilterTransformer(BaseTransformer)` that takes a `condition` string and has a 
   `transform()` that prints the filter being applied

In [None]:
# Exercise 4.1
# YOUR CODE HERE













# Test:
# r = RenameTransformer({"cust_id": "customer_id", "nm": "name"})
# r.transform(["cust_id", "nm", "email"])
# f = FilterTransformer("amount > 0")
# f.transform(["id", "amount"])

---
## Section 5: Abstract Base Classes (ABC)

An **Abstract Base Class** is a class that:
1. Cannot be instantiated directly (you cannot create an object from it)
2. Defines methods that child classes MUST implement

This is how you create a **contract**. If you say "every engine must have a `read()` method", 
the ABC enforces that. If someone creates a new engine and forgets `read()`, Python raises an error.

This is the **core architecture pattern** of Odibi. Look at `odibi/engine/base.py`:
```python
class Engine(ABC):
    @abstractmethod
    def read(self, connection, format, ...):
        pass

    @abstractmethod
    def write(self, df, connection, format, ...):
        pass
```

Then `PandasEngine(Engine)`, `SparkEngine(Engine)`, and `PolarsEngine(Engine)` each implement 
these methods differently. But any code that uses an Engine can work with ANY of them 
because they all have the same interface.

In [None]:
from abc import ABC, abstractmethod

# Define the contract
class BaseEngine(ABC):
    """Abstract base class for execution engines."""

    @abstractmethod
    def read(self, path, format):
        """Read data from a source. Must be implemented."""
        pass

    @abstractmethod
    def write(self, data, path, format):
        """Write data to a destination. Must be implemented."""
        pass

    @abstractmethod
    def count_rows(self, data):
        """Count rows in a dataset. Must be implemented."""
        pass

    def describe(self):
        """Non-abstract method -- shared by all engines."""
        return f"Engine: {self.__class__.__name__}"


# Try to instantiate the ABC directly -- this FAILS
try:
    engine = BaseEngine()
except TypeError as e:
    print(f"Cannot create BaseEngine: {e}")

In [None]:
from abc import ABC, abstractmethod

class BaseEngine(ABC):
    @abstractmethod
    def read(self, path, format):
        pass
    @abstractmethod
    def write(self, data, path, format):
        pass
    @abstractmethod
    def count_rows(self, data):
        pass

# Implement the contract with Pandas
class PandasEngine(BaseEngine):
    """Pandas implementation of the engine contract."""

    def read(self, path, format):
        print(f"[Pandas] Reading {format} from {path}")
        return [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]  # Simulated

    def write(self, data, path, format):
        print(f"[Pandas] Writing {len(data)} rows to {path} as {format}")

    def count_rows(self, data):
        return len(data)

# Use it
engine = PandasEngine()
data = engine.read("data/customers.csv", "csv")
print(f"Row count: {engine.count_rows(data)}")
engine.write(data, "output/customers.parquet", "parquet")

### The power of ABC: Write code that works with ANY engine

Because all engines follow the same contract, you can write functions that accept 
any engine without knowing which one it is. This is **polymorphism**.

In [None]:
# This function works with ANY engine (Pandas, Spark, Polars)
def process_node(engine, source_path, target_path):
    """Process a node using whatever engine is provided."""
    data = engine.read(source_path, "csv")
    print(f"Read {engine.count_rows(data)} rows")
    engine.write(data, target_path, "parquet")

# It does not matter which engine we pass -- they all work
engine = PandasEngine()
process_node(engine, "input/sales.csv", "output/sales.parquet")

### Exercise 5.1: Build a validator ABC

Create:
1. `BaseValidator(ABC)` with an abstract method `validate(self, data)` that returns a list of error strings
2. `NotNullValidator(BaseValidator)` that checks if any values in a list are None
3. `RangeValidator(BaseValidator)` that checks if all numbers are between a min and max

Test both validators on sample data.

In [None]:
# Exercise 5.1
# YOUR CODE HERE
from abc import ABC, abstractmethod












# Test:
# data_with_nulls = [1, None, 3, None, 5]
# v1 = NotNullValidator("id")
# print(v1.validate(data_with_nulls))
#
# numbers = [10, 200, 30, -5, 50]
# v2 = RangeValidator("amount", 0, 100)
# print(v2.validate(numbers))

---
## Section 6: Dunder (Magic) Methods

Methods with double underscores (like `__init__`) are called "dunder" methods. 
They let your objects work with Python's built-in operations.

| Method | What it enables | Example |
|--------|-----------------|---------|
| `__repr__` | Developer string | `repr(obj)` |
| `__str__` | User string | `str(obj)`, `print(obj)` |
| `__len__` | Length | `len(obj)` |
| `__eq__` | Equality | `obj1 == obj2` |
| `__getitem__` | Indexing | `obj[0]`, `obj['key']` |
| `__iter__` | Iteration | `for x in obj` |
| `__contains__` | Membership | `x in obj` |

In [None]:
class Pipeline:
    """A collection of nodes with Python magic methods."""

    def __init__(self, name):
        self.name = name
        self._nodes = []

    def add_node(self, node_name):
        self._nodes.append(node_name)

    def __repr__(self):
        return f"Pipeline('{self.name}', nodes={len(self._nodes)})"

    def __len__(self):
        return len(self._nodes)

    def __getitem__(self, index):
        return self._nodes[index]

    def __contains__(self, node_name):
        return node_name in self._nodes

    def __iter__(self):
        return iter(self._nodes)

# Use it with Python built-ins
pipe = Pipeline("sales")
pipe.add_node("customers")
pipe.add_node("orders")
pipe.add_node("products")

print(repr(pipe))              # Pipeline('sales', nodes=3)
print(f"Length: {len(pipe)}")   # 3
print(f"First: {pipe[0]}")     # customers
print(f"Has orders: {"orders" in pipe}")  # True

# Iterate like a list!
for node in pipe:
    print(f"  Node: {node}")

### Exercise 6.1: DataBatch class

Create a `DataBatch` class that:
- Takes `name` (str) and `rows` (list of dicts)
- `__len__` returns number of rows
- `__repr__` returns `DataBatch('name', rows=N)`
- `__getitem__` allows indexing individual rows
- `__iter__` allows iterating over rows

In [None]:
# Exercise 6.1
# YOUR CODE HERE










# Test:
# data = [
#     {"id": 1, "name": "Alice"},
#     {"id": 2, "name": "Bob"},
#     {"id": 3, "name": "Charlie"},
# ]
# batch = DataBatch("customers", data)
# print(repr(batch))      # DataBatch('customers', rows=3)
# print(len(batch))       # 3
# print(batch[0])         # {'id': 1, 'name': 'Alice'}
# for row in batch:
#     print(row)

---
## Section 7: Composition vs Inheritance

Two ways to combine classes:

- **Inheritance**: "is-a" -- PandasEngine IS an Engine
- **Composition**: "has-a" -- Pipeline HAS an Engine, HAS Nodes

The rule of thumb: **prefer composition over inheritance** unless the "is-a" relationship 
is genuinely true.

Odibi uses both:
- Inheritance: Engine ABC -> PandasEngine, SparkEngine, PolarsEngine
- Composition: Pipeline HAS an Engine, HAS a list of Nodes, HAS a Context

In [None]:
# Composition example -- Pipeline is composed of other objects
class SimpleEngine:
    def process(self, data):
        return f"Processed {len(data)} items"

class SimpleNode:
    def __init__(self, name, source):
        self.name = name
        self.source = source

class Pipeline:
    """Pipeline is COMPOSED of an engine and nodes."""

    def __init__(self, name, engine):
        self.name = name
        self.engine = engine      # HAS an engine
        self.nodes = []           # HAS nodes

    def add_node(self, node):
        self.nodes.append(node)

    def run(self):
        for node in self.nodes:
            result = self.engine.process([1, 2, 3])  # Use the engine
            print(f"  {node.name}: {result}")

# Build a pipeline from components
engine = SimpleEngine()
pipe = Pipeline("sales", engine)
pipe.add_node(SimpleNode("customers", "data/customers.csv"))
pipe.add_node(SimpleNode("orders", "data/orders.csv"))
pipe.run()

---
## Section 8: Interview Drill


### Drill 1: Shape hierarchy

Create:
- `Shape(ABC)` with abstract method `area()`
- `Circle(Shape)` with radius, implement area (pi * r^2)
- `Rectangle(Shape)` with width and height, implement area

Print the area of each.

In [None]:
# Drill 1
# YOUR CODE HERE
import math
from abc import ABC, abstractmethod


### Drill 2: What is the difference between a class and an instance?
Write your answer as a comment, then demonstrate with code.

In [None]:
# Drill 2
# YOUR ANSWER AND CODE HERE


### Drill 3: When would you use @property vs a regular method?
Write your answer and give an example.

In [None]:
# Drill 3
# YOUR ANSWER AND CODE HERE


### Drill 4: Explain composition vs inheritance
When would you use each? Give a real-world example.

In [None]:
# Drill 4
# YOUR ANSWER AND CODE HERE


---
## Checkpoint

You now understand Object-Oriented Programming in Python:

- Classes and instances (`__init__`, `self`)
- Methods (instance, @staticmethod, @classmethod)
- Properties (@property for computed attributes)
- Inheritance (parent -> child, super())
- Abstract Base Classes (ABC, @abstractmethod)
- Dunder methods (__repr__, __len__, __getitem__, __iter__)
- Composition vs Inheritance

This is the architecture of Odibi. When you see `class PandasEngine(Engine)` in the real code, 
you now understand exactly what that means and why it is designed that way.

**Next:** Phase 5 -- Decorators, Generators, and Advanced Patterns.