### Slots

__slots__ in Python constrains object attributes to a fixed set, eliminating the memory overhead associated with the dynamic per-instance __dict__. This leads to more memory-efficient storage of objects. In a data engineering context, where you often work with large data sets or many instances of custom classes, using __slots__ can significantly reduce memory footprint and improve performance during data transformations and manipulations.

In [5]:
from pympler import asizeof

class WithoutSlots:
    def __init__(self, name, age):
        self.name = name
        self.age = age

class WithSlots:
    __slots__ = ['name', 'age']
    def __init__(self, name, age):
        self.name = name
        self.age = age

obj1 = WithoutSlots('Alice', 30)
obj2 = WithSlots('Bob', 40)

print(f"Size without slots: {asizeof.asizeof(obj1)} bytes")
print(f"Size with slots: {asizeof.asizeof(obj2)} bytes")


### Generator Expressions

Generator expressions provide a memory-efficient way to handle large data sets by yielding items one at a time, instead of loading all into memory. In data engineering, this is useful for streaming and transforming large data files or query results.

In [8]:
gen_exp = (x**2 for x in range(10))
for val in gen_exp:
    print(val) 

### Walrus operator

The Walrus Operator := allows you to both assign a value to a variable and evaluate it in a single expression. In data engineering tasks like data filtering or transformation, this can reduce redundant calculations, making code more efficient.

In [29]:
square = 0

In [30]:
filtered_data = [(n, square := n ** 2) for n in range(10)]
filtered_data

In [31]:
square

### Custom context managers 

Custom context managers help in abstracting setup and teardown activities, making the code more readable and maintainable. In data engineering, they can manage database connections, temporary files, or other resources efficiently.

In [34]:
from contextlib import contextmanager

@contextmanager
def managed_resource():
    print("Setup")
    yield 
    print("Teardown")

with managed_resource():
    print("Do work")  # Setup -> Do work -> Teardown

### `__repr__` Method

The `__repr__` method should return a string that, when passed to `eval()`, would create an object with the same internal state as the original object. It's mainly intended for debugging and development.

In [35]:

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __repr__(self):
        return f"Point({self.x}, {self.y})"

p = Point(2, 3)
print(repr(p))

### `__str__` Method

The `__str__` method returns a string that provides an "informal" or nicely printable representation of the object. This makes the object's printout more human-readable.

```python
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __repr__(self):
        return f"Point({self.x}, {self.y})"

    def __str__(self):
        return f"A point at ({self.x}, {self.y})"

p = Point(2, 3)
print(str(p)) 

### `defaultdict`

`defaultdict` is a subclass of Python's `dict` that returns default values for missing keys. In data engineering, this is useful for building frequency counters, group-by operations, or adjacency lists, where the structure of the dictionary needs to be dynamic.

In [40]:
from collections import defaultdict

dd = defaultdict(list)
dd['key1'].append(1)
dd['key2'].append(2)
print(dd)

### Decorators

Decorators modify or extend the behavior of functions or methods without changing their code. In data engineering, decorators can be used for logging, memoization, or access control.

In [42]:
def logger(func):
    def wrapper(*args, **kwargs):
        print(f"Calling {func.__name__}")
        return func(*args, **kwargs)
    return wrapper

@logger
def add(a, b):
    return a + b

add(1, 2)

### Decorators with Arguments

Decorators with arguments allow you to pass extra information to your decorator, making them more flexible. This is useful in data engineering for dynamically setting behaviors like caching policies or operation modes.

In [44]:
def multiplier(factor):
    def decorator(func):
        def wrapper(*args, **kwargs):
            return func(*args, **kwargs) * factor
        return wrapper
    return decorator

@multiplier(4)
def add(a, b):
    return a + b

print(add(1, 2))  

### Data Classes

Data classes in Python automatically generate special methods like `__init__`, `__repr__`, and `__eq__`. They make it easier to create classes for storing data. In data engineering, this simplifies the definition of complex data structures.

In [46]:
from dataclasses import dataclass

@dataclass
class DataPoint:
    x: int
    y: int

dp = DataPoint(1, 2)
print(dp) 

### Abstract Base Classes (ABCs)

Abstract Base Classes define a set of methods and properties that a class must implement, but don't provide implementations. They allow you to set up a blueprint for other classes, ensuring a consistent interface. In data engineering, they can be used to define interfaces for plug-and-play components in a pipeline.

In [None]:
# Code Block
from abc import ABC, abstractmethod

class DataProcessor(ABC):
    @abstractmethod
    def process(self, data):
        pass

class MyProcessor(DataProcessor):
    pass
    # def process(self, data):
    #     print(f"Processing {data}")

processor = MyProcessor()
processor.process("some data")  # Output: Processing some data

Data classes make it easier to manage data in a structured form, while ABCs ensure that certain classes adhere to a specific contract, making your data engineering pipelines more modular and easier to understand.