# Typing Theory (A Very Rudimentary Guide)

Although the reference guide is written by the authors of the python language,
the concepts of type theory, including subtype relationships and type safety,
are applicable across many programming languages, not just Python. These
principles have deep roots in computer science and are essential in
understanding static type checking in languages like Java, C#, TypeScript, and
many others.

In Java, for instance, the understanding of types and subtypes is crucial for
working with class hierarchies, interface implementation, and generics. The
concepts of polymorphism, inheritance, and interface implementation in Java all
rely on a clear understanding of type relationships.

So, learning these concepts is definitely not confined to Python. They are
fundamental to computer science and software engineering, providing a foundation
for understanding type systems in various programming languages and their role
in software design and architecture.

In [2]:
from typing import *

## Criterion for Subtype Relationships

Let $\mathcal{T}_1$ and $\mathcal{T}_2$ represent two types. Then
$\mathcal{T}_2$ is a subtype of $\mathcal{T}_1$ (denoted
$\mathcal{T}_2 <: \mathcal{T}_1$) if and only if they fulfill the below two
criterion:

1. **Value Inclusion**:

   $$
   \forall v \in \mathcal{T}_2, v \in \mathcal{T}_1
   $$

   Here once can understand values $v$ as an instance of. We will see it in 
   an example later.

2. **Function Applicability**:

   Given a function $f$, if $f$ is applicable to $\mathcal{T}_1$, then it should
   also be applicable to $\mathcal{T}_2$.

   $$
   \forall f, \left(f: \mathcal{T}_1 \rightarrow \dots \right) \Rightarrow \left(f: \mathcal{T}_2 \rightarrow \dots \right)
   $$

   In other words, consider without loss of generality that $\mathcal{T}_1$ has
   $N$ functionalities, $f_1, f_2, \ldots, f_N$, then $\mathcal{T}_2$ also have
   these $N$ functionalities, $f_1, f_2, \ldots, f_N$ or more.

Since the above is a mathematical _relation_, we have:

- **Consequences of Subtype Relation**:
  - Reflexivity: $\mathcal{T} <: \mathcal{T}$ for any type $\mathcal{T}$.
  - Narrowing Values, Widening Functions: In the subtype process, the set of
    values of $\mathcal{T}_2$ is a subset (or equal to) of $\mathcal{T}_1$, and
    the set of functions applicable to $\mathcal{T}_2$ is a superset (or equal
    to) of those applicable to $\mathcal{T}_1$.

Using this formal notation, subtype relationships in type theory can be
rigorously defined and reasoned about. The notations $\mathcal{T}_1$ and
$\mathcal{T}_2$ represent the sets of values defining the types, while $f$
represents a function applicable to these types.

Let's look at a few examples.

1. Consider the real number system $\mathbb{R}$, then we say that the integers
   (whole numbers) $\mathbb{Z}$ is a **_subtype_** of $\mathbb{R}$ because it
   fulfills:
   1. For all values $v \in \mathbb{Z}$, it follows that $v \in \mathbb{R}$ as
      well.
   2. All operations well defined on $\mathbb{R}$ is well defined on the integer
      system $\mathbb{Z}$. For example, all plus, subtraction, multiplication
      and division are applicable on $\mathbb{R}$ and also on $\mathbb{Z}$. But
      $\mathbb{Z}$ can have more, for instance the bitshift operator $<<$, but
      is not defined on $\mathbb{R}$.

In python, the $\mathbb{R}$ can be denoted as type `float` and $\mathbb{Z}$ as
`int`.

## Safe

If S is a subtype of T, the subtyping relation (written as S <: T, S ⊑ T,[1] or
S ≤: T ) means that any term of type S can safely be used in any context where a
term of type T is expected.

What this mean?

In [3]:
old: float = 3.01
new: int = 5
old = new  # Safe because int <: float

And conversely:

In [4]:
old: int = 3
new: float = 3.03
old = new  # Unsafe because int <: float
# assume the static language doesnt compile error then old will truncate to 3 silently because it is defined as an `int`!

In [6]:
class Animal:
    def describe(self) -> str:
        return str(self.__class__.__name__)
        

class Dog(Animal):
    def bark(self) -> Literal["woof"]:
        return "woof"

generic_animal: Animal = Animal()
generic_dog: Dog = Dog()

In the `Dog` ($\mathcal{T}_2$) and `Animal` ($\mathcal{T}_1$) example, `Dog` is a subtype of `Animal`. 

Why? Because:

1. It fulfills that for all $v \in \mathcal{T}_2$, we have $v \in \mathcal{T}_1$. This means every instance of `Dog` is an instance of `Animal`.

In [10]:
isinstance(generic_dog, Animal), isinstance(generic_dog, Dog), isinstance(generic_animal, Dog)

(True, True, False)

2. It also fulfills that for all functionalities defined on $\mathcal{T}_1$ (`Animal`) it is also applicable to $\mathcal{T}_2$ (`Dog`), which is trivially true here due to inheritance (and we did not override any old method to "define" a different functionality).

In [11]:
generic_animal = generic_dog  # Safe because Dog <: Animal

Therefore, it's safe to assign an instance of `Dog` to a variable of type
`Animal` since `Dog` contains all functionalities (`speak`) of `Animal` and
possibly more (`bark`) so there won't be any surprise here. But it is deemed
unsafe to assign `generic_animal` to `generic_dog` because
not every `Animal` is a `Dog`. While every `Dog` instance is an `Animal`
(fulfilling the subtype criteria), the reverse isn't true. An `Animal` instance
might not have all functionalities of a `Dog` (like `bark()`), leading to
potential errors or undefined behaviors if treated as a `Dog`. This violates the
principle that the subtype should be able to handle everything the supertype
can, plus potentially more.


This concept is fundamental in static type checking, ensuring type safety by
validating that assignments and function calls do not violate the established
type hierarchy.


## Tricky Subtype example

The tricky example you mentioned highlights a subtle issue in type theory
related to subtype relationships and the concept of variance in generic types.

1. **Subtyping Condition**: While every `int` is a `float`, making one inclined
   to think `List[int]` is a subtype of `List[float]`, this is not the case. The
   reason is due to the second condition of subtyping related to function
   applicability.

2. **Function Applicability Issue**: `List[int]` and `List[float]` differ in how
   they react to certain operations. You can append a `float` to a
   `List[float]`, but not to a `List[int]`. Therefore, passing `List[int]` where
   `List[float]` is expected can lead to operations that are invalid for
   `List[int]`.

3. **Example Breakdown**: In the given code, appending `3.14` (a `float`) to
   `my_list` (typed as `List[int]`) would be invalid. This invalid operation
   would either fail or incorrectly alter the list's type integrity. The
   subsequent operation `my_list[-1] << 5` expects an integer (due to the
   bitwise shift operation), which would fail or behave unexpectedly if
   `my_list[-1]` is a float.

This example underscores the importance of understanding variance (covariance
and contravariance) in generic types and why certain intuitive subtype
relationships might not hold in practice, especially for mutable types like
lists.

It fails the second criterion of subtyping (function applicability) because the
set of valid operations for `List[int]` is not the same as for `List[float]`.
Specifically, appending a `float` to a `List[int]` is not valid. This difference
means that while `List[int]` can be used wherever `List[float]` is expected in
terms of containing only numerical values, it cannot be used in contexts where
operations specific to `List[float]` (like appending a `float` value) are
required. This violation of function applicability criteria makes `List[int]`
not a subtype of `List[float]`.

In [None]:
def append_pi(lst: List[float]) -> None:
    lst += [3.14]

my_list = [1, 3, 5]  # type: List[int]

append_pi(my_list)   # Naively, this should be safe...

my_list[-1] << 5     # ... but this fails

## Subtyping Schemes

### Nominal Subtyping

**Definition**: In nominal subtyping, the subtype relationship is explicitly
declared. It is based on the class hierarchy or inheritance structure.

**Example**: In a class-based object-oriented language like Java, a class `Dog`
is a subtype of `Animal` if `Dog` explicitly extends `Animal`.

In [12]:
class Animal:
    def __init__(self, name: str) -> None:
        self.name = name

    def describe(self) -> str:
        return f"This is an animal named {self.name}."

    def speak(self) -> str:
        return "Some generic animal sound."

class Dog(Animal):
    def __init__(self, name: str, breed: str) -> None:
        super().__init__(name)
        self.breed = breed

    def speak(self) -> str:
        return "Woof!"

# Usage
generic_animal = Animal("GenericAnimal")
buddy = Dog("Buddy", "Golden Retriever")

print(generic_animal.describe())  # Outputs: This is an animal named GenericAnimal.
print(buddy.describe())           # Outputs: This is an animal named Buddy.
print(buddy.speak())              # Outputs: Woof!

# Assigning Dog to Animal type
another_animal: Animal = buddy  # This is nominal subtyping
print(another_animal.speak())   # Still outputs: Woof! because another_animal is actually a Dog.

This is an animal named GenericAnimal.
This is an animal named Buddy.
Woof!
Woof!


The assignment of `buddy` (a `Dog` instance) to `another_animal` (typed as
`Animal`) is an example of nominal subtyping because `Dog` is a subclass of
`Animal` through explicit class inheritance. This inheritance is the "nominal"
part – it's based on the names and explicit declarations in the class
definitions. In this example, `Dog` inherits from `Animal`, so it can be used
wherever an `Animal` is expected, adhering to nominal subtyping's principle that
subtype relationships are based on explicit, named inheritance hierarchies.

In [14]:
another_buddy: Dog = generic_animal

But if you do this, your static checker will raise an error similar to the following:

```python
Incompatible types in assignment (expression has type "Animal", variable has type "Dog")  [assignment]
```

because we are assigning `Animal` (expression) to a variable `another_buddy` that
belongs to the instance `Dog`. This is not allowed in nominal subtyping because
`Dog` is a subclass of `Animal` and not the other way round. This assignment 
will still work in python but is deemed ***unsafe*** and therefore a static
checker like `mypy` will raise an error telling you the violation.

### Structural Subtyping

**Definition**: Structural subtyping, also known as "duck typing," is based on
the structure or behavior of types. A type `A` is a subtype of type `B` if `A`
has at least all the methods and properties of `B`.

**Example**: In a language like Python, if `Bird` and `Airplane` both have a
method `fly()`, they could be considered structurally equivalent for contexts
requiring `fly()`.



In [None]:
from typing import Protocol, Literal


class Flyable(Protocol):
    def fly(self) -> None:
        pass


class Bird:
    def fly(self) -> None:
        print("Bird flying")


class Airplane:
    def fly(self) -> None:
        print("Airplane flying")


# Now, try to use Boat in a context requiring Flyable
def can_you_fly(obj: Flyable) -> Literal["Yes I can!", "No I cannot"]:
    if hasattr(obj, "fly"):
        return "Yes I can!"
    return "No I cannot!"

**Code Illustration**:

```python


def test_flying(obj):
    obj.fly()

test_flying(Bird())     # Works
test_flying(Airplane()) # Also works
```

In this case, `Bird` and `Airplane` are structurally similar due to the `fly()`
method, satisfying structural subtyping.

**Rigor**: Structural subtyping offers flexibility and is suited for dynamic
typing, but it can lead to confusion if types are structurally similar but
conceptually different.

These two approaches offer different perspectives on subtype relationships, each
with their own advantages and complexities in programming language design and
type systems.


## Sentinel Types

[OpenAI's `NotGiven`](https://github.com/openai/openai-python/blob/7367256070a975921ed4430f55d17dc0a9319f21/src/openai/_types.py#L273)

In [1]:
"""
## `NotGiven`

- **Purpose**: Indicates that a parameter was not provided at all. It's used to
  distinguish between a parameter being explicitly set to `None` and not being
  provided.
- **Use Case**: Common in APIs where default behavior is triggered when a
  parameter is not given, but `None` might be a valid, meaningful input. For
  example, `None` might mean "disable timeout", while `NotGiven` means "use a
  default timeout".
- Other example usage is if you want to assign a default empty list or dict but
  it is mutable, so you assign this type but not None since None don't make
  sense.
- **Behavior**: Functions can check for `NotGiven` to apply default behavior.

## `Omit`

- **Purpose**: Used to explicitly remove or omit a default value that would
  otherwise be applied. It's not just about a value being absent, but rather
  about actively removing a pre-existing default.
- **Use Case**: Useful in situations where the default behavior or value needs
  to be explicitly overridden or disabled, and where `None` is not a suitable
  option. For example, removing a default HTTP header.
- **Behavior**: Functions can check for `Omit` to actively remove or ignore a
  default setting or value.

### Comparison

- **Similarity**: Both are used to signal special cases in the absence of normal
  parameter values.
- **Difference**: `NotGiven` is about the absence of a value where a default may
  apply, while `Omit` is about actively overriding a default.
"""

from __future__ import annotations

from typing import Any, Literal, Type, Union

from typing_extensions import override, TypeAlias


class _NotGiven:
    """
    A sentinel singleton class used to distinguish omitted keyword arguments
    from those passed in with the value None (which may have different behavior).

    Quite similar with dataclass's MISSING.

    This is used to differentiate between cases where a parameter is not
    provided and where a parameter is provided with the value None. The class
    provides a more descriptive representation than None or other placeholders.

    NOTE: example usage is if you want to assign a default empty list or dict
    but it is mutable, so you assign this type but not None since None don't make
    sense.

    It is a singleton because `None` is also a singleton so we mimic this
    behaviour. No matter how many times you call `None` in any function or
    methods, it will reference the same unique singleton `None` class.

    More importantly, because `None` is a singleton, we can use the `is`
    operator to check for object identity. This is why the idiomatic way
    to check if a variable is `None` is to do `if var is None`.

    So that is why we make `_NotGiven` a singleton, because referencing this
    class across scripts will maintain its unique identity across imports.

    We further make this class immutable to behave a bit like `None`.

    Example
    -------
    ```python
    def get(timeout: Union[int, _NotGiven, None] = _NotGiven()) -> Response:
        if timeout is _NotGiven:
            # Default timeout behavior
        elif timeout is None:
            # No timeout
        else:
            # Specific timeout given

    get(timeout=1) # 1s timeout
    get(timeout=None) # No timeout
    get() # Default timeout behavior, which may not be statically known at
          # the method definition.
    ```
    """

    _instance: _NotGiven | None = None

    def __new__(cls: Type[_NotGiven]) -> _NotGiven:  # noqa: PYI034
        if cls._instance is None:
            cls._instance = super(_NotGiven, cls).__new__(cls)  # noqa: UP008
        return cls._instance

    def __bool__(self) -> Literal[False]:
        """
        This method is used to define the boolean value of an instance of `_NotGiven`.
        By returning `False`, it allows `_NotGiven` to be used in boolean contexts (like
        `if` statements) to signify the absence of a value. This is especially useful
        for checking if an argument was provided or not in a function.
        """
        return False

    @override
    def __repr__(self) -> Literal["_NOT_GIVEN"]:
        return "_NOT_GIVEN"

    def __setattr__(self, key: str, value: Any) -> None:
        raise AttributeError("_NotGiven instances are immutable")

    def __delattr__(self, key: str) -> None:
        raise AttributeError("_NotGiven instances are immutable")


NOT_GIVEN = _NotGiven()
NotGiven: TypeAlias = _NotGiven

In Python, when using the `requests` library to make HTTP requests, the
`timeout` parameter specifies the maximum number of seconds to wait for a
response. If `timeout` is set to `None`, it means that there is no timeout
limit for the request. In other words, the request will wait indefinitely
until the server responds or the connection is closed.

Here, we will use a relatively simple example to illustrate. Consider the
following function call `get` that takes in a argument `timeout` that
defines how many seconds to wait before raising a `TimeoutError`. If user
specifies `None`, it means that this program should have no timeout, and
therefore should run indefinitely until a server or something responds to halt.

In [3]:
import time 

def get(timeout: int | None = 2) -> int | float:
    if timeout is None:
        actual_timeout = float("inf")   
    else:
        actual_timeout = timeout        
    return actual_timeout

print(get())
print(get(timeout=2))
print(get(timeout=3))
print(get(timeout=None))

2
2
3
inf


What is the issue here? Not much. But one quirk is that the program has no
elegant way to distinguish whether a user passed in a default value or not.

```python
print(get())
print(get(timeout=2))
```

The above two will yield the same result, because the `timeout` has a default
value of `2`, so when the function is called without specifying `timeout`, it
automatically takes the value of `2` - which is the standard behaviour for
default values.

This approach does not disinguish between an user not providing the argument at
all and an user explicitly setting the argument to its default value.

Why does it matter? Besides the reason of expressing user intent and
explicitness, we can argue that we want more fine-grained behaviour control of
our program. If user pass in their own values, we may want to check whether that
value is within bounds, or in other words, legitimate. 

The key motivations for using a singleton sentinel class are primarily centered
around distinguishing between different states of function arguments, especially
in the context of **default** values and **optional** arguments.

1. **Differentiating Between 'None', 'Default Values' and 'Not Provided':** In Python, `None` is
   often used as a default value for function arguments. However, there are
   situations where `None` is a meaningful value distinct from the absence of a
   value. The `NotGiven` singleton allows you to differentiate between a user
   explicitly passing `None` (which might have a specific intended behavior) and
   not passing any value at all.
2. **Default Behavior Control:** By using a sentinel like `NotGiven`, we can
   implement a default behavior that is only triggered when an argument is
   **not** provided. This is different from setting a default value in the
   function definition, as it allows the function to check if the user has
   explicitly set the argument, even if it's set to `None`.
3. **Semantic Clarity:** In complex APIs or libraries, using a sentinel value
   can provide clearer semantics. It makes the intention of the code more
   explicit, both for the developer and for users of the API. It indicates that
   thought has been given to the different states an argument can be in, and
   different behaviors are intentionally designed for each state.

In [4]:
def get_with_not_given(timeout: int | NotGiven | None = NOT_GIVEN) -> int | float:
    actual_timeout: Union[int, float]
    if timeout is NOT_GIVEN:
        actual_timeout = 2
    elif timeout is None:
        actual_timeout = float("inf")
    else:
        assert isinstance(timeout, int)
        actual_timeout = timeout
    return actual_timeout

In [5]:
print(get_with_not_given())
print(get_with_not_given(timeout=2))
print(get_with_not_given(timeout=3))
print(get_with_not_given(timeout=None))

2
2
inf


### A More Practical Example: Database Query with Optional Filters

Consider a function that constructs a database query. In this scenario, the
function might accept several optional parameters that act as filters for the
query. The use of `NOT_GIVEN` allows us to differentiate between a filter being
intentionally set to `None` (indicating the desire to include records where the
field is `NULL`), a filter being set to a specific value, or the filter not
being used at all.

#### Example: `query_database`

```python
class NOT_GIVEN:
    _instance = None

    def __new__(cls):
        if cls._instance is None:
            cls._instance = super().__new__(cls)
        return cls._instance

    def __repr__(self):
        return "NOT_GIVEN()"

def query_database(name_filter: Union[str, None, NOT_GIVEN] = NOT_GIVEN(),
                   age_filter: Union[int, None, NOT_GIVEN] = NOT_GIVEN()):
    query = "SELECT * FROM users"

    where_clauses = []
    if name_filter is not NOT_GIVEN:
        if name_filter is None:
            where_clauses.append("name IS NULL")
        else:
            where_clauses.append(f"name = '{name_filter}'")

    if age_filter is not NOT_GIVEN:
        if age_filter is None:
            where_clauses.append("age IS NULL")
        else:
            where_clauses.append(f"age = {age_filter}")

    if where_clauses:
        query += " WHERE " + " AND ".join(where_clauses)

    return query

# Usage examples
print(query_database(name_filter="Alice"))       # Filter by name 'Alice'
print(query_database(age_filter=None))           # Filter by age being NULL
print(query_database())                          # No filters applied
print(query_database(name_filter=None, age_filter=30))  # Filter by name being NULL and age 30
```

#### Explanation

-   **Specific Value Provided:** If a specific value is provided (like `"Alice"`
    for `name_filter`), the function includes this in the query as a filter.
-   **`None` Value Provided:** If `None` is passed (like `None` for
    `age_filter`), the function interprets this as a requirement to include
    records where the corresponding field is `NULL`.
-   **No Value Provided (`NOT_GIVEN`):** If no value is provided, the function
    does not include the corresponding filter in the query. This is different
    from filtering where the field is `NULL`.

This example showcases a scenario where the distinction made by `NOT_GIVEN`
significantly alters the behavior of the function, demonstrating its practical
utility in a real-world context.


### Threading Example

In [12]:
import time
import threading

def perform_task(timeout: int | None, thread_id: int):
    try:
        print(f"Thread {thread_id}: Starting a task...")
        if timeout is None:
            # Simulate a task that runs indefinitely
            while True:
                time.sleep(1)
        else:
            time.sleep(timeout)
        print(f"Thread {thread_id}: Task completed successfully.")
    except TimeoutError:
        print(f"Thread {thread_id}: Task timed out and was terminated.")

def get(timeout: int | None = 2) -> None:
    thread_id = threading.get_ident()
    task_thread = threading.Thread(target=perform_task, args=(timeout, thread_id))
    task_thread.start()

    if timeout is not None:
        buffer_time = 0.1
        task_thread.join(timeout + buffer_time)
        if task_thread.is_alive():
            print(f"Thread {thread_id}: Raising TimeoutError.")
            raise TimeoutError
    else:
        task_thread.join()


# Usage examples
get(10)   # Task with a 5-second timeout
#time.sleep(3)
get(None) # Task with no timeout (indefinite task)


Thread 32224: Starting a task...
Thread 32224: Task completed successfully.
Thread 32224: Starting a task...


### A More Practical Example: Database Query with Optional Filters

Consider a function that constructs a database query. In this scenario, the
function might accept several optional parameters that act as filters for the
query. The use of `NOT_GIVEN` allows us to differentiate between a filter being
intentionally set to `None` (indicating the desire to include records where the
field is `NULL`), a filter being set to a specific value, or the filter not
being used at all.

#### Example: `query_database`

```python
class NOT_GIVEN:
    _instance = None

    def __new__(cls):
        if cls._instance is None:
            cls._instance = super().__new__(cls)
        return cls._instance

    def __repr__(self):
        return "NOT_GIVEN()"

def query_database(name_filter: Union[str, None, NOT_GIVEN] = NOT_GIVEN(),
                   age_filter: Union[int, None, NOT_GIVEN] = NOT_GIVEN()):
    query = "SELECT * FROM users"

    where_clauses = []
    if name_filter is not NOT_GIVEN:
        if name_filter is None:
            where_clauses.append("name IS NULL")
        else:
            where_clauses.append(f"name = '{name_filter}'")

    if age_filter is not NOT_GIVEN:
        if age_filter is None:
            where_clauses.append("age IS NULL")
        else:
            where_clauses.append(f"age = {age_filter}")

    if where_clauses:
        query += " WHERE " + " AND ".join(where_clauses)

    return query

# Usage examples
print(query_database(name_filter="Alice"))       # Filter by name 'Alice'
print(query_database(age_filter=None))           # Filter by age being NULL
print(query_database())                          # No filters applied
print(query_database(name_filter=None, age_filter=30))  # Filter by name being NULL and age 30
```

#### Explanation

-   **Specific Value Provided:** If a specific value is provided (like `"Alice"`
    for `name_filter`), the function includes this in the query as a filter.
-   **`None` Value Provided:** If `None` is passed (like `None` for
    `age_filter`), the function interprets this as a requirement to include
    records where the corresponding field is `NULL`.
-   **No Value Provided (`NOT_GIVEN`):** If no value is provided, the function
    does not include the corresponding filter in the query. This is different
    from filtering where the field is `NULL`.

This example showcases a scenario where the distinction made by `NOT_GIVEN`
significantly alters the behavior of the function, demonstrating its practical
utility in a real-world context.


## NotGiven vs Missing

The choice between using a sentinel like `MISSING` versus `NOTGIVEN` often depends on the specific context and semantics you want to convey in your code. Let's explore the typical use cases for each to understand when one might be more appropriate than the other.

### `NOTGIVEN`

- **Typical Use Case:** `NOTGIVEN` is generally used to represent the absence of a value in scenarios where `None` is a valid and meaningful input. This is particularly relevant in function arguments where you need to distinguish between "no argument provided" and "argument explicitly set to None."

- **Example Context:** Consider a function with an optional parameter where `None` has a specific semantic meaning (like turning off a feature or using a default setting). If you also need to implement a different default behavior when the user does not provide any value, `NOTGIVEN` can be used to make this distinction.

- **Code Example:**
  ```python
  def configure(setting=None, flag=NOTGIVEN):
      if flag is NOTGIVEN:
          # Apply some default behavior
      elif flag is None:
          # Disable the feature
      else:
          # Use the provided flag value
  ```

### `MISSING`

- **Typical Use Case:** `MISSING` is often used in data structures or configurations to indicate that a value is missing or has not been set. It's particularly useful in contexts like dictionaries, APIs, or data processing where you need to differentiate between a value that is intentionally set to `None` and a value that is not provided at all.

- **Example Context:** In a configuration dictionary where each key is supposed to map to a specific value, `MISSING` could be used to represent keys that have not been assigned a value yet. It signals that the value is expected but not available, which is different from being intentionally set to `None`.

- **Code Example:**
  ```python
  config = {
      "timeout": 30,
      "mode": MISSING,  # Indicates that the mode setting is yet to be configured
  }
  if config["mode"] is MISSING:
      # Handle the case where mode is not set
  ```

### Summary

- Use `NOTGIVEN` to explicitly indicate that no value has been provided for a parameter, especially when `None` is a valid input with a specific meaning.
- Use `MISSING` to represent an absent or unassigned value in data structures or configurations, where you need to differentiate between an unassigned state and a value explicitly set to `None`.

The choice depends on what you're trying to communicate: `NOTGIVEN` emphasizes the behavior of function arguments, while `MISSING` emphasizes the state of data or configuration.

Is this more clear?

## Overload

Use

```
class Unsupervised:
    def __repr__(self):
        return "Unsupervised()"

UNSUPERVISED = Unsupervised()

class BaseEstimator(ABC):
    @overload
    def fit(self, X: T, y: T) -> BaseEstimator:
        """Overload for supervised learning."""

    @overload
    def fit(self, X: T, y: Unsupervised = UNSUPERVISED) -> BaseEstimator:
        """Overload for unsupervised learning."""

    @abstractmethod
    def fit(self, X: T, y: Union[T, Unsupervised] = UNSUPERVISED) -> BaseEstimator:
        """
        Fit the model according to the given training data.
        
        For supervised learning, y should be the target data.
        For unsupervised learning, y should be Unsupervised.
        """
        pass

# Example subclass
class MyEstimator(BaseEstimator):
    def fit(self, X: T, y: Union[T, Unsupervised] = UNSUPERVISED) -> BaseEstimator:
        if y is UNSUPERVISED:
            # Unsupervised learning logic
            ...
        else:
            # Supervised learning logic
            ...
        return self
```

## Covariant?

In the context of type generics and `TypeVar` in Python, covariance (covariant) refers to a way of defining how types can change through inheritance in a type-safe manner.

To understand covariance, it's important to first grasp the concepts of subtyping and inheritance. In object-oriented programming, a class can inherit from another class, becoming a subtype of it. For instance, if you have a class `Animal` and a class `Dog` that inherits from `Animal`, `Dog` is a subtype of `Animal`.

Now, let's consider generics and `TypeVar`:

- `TypeVar` is a utility for defining generic types in Python. You can create a `TypeVar` with or without constraints, and it can be invariant, covariant, or contravariant.
- A type variable is covariant if it allows a subtype relationship to be transferred from its base types to the constructed types. In simpler terms, if `Dog` is a subtype of `Animal`, and you have a generic type `Container[T]`, then `Container[Dog]` can be considered a subtype of `Container[Animal]` if `T` is defined as covariant.

Here's a basic example:

```python
from typing import TypeVar, Generic

T = TypeVar('T', covariant=True)

class Container(Generic[T]):
    ...
```

In this example, `T` is covariant. This means if you have a function that expects `Container[Animal]`, you can safely pass `Container[Dog]` to it, because `Dog` is a subtype of `Animal`, and due to covariance, `Container[Dog]` is considered a subtype of `Container[Animal]`.

Covariance is particularly useful for return types. If a method in a base class returns a type `T`, and in a derived class this method returns a subtype of `T`, this is a safe and natural use of covariance.

However, covariance has its limitations and isn't suitable for all situations, especially when dealing with method arguments where contravariance might be more appropriate. Proper use of covariance in generics ensures type safety and consistency, adhering to the Liskov Substitution Principle in object-oriented design.

# Transformer

Let's use the sentence "The cat walks by the bank" to walk through the
self-attention mechanism with analogies and to clarify how it works step by
step.

**Setting the Scene (Embedding the Sentence):** Imagine each word in the
sentence is a person at a party (our tokens). They start by telling a basic fact
about themselves (their initial embedding).

**The Roles:**

-   **Q (Seekers)**: Each person (word) is curious about the stories (contexts)
    of others at the party. They have their own perspective or question (Q
    vector).
-   **K (Holders)**: At the same time, each person has a name tag with keywords
    that describe their story (K vector).
-   **V (Retrievers)**: They also hold a bag of their experiences (V vector),
    ready to share.

**Transformations (Applying W Matrices):** We give each person a set of glasses
(the matrices $W_Q, W_K, W_V$) that changes how they see the world (the space
they project to).

-   With $W_Q$ glasses, they focus on what they want to know from others.
-   With $W_K$ glasses, they highlight their name tag details, making some
    features stand out more.
-   With $W_V$ glasses, they prepare to share the contents of their bag
    effectively.

**Attention (Calculating Q @ K.T):** Now, each person looks around the room
(sequence) with their $W_Q$ glasses and sees the highlighted name tags (after
$W_K$ transformation) of everyone else. They measure how similar their question
is to the others' name tags—this is the dot product $Q @ K^T$.

For "cat," let’s say it’s curious about the notion of "walking" and "bank." It
will measure the similarity (attention scores) between its curiosity and the
name tags of "walks," "by," "the," "bank."

**Normalization (Softmax):** After measuring, "cat" decides how much to focus on
each story—this is softmax. Some stories are very relevant ("walks"), some
moderately ("by," "the"), and some might be highly relevant depending on context
("bank" — is it a river bank or a financial institution?).

**Retrieval (Applying Attention to V):** Now "cat" decides to listen to the
stories in proportion to its focus. It takes pieces (weighted by attention
scores) from each person's experience bag (V vectors) and combines them into a
richer, contextual understanding of itself in the sentence. This combination
gives us the new representation of "cat," informed by the entire context of the
sentence.

In essence:

-   **Q (Query):** What does "cat" want to know?
-   **K (Key):** Who has relevant information to "cat"’s curiosity?
-   **V (Value):** What stories does "cat" gather from others, and how much does
    it take from each to understand its role in the sentence?

The output of self-attention for "cat" now encapsulates not just "cat" but its
relationship and relevance to "walks," "by," "the," "bank" in a way that no
single word could convey alone. This output then becomes the input to the next
layer, where the process can repeat, enabling the model to develop an even more
nuanced understanding.
