# OpenDP Framework Deep Dive

This notebook is an extension to the `basic_data_analysis.ipynb`.
We'll be going into greater detail on the internals of the OpenDP framework.

Great! We've successfully transformed CSV data into an integer vector from the income column.

### Transformation Structure
The general approach is that we can build up lengthy computation chains out of isolated transformations.
Each constituent transformation represents an isolated unit of computation with provable stability properties.

The following snip shows the definition of a Transformation in the core rust library.
```rust
pub struct Transformation<DI: Domain, DO: Domain, MI: Metric, MO: Metric> {
    pub input_domain: DI,
    pub output_domain: DO,
    pub function: Function<DI, DO>,
    pub input_metric: MI,
    pub output_metric: MO,
    pub stability_relation: StabilityRelation<MI, MO>,
}
```

Lets explain each of the struct members.
```rust
    ...
    pub input_domain: DI,
    pub output_domain: DO,
    ...
```
The input and output domain strictly defines permissible input and output values.
When you attempt to chain any two transformations, the output domain of the first transformation must match the input domain of the second transformation.
The resulting chained transformation contains the input domain of the first transformation, the output domain of the second transformation, as well as the two functions composed.

```rust
    ...
    pub function: Function<DI, DO>,
    ...
```
When we invoked the following transformation, the python data structure was translated into a low-level C representation, the rust `function` was evaluated, and the result shipped back out to familiar python data structures.

In [35]:
cast_str_int(["null", "1.", "2", "456"])

[0, 0, 2, 456]

We also have input and output metrics.
```rust
    ...
    pub input_metric: MI,
    pub output_metric: MO,
    ...
```
Examples of metrics are `HammingDistance`, `SymmetricDistance`, `AbsoluteDistance` and `L1Distance`. 
They behave in the same way that the input and output domains do when chaining.

```rust
    ...
    pub stability_relation: StabilityRelation<MI, MO>,
    ...
```
Finally, the stability relation. 
It is a function that takes in an input and output distance, in the respective metric spaces, and returns a boolean.
The function relates the input and output distances, returning False if the output distance is too small with respect to the input distance.

For example, we know that the casting transformation is row-by-row, so we should expect that for any symmetric distance `a`, the pair of distances (`a`, `a`) are related.

In [36]:
a = 3
cast_str_int.check(d_in=a, d_out=a)

True

In [37]:
cast_str_int.check(d_in=a, d_out=a - 1)

False

When any two compatible transformations are chained, the resulting transformation contains a functional composition of the relations.

Ultimately, all pieces are used to construct the new transformation:

| input | chaining | output |
|---:|:---:|:---|
| input_domain_1 | output_domain_1 == input_domain_2 | output_domain_2 |
| function_1 |composed with| function_2 |
| input_metric_1 | output_metric_1 == input_metric_2 | output_metric_2 |
| stability_relation_1 | composed with | stability_relation_2 |

As you've seen above, when we want to create a transformation, we use "constructor" functions. These are, by convention, prefixed with `make_`.

An example implementation of a casting transformation constructor is provided. I'll break it down into three parts.

```rust
// 1.
pub fn make_cast_default<DIA, TOA>() -> Fallible<Transformation<
    VectorDomain<AllDomain<TIA>>, VectorDomain<AllDomain<TOA>>, 
    SymmetricDistance, SymmetricDistance>>

    // 2.
    where TIA: 'static + Clone + CheckNull, 
          TOA: 'static + RoundCast<TIA> + Default + CheckNull {

    // 3.
    Ok(Transformation::new(
        VectorDomain::new(AllDomain::new()),
        VectorDomain::new(AllDomain::new()),
        Function::new(move |arg: &Vec<TIA>|
            arg.iter().map(|v| TOA::round_cast(v.clone()).unwrap_or_default()).collect()),
        SymmetricDistance::new(),
        SymmetricDistance::new(),
        StabilityRelation::new_from_constant(1)))
}
```

The first part is the function signature:
```rust
pub fn make_cast_default<TIA, TOA>() -> Fallible<Transformation<
    VectorDomain<AllDomain<TIA>>, VectorDomain<AllDomain<TOA>>, 
    SymmetricDistance, SymmetricDistance>>
    ...
```
Most of the signature consists of types. 
Rust is strictly typed, so the code needs to be very explicit about what the type of the constructor function's inputs and outputs are. 

This is a generic function with two type arguments `TIA` and `TOA`, standing for "atomic input type" and "atomic output type".
There are zero first-class arguments `()`.

The constructor returns a fallible transformation.
The last two lines specify the types of the input/output domains/metrics.

The second part is the where clause:
```rust
    ...
    where TIA: 'static + Clone + CheckNull, 
          TOA: 'static + RoundCast<TIA> + Default + CheckNull {
    ...
```
A where clause lists bounds on the acceptable types to be used in the function.
You can interpret this as, "the compiler will enforce that `TIA` must be some type that has the `Clone` and `CheckNull` traits. 
In other words, while I don't specify what `TIA` must be up-front, I can bound what type it may be to types that are cloneable and have some concept of null-checking.
`TOA`, in particular, has a `RoundCast` trait, which can be used to cast from type `TIA` to `TOA`. 

The final part is the function body, which just creates and implicitly returns a Transformation struct.
```rust
    ...
    Ok(Transformation::new(
        VectorDomain::new(AllDomain::new()),
        VectorDomain::new(AllDomain::new()),
        Function::new(move |arg: &Vec<TIA>|
            arg.iter().map(|v| TOA::round_cast(v.clone()).unwrap_or_default()).collect()),
        SymmetricDistance::new(),
        SymmetricDistance::new(),
        StabilityRelation::new_from_constant(1)))
}
```
Each argument corresponds to a struct member.
We take advantage of a handy syntax for creating un-named functions:
In the example function addition function, `|a, b| a + b`. takes two arguments, `a` and `b`. The function body is `a + b`.

With this shorthand in-hand, we create a function that casts the data by iterating over each record `v`, casting, and replacing nulls with the default value for the type.

We also take advantage of a convenient constructor for building `c`-stable relations.
Since the cast function is row-by-row, it is 1-stable.

### Measurement Structure

Measurements are very similar to Transformations, with two key differences.

```rust
pub struct Measurement<DI: Domain, DO: Domain, MI: Metric, MO: Measure> {
    pub input_domain: DI,
    pub output_domain: DO,
    pub function: Function<DI, DO>,
    pub input_metric: MI,
    pub output_measure: MO,
    pub privacy_relation: PrivacyRelation<MI, MO>,
}
```

First, the `output_metric` is replaced with an `output_measure`, as distances in the output space are measured in terms of divergences between probability distributions.

Second, the name of the relation has changed from a stability relation to a privacy relation. 
This is because the relation between distances now carries meaning with respect to privacy.

For more information... 
- Docs website: https://docs.opendp.org
- Github Repo:  https://github.com/opendp/opendp


### Contrib, Vetting and Proofs

As mentioned before, much of the library is still in "contrib".
A requirement of the vetting process is having the code supported by a proof document. 
The library is designed to make this as easy as possible, because it consists of modular building blocks (Transformations and Measurements) for which encapsulated proofs may be written.

Each Transformation or Measurement proof must show the following:
1. That the function, when evaluated on any element in the input domain, emits a value in the output domain.
2. That the relation always returns false if the function is not (d_in, d_out)-close for all d_in and d_out.
3. That your choices of metrics/measures are compatible with your domains.

