## Fake data to visualize MWEM's histograms
MWEM works by first creating a uniformly distributed histogram out of real data. It then iteratively updates this histogram with noisy samples from the real data. In other words, using the multiplicative weights mechanism, MWEM updates the histograms "weights" via the DP exponential mechanism (for querying the original data).

Here, we create a heatmap from the histograms. We visualize the histogram made from the real data, and the differentially private histogram. Brighter values correspond to more higher probability bins in each histogram.

In [None]:
import os
import pandas as pd
import numpy as np

import matplotlib as mpl
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

from opendp.whitenoise.synthesizers.mwem import MWEMSynthesizer

In [None]:
def plot_histo(title,histo):
    fig = plt.figure(figsize=(6, 6))
    ax = fig.add_subplot(111)
    ax.set_title(title)
    plt.imshow(histo)
    ax.set_aspect('equal')
    cax = fig.add_axes([0.1, 1.0, 1., 0.1])
    cax.get_xaxis().set_visible(False)
    cax.get_yaxis().set_visible(False)
    cax.set_frame_on(False)
    plt.colorbar(orientation='horizontal')
    plt.show()


In [None]:
df = pd.read_csv('datasets/fake_data_2d.csv')
nf = df.to_numpy()

synth = MWEMSynthesizer(400, 10.0, 30, 20,[[0,1]])
synth.fit(nf)

plot_histo('"Real" Data', synth.synthetic_histograms[0][1])
plot_histo('"Fake" Data', synth.synthetic_histograms[0][0])

## Effect of Bin Count
Here we can visualize the effect of specifying a max_bin_count. In the original data, we have 100 bins. If we halve that, we see that we still do a pretty good job at capturing the overall distribution.

In [None]:
df = pd.read_csv('fake_data_2d.csv')
nf = df.to_numpy()

synth = MWEMSynthesizer(400, 10.0, 30, 20,[[0,1]], max_bin_count=50)
synth.fit(nf)

plot_histo('"Real" Data', synth.synthetic_histograms[0][1])
plot_histo('"Fake" Data', synth.synthetic_histograms[0][0])