Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ src/policies/__pycache__/
src/apps/__pycache__/
src/.coverage
src/maths/__pycache__/
src/networks/__pycache__/
84 changes: 2 additions & 82 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,86 +3,6 @@

# RL anonymity (with Python)

An experimental effort to use reinforcement learning techniques for data anonymization.

## Conceptual overview

The term data anonymization refers to techiniques that can be applied on a given dataset, D, such that after
the latter has been submitted to such techniques, it makes it difficult for a third party to identify or infer the existence
of specific individuals in D. Anonymization techniques, typically result into some sort of distortion
of the original dataset. This means that in order to maintain some utility of the transformed dataset, the transofrmations
applied should be constrained in some sense. In the end, it can be argued, that data anonymization is an optimization problem
meaning striking the right balance between data utility and privacy.

Reinforcement learning is a learning framework based on accumulated experience. In this paradigm, an agent is learning by iteracting with an environment
without (to a large extent) any supervision. The following image describes, schematically, the reinforcement learning framework .

![RL paradigm](images/agent_environment_interface.png "Reinforcement learning paradigm")

The agent chooses an action, ```a_t```, to perform out of predefined set of actions ```A```. The chosen action is executed by the environment
instance and returns to the agent a reward signal, ```r_t```, as well as the new state, ```s_t```, that the enviroment is in.
The framework has successfully been used to many recent advances in control, robotics, games and elsewhere.


Let's assume that we have in our disposal two numbers a minimum distortion, ```MIN_DIST``` that should be applied to the dataset
for achieving privacy and a maximum distortion, ```MAX_DIST```, that should be applied to the dataset in order to maintain some utility.
Let's assume also that any overall dataset distortion in ```[MIN_DIST, MAX_DIST]``` is acceptable in order to cast the dataset as
preserving privacy and preserving dataset utility. We can then train a reinforcement learning agent to distort the dataset
such that the aforementioned objective is achieved.

Overall, this is shown in the image below.

![RL anonymity paradigm](images/general_concept.png "Reinforcement learning anonymity schematics")

The images below show the overall running distortion average and running reward average achieved by using the
<a href="https://en.wikipedia.org/wiki/Q-learning">Q-learning</a> algorithm and various policies.

**Q-learning with epsilon-greedy policy and constant epsilon**
![RL anonymity paradigm](images/q_learn_epsilon_greedy_avg_run_distortion.png "Epsilon-greedy constant epsilon ")
![RL anonymity paradigm](images/q_learn_epsilon_greedy_avg_run_reward.png "Reinforcement learning anonymity schematics")

**Q-learning with epsilon-greedy policy and decaying epsilon per episode**
![RL anonymity paradigm](images/q_learn_epsilon_greedy_decay_avg_run_distortion.png "Reinforcement learning anonymity schematics")
![RL anonymity paradigm](images/q_learn_epsilon_greedy_decay_avg_run_reward.png "Reinforcement learning anonymity schematics")


**Q-learning with epsilon-greedy policy with decaying epsilon at constant rate**
![RL anonymity paradigm](images/q_learn_epsilon_greedy_decay_rate_avg_run_distortion.png "Reinforcement learning anonymity schematics")
![RL anonymity paradigm](images/q_learn_epsilon_greedy_decay_rate_avg_run_reward.png "Reinforcement learning anonymity schematics")

**Q-learning with softmax policy running average distorion**
![RL anonymity paradigm](images/q_learn_softmax_avg_run_distortion.png "Reinforcement learning anonymity schematics")
![RL anonymity paradigm](images/q_learn_softmax_avg_run_reward.png "Reinforcement learning anonymity schematics")


## Dependencies

The following packages are required.

- <a href="#">NumPy</a>
- <a href="https://www.sphinx-doc.org/en/master/">Sphinx</a>
- <a href="#">Python Pandas</a>

You can use

```
pip install -r requirements.txt
```

## Examples

- <a href="src/examples/qlearning_three_columns.py"> Qlearning agent on a three columns dataset</a>
- <a href="src/examples/nstep_semi_grad_sarsa_three_columns.py"> n-step semi-gradient SARSA on a three columns dataset</a>

## Documentation

You will need <a href="https://www.sphinx-doc.org/en/master/">Sphinx</a> in order to generate the API documentation. Assuming that Sphinx is already installed
on your machine execute the following commands (see also <a href="https://www.sphinx-doc.org/en/master/tutorial/index.html">Sphinx tutorial</a>).

```
sphinx-quickstart docs
sphinx-build -b html docs/source/ docs/build/html
```

## References
An experimental effort to use reinforcement learning techniques for data anonymization. The project documentation
can be found at <a href="https://rl-anonymity-with-python.readthedocs.io/en/latest/index.html">RL anonymity (with Python)</a>

7 changes: 7 additions & 0 deletions docs/source/API/networks/a2c_networks.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
a2c\_networks
=============

.. automodule:: a2c_networks

.. autoclass:: A2CNetSimpleLinear
:members: __init__, forward
File renamed without changes.
File renamed without changes.
File renamed without changes.
16 changes: 13 additions & 3 deletions docs/source/Examples/a2c_three_columns.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,19 @@ However, the true objective of reinforcement learning is to directly learn a pol

The main advantage of learning a parametrized policy is that it can be any learnable function e.g. a linear model or a deep neural network.

The A2C algorithm falls under the umbrella of actor-critic methods [REF]. In these methods, we estimate a parametrized policy; the actor
and a parametrized value function; the critic.
The A2C algorithm is a a synchronous version of A3C. Both algorithms, fall under the umbrella of actor-critic methods [REF]. In these methods, we estimate a parametrized policy; the actor
and a parametrized value function; the critic. The role of the policy or actor network is to indicate which action to take on a given state. In our implementation below,
the policy network returns a probability distribution over the action space. Specifically, a tensor of probabilities. The role of the critic model is to evaluate how good is
the action that is selected.

In A2C there is a single agent that interacts with multiple instances of the environment. In other words, we create a number of workers where each worker loads its own instance
of the data set to anonymize. A shared model is then optimized by each worker.

We can use neural networks to approximate both models


Specifically, we will use a weight-sharing model. Moreover, the environment is a multi-process class that gathers samples from multiple
emvironments at once
emvironments at once.

Code
----
2 changes: 2 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@
sys.path.append(os.path.abspath("../../src/policies/"))
sys.path.append(os.path.abspath("../../src/maths/"))
sys.path.append(os.path.abspath("../../src/utils/"))
sys.path.append(os.path.abspath("../../src/datasets/"))
sys.path.append(os.path.abspath("../../src/networks/"))
print(sys.path)


Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/images/general_concept.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 3 additions & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,11 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.

Welcome to RL Anonymity (with Python)'s documentation!
======================================================
RL Anonymity (with Python)
==========================

An experimental effort to use reinforcement learning techniques for data anonymization.
The project repository is at `RL anonymity (with Python) <https://github.com/pockerman/rl_anonymity_with_python>`_.

Contents
--------
Expand Down
5 changes: 4 additions & 1 deletion docs/source/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,17 @@ The following packages are required:
- `NumPy <https://numpy.org/>`_
- `Sphinx <https://www.sphinx-doc.org/en/master/>`_
- `Python Pandas <https://pandas.pydata.org/>`_
- `PyTorch <https://pytorch.org/>`_

.. code-block:: console

pip install -r requirements.txt

Run tests
---------

Generate documentation
======================
----------------------

You will need `Sphinx <https://www.sphinx-doc.org/en/master/>`_ in order to generate the API documentation. Assuming that Sphinx is already installed
on your machine execute the following commands (see also `Sphinx tutorial <https://www.sphinx-doc.org/en/master/tutorial/index.html>`_).
Expand Down
24 changes: 14 additions & 10 deletions docs/source/modules.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,24 @@ API
.. toctree::
:maxdepth: 4

API/actions
API/action_space
API/state
API/time_step
API/epsilon_greedy_policy
API/epsilon_greedy_q_estimator
API/q_learning
API/trainer
API/optimizer_type
API/pytorch_optimizer_builder
API/datasets/column_type
API/exceptions/exceptions
API/maths/optimizer_type
API/maths/pytorch_optimizer_builder
API/networks/a2c_networks
API/spaces/actions
API/spaces/action_space
API/spaces/state
API/spaces/discrete_state_environment
API/spaces/tiled_environment
API/spaces/time_step
API/replay_buffer
API/a2c
API/exceptions
API/column_type
API/discrete_state_environment
API/tiled_environment




72 changes: 60 additions & 12 deletions docs/source/overview.rst
Original file line number Diff line number Diff line change
@@ -1,25 +1,73 @@
Conceptual overview
===================

The term data anonymization refers to techiniques that can be applied on a given dataset, D, such that after
the latter has been submitted to such techniques, it makes it difficult for a third party to identify or infer the existence
of specific individuals in D. Anonymization techniques, typically result into some sort of distortion
The term data anonymization refers to techiniques that can be applied on a given dataset, :math:`D`, such that it makes it difficult for a third party to identify or infer the existence
of specific individuals in :math:`D`. Anonymization techniques, typically result into some sort of distortion
of the original dataset. This means that in order to maintain some utility of the transformed dataset, the transofrmations
applied should be constrained in some sense. In the end, it can be argued, that data anonymization is an optimization problem
meaning striking the right balance between data utility and privacy.

Reinforcement learning is a learning framework based on accumulated experience. In this paradigm, an agent is learning by iteracting with an environment
Reinforcement learning is a learning framework based on accumulated experience. In this paradigm, an agent is learning by iteracting with an environment
without (to a large extent) any supervision. The following image describes, schematically, the reinforcement learning framework .

![RL paradigm](images/agent_environment_interface.png "Reinforcement learning paradigm")
.. figure:: images/agent_environment_interface.png

The agent chooses an action, ```a_t```, to perform out of predefined set of actions ```A```. The chosen action is executed by the environment
instance and returns to the agent a reward signal, ```r_t```, as well as the new state, ```s_t```, that the enviroment is in.
Reinforcement learning paradigm.


The agent chooses an action, :math:`A_t \in \mathbb{A}`, to perform out of predefined set of actions :math:`\mathbb{A}`. The chosen action is executed by the environment
instance and returns to the agent a reward signal, :math:`R_{t+1}`, as well as the new state, :math:`S_{t + 1}`, that the enviroment is in.
The overall goal of the agent is to maximize the expected total reward i.e.

.. math::

max E\left[R\right]


The framework has successfully been used to many recent advances in control, robotics, games and elsewhere.

In this work we are intersted in applying reinforcment learning techniques, in order to train agents to optimally anonymize a given
data set. In particular, we want to consider the following two scenarios

- A tabular data set is to be publicly released
- A data set is behind a restrictive API that allows users to perform certain queries on the hidden data set.

For the first scenario, let's assume that we have in our disposal two numbers :math:`DIST_{min}` and :math:`DIST_{max}`. The former indicates
the minimum total data set distortion that it should be applied in order to satisfy some minimum safety criteria. The latter indicates
the maximum total data set distortion that it should be applied in order to satisfy some utility criteria. Note that the same idea can be
applied to enforce constraints on how much a column should be distorted. Furtheremore, let's assume the most common transformations applied
for data anonymization

- Generalization
- Suppresion
- Permutation
- Pertubation
- Anatomization

We can conceive the above transformations as our action set :math:`\mathbb{A}`. We can now cast the data anonymity problem into a form
suitable for reinforcement learning. Specifically, our goal, and the agent's goal in that matter, is to obtain a policy $\pi$ of transformations such that by following $\pi$,
the data set total distortion will be into the interval :math:`[DIST_{min}, DIST_{max}]`. This is done by choosing actions/transformations from :math:`\mathbb{A}`.
This is shown schematically in the figure below

.. figure:: images/general_concept.png

Data anonymization using reinforcement learning.

Thus the environment is our case is an entity that encapsulates the original data set and controls the actions applied on it as well as the
reward signal :math:`R_{t+1}` and the next state :math:`S_{t+1}` to be presented to the agent.

Nevertheless, there are some caveats that we need to take into account. We summarize these below.

First, we need a reward policy. The way we assign rewards implicitly
specifies the degree of supervision we allow. For instance we could allow for a reward to be assigned every time a transformation is applied.
This strategy allows for faster learning but it leaves little room for the agent to come up with novel strategies. In contrast,
returning a reward at the end of the episode, although it increases the training time, it allows the agent to explore novel strategies.
Related to the reward assignement is also the follwing issue. We need to reward the agent in a way that it is convinced that it should
explore transformations. This is important as we don't want to the agent to simply exploit around the zero distortion point.
The second thing we need to take into account is that the metric we use to measure the data set distortion plays an important role.
Thirdly, we need to hold into memory two copies of the data set. One copy that no distortion is applied and one copy that we distort somehow
during an episode. We need this setting so that we are able to compute the column distortions. Fourthly, we need to establish the episode
termination criteria i.e. when do we consider that an episode is complete. Finally, as we assume that a data set may contain strings, floating point
numbers as well as integers, then computed distortions are normalized. This is needed in order to avoid having large column distortions, e.g. consider a salary column being distorted,
and also being able to sum all the column distortions in a meanigful way.

Let's assume that we have in our disposal two numbers a minimum distortion, ```MIN_DIST``` that should be applied to the dataset
for achieving privacy and a maximum distortion, ```MAX_DIST```, that should be applied to the dataset in order to maintain some utility.
Let's assume also that any overall dataset distortion in ```[MIN_DIST, MAX_DIST]``` is acceptable in order to cast the dataset as
preserving privacy and preserving dataset utility. We can then train a reinforcement learning agent to distort the dataset
such that the aforementioned objective is achieved.
Loading