pockerman · pockerman · Feb 17, 2022 · Feb 16, 2022 · Feb 16, 2022 · Feb 17, 2022
diff --git a/README.md b/README.md
@@ -56,7 +56,7 @@ The images below show the overall running distortion average and running reward
 
 The following packages are required. 
 
-- NumPy
+- <a href="#">NumPy</a>
 - <a href="https://www.sphinx-doc.org/en/master/">Sphinx</a> 
 - <a href="#">Python Pandas</a>
 

diff --git a/build_sphinx_doc.sh b/build_sphinx_doc.sh
@@ -0,0 +1,4 @@
+#sphinx-quickstart docs
+
+sphinx-apidoc -f -o docs/source docs/projectdir
+#sphinx-build -b html docs/source/ docs/build/html
diff --git a/docs/source/API/actions.rst b/docs/source/API/actions.rst
@@ -0,0 +1,44 @@
+actions
+=======
+
+.. automodule:: actions
+
+
+
+
+
+
+
+
+
+
+
+   .. rubric:: Classes
+
+   .. autosummary::
+
+.. autoclass:: ActionBase
+   :members: __init__, act
+
+.. autoclass:: ActionIdentity
+   :members: __init__, act
+
+      ActionNumericBinGeneralize
+      ActionNumericStepGeneralize
+      ActionRestore
+
+.. autoclass:: ActionStringGeneralize
+   :members: __init__, act, add
+
+      ActionSuppress
+      ActionTransform
+      ActionType
+
+
+
+
+
+
+
+
+
diff --git a/docs/source/examples.rst b/docs/source/examples.rst
@@ -0,0 +1,7 @@
+Examples
+========
+
+Some examples can be found below
+
+- `Qlearning agent on a three columns dataset <src/examples/qlearning_three_columns.py>`_
+- `N-step semi-gradient SARSA on  a three columns dataset <src/examples/nstep_semi_grad_sarsa_three_columns.py>`_
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -0,0 +1,30 @@
+.. RL Anonymity (with Python) documentation master file, created by
+   sphinx-quickstart on Tue Feb 15 17:55:45 2022.
+   You can adapt this file completely to your liking, but it should at least
+   contain the root `toctree` directive.
+
+Welcome to RL Anonymity (with Python)'s documentation!
+======================================================
+
+An experimental effort to use reinforcement learning techniques for data anonymization. 
+
+Contents
+--------
+
+.. toctree::
+   :maxdepth: 2
+
+
+   overview
+   install
+   examples
+   modules
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`modindex`
+* :ref:`search`
+
+
diff --git a/docs/source/install.rst b/docs/source/install.rst
@@ -0,0 +1,27 @@
+Installation
+============
+
+The following packages are required:
+
+- `NumPy <https://numpy.org/>`_
+- `Sphinx <https://www.sphinx-doc.org/en/master/>`_
+- `Python Pandas <https://pandas.pydata.org/>`_
+
+.. code-block:: console
+
+	pip install -r requirements.txt
+
+
+Generate documentation
+======================
+
+You will need `Sphinx <https://www.sphinx-doc.org/en/master/>`_ in order to generate the API documentation. Assuming that Sphinx is already installed
+on your machine execute the following commands (see also `Sphinx tutorial <https://www.sphinx-doc.org/en/master/tutorial/index.html>`_). 
+
+.. code-block:: console
+
+	sphinx-quickstart docs
+	sphinx-build -b html docs/source/ docs/build/html
+
+
+
diff --git a/docs/source/modules.rst b/docs/source/modules.rst
@@ -0,0 +1,22 @@
+API
+===
+
+.. toctree::
+   :maxdepth: 4
+
+   API/actions
+   generated/action_space
+   generated/q_estimator
+   generated/q_learning
+   generated/trainer
+   generated/sarsa_semi_gradient
+   generated/exceptions
+   generated/action_space
+   generated/actions
+   generated/column_type
+   generated/discrete_state_environment
+   generated/observation_space
+   generated/state
+   generated/time_step
+   generated/tiled_environment
+
diff --git a/docs/source/overview.rst b/docs/source/overview.rst
@@ -0,0 +1,25 @@
+Conceptual overview
+===================
+
+The term data anonymization refers to techiniques that can be applied on a given dataset, D, such that after
+the latter has been submitted to such techniques, it makes it difficult for a third party to identify or infer the existence
+of specific individuals in D. Anonymization techniques, typically result into some sort of distortion
+of the original dataset. This means that in order to maintain some utility of the transformed dataset, the transofrmations
+applied should be constrained in some sense. In the end, it can be argued, that data anonymization is an optimization problem
+meaning striking the right balance between data utility and privacy. 
+
+Reinforcement learning is a learning framework based on accumulated experience. In this paradigm, an agent is learning by iteracting with an environment 
+without (to a large extent) any supervision. The following image describes, schematically, the reinforcement learning framework .
+
+![RL paradigm](images/agent_environment_interface.png "Reinforcement learning paradigm") 
+
+The agent chooses an action, ```a_t```, to perform out of predefined set of actions ```A```. The chosen action is executed by the environment
+instance and returns to the agent a reward signal, ```r_t```, as well as the new state, ```s_t```, that the enviroment is in. 
+The framework has successfully been used  to many recent advances in control, robotics, games and elsewhere.
+
+
+Let's assume that we have in our disposal two numbers a minimum distortion, ```MIN_DIST``` that should be applied to the dataset
+for achieving privacy and a maximum distortion, ```MAX_DIST```,  that should be applied to the dataset in order to maintain some utility.
+Let's assume also that any overall dataset distortion in ```[MIN_DIST, MAX_DIST]``` is acceptable in order to cast the dataset as 
+preserving  privacy and preserving dataset utility. We can then train a reinforcement learning agent to distort the dataset
+such that the aforementioned objective is achieved.
diff --git a/requirements.txt b/requirements.txt
@@ -2,3 +2,4 @@ numpy==1.20.2
 pandas==1.1.3
 gym==0.18.0
 textdistance==4.2.0
+numpydoc==1.2
diff --git a/src/algorithms/q_learning.py b/src/algorithms/q_learning.py
@@ -14,9 +14,7 @@
 
 
 class QLearnConfig(object):
-    """
-    Configuration  for Q-learning
-    """
+    """Configuration  for Q-learning"""
     def __init__(self):
         self.gamma: float = 1.0
         self.alpha: float = 0.1
@@ -25,11 +23,13 @@ def __init__(self):
 
 
 class QLearning(WithMaxActionMixin):
-    """
-    Q-learning algorithm implementation
-    """
+    """Q-learning algorithm implementation"""
 
     def __init__(self, algo_config: QLearnConfig):
+        """
+        Constructor. Constructs an untrained agent
+        :param algo_config: Configuration parameters
+        """
         super(QLearning, self).__init__()
         self.q_table = {}
         self.config = algo_config