Merge pull request #27 from sankhaMukherjee/dev

added documentation
sankhaMukherjee · Jun 5, 2019 · 620442f · 620442f
2 parents dafce6b + 65f1e05
commit 620442f
Show file tree

Hide file tree

Showing 5 changed files with 139 additions and 2 deletions.
diff --git a/docs/_modules/lib/agents/Agent_DQN.html b/docs/_modules/lib/agents/Agent_DQN.html
@@ -152,6 +152,54 @@ <h1>Source code for lib.agents.Agent_DQN</h1><div class="highlight"><pre>
 <span class="kn">import</span> <span class="nn">sys</span>
 
 <div class="viewcode-block" id="Agent_DQN"><a class="viewcode-back" href="../../../lib.agents.html#lib.agents.Agent_DQN.Agent_DQN">[docs]</a><span class="k">class</span> <span class="nc">Agent_DQN</span><span class="p">:</span>
+    <span class="sd">&#39;&#39;&#39;A class allowing the training of the DQN</span>
+
+<span class="sd">        This class is intended to be used by functions within the ``lib.agents.trainAgents``</span>
+<span class="sd">        module.</span>
+<span class="sd">        </span>
+<span class="sd">        The DQN algorithm was first proposed over some years ago and was slated to be used for </span>
+<span class="sd">        improving the state of affairs of traditional reinforcement learning and extending it</span>
+<span class="sd">        to deep reinforcement learning. This class allows you to easily set up a DQN learning</span>
+<span class="sd">        framework. This class does not care about the type of environment. Just that the action</span>
+<span class="sd">        an agent is able to take is one of a finite number of actions, each action at a particular</span>
+<span class="sd">        state has an associated Q-value. This algorithm attempts to find theright Q value for each</span>
+<span class="sd">        action.</span>
+
+<span class="sd">        The class itself does not care about the specifics of the state, and the Qnetworks that </span>
+<span class="sd">        calculate the results. It is up to the user to specify the right environment and the </span>
+<span class="sd">        associated networks that will allow the algorithm to solve the Bellman equation.</span>
+<span class="sd">        </span>
+<span class="sd">        [link to paper](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf)</span>
+<span class="sd">        </span>
+<span class="sd">        Parameters</span>
+<span class="sd">        ----------</span>
+<span class="sd">        env : instance of an Env class</span>
+<span class="sd">            The environment that will be used for generating the result of a particulat action</span>
+<span class="sd">            in the current state</span>
+<span class="sd">        memory : instance of the Memory class</span>
+<span class="sd">            The environment that will allow one to store and retrieve previously held states that</span>
+<span class="sd">            can be used to train upon.</span>
+<span class="sd">        qNetworkSlow : neural network instance</span>
+<span class="sd">            This is a neural network instance that can be used for converting a state into a</span>
+<span class="sd">            set of Q-values. This is the slower version, used for making a prediction, and is </span>
+<span class="sd">            never trained. Its parameters are slowly updated over time to slowly allow it to </span>
+<span class="sd">            converge to the right value</span>
+<span class="sd">        qNetworkFast : neural network instance</span>
+<span class="sd">            This is the instance of the faster network that can be used for training Q-learning</span>
+<span class="sd">            algorithm. This is the main network that implements the Bellman equation.</span>
+<span class="sd">        numActions : int</span>
+<span class="sd">            The number of discrete actions that the current environment can accept.</span>
+<span class="sd">        gamma : float</span>
+<span class="sd">            The discount factor. currently not used</span>
+<span class="sd">        device : str, optional</span>
+<span class="sd">            the device where you want to run your algorithm, by default &#39;cpu&#39;. If you want to run</span>
+<span class="sd">            the optimization of a particular GPU, you may specify that. For example with &#39;cuda:0&#39;</span>
+<span class="sd">        </span>
+<span class="sd">        Raises</span>
+<span class="sd">        ------</span>
+<span class="sd">        type</span>
+<span class="sd">            [description]</span>
+<span class="sd">    &#39;&#39;&#39;</span>
 
     <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">env</span><span class="p">,</span> <span class="n">memory</span><span class="p">,</span> <span class="n">qNetworkSlow</span><span class="p">,</span> <span class="n">qNetworkFast</span><span class="p">,</span> <span class="n">numActions</span><span class="p">,</span> <span class="n">gamma</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="s1">&#39;cpu&#39;</span><span class="p">):</span>
         <span class="sd">&#39;&#39;&#39;A class allowing the training of the DQN</span>

diff --git a/docs/_sources/index.rst.txt b/docs/_sources/index.rst.txt
@@ -1,5 +1,5 @@
 .. src documentation master file, created by
-   sphinx-quickstart on Wed Jun  5 12:34:41 2019.
+   sphinx-quickstart on Wed Jun  5 12:41:08 2019.
    You can adapt this file completely to your liking, but it should at least
    contain the root `toctree` directive.
 

diff --git a/docs/lib.agents.html b/docs/lib.agents.html
@@ -189,6 +189,47 @@ <h2>Submodules<a class="headerlink" href="#submodules" title="Permalink to this
 <dt id="lib.agents.Agent_DQN.Agent_DQN">
 <em class="property">class </em><code class="descclassname">lib.agents.Agent_DQN.</code><code class="descname">Agent_DQN</code><span class="sig-paren">(</span><em>env</em>, <em>memory</em>, <em>qNetworkSlow</em>, <em>qNetworkFast</em>, <em>numActions</em>, <em>gamma</em>, <em>device='cpu'</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/lib/agents/Agent_DQN.html#Agent_DQN"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#lib.agents.Agent_DQN.Agent_DQN" title="Permalink to this definition">¶</a></dt>
 <dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">object</span></code></p>
+<p>A class allowing the training of the DQN</p>
+<p>This class is intended to be used by functions within the <code class="docutils literal notranslate"><span class="pre">lib.agents.trainAgents</span></code>
+module.</p>
+<p>The DQN algorithm was first proposed over some years ago and was slated to be used for
+improving the state of affairs of traditional reinforcement learning and extending it
+to deep reinforcement learning. This class allows you to easily set up a DQN learning
+framework. This class does not care about the type of environment. Just that the action
+an agent is able to take is one of a finite number of actions, each action at a particular
+state has an associated Q-value. This algorithm attempts to find theright Q value for each
+action.</p>
+<p>The class itself does not care about the specifics of the state, and the Qnetworks that
+calculate the results. It is up to the user to specify the right environment and the
+associated networks that will allow the algorithm to solve the Bellman equation.</p>
+<p>[link to paper](<a class="reference external" href="https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf">https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf</a>)</p>
+<table class="docutils field-list" frame="void" rules="none">
+<col class="field-name" />
+<col class="field-body" />
+<tbody valign="top">
+<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
+<li><strong>env</strong> (<em>instance of an Env class</em>) – The environment that will be used for generating the result of a particulat action
+in the current state</li>
+<li><strong>memory</strong> (<em>instance of the Memory class</em>) – The environment that will allow one to store and retrieve previously held states that
+can be used to train upon.</li>
+<li><strong>qNetworkSlow</strong> (<em>neural network instance</em>) – This is a neural network instance that can be used for converting a state into a
+set of Q-values. This is the slower version, used for making a prediction, and is
+never trained. Its parameters are slowly updated over time to slowly allow it to
+converge to the right value</li>
+<li><strong>qNetworkFast</strong> (<em>neural network instance</em>) – This is the instance of the faster network that can be used for training Q-learning
+algorithm. This is the main network that implements the Bellman equation.</li>
+<li><strong>numActions</strong> (<em>int</em>) – The number of discrete actions that the current environment can accept.</li>
+<li><strong>gamma</strong> (<em>float</em>) – The discount factor. currently not used</li>
+<li><strong>device</strong> (<em>str</em><em>, </em><em>optional</em>) – the device where you want to run your algorithm, by default ‘cpu’. If you want to run
+the optimization of a particular GPU, you may specify that. For example with ‘cuda:0’</li>
+</ul>
+</td>
+</tr>
+<tr class="field-even field"><th class="field-name">Raises:</th><td class="field-body"><p class="first last"><code class="xref py py-exc docutils literal notranslate"><span class="pre">type</span></code> – [description]</p>
+</td>
+</tr>
+</tbody>
+</table>
 <dl class="method">
 <dt id="lib.agents.Agent_DQN.Agent_DQN.checkTrainingMode">
 <code class="descname">checkTrainingMode</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/lib/agents/Agent_DQN.html#Agent_DQN.checkTrainingMode"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#lib.agents.Agent_DQN.Agent_DQN.checkTrainingMode" title="Permalink to this definition">¶</a></dt>