Permalink
Browse files

Documentation updates

  • Loading branch information...
lepisma committed Feb 26, 2018
1 parent 63e89d0 commit 5684b44e2f66827326493c35ca9666d95d02a004
Binary file not shown.
Binary file not shown.
Binary file not shown.
BIN +29.3 KB (1400%) docs/build/doctrees/usage.doctree
Binary file not shown.
Binary file not shown.
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 09e68fdd25639ff87cbe404713fd80d9
config: 5d2bee81836ce809ffce0b0d913cf158
tags: 645f666f9bcd5a90fca523b33c5a78b7

Large diffs are not rendered by default.

Oops, something went wrong.
@@ -0,0 +1,93 @@

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Overview: module code &#8212; diffport 0.2.3 documentation</title>
<link rel="stylesheet" href="../_static/alabaster.css" type="text/css" />
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: '../',
VERSION: '0.2.3',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true,
SOURCELINK_SUFFIX: '.txt'
};
</script>
<script type="text/javascript" src="../_static/jquery.js"></script>
<script type="text/javascript" src="../_static/underscore.js"></script>
<script type="text/javascript" src="../_static/doctools.js"></script>
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />

<link rel="stylesheet" href="../_static/custom.css" type="text/css" />


<meta name="viewport" content="width=device-width, initial-scale=0.9, maximum-scale=0.9" />

</head>
<body>


<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">

<h1>All modules for which code is available</h1>
<ul><li><a href="diffport/watchers.html">diffport.watchers</a></li>
</ul>

</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h1 class="logo"><a href="../index.html">diffport</a></h1>








<h3>Navigation</h3>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../usage.html">Usage</a></li>
<li class="toctree-l1"><a class="reference internal" href="../development.html">Development</a></li>
</ul>


<div id="searchbox" style="display: none" role="search">
<h3>Quick search</h3>
<form class="search" action="../search.html" method="get">
<div><input type="text" name="q" /></div>
<div><input type="submit" value="Go" /></div>
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="footer">
&copy;2018, Reich Lab.

|
Powered by <a href="http://sphinx-doc.org/">Sphinx 1.6.5</a>
&amp; <a href="https://github.com/bitprophet/alabaster">Alabaster 0.7.10</a>

</div>




</body>
</html>
@@ -0,0 +1,112 @@
.. _development:

Development
===========

Diffport involves the following units:

1. *Command line interface*. The code for this is in file ``cli.py``.
2. *Core* module which reads config and delegates tasks to watchers. This is in
the file ``core.py``.
3. *Store* is an abstraction over the area where diffport is going to save
snapshots. Its defined in ``store.py``. Adding new store here means adding
another class inheriting from ``Store`` abstract class. As an example, see
the class ``StoreDirectory`` which keeps snapshots in a directory.
4. *DB connection*. A few functions related to database connection are in
``connection.py``.
5. *Watchers*. Actual watchers are defined in ``watchers.py`` along with their
report templates in ``templates.py``. We will dissect watchers in more
details later.

Watchers
--------

A watcher is defined by a bunch of functions grouped together (as static
methods) in a class. We don't maintain any state in a watcher and the class
structure is only to modularize the functionality. These methods are enforced by
the abstract class ``Watcher`` to have the following structure

.. autoclass:: diffport.watchers.Watcher
:members:
:private-members:
:special-members:

After reading the main config.yaml file, the ``core`` module of diffport invokes
each involved watcher to take snapshot by providing a ``db`` object (which is a
`dataset <http://dataset.readthedocs.io/>`_ instance) and that watcher's config
as read from the yaml file.

Any new watcher needs to implement a new class with internally consistent
methods (meaning that its ``diff`` method accepts the output from its own
``take_snapshot`` method). In what follows, we describe the general structure of
these methods using the example of a simple watcher ``SchemaTables`` with the
following config passed in::

# Input config to SchemaTables
config = ["raw_tables", "processed_tables"]

``take_snapshot``
~~~~~~~~~~~~~~~~~

Snapshot output from a watcher is expected to be a *serializable* dictionary
object. Although not required, it is nice to pass in the *config* used for
taking the snapshot so that the diffing function may run quick checks or use
some metadata from it. As an example of snapshot returned from a watcher, here
is the output from our ``SchemaTables`` example::

# Output snapshot
{
"config": ["raw_tables", "processed_tables"],
"data": [("raw_tables", ["table_one_raw", "table_two_raw"]),
("processed_tables", ["the_only_processed_table"])]
}

Diffport core will now will save this snapshot in its store along with other
snapshots collected from other watchers.

``diff``
~~~~~~~~

The ``diff`` method of a watcher takes in two snapshots generated by its own
``take_snapshot`` method and returns an object representing the diff in those
snapshots. As an example, consider that our ``SchemaTables`` watcher saved the
following two snapshots at some points in time::

# Snapshot old
old = {
"config": ["raw_tables", "processed_tables"],
"data": [("raw_tables", ["table_one_raw", "table_two_raw"]),
("processed_tables", ["the_only_processed_table"])]
}

# Snapshot new
new = {
"config": ["raw_tables", "processed_tables"],
"data": [("raw_tables", ["table_one_raw", "table_two_raw", "table_three_raw"]),
("processed_tables", [])]
}

After finding the difference, the ``diff`` method might return a diff object
like so (the current implementation actually does return this)::

# Diff output
{
"config": ["raw_tables", "processed_tables"],
"data": [["raw_tables", { "removed": [], "added": ["table_three_raw"] }],
["processed_tables", { "removed": ["the_only_processed_table"], "added": [] }]]
}

Notice that we also pass along the config. This is not required for this
watcher, but some watchers (like ``NumberOfRows``) use some information from
config to generate the final report.

``report``
~~~~~~~~~~

After a diff is calculated, the ``report`` function generates a string report
for the diff. The reports from all the enabled watchers are concatenated and
returned as the final report by diffport. For generating their own chunk of diff
reports, watcher rely on jinja2 templates present in ``templates.py``. The
expected format of template is markdown since its easy to maintain and can be
converted to other formats pretty easily using tools like `pandoc
<http://pandoc.org/>`_.
Oops, something went wrong.

0 comments on commit 5684b44

Please sign in to comment.