Skip to content

Commit

Permalink
Merge pull request #60 from scrapinghub/switch_versioning
Browse files Browse the repository at this point in the history
Switch versioning
  • Loading branch information
manycoding committed Apr 13, 2019
2 parents 0b2335e + deb1a8d commit 401063e
Show file tree
Hide file tree
Showing 107 changed files with 4,085 additions and 2,000 deletions.
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 2019.03.25
current_version = 0.3.0
commit = True
tag = True
parse = (?P<major>\d+).(?P<minor>\d+).(?P<patch>\d+)
Expand Down
9 changes: 6 additions & 3 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,13 @@ Note that the top-most release is changes in the unreleased master branch on Git

[Keep a Changelog](https://keepachangelog.com/en/1.0.0/), [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]
## [0.4.0.dev] (Work In Progress)


## [0.3.0] (2019-04-12)

### Fixed
- Big notebook size, replaced cufflinks with plotly, #39
- Big notebook size, replaced cufflinks with plotly and ipython, #39

### Changed
- *Fields Coverage* now is printed as a bar plot, #9
Expand All @@ -26,7 +29,7 @@ Note that the top-most release is changes in the unreleased master branch on Git

### Removed
- `cufflinks` dependency

- Deprecated `category_field` tag


## [2019.03.25]
Expand Down
2 changes: 1 addition & 1 deletion docs/.buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 93bc4220b6963adfbd692d21c6631cd2
config: 43d05a8956c22fbc8d7f999aef2282d6
tags: 645f666f9bcd5a90fca523b33c5a78b7
Binary file modified docs/.doctrees/api/arche.arche.doctree
Binary file not shown.
Binary file modified docs/.doctrees/api/arche.data_quality_report.doctree
Binary file not shown.
Binary file modified docs/.doctrees/api/arche.doctree
Binary file not shown.
Binary file modified docs/.doctrees/api/arche.readers.items.doctree
Binary file not shown.
Binary file modified docs/.doctrees/api/arche.readers.schema.doctree
Binary file not shown.
Binary file modified docs/.doctrees/api/arche.report.doctree
Binary file not shown.
Binary file added docs/.doctrees/api/arche.rules.category.doctree
Binary file not shown.
Binary file modified docs/.doctrees/api/arche.rules.category_coverage.doctree
Binary file not shown.
Binary file modified docs/.doctrees/api/arche.rules.coverage.doctree
Binary file not shown.
Binary file modified docs/.doctrees/api/arche.rules.doctree
Binary file not shown.
Binary file modified docs/.doctrees/api/arche.rules.duplicates.doctree
Binary file not shown.
Binary file modified docs/.doctrees/api/arche.rules.garbage_symbols.doctree
Binary file not shown.
Binary file modified docs/.doctrees/api/arche.rules.json_schema.doctree
Binary file not shown.
Binary file modified docs/.doctrees/api/arche.rules.other_rules.doctree
Binary file not shown.
Binary file added docs/.doctrees/api/arche.rules.others.doctree
Binary file not shown.
Binary file modified docs/.doctrees/api/arche.rules.price.doctree
Binary file not shown.
Binary file modified docs/.doctrees/api/arche.rules.result.doctree
Binary file not shown.
Binary file modified docs/.doctrees/api/arche.tools.helpers.doctree
Binary file not shown.
Binary file modified docs/.doctrees/api/arche.tools.json_schema_validator.doctree
Binary file not shown.
Binary file modified docs/.doctrees/api/arche.tools.schema.doctree
Binary file not shown.
Binary file modified docs/.doctrees/index.doctree
Binary file not shown.
Binary file modified docs/.doctrees/nbs/API.doctree
Binary file not shown.
Binary file added docs/.doctrees/nbs/DQR.doctree
Binary file not shown.
Binary file modified docs/.doctrees/nbs/basics.doctree
Binary file not shown.
Binary file modified docs/.doctrees/nbs/compare.doctree
Binary file not shown.
Binary file added docs/.doctrees/nbs/in-short.doctree
Binary file not shown.
Binary file modified docs/.doctrees/nbs/notebooks.doctree
Binary file not shown.
22 changes: 9 additions & 13 deletions docs/Handling-items-of-different-types.html
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
<meta http-equiv="X-UA-Compatible" content="IE=Edge" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Handling items of different types &#8212; Arche 2019.03.18 documentation</title>
<meta charset="utf-8" />
<title>Handling items of different types &#8212; Arche 2019.03.25 documentation</title>
<link rel="stylesheet" href="_static/alabaster.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
Expand Down Expand Up @@ -38,14 +36,14 @@
<h1>Handling items of different types<a class="headerlink" href="#handling-items-of-different-types" title="Permalink to this headline"></a></h1>
<p>A data source could contain items of different types, which can be handled with some preparation:</p>
<ul class="simple">
<li>Create one schema per item type choosing homogeneous items <code class="docutils literal notranslate"><span class="pre">basic_json_schema('job_key',</span> <span class="pre">items_number=[0,1)</span></code></li>
<li>Pass a filter argument to choose items of one type - <code class="docutils literal notranslate"><span class="pre">Arche(source='000000/000/0',</span> <span class="pre">schema=schema,</span> <span class="pre">filter=[(&quot;_type&quot;,</span> <span class="pre">&quot;=&quot;,</span> <span class="pre">[&quot;ItemType&quot;])])</span></code></li>
<li>Repeat the previous steps for each item type</li>
<li><p>Create one schema per item type choosing homogeneous items <code class="docutils literal notranslate"><span class="pre">basic_json_schema('job_key',</span> <span class="pre">items_number=[0,1)</span></code></p></li>
<li><p>Pass a filter argument to choose items of one type - <code class="docutils literal notranslate"><span class="pre">Arche(source='000000/000/0',</span> <span class="pre">schema=schema,</span> <span class="pre">filter=[(&quot;_type&quot;,</span> <span class="pre">&quot;=&quot;,</span> <span class="pre">[&quot;ItemType&quot;])])</span></code></p></li>
<li><p>Repeat the previous steps for each item type</p></li>
</ul>
<p>The library also supports other API arguments, such as <code class="docutils literal notranslate"><span class="pre">count,</span> <span class="pre">start</span></code>. The complete list of arguments is available in Scrapinghub Python API documentation:</p>
<ul class="simple">
<li>https://python-scrapinghub.readthedocs.io/en/latest/client/apidocs.html#scrapinghub.client.items.Items.iter</li>
<li>https://python-scrapinghub.readthedocs.io/en/latest/client/apidocs.html#scrapinghub.client.collections.Collection.iter</li>
<li><p>https://python-scrapinghub.readthedocs.io/en/latest/client/apidocs.html#scrapinghub.client.items.Items.iter</p></li>
<li><p>https://python-scrapinghub.readthedocs.io/en/latest/client/apidocs.html#scrapinghub.client.collections.Collection.iter</p></li>
</ul>
</div>

Expand Down Expand Up @@ -91,8 +89,6 @@ <h3>Quick search</h3>
<form class="search" action="search.html" method="get">
<input type="text" name="q" />
<input type="submit" value="Go" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
Expand All @@ -113,7 +109,7 @@ <h3>Quick search</h3>
&copy;2018-2019, Arche developers.

|
Powered by <a href="http://sphinx-doc.org/">Sphinx 1.8.5</a>
Powered by <a href="http://sphinx-doc.org/">Sphinx 2.0.0</a>
&amp; <a href="https://github.com/bitprophet/alabaster">Alabaster 0.7.12</a>

|
Expand Down
28 changes: 12 additions & 16 deletions docs/Quickstart.html
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
<meta http-equiv="X-UA-Compatible" content="IE=Edge" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Quickstart &#8212; Arche 2019.03.18 documentation</title>
<meta charset="utf-8" />
<title>Quickstart &#8212; Arche 2019.03.25 documentation</title>
<link rel="stylesheet" href="_static/alabaster.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
Expand Down Expand Up @@ -39,11 +37,11 @@ <h1>Quickstart<a class="headerlink" href="#quickstart" title="Permalink to this
<div class="section" id="basic-usage">
<h2>Basic Usage<a class="headerlink" href="#basic-usage" title="Permalink to this headline"></a></h2>
<ul class="simple">
<li>make sure required <a class="reference external" href="#environment-variables">environment variables</a> are set</li>
<li>create a schema with <code class="docutils literal notranslate"><span class="pre">arche.basic_json_schema(job_key)</span></code></li>
<li>create an Arche instance <code class="docutils literal notranslate"><span class="pre">g</span> <span class="pre">=</span> <span class="pre">Arche(source=job_key,</span> <span class="pre">schema=temp_schema)</span></code></li>
<li>run and report all tests with <code class="docutils literal notranslate"><span class="pre">g.report_all()</span></code></li>
<li>run DQR with <code class="docutils literal notranslate"><span class="pre">g.data_quality_report()</span></code></li>
<li><p>make sure required <a class="reference external" href="#environment-variables">environment variables</a> are set</p></li>
<li><p>create a schema with <code class="docutils literal notranslate"><span class="pre">arche.basic_json_schema(job_key)</span></code></p></li>
<li><p>create an Arche instance <code class="docutils literal notranslate"><span class="pre">g</span> <span class="pre">=</span> <span class="pre">Arche(source=job_key,</span> <span class="pre">schema=temp_schema)</span></code></p></li>
<li><p>run and report all tests with <code class="docutils literal notranslate"><span class="pre">g.report_all()</span></code></p></li>
<li><p>run DQR with <code class="docutils literal notranslate"><span class="pre">g.data_quality_report()</span></code></p></li>
</ul>
<p><code class="docutils literal notranslate"><span class="pre">job_key</span></code> can be either a usual job key, e.g. <code class="docutils literal notranslate"><span class="pre">000001/1/1</span></code>, or a collection key - <code class="docutils literal notranslate"><span class="pre">00001/collections/s/reviews</span></code></p>
<p><code class="docutils literal notranslate"><span class="pre">schema</span></code> argument accepts either a dict or a s3 bucket link to a schema.</p>
Expand All @@ -64,12 +62,12 @@ <h2>Schema Validation<a class="headerlink" href="#schema-validation" title="Perm
<h2>Environment variables<a class="headerlink" href="#environment-variables" title="Permalink to this headline"></a></h2>
<p>Next env variables are required:</p>
<ul class="simple">
<li>SH_APIKEY - This key should have read permissions for the project you want to get items from.</li>
<li><p>SH_APIKEY - This key should have read permissions for the project you want to get items from.</p></li>
</ul>
<p>If you also wish access your schemas from S3, set AWS credentials</p>
<ul class="simple">
<li>AWS_ACCESS_KEY_ID</li>
<li>AWS_SECRET_ACCESS_KEY</li>
<li><p>AWS_ACCESS_KEY_ID</p></li>
<li><p>AWS_SECRET_ACCESS_KEY</p></li>
</ul>
</div>
</div>
Expand Down Expand Up @@ -121,8 +119,6 @@ <h3>Quick search</h3>
<form class="search" action="search.html" method="get">
<input type="text" name="q" />
<input type="submit" value="Go" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
Expand All @@ -143,7 +139,7 @@ <h3>Quick search</h3>
&copy;2018-2019, Arche developers.

|
Powered by <a href="http://sphinx-doc.org/">Sphinx 1.8.5</a>
Powered by <a href="http://sphinx-doc.org/">Sphinx 2.0.0</a>
&amp; <a href="https://github.com/bitprophet/alabaster">Alabaster 0.7.12</a>

|
Expand Down
28 changes: 12 additions & 16 deletions docs/Strategy.html
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
<meta http-equiv="X-UA-Compatible" content="IE=Edge" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Strategy &#8212; Arche 2019.03.18 documentation</title>
<meta charset="utf-8" />
<title>Strategy &#8212; Arche 2019.03.25 documentation</title>
<link rel="stylesheet" href="_static/alabaster.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
Expand Down Expand Up @@ -38,8 +36,8 @@
<h1>Strategy<a class="headerlink" href="#strategy" title="Permalink to this headline"></a></h1>
<p>The purpose of this page is to highlight the key points which shape the library, targeting both:</p>
<ol class="simple">
<li>Stakeholders (e.g. QA teams and developers), to help making it clear why things are done in such way</li>
<li>Maintainers - to not deviate from the main purpose</li>
<li><p>Stakeholders (e.g. QA teams and developers), to help making it clear why things are done in such way</p></li>
<li><p>Maintainers - to not deviate from the main purpose</p></li>
</ol>
<p>While the library is an ongoing job and a lot has been changed the underlying principles should stay more or less the same.</p>
<p>By no means the information listed here should limit any work for the sake of the rules or processes. Instead it should be an additional instrument along with stakeholders’ needs to evolve the framework.</p>
Expand All @@ -57,12 +55,12 @@ <h2>Short-term<a class="headerlink" href="#short-term" title="Permalink to this
<h2>Long-term<a class="headerlink" href="#long-term" title="Permalink to this headline"></a></h2>
<p>6 months and more</p>
<ul class="simple">
<li><strong>Choose tools wisely.</strong> Any libraries and even the language should be chosen because it’s one of the most suitable tools for a given task keeping in mind the following points, not just because it works.</li>
<li><strong>Add unit tests.</strong> They are cheap and bring efficient results. All new features should be covered, old features should be covered and refactored if needed, in other words test coverage should strive for 100%.</li>
<li><strong>Manage data of any sane size with size growth in mind.</strong> I.e. the framework should allow to test required data in a sensible time.</li>
<li><strong>Support the most common input data sources.</strong> A good reference is <a class="reference external" href="https://pandas.pydata.org/">pandas</a> which allows a wide number of way to read and transform data to DataFrame with simply API <code class="docutils literal notranslate"><span class="pre">pd.DataFrame()</span></code>.</li>
<li><strong>UX. The framework is used by people, so it should be simple.</strong> Take pains to keep it backwards compatible with at least the previous version, deprecate code noticeably, keep it simple. Another point is integration, for example in Scrapy Cloud.</li>
<li><strong>Do not depend on custom rules, e.g. JSON schema.</strong> Data should be enough by itself to verify it, custom rules should clarify verification and be optional.</li>
<li><p><strong>Choose tools wisely.</strong> Any libraries and even the language should be chosen because it’s one of the most suitable tools for a given task keeping in mind the following points, not just because it works.</p></li>
<li><p><strong>Add unit tests.</strong> They are cheap and bring efficient results. All new features should be covered, old features should be covered and refactored if needed, in other words test coverage should strive for 100%.</p></li>
<li><p><strong>Manage data of any sane size with size growth in mind.</strong> I.e. the framework should allow to test required data in a sensible time.</p></li>
<li><p><strong>Support the most common input data sources.</strong> A good reference is <a class="reference external" href="https://pandas.pydata.org/">pandas</a> which allows a wide number of way to read and transform data to DataFrame with simply API <code class="docutils literal notranslate"><span class="pre">pd.DataFrame()</span></code>.</p></li>
<li><p><strong>UX. The framework is used by people, so it should be simple.</strong> Take pains to keep it backwards compatible with at least the previous version, deprecate code noticeably, keep it simple. Another point is integration, for example in Scrapy Cloud.</p></li>
<li><p><strong>Do not depend on custom rules, e.g. JSON schema.</strong> Data should be enough by itself to verify it, custom rules should clarify verification and be optional.</p></li>
</ul>
</div>
</div>
Expand Down Expand Up @@ -114,8 +112,6 @@ <h3>Quick search</h3>
<form class="search" action="search.html" method="get">
<input type="text" name="q" />
<input type="submit" value="Go" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
Expand All @@ -136,7 +132,7 @@ <h3>Quick search</h3>
&copy;2018-2019, Arche developers.

|
Powered by <a href="http://sphinx-doc.org/">Sphinx 1.8.5</a>
Powered by <a href="http://sphinx-doc.org/">Sphinx 2.0.0</a>
&amp; <a href="https://github.com/bitprophet/alabaster">Alabaster 0.7.12</a>

|
Expand Down
7 changes: 7 additions & 0 deletions docs/_sources/api/arche.rules.category.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
arche.rules.category module
===========================

.. automodule:: arche.rules.category
:members:
:undoc-members:
:show-inheritance:
7 changes: 7 additions & 0 deletions docs/_sources/api/arche.rules.others.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
arche.rules.others module
=========================

.. automodule:: arche.rules.others
:members:
:undoc-members:
:show-inheritance:
5 changes: 2 additions & 3 deletions docs/_sources/api/arche.rules.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,12 @@ Submodules

.. toctree::

arche.rules.category_coverage
arche.rules.category
arche.rules.coverage
arche.rules.duplicates
arche.rules.garbage_symbols
arche.rules.json_schema
arche.rules.metadata
arche.rules.other_rules
arche.rules.others
arche.rules.price
arche.rules.result

44 changes: 4 additions & 40 deletions docs/_sources/nbs/API.ipynb.txt

Large diffs are not rendered by default.

Loading

0 comments on commit 401063e

Please sign in to comment.