Skip to content

Commit

Permalink
Documentation, related readability refactors and Task caching
Browse files Browse the repository at this point in the history
  • Loading branch information
mikicz committed Mar 23, 2018
1 parent 29a048b commit 0c75c5e
Show file tree
Hide file tree
Showing 27 changed files with 1,652 additions and 491 deletions.
4 changes: 4 additions & 0 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
build:
image: latest
python:
version: 3.6
281 changes: 51 additions & 230 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
arca
Arca
====

.. image:: https://img.shields.io/travis/mikicz/arca.svg
Expand All @@ -13,270 +13,90 @@ arca
.. image:: https://img.shields.io/github/license/mikicz/arca.svg?style=flat
:target: https://github.com/mikicz/arca/blob/master/LICENSE

.. image:: https://img.shields.io/readthedocs/arca.svg
:target: https://arca.readthedocs.io/

Arca is a library for running Python scripts from git repositories in various states of isolation.
Arca can also cache the results of these scripts using `dogpile.cache <https://dogpilecache.readthedocs.io/en/latest/>`_.

Getting started
***************

Requirements
++++++++++++

* Python >= 3.6
* Internet connection


Optional requirements for certain backends, they need to be runnable by the user using the library:

* Docker
* Vagrant

Optional for some caching backends:

* Redis
* Memcached

Hello World
+++++++++++

To run a Hello World you'll only need the ``arca.Arca`` and ``arca.Task`` classes.
``Task`` is used for defining the task that's supposed to be run in the repositories.
``Arca`` takes care of all the settings and provides the basic API for running tasks.

Lets presume that we have a repository ``https://example.com/hello_word.git`` with branch ``master`` that contains a
file ``hello_world.py`` with function ``say_hello``. In that case you would use the following example to launch the function
in a isolated environment:

.. code-block:: python
from arca import Arca, Task
task = Task("hello_world:say_hello")
arca = Arca()
Glossary
++++++++

result = arca.run("https://example.com/hello_word.git", "master", task)
* **Arca** - name of the library. When written as ``Arca``, the main interface class is being referenced.
* **Task** - definition of the script, consists of a reference to a callable object and arguments.
* **Backend** - a way of running tasks.

Result will be a ``arca.Result`` instance which currently only has one attribute ``output`` with the output of the function call.
Should running of the task fail, ``arca.exceptions.BuildError`` will be raised.
Installation
++++++++++++

By default, the ``CurrentEnvironmentBackend`` is used to run tasks, which uses the python installation which is used to run
arca, launching the code in a subprocess. You can learn about using different backends bellow.
Requirements
------------

Configuring arca
****************
* Python >= 3.6

There are three ways to configure ``Arca``.
Requirements for certain backends:

1. You can initialize the class and backends directly and set it's options via constructor arguments.
For example changing the base directory where repositories are cloned and other arca-related things are saved you would call Arca like this:
* `Docker <https://www.docker.com/>`_ (for `Docker Backend <https://arca.readthedocs.io/en/latest/backends.html#docker>`_
and `Vagrant Backend <https://arca.readthedocs.io/en/latest/backends.html#vagrant>`_)
* `Vagrant <https://www.vagrantup.com/>`_ (for the `Vagrant Backend <https://arca.readthedocs.io/en/latest/backends.html#vagrant>`_)

.. code-block:: python
Installation
------------

arca = Arca(base_dir=".custom_arca_dir")
To install the last stable version:

This option is the most direct but it has one caveat - options set by this method cannot be overidden by the following methods.
.. code-block:: bash
2. You can pass a dict with settings. The keys are always uppercase and prefixed with ``ARCA_``.
For example the same setting as above would be written as:
python -m pip install arca
.. code-block:: python
Or if you wish to install the upstream version:

arca = Arca(settings={
"ARCA_BASE_DIR": ".custom_arca_dir"
})
.. code-block:: bash
3. You can configure ``Arca`` with environ variables, with keys being the same as in the second method.
Environ variables override settings from the second method.
python -m pip install git+https://github.com/mikicz/arca.git#egg=arca
You can combine these methods as long as you remember that options explicitly specified in constructors
cannot be overridden by the settings and environ methods.
Example
+++++++

Backends
********
To run a Hello World example you'll only need the ``arca.Arca`` and ``arca.Task`` classes.
``Task`` is used for defining the task that's supposed to be run in the repositories.
``Arca`` takes care of all the settings and provides the basic API for running the tasks.

There are currently four different backends. They can also be initialized in few different ways, consistent with general settings.
You can use the ``ARCA_BACKEND`` setting or you can pass a ``backend`` keyword directly to ``Arca``.
This setting can be either a string, class or an instance. String is used to load the class and initialize the instance,
class is used to initialize and a instance is used as is. All the initializations shown bellow are equivalent, but again,
as mentioned above, the straight ``backend`` keyword cannot be overridden by settings or environ variables.
Let's day we have the following file, called ``hello_world.py``,
in a repository ``https://example.com/hello_word.git``, on branch ``master``.

.. code-block:: python
from arca import Arca, DockerBackend
def say_hello():
return "Hello World!"
Arca(settings={"ARCA_BACKEND": "arca.backend.DockerBackend"})
Arca(settings={"ARCA_BACKEND": DockerBackend})
Arca(backend="arca.backend.DockerBackend")
Arca(backend=DockerBackend)
Arca(backend=DockerBackend())
Setting up backends is based on the same principle as setting up ``Arca``. You can either pass keyword arguments when initializing class
or you can use settings with the prefix ``ARCA_BACKEND_``. For example these two calls are equivalent:
To call the function using Arca, the following example would do so:

.. code-block:: python
from arca import Arca, DockerBackend
Arca(settings={
"ARCA_BACKEND": "arca.backend.DockerBackend",
"ARCA_BACKEND_PYTHON_VERSION": "3.6.4"
})
Arca(backend=DockerBackend(python_version="3.6.4"))
There are two options common for all backends:

* **requirements_location**: Tells backends where they should look for requirements in the repositories, the default is ``requirements.txt``.
* **cwd**: Tells backends in which folder they should launch the code, default is the root folder of the repository.

Current Environment
+++++++++++++++++++

*arca.backend.CurrentEnvironmentBackend*

This backend is the default option, it runs the tasks with the same Python that's used to run arca in a subprocess.
There are two settings for this backend, to determine how the backend should treat requirements in the repositories.

* **current_environment_requirements**: a path to the requirements of the current environment, the default is ``requirements.txt``.
``None`` would indicate there are no requirements for the current environment.
* **requirements_strategy**: Which approach the backend should take. There are three, the default being ``raise``.

Requirements strategies:

* ``raise``: Raise an ``arca.exceptions.RequirementsMismatch`` if there are any extra requirements in the target repository.
* ``ignore``: Ignore any extra requirements.
* ``install_extra``: Install the requirements that are extra in the target repository as opposed to the current environment.

Virtual Environment
+++++++++++++++++++

*arca.backend.VenvBackend*

This backend uses the Python virtual environments to run the tasks. The environments are created from the Python
used to run Arca and they are shared between repositories that have the same exact requirement file.
The virtual environments are stored in folder ``venv`` in folder determined by the ``Arca`` ``base_dir`` setting, usually ``.arca``.

Docker
++++++

*arca.backend.DockerBackend*

This backend runs tasks in docker containers. To use this backend the user running Arca needs to be able to interact
with ``docker`` (see `documentation <https://docs.docker.com/install/linux/linux-postinstall/>`_).

This backend firstly creates an image with requirements and dependencies installed so the installation only runs one.
By default the images are based on `custom images <https://hub.docker.com/r/mikicz/arca/tags/>`_, which have Python
and several build tools pre-installed.
These images are based on ``alpine`` and use `pyenv <https://github.com/pyenv/pyenv>`_ to install Python.
You can specify you want to base your images on a different image with the ``inherit_image`` setting.

Once arca has an image with the requirements installed, it launches a container for each task and kills it when the task finishes.
This can be modify by setting ``keep_container_running`` to ``True``, then the container is not killed and can be used
by different tasks running from the same repository, branch and commit. This can save time on starting up containers before each task.
You can then kill the containers by calling ``DockerBackend`` method ``stop_containers``.

If you're using arca on a CI/CD tool or somewhere docker images are not kept long-term, you can setup pushing
images with the installed requirements and dependencies to a docker registry and they will be pulled next time instead
of building them each time. It's set using ``push_to_registry_name`` and you'll have to be logged in to docker
using ``docker login``.

Settings:

* **python_version**: What Python version should be used.
In theory any of `these versions <https://github.com/pyenv/pyenv/tree/master/plugins/python-build/share/python-build>`_ could be used,
but only CPython 3.6 has been tested. The default is the Python version of the current environment.
This setting is ignored if ``inherit_image`` is set.
* **keep_container_running**: When ``True``, containers aren't killed once the task finishes. Default is ``False``.
* **apk_dependencies**: For some python libraries, system dependencies are required,
for example ``libxml2-dev`` and ``libxslt-dev`` are needed for ``lxml``.
With this settings you can specify a list of system dependencies that will be installed via alpine ``apk``.
This setting is ignored if ``inherit_image`` is set since arca can't determine how to install requirements on an unknown system.
* **disable_pull**: Disable pulling prebuilt arca images from Docker Hub and build even the base images locally.
* **inherit_image**: If you don't wish to use the arca images you can specify what image should be used instead.
* **push_to_registry_name**: Pushes all built images with installed requirements and dependencies to docker registry with this name,
tries to pull image from the registry before building it locally to save time.

Vagrant
+++++++

*arca.backend.VagrantBackend*

**This backend might be reworked completely, consider its API unstable.**

If you're extra paranoid you can use Vagrant to completely isolate the tasks.
This backend is actually a subclass of ``DockerBackend`` and uses docker in the VM to run the tasks. Currently the backend
works by building the image with requirements and dependencies locally and pushing it to registry
(``push_to_registry_name`` is required), spinning up a VM for each repository, pulling the image in the VM and running the task.
Docker and Vagrant must be runnable by the current user.

The backend inherits all the settings of ``DockerBackend`` except for ``keep_container_running`` and has these extra settings:

* **box**: Vagrant box used in the VM. Either has to have docker version >= 1.8 or not have docker at all, in which case
it will be installed when spinning up the VM. The default is `ailispaw/barge <https://app.vagrantup.com/ailispaw/boxes/barge>`_.
* **provider**: Vagrant provider, default is ``virtualbox``. Visit `vagrant docs <https://www.vagrantup.com/docs/providers/>`_ for more.
* **quiet**: Tells Vagrant and Fabric (which is used to run the task in the VM) to be quiet. Default is ``True``. Vagrant and Docker output is
logged in separate files for each run in a folder ``logs`` in the ``Arca`` ``base_dir``. The filename is logged in the arca logger (see bellow)
* **destroy**: Destroy the VM right after the task is finished if ``True`` (default). If ``False`` is set, the VM is only halted.

Your own
++++++++

You can also create your own backend and pass it to ``Arca``. It has be a subclass of ``arca.backend.base.BaseBackend`` and
it has to implement its ``run`` method.

Other options and functionality
*******************************

Static files
++++++++++++

With method ``static_filename`` you can request a file from a repository.
The method accepts a relative path (can be a ``pathlib.Path`` or ``str``) to the file in the repository
and it returns an absolute path to the file is currently stored. The method raises ``arca.exceptions.FileOutOfRangeError``
when the relative path goes outside the scope of the repository and ``FileNotFoundError`` if the file doesn't exist in the
repository.

Singe pull
++++++++++

You might not want the repositories to be pulled everytime ``run`` is called.
You can specify that you want each repository and branch combination should be pulled only once
per initialization of ``Arca`` with the ``ARCA_SINGLE_PULL`` setting or ``single_pull`` keyword argument.

You can tell ``Arca`` to pull again by calling the method ``pull_again``.

Caching
+++++++

Arca can cache results of the tasks using ``dogpile.cache``. The default cache backend is ``dogpile.cache.null``, so no caching.
You can setup caching with the backend setting ``ARCA_CACHE_BACKEND`` and all the arguments needed for setup can be set
using ``ARCA_CACHE_BACKEND_ARGUMENTS`` which can either be a dict or or a json string.

Example setup:
from arca import Arca, Task
.. code-block:: python
task = Task("hello_world:say_hello")
arca = Arca()
arca = Arca(settings={
"ARCA_CACHE_BACKEND": "dogpile.cache.redis",
"ARCA_CACHE_BACKEND_ARGUMENTS": {
"host": "localhost",
"port": 6379,
"db": 0,
}
})
result = arca.run("https://example.com/hello_word.git", "master", task)
print(result.output)
To see all available cache backends and their settings,
please visit the ``dogpile.cache`` `documentation <https://dogpilecache.readthedocs.io/en/latest/>`_.
The code would print ``Hello World!``.
``result`` would be a ``arca.Result`` instance which currently only has one attribute,
``output``, with the output of the function call.
If the task fails, ``arca.exceptions.BuildError`` would be raised.

When ``Arca`` is being initialized, a check is made if the cache backend is writable and readable,
which raises an ``arca.exceptions.ArcaMisconfigured`` if it's not.
If you wish to ignore this error, set ``ARCA_IGNORE_CACHE_ERRORS`` to True or initialize ``Arca`` with ``ignore_cache_errors`` keyword argument.
By default, the `Current Environment Backend <https://arca.readthedocs.io/en/latest/backends.html#current-environment>`_ is used to run tasks,
which uses the current Python, launching the code in a subprocess. You can learn about backends `here <https://arca.readthedocs.io/en/latest/backends.html>`_.

Logging
+++++++
Further reading
***************

Arca logs debug information via standard Python logging. The logger is called ``arca``.
You can read the full documentation on `Read The Docs <https://arca.readthedocs.io/>`_.

Running tests
**************
Expand All @@ -303,6 +123,7 @@ Links
- PyPi package: `arca <https://pypi.python.org/pypi/arca>`_
- CI: `Travis <https://travis-ci.org/mikicz/arca>`_
- Test coverage: `Codecov <https://codecov.io/gh/mikicz/arca>`_
- Documentation: `Read The Docs <https://arca.readthedocs.io/>`_

License
*******
Expand Down

0 comments on commit 0c75c5e

Please sign in to comment.