docs: add developer oriented docs

This contains docs meant to aid those wishing to contribute to the project and for on-boarding new developers on the project.
savannahghi · Aug 24, 2022 · 0e98c4f · 0e98c4f
1 parent 8273884
commit 0e98c4f
Show file tree

Hide file tree

Showing 3 changed files with 200 additions and 17 deletions.
diff --git a/README.md b/README.md
@@ -81,7 +81,7 @@ pip install -r requirements/dev.txt
 
 And then create the binary using the following command:-
 ```bash
-pyinstaller app/__main__.py  --hidden-import apps/imp --collect-all app --name idr_client_temp -F
+pyinstaller app/__main__.py --collect-all app --name idr_client_temp -F
 ```
 This will create an executable but the executable will still depend on the
 target system/computer having the correct system libraries. More details on this
@@ -96,22 +96,15 @@ The executable binary can be found on the `dist` directory of the project. To
 learn more about the `staticx` command, check the docs [here](https://staticx.readthedocs.io).
 
 
-## Concepts
-This section is for the curious and those wishing to contribute. It provides a
-summary description of how the app works and the concepts and terms used in the
-project. These are:
-* __Data Source Type__ - A data source type is just that, it describes a kind
-  of data source together with the operations that can be performed around those
-  data sources. Each data source type can have multiple *data sources*.
-* __Data Source__ - A data source represents an entity that contains data of
-  interest such as a database or a file. Each data source has multiple
-  *extra metadata*.
-* __Extract Metadata__ - This a description of the data to be extracted from a
-  data source. An extract metadata also defines how data is extracted from a
-  data source.
-* __Upload Metadata__ - This describes data that has been extracted and how
-  it's packaged for uploading to the remote server. Each upload metadata is
-  always associated with a given *extract metadata*.
+## Contributing
+This section is for the curious and those wishing to contribute. For those who
+are curious about how the app works and the architecture of the project,
+check out the [architecture docs](https://github.com/savannahghi/idr-client/blob/develop/docs/ARCHITECTURE.rst).
+For those wishing to contribute, it is highly recommended that they start by
+reading the [contribution guidelines](https://github.com/savannahghi/idr-client/blob/develop/docs/CONTRIBUTING.rst).
+
+All contributions are welcome.
+
 
 ## License
 

diff --git a/docs/ARCHITECTURE.rst b/docs/ARCHITECTURE.rst
@@ -0,0 +1,132 @@
+=======================
+IDR Client Architecture
+=======================
+
+This document describes among other things the core concepts used through out
+this project, the layout of the project and the class hierarchy of the main
+domain models in the project. All these are important in order to understand
+how the application is structured and how the different components of the
+application fit together and interact with each other.
+
+
+Concepts and Terminology
+------------------------
+
+This section contains important terms and concept central to the project.
+
+Core Domain
+~~~~~~~~~~~
+
+The core domain of the project is majorly composed of the following components:
+
+* **Data Source Type** - A data source type is just that, it describes a kind
+  of data source together with the operations that can be performed around
+  those data sources. Each data source type can have multiple *data sources*.
+  As well as being a container for *data sources*, a data source type also
+  exposes concrete implementations of the other core domain models that define
+  properties and behaviours that are useful when working with data of the
+  given *type*. This allows the application to work with data of different
+  types and from different sources.
+* **Data Source** - A data source represents an entity that contains data of
+  interest such as a database or a file. Each data source has multiple
+  *extract metadata* associated with it.
+* **Extract Metadata** - This a description of the data to be extracted from a
+  *data source*. These description can include *(but is not limited)*
+  properties such as the scope, depth and amount of data to be extracted from a
+  data source. An extract metadata also defines how data is extracted from it's
+  parent *data source*.
+* **Upload Metadata** - This describes the attributes of the extracted data and
+  how it's packaged for uploading to the remote server. Each upload metadata is
+  always associated with a given *extract metadata*. Note that an upload
+  metadata doesn't contain the actual data to be uploaded, just information
+  about the data. The actual data is contained by the *upload chunks*
+  associated with the given upload metadata.
+* **Upload Chunk** - Before data is uploaded to the server, it is partitioned
+  into smaller units *(for transmission efficiency reasons)* which are referred
+  to as upload chunks. These chunks are then uploaded to the server.
+
+These domain components are defined in the ``app.core.domain`` module as
+interfaces meant to be implemented for each *data source type* that the
+application needs to support. The default implementations that ship with the
+application can be found at the ``app.imp`` package. This is designed to
+emulate something similar to the `Service Provider Interface <spi_>`_ pattern in
+Java.
+
+Transport
+~~~~~~~~~
+
+A transport in the project represents the flow of data to and from the IDR
+Client. Specifically, a transport connects the IDR Client to a metadata source
+and also connects the client to the final destination of the extracted data. If
+it helps, a transport can be thought of as an interface composed of two other
+interfaces, ``MetadataProvider`` and ``DataSink``. In the future, the transport
+interface might as well be split into those two interfaces if the need arises
+but for now it remains as a single interface. The application receives metadata
+through a transport and uploads the final data using a transport. A transport
+can be anything from a HTTP API to a filesystem API. The transport interface is
+defined in the ``app.core.transport`` module whereas the ``app.lib.transports``
+package contains common transport implementations.
+
+Task
+~~~~
+
+A task is a job or an action that takes an input and returns an output. Most
+actions and processes in the project are modelled by composing different tasks
+to achieve the desired objective. The task interface is defined at the
+``app.core.task`` module whereas the ``app.lib.tasks`` package provides most
+common tasks implementations as well as tasks that can be used to compose
+multiple tasks.
+
+Project Layout
+--------------
+
+The project structure/layout.
+
+::
+
+    .
+    idr-client
+    ├── ...other project configuration files.
+    └── app - The root src directory.
+    |   ├── core - The core application components.
+    |   |   ├── domain - Interfaces describing the services and essential processes provided by the application.
+    |   |   ├── exceptions - Defines key application errors and exception used through out the project.
+    |   |   ├── mixins - Defines components and interfaces used to model common behaviours and reusable functionality.
+    |   |   ├── serializers - Defines interfaces that convert python objects into simple native types for easy storage and/or transmission.
+    |   |   ├── task - Defines the interface that models a job or piece of work in the application.
+    |   |   └── transport - Defines an interface that models the flow of data to and from the application.
+    |   |
+    |   ├── imp - Implementations of the core services.
+    |   |
+    |   ├── lib - Utilities and helpers.
+    |   |   ├── config - Classes and functions needed to configure the application.
+    |   |   ├── tasks - Implementations of common utility tasks.
+    |   |   ├── transports - Different implementations of the transport interface.
+    |   |   ├── app_registry - Contains the implementation of the main application registry.
+    |   |   ├── checkers - Defines validators used throughout the application.
+    |   |   └── module_loading - Defines utilities used for dynamic module loading.
+    |   |
+    |   ├── use_cases - This are application specific operations.
+    |   |   ├── fetch_metadata - Defines fetch metadata operations.
+    |   |   ├── main_pipeline - The main application pipeline operations.
+    |   |   ├── run_extraction - Define data extraction operations.
+    |   |   ├── types - Defines common typings used within the use cases package.
+    |   |   └── upload_extracts - Defines data upload operations.
+    |   |
+    |   ├── __init__ - Defines the application setup operations.
+    |   ├── __main__ - The main application entry point.
+    |   └── __version__ - Metadata about the application.
+    |
+    ├── docs - Documentation for the project.
+    |
+    ├── logs - A directory that can be used to store log directories during development. This is not needed to run the application but is there for convenience.
+    |
+    ├── requirements - Defines dependencies needed to by the application.
+    |   ├── base - The key dependencies needed for the application to run.
+    |   ├── dev - Dependencies needed to set up a development environment for the project.
+    |   └── test - Dependencies needed to test the application.
+    |
+    └── tests - Tests for the application.
+
+
+.. _spi: https://docs.oracle.com/javase/tutorial/sound/SPI-intro.html
diff --git a/docs/CONTRIBUTING.rst b/docs/CONTRIBUTING.rst
@@ -0,0 +1,58 @@
+===================
+Contributor's Guide
+===================
+
+If you are reading this, you're probably interested in contributing to this
+project. All contributions are welcome and your efforts are greatly
+appreciated. This document lays out guidelines and advice for contributing to
+the project.
+
+Note that the project maintainers have the final say on whether or not a
+contribution is accepted. All contributions will be considered carefully, but
+occasionally, some contributions will be rejected because they do not suit the
+current goals or needs of the project.
+
+If your contribution is rejected, don't despair! As long as you followed these
+guidelines, you will have a much better chance of getting your next
+contribution accepted.
+
+Steps for Submitting Code
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Use the checklist below when contributing code:
+
+1. Fork the repository on `GitHub`_.
+2. Run the tests to confirm they all pass on your system. If they don't, you'll
+   need to investigate why they fail. If you're unable to diagnose this
+   yourself, raise it as a bug report by creating a new issue on GitHub.
+3. Write tests that demonstrate your bug or feature. Ensure that they fail.
+4. Make your change.
+5. Run the entire test suite again, confirming that all tests pass including
+   the ones you just added. Send a GitHub Pull Request to the main repository's
+   ``main`` branch. GitHub Pull Requests are the expected method of code
+   collaboration on this project.
+
+Code Review
+~~~~~~~~~~~
+
+Contributions will not be merged until they've been code reviewed. You should
+implement any code review feedback unless you strongly object to it. In the
+event that you object to the code review feedback, you should make your case
+clearly and calmly. If, after doing so, the feedback is judged to still apply,
+you must either apply the feedback or withdraw your contribution.
+
+Code Style
+~~~~~~~~~~
+
+This project uses a collection of tools to ensure the code base has a
+consistent style as it grows. We have these orchestrated using a tool called
+`pre-commit`_. This can be installed locally and run over your changes prior
+to opening a PR, and will also be run as part of the CI approval process
+before a change is merged.
+
+You can find the full list of formatting requirements specified in the
+`.pre-commit-config.yaml`_ at the top level directory of this project.
+
+.. _GitHub: https://github.com/savannahghi/idr-client
+.. _pre-commit: https://pre-commit.com/
+.. _.pre-commit-config.yaml: https://github.com/savannahghi/idr-client/blob/develop/.pre-commit-config.yaml