okfn-brasil · cuducos · Nov 8, 2016 · Oct 5, 2016 · Oct 6, 2016 · Oct 11, 2016
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -17,6 +17,8 @@ A lot of discussions about ideas take place in the [Issues](https://github.com/d
 
 ## Environment
 
+##### Local Installation Environment
+
 The recommended way of setting your environment up is with [Anaconda](https://www.continuum.io/), a Python distribution with useful packages for Data Science. [Download it](https://www.continuum.io/downloads) and create an _environment_ for the project.
 
 ```console
@@ -29,14 +31,45 @@ $ ./setup
 
 The `activate serenata_de_amor` command must be run every time you enter in the project folder to start working.
 
-### Pyenv users
-
-If you installed Anaconda via [pyenv](https://github.com/yyuu/pyenv) probably `source activate serenata_de_amor` will fail _unless_ you explicitly use the path to the Anaconta `activate` script. For example:
+**For Pyenv users:** If you installed Anaconda via [pyenv](https://github.com/yyuu/pyenv) probably `source activate serenata_de_amor` will fail _unless_ you explicitly use the path to the Anaconta `activate` script. For example:
 
 ```console
 $ source /usr/local/var/pyenv/versions/anaconda3-4.1.1/bin/activate serenata_de_amor
 ```
 
+##### Docker Installation Environment
+
+Requirements:
+
+* [Docker](https://docs.docker.com/engine/installation/)
+* [Docker-compose](https://docs.docker.com/compose/install/)
+
+Start the environment (maybe it will take some time, the docker image has 4GB):
+
+```console
+$ docker-compose up -d
+```
+
+Create your config.ini file from the example:
+
+```console
+$ cp config.ini.example config.ini
+```
+
+Run the script to fetch Quota for Exercising Parliamentary Activity (CEAP) datasets:
+
+```console
+$ docker-compose run --rm jupyter python src/fetch_datasets.py
+```
+
+If you want to access the console:
+
+```console
+$ docker-compose run --rm jupyter bash
+```
+
+And access Jupyter Notebook here: [localhost:8888](localhost:8888)
+
 ## Best practices
 
 In order to avoid tons of conflicts when trying to merge [Jupyter Notebooks](http://jupyter.org), there are some [guidelines we follow](http://www.svds.com/jupyter-notebook-best-practices-for-data-science/).
@@ -46,7 +79,7 @@ Basically we have four big directories with different purposes:
 | Directory | Purpose | File naming |
 |-----------|---------|-------------|
 | **`develop/`** | This is where we _explore_ data, feel free to create your own notebook for your exploration. | `[ISO 8601 date]-[author-initials]-[2-4 word description].ipynb` (e.g. `2016-05-13-ec-air-tickets.ipynb`) |
-|**`report/`** | This is where we write up the findings and results, here is where we put together different data, analysis and strategies to make a point, feel free to jump in. | Meaninful title for the report (e.g. `Transport-allowances.ipybn` |
+|**`report/`** | This is where we write up the findings and results, here is where we put together different data, analysis and strategies to make a point, feel free to jump in. | Meaningful title for the report (e.g. `Transport-allowances.ipynb` |
 | **`src/`** | This is where our auxiliary scripts lies, code to scrap data, to convert stuff etc. | Small caps, no special character, `-` instead of spaces. |
 | **`data/`** | This is not supposed to be committed, but it is where saved databases will be stored locally (scripts from `src/` should be able to get this data for you); a copy of this data will be available elsewhere (_just in case_). | Small caps, no special character, `-` instead of spaces. |
 
@@ -56,13 +89,13 @@ Here we explain what each script from `src/` does for you:
 
 ##### One script to rule them all
 
-1. `src/fetch_datasets.py` dowloads all the available datasets to `data/` is `.xz` compressed CSV format with headers translated to English.
+1. `src/fetch_datasets.py` downloads all the available datasets to `data/` is `.xz` compressed CSV format with headers translated to English.
 
 
 ##### Quota for Exercising Parliamentary Activity (CEAP)
 
-1. `src/fetch_datasets.py --from-source` dowloads all CEAP datasets to `data/` from the official source (in XML format in Portuguese) .
-1. `src/fetch_datasets.py` dowloads the CEAP datasets into `data/`; it can download them from the official source (in XML format in Portuguese) or from our backup server (`.xz` compressed CSV format, with headers translated to English).
+1. `src/fetch_datasets.py --from-source` downloads all CEAP datasets to `data/` from the official source (in XML format in Portuguese) .
+1. `src/fetch_datasets.py` downloads the CEAP datasets into `data/`; it can download them from the official source (in XML format in Portuguese) or from our backup server (`.xz` compressed CSV format, with headers translated to English).
 1. `src/xml2csv.py` converts the original XML datasets to `.xz` compressed CSV format.
 1. `src/translate_datasets.py` translates the datasets file names and the labels of the variables within these files.
 1. `src/translation_table.py` creates a `data/YYYY-MM-DD-ceap-datasets.md` file with details of the meaning and of the translation of each variable from the _Quota for Exercising Parliamentary Activity_ datasets.
@@ -99,6 +132,6 @@ The project basically happens in four moments, and contributions are welcomed in
 
 ## Jarbas
 
-As soon as we started _Serenata de Amor_ [we felt the need for a simple webservice](https://github.com/datasciencebr/serenata-de-amor/issues/34) to browse our data and refer to documents we analize. This is how [Jarbas](https://github.com/datasciencebr/jarbas) was created.
+As soon as we started _Serenata de Amor_ [we felt the need for a simple webservice](https://github.com/datasciencebr/serenata-de-amor/issues/34) to browse our data and refer to documents we analyze. This is how [Jarbas](https://github.com/datasciencebr/jarbas) was created.
 
 If you fancy web development, feel free to check Jarbas' source code, to check [Jarbas' own Issues](https://github.com/datasciencebr/jarbas/issues) and to contribute there too.
diff --git a/README.md b/README.md
@@ -8,9 +8,9 @@ The Serenata de Amor Operation arose from a combination of needs, from many peop
 
 We are building an intelligence capable of analyzing public spending and saying, with reliability, the possibility of each receipt being unlawful. This information will be used beyond the code, in the world outside of GitHub. Everything is open source from the beginning, allowing others to fork the project when their ideas diverge from the Operation Serenata de Amor.
 
-Our current milestone is to create the means for this kind of automation with the Quota for Exercising Parliamentary Activity (CEAP), from the Brazilian Chamber of Deputies. This job includes the development of APIs, data cleaning and analyses, conception and validation of scientific hyphotheses, confirmation of illicit acts via investigation and reports - to the population and to legal authorities.
+Our current milestone is to create the means for this kind of automation with the Quota for Exercising Parliamentary Activity (CEAP), from the Brazilian Chamber of Deputies. This job includes the development of APIs, data cleaning and analyses, conception and validation of scientific hypothesis, confirmation of illicit acts via investigation and reports - to the population and to legal authorities.
 
-To achieve this goal, unprecedented, we invite everyone to train the intelligence, collect information, cross databases, validate hyphotheses and apply Machine Learning with models competing against each other and getting combined in ensembles with higher precision than any previous option.
+To achieve this goal, unprecedented, we invite everyone to train the intelligence, collect information, cross databases, validate hypothesis and apply Machine Learning with models competing against each other and getting combined in ensembles with higher precision than any previous option.
 
 ## Before contributing
 

diff --git a/docker-compose.yml b/docker-compose.yml
@@ -0,0 +1,12 @@
+version: '2'
+
+services:
+  jupyter:
+    build:
+      context: .
+      dockerfile: docker/Dockerfile
+    ports:
+      - 8888:8888
+    volumes:
+      - .:/notebook
+    working_dir: /notebook
diff --git a/docker/Dockerfile b/docker/Dockerfile
@@ -0,0 +1,20 @@
+FROM jupyter/datascience-notebook:latest
+MAINTAINER Serenata de Amor "datasciencebr@gmail.com"
+
+USER root
+
+RUN apt-get update && apt-get install -y \
+  unzip
+
+USER jovyan
+
+COPY requirements.txt ./
+COPY conda_requirements.txt ./
+
+RUN pip install --upgrade pip
+RUN pip install -r requirements.txt
+
+RUN conda update --yes conda
+RUN conda config --add channels Rufone
+RUN conda config --add channels conda-forge
+RUN conda install --yes --file conda_requirements.txt