Airflow on Heroku

Airflow is a framework to programmatically author, schedule and monitor data workflows. Heroku is a great place to run Airflow because you can focus on building the workflows without having to worry about the infrastructure. As you start building more and more workflows, Heroku will allow you to easily scale.

Getting Started

This project is a reference for what your project should look like when you deploy to Heroku. The easist way to start is to deploy the Project via the Heroku Button. Here are all of the steps you need to handle to get the project up and running:

Generate a new fernet key. Fernet is used to encrypt the database connections stored in the metadata database. The new key will have to be saved in the config var AIRFLOW_FERNET_KEY:

$ python -c "from cryptography.fernet import Fernet; print Fernet.generate_key()" pB_dxwr55GbGp0rKAU0LhfEGHx77CAlTRV6g5vvSZWI=


* The Procfile will need to change to include a scheduler and worker task like so:

web: airflow webserver -p $PORT worker: airflow worker scheduler: airflow scheduler

  Make sure that when you deploy these extra processes, you should only scale the
  scheduler to exactly one dyno.

* If you deployed via the Heroku Button the metadata database should be set up. If
  you didn't, then you'll need to log into a heroku bash session and instantiate
  the database. Make sure that a Heroku Postgres database has already been attached
  to the project:

$ heroku run bash ~ $ airflow initdb


* A new user will have to get created in the Airflow metadata database. Airflow
  has [great documentation](https://airflow.incubator.apache.org/security.html#web-authentication) on how to do this.

Once you get through the above steps, you're all set up and ready to go. Start
creating new DAGs!

## Best Practices

While Airflow does save connection information in the metadata database, when you build
your DAGs, you should environment variables to reference the connection to the
database where your data is saved. This example DAG shows you how you should connect
to your source system:

from airflow import DAG from airflow.operators import PostgresOperator

dag = DAG( 'transformation', default_args={ 'owner': 'airflow' })

task1 = PostgresOperator( sql='dim_get_count.sql', task_id='get_count', postgres_conn_id='DATABASE_URL', dag=dag)

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
dags		dags
plugins		plugins
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
airflow.cfg		airflow.cfg
airflow_login.py		airflow_login.py
app.json		app.json
conda-requirements.txt		conda-requirements.txt
init_env		init_env
requirements.txt		requirements.txt
runtime.txt		runtime.txt
unittests.cfg		unittests.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dags

dags

plugins

plugins

.gitignore

.gitignore

Procfile

Procfile

README.md

README.md

airflow.cfg

airflow.cfg

airflow_login.py

airflow_login.py

app.json

app.json

conda-requirements.txt

conda-requirements.txt

init_env

init_env

requirements.txt

requirements.txt

runtime.txt

runtime.txt

unittests.cfg

unittests.cfg

Repository files navigation

Airflow on Heroku

Getting Started

About

Releases

Packages

Languages

leandroloi/heroku-airflow

Folders and files

Latest commit

History

Repository files navigation

Airflow on Heroku

Getting Started

About

Resources

Stars

Watchers

Forks

Languages