Your machine should have the following installed :
-
Python (>2.7.10 at least for Python 2). If you are on OSX, installing homebrew and the homebrew python is highly recommended as well.
-
SQLite (It should be installed on most systems)
-
Upgrading pip is recommended.
pip install --upgrade pip
NOTE: Apache Airflow won’t work on Windows. If you have a windows machine, please install Virtual Box/VMWare/Docker and install a Linux OS and then follow the instructions below or use Docker and install Docker Airflow Image.
I would recommend virtualenv for testing.
pip install --upgrade virtualenv
rm -rf airflow_workshop
virtualenv airflow_workshop
source airflow_workshop/bin/activate
The easiest way to install the latest stable version of Airflow is with pip
:
# You will need to export an environment variable due to a licensing issue.
export SLUGIFY_USES_TEXT_UNIDECODE=yes
pip install apache-airflow
The current stable version is 1.10.0
. You can install this version specifically by using
export SLUGIFY_USES_TEXT_UNIDECODE=yes
pip install apache-airflow==1.10.0
Before you can use Airflow you have to initialize its database. The database contains information about historical & running workflows, connections to external data sources, user management, etc. Once the database is set up, Airflow's UI can be accessed by running a web server and workflows can be started.
The default database is a SQLite database, which is fine for this tutorial. In a production setting you'll probably be using something like MySQL or PostgreSQL. You'll probably want to back it up as this database stores the state of everything related to Airflow.
Airflow will use the directory set in the environment variable AIRFLOW_HOME
to store its configuration and our SQlite database.
This directory will be used after your first Airflow command.
If you don't set the environment variable AIRFLOW_HOME
, Airflow will create the directory ~/airflow/
to put its files in.
Set environment variable AIRFLOW_HOME
to e.g. your current directory $(pwd)
:
export AIRFLOW_HOME=~/airflow
or any other suitable directory.
Next, initialize the database:
airflow initdb
Now start the web server and go to localhost:8080 to check out the UI:
airflow webserver --port 8080
Start the Scheduler in a different terminal session:
airflow scheduler