Skip to content

Latest commit

 

History

History
96 lines (64 loc) · 2.75 KB

prepare_your_machine.md

File metadata and controls

96 lines (64 loc) · 2.75 KB

Setup your machine for the workshop

What you should have

Your machine should have the following installed :

  • Python (>2.7.10 at least for Python 2). If you are on OSX, installing homebrew and the homebrew python is highly recommended as well.

  • SQLite (It should be installed on most systems)

  • Upgrading pip is recommended.

    pip install --upgrade pip

NOTE: Apache Airflow won’t work on Windows. If you have a windows machine, please install Virtual Box/VMWare/Docker and install a Linux OS and then follow the instructions below or use Docker and install Docker Airflow Image.

Setup

Get Virtualenv

I would recommend virtualenv for testing.

pip install --upgrade virtualenv

Virtualenv

rm -rf airflow_workshop
virtualenv airflow_workshop
source airflow_workshop/bin/activate

Installing Airflow

The easiest way to install the latest stable version of Airflow is with pip:

# You will need to export an environment variable due to a licensing issue.
export SLUGIFY_USES_TEXT_UNIDECODE=yes
pip install apache-airflow

The current stable version is 1.10.0. You can install this version specifically by using

export SLUGIFY_USES_TEXT_UNIDECODE=yes
pip install apache-airflow==1.10.0

Run Airflow

Before you can use Airflow you have to initialize its database. The database contains information about historical & running workflows, connections to external data sources, user management, etc. Once the database is set up, Airflow's UI can be accessed by running a web server and workflows can be started.

The default database is a SQLite database, which is fine for this tutorial. In a production setting you'll probably be using something like MySQL or PostgreSQL. You'll probably want to back it up as this database stores the state of everything related to Airflow.

Airflow will use the directory set in the environment variable AIRFLOW_HOME to store its configuration and our SQlite database. This directory will be used after your first Airflow command. If you don't set the environment variable AIRFLOW_HOME, Airflow will create the directory ~/airflow/ to put its files in.

Set environment variable AIRFLOW_HOME to e.g. your current directory $(pwd):

export AIRFLOW_HOME=~/airflow

or any other suitable directory.

Next, initialize the database:

airflow initdb

Now start the web server and go to localhost:8080 to check out the UI:

airflow webserver --port 8080

Start the Scheduler in a different terminal session:

airflow scheduler

References: