Skip to content

Set Schedule

icarus edited this page Feb 10, 2019 · 4 revisions

Introduction

Datalab uses airflow to control the scheduled task. You may find detailed documentation from https://airflow.apache.org/. This page will illustrate only a simple example of scheduling jupyter notebook in datalab.

Create the dags folder

Directory of dags is set to /app/dags. You can create a new folder named dags.

Create first dag file

Inside your dags folder, you can create a new file named with your purpose such as hello_world.py. It is recommended to use the same name as the dag_id inside the file.

from datetime import datetime
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.python_operator import PythonOperator
from airflow.operators.bash_operator import BashOperator
from dsutil import NbExecuter

# Define default args
default_args = {
    'owner': 'user',
    'on_failure_callback': lambda context: True
}    

# Define DAG setting
dag = DAG('hello_world', description='First ',
          schedule_interval='04 20 * * *',
          start_date=datetime(2017, 10, 19), 
          default_args=default_args,
          catchup=False)

# Define DAG components
first_nb = PythonOperator(
    task_id='first_nb',
    python_callable=NbExecuter.execute_nb2,
    provide_context=True,
    op_kwargs={'path': '/app/user-ws/hello_world/notebook/first_nb.ipynb'},
    dag=dag
)

second_nb = PythonOperator(
    task_id='first_nb',
    python_callable=NbExecuter.execute_nb2,
    provide_context=True,
    op_kwargs={'path': '/app/user-ws/hello_world/notebook/second_nb.ipynb'},
    dag=dag
)


# Define dependencies
first_nb >> second_nb

Check the status

After you saved the file, airflow will auto detect and register it. The process takes a couple of minutes. You can then find the registered dag on the airflow webserver via port 9090.

Execute the dag

On the airflow web interface, you can toggle the dag from off to on. The dag will start in a minute.

Debug

Besides using airflow's debug portal, you may also execute the following command. It does not need to wait airflow to refresh

from dsutil import NbExecuter
NbExecuter.execute_nb('/app/user-ws/hello_world/notebook/first_nb.ipynb', '2018-10-02')
Clone this wiki locally