### Introduction

When we need to deploy our code into production for batch processing, maybe we need to automate the job to run during some periods or at some time, that's **Composer** comes in, which builds on [**apache airflow**](https://airflow.apache.org/): a open-source author, schedule and monitor workflows

Key features:


> Fully managed workflow orchestration


> Integrates with other Google Cloud products

This is pure python supported, let's get started with **Composer**.






In [1]:
# first auth
from google.colab import auth

auth.authenticate_user()

In [2]:
# set project
! gcloud config set project cloudtutorial-279003

Updated property [core/project].


In [3]:
# for composer, the main entry is environment that the airflow runs in
# first let's try to create environment
# this do take much time to finish, as this will also need to create a GKE cluster
# as Composer is container based, that's why.
! gcloud composer environments create first-composer --location us-central1

In [26]:
# let's create a python file to do ML training with sklearn, 
# this is just a example, we could do more than this.
%%writefile training.py
import datetime
import logging
import airflow
from airflow.operators import bash_operator
from airflow.operators.python_operator import PythonOperator
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris

logger = logging.getLogger(__name__)

# When I deploy code into Composer always no_status, 
# according to: https://www.astronomer.io/blog/7-common-errors-to-check-when-debugging-airflow-dag/
# shouldn't provide start_date with now timestamps
now = datetime.datetime.now() - datetime.timedelta(days=1)

default_args = {
    'owner': 'lugq',
    'depends_on_past': False,
    'email': [''],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': datetime.timedelta(minutes=5),
    'start_date': now,
}


# training interval is just 1 day a time.
dag = airflow.DAG("training_sklearn", "catchup=False", default_args=default_args, schedule_interval=datetime.timedelta(days=1))

x, y = load_iris(return_X_y=True)
lr = LogisticRegression()

# this is our training function, we could write our logic here to trigger our function.
def train_model():
    print("Start to train model")
    lr.fit(x, y)

    score = lr.score(x, y)
    print("Model test score: {}".format(score))

PythonOperator(dag=dag,
               task_id='Task_with_python',
               provide_context=False,
               python_callable=train_model)

print("Whole training finished.")

Overwriting training.py


In [5]:
# before we do anything, I have to say that currently with GKE for python, sklearn hasn't been installed.
# so we have to provide a requirments.txt file to define the libraries we use
%%writefile requirements.txt
scikit-learn == 0.23.0

Writing requirements.txt


#### Install dependencies

We could install packages we used in code with composer [here](https://cloud.google.com/composer/docs/how-to/using/installing-python-dependencies#install-package).

In [6]:
# then we have to update current env with gcloud
# if we don't want to wait, could just add: --async
# I have to say this do take too much time!!!
! gcloud composer environments update first-composer --update-pypi-packages-from-file requirements.txt --location us-central1

In [7]:
# then we have to update our code into GCS, into the bucket and folder name: dag
# then Composer will help us to create schudule automatically

# but first let's check which folder we upload our files
! gsutil ls gs://

gs://artifacts.cloudtutorial-279003.appspot.com/
gs://asia-northeast1-first-compo-a1a2973a-bucket/
gs://asia.artifacts.cloudtutorial-279003.appspot.com/
gs://cloudtutorial-279003.appspot.com/
gs://cloudtutorial-279003_cloudbuild/
gs://staging.cloudtutorial-279003.appspot.com/
gs://us-central1-first-composer-ca40dafb-bucket/
gs://us.artifacts.cloudtutorial-279003.appspot.com/


In [28]:
# the bucket should be : gs://us-central1-first-composer-ca40dafb-bucket/
# so let's upload our file into the bucket folder: dags
! gsutil copy training.py gs://us-central1-first-composer-ca40dafb-bucket/dags/training.py

Copying file://training.py [Content-Type=text/x-python]...
/ [1 files][  1.4 KiB/  1.4 KiB]                                                
Operation completed over 1 objects/1.4 KiB.                                      


#### View our application in Airflow

After the whole thing finished, we could get the Airflow link in our project **Composer** module, click the **environment**: first-composer, with link **environment variables**, we could get the link with **Airflow web UI**. 

Then we could get our application info we created, to get some info about Airflow, could just walk through the website we created, we could moniter, re-run and get the running info in the website, this is beyond of this tutorial.

But you could find my result with airflow.
![My airflow](https://docs.google.com/uc?export=download&id=1GjmW4arrii7a3wx8NYzDrpaKRXCIP7YD)

#### DAG with Airflow

Let's try **Composer** with more advance with DAG.

In [41]:
%%writefile training.py
import datetime
import logging

import airflow
from airflow.operators import bash_operator
from airflow.operators.python_operator import PythonOperator
from airflow.operators.bash_operator import BashOperator

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris

logger = logging.getLogger(__name__)

# When I deploy code into Composer always no_status, 
# according to: https://www.astronomer.io/blog/7-common-errors-to-check-when-debugging-airflow-dag/
# shouldn't provide start_date with now timestamps
now = datetime.datetime.now() - datetime.timedelta(days=1)

default_args = {
    'owner': 'lugq',
    'depends_on_past': False,
    'email': [''],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': datetime.timedelta(minutes=5),
    'start_date': now,
}


# training interval is just 1 day a time.
dag = airflow.DAG("training_sklearn", "catchup=False", default_args=default_args, schedule_interval=datetime.timedelta(days=1))

x, y = load_iris(return_X_y=True)
lr = LogisticRegression()

# this is our training function, we could write our logic here to trigger our function.
def train_model():
    print("Start to train model")
    lr.fit(x, y)

    score = lr.score(x, y)
    print("Model test score: {}".format(score))


# let's make airflow with DAG manager
with airflow.models.DAG("Combine_bash_python", default_args=default_args, 
                        schedule_interval=datetime.timedelta(days=1)) as dag:
  echo_command =  BashOperator(task_id='start_task', bash_command="echo Start training with echo")                    
  python_training = PythonOperator(task_id='training_task', python_callable=train_model)
  same_command = BashOperator(task_id='simulate_task', bash_command="echo This simuluately with training")                    

  # then we could define our DAG, with >> will define the steps.
  echo_command >> [python_training, same_command]

print("Whole training finished.")

Overwriting training.py


In [42]:
# so let's upload our file into the bucket folder: dags
! gsutil copy training.py gs://us-central1-first-composer-ca40dafb-bucket/dags/training.py

Copying file://training.py [Content-Type=text/x-python]...
/ [1 files][  1.9 KiB/  1.9 KiB]                                                
Operation completed over 1 objects/1.9 KiB.                                      


#### Result

You could get result from from my web with DAG.
![DAG result](https://docs.google.com/uc?export=download&id=1GH7HRDHyTA_2AEXVIUhcfAbpoUBoJNgU)

### Last word

We do get the schedule with **Composer**, last step is to delete our resoures.

In [46]:
! gcloud composer environments delete first-composer --location us-central1

Deleting the following environments: 
 - [first-composer] in [us-central1]

Do you want to continue (Y/n)?  y

Delete in progress for environment [projects/cloudtutorial-279003/locations/us-central1/environments/first-composer] with operation [projects/cloudtutorial-279003/locations/us-central1/operations/03512721-fe13-43f5-8e30-a97b0b26d653].
[1;31mERROR:[0m (gcloud.composer.environments.delete) Aborting wait for operation projects/cloudtutorial-279003/locations/us-central1/operations/03512721-fe13-43f5-8e30-a97b0b26d653.

