### **8.3 - [Practica] Setting up custom logging**

In the previous video, we have discovered how the logging system in Airflow works.

In this video I’m going to show you the logging system in action and how can you customise it.

Let's get started.

First, check that you are under the folder airflow-materials/airflow-section-8 and

start the docker containers running Airflow with the command docker-compose -f

docker-compose-CeleryExecutor.yml up -d,

Enter.

Ok, check that the containers are running with the command docker ps.

Perfect. Now we are going to see the logs produced by the scheduler. Type the command docker logs

and copy and paste the container id of the scheduler.

Enter. Here you obtain the logs of the scheduler.

Nothing new, you should already familiar with them.

Let’s move to the code editor. Open the file airflow.cfg in the folder airflow-materials/airflow

-section-8/mnt/airflow.

Ok. From there, let me point out some important parameters.

First, the parameter base_log_folder specifies where the log files will be stored.

Then if you scroll down, we have the loglevel which is set to INFO by default.

If we change this value by WARNING

and save the file. From your terminal, execute the script, restart.sh.

Enter.

Now Airflow is restarted, type docker ps

then docker logs with the container id of the scheduler.

And ss you can see now, we have much less logs since we only keep the warning messages.

Okay back to your code editor,

change WARNING by INFO as before.

Next,

fab_logging_level defines the logging level of the Flask-appbuilder UI. Since the

webserver of Airflow is based on Flask and built using the Flask

-appbuilder framework, you could get additional logs based on this level.

Nonetheless, keep the value by default. Just below, there is the parameter logging_config

_class. This parameter allows you to define the class describing your logging configuration.

Don't worry I will come back at it in a minute.

Finally we have the parameters related to the log format and the log filename format.

These three lines will be applied only if your terminal is a TTY, to apply fancy colours. The parameter

log_format with the following string defines the format of your log.

So basically, the format given here with the date time, the filename, the log level and so on, gives the log

line that you have seen from the logs. In your terminal,

if you scroll up,

at some point you will the following line corresponding to the format.

Ok back to the code editor, let’s change the format.

The keywords that can you see here, are actually logrecord attributes given by the python logging

module.

If you want the exhaustive list, please check the link below.

Okay let's say we would like to know from which process id the log has been produced as well as the

name of the logger used to log the call. To do this,

just after asctime type following string [ %%(process)s - %%(name)s ].

Save the file and go back to your terminal. Execute the script restart.sh in order to apply the

modifications.

All right,

now Airflow is running again, type docker

ps,

and docker logs with the container id of the scheduler.

As you can see here, we have the process id as well as the name of the logger

airflow.settings. By the way,

notice that all the logs are not impacted by the format we just defined. As shown by the logs above,

those logs are system logs and so depends on another logger object.

Alright, we have seen the very basics of how to customise the logging system of Airflow,

now it’s time to move to the advanced part. When you want to really customize the logging system of airflow

you have to create a python file describing the configuration you want.

This python file is a dictionary-formatted file where the content can be found from the repository of

Airflow. Open your web browser and type the following link https://github.com/apache/airflow/blob/v1-10-stable/airflow/config_templates/airflow_local_settings.py.

This file looks pretty complicated but don’t worry I’m gonna explain everything you need step by step.

airflow_local_settings.py contains the formatters, the type of handler

to use according to the logger object printing out the log event, the level of log messages you want and

so on.

For example, the logger “airflow.task”, which is the one producing the logs related to the tasks, is

defined with the handler “task” and the logging level INFO.

Notice that LOG_LEVEL here refers to the LOG_LEVEL parameter in airflow.cfg.

Now, if we check the handler “task” just above, we can see that the FileHandler is used with the formatter

“airflow”. The logs will be stored at the location defined by BASE_LOG_FOLDER

with the filename templated with FILENAME_TEMPLATE as defined in the airflow.cfg configuration

file.

Let me show you quickly those values.

Here

and here.

All right,

last thing before moving forward,

if we take a look at the formatter “airflow”, we can see that the LOG_FORMAT parameter we set

earlier is used as well. As a best practice you should use this file instead of the airflow.cfg file

in order to customize the logging system of airflow.

Ok now it’s time to create your own custom configuration. First, copy the content of this file

and go to your code editor.

From there, create a new file called log_config.py under the folder mnt/airflow

/conf.

Like that.

Then paste the code you copied and save the file.

All right several things to explain here.

When you want to customize the logging system of Airflow, you have to create a folder called “conf” at

the default location /usr/local/airflow.

In this folder you have to put two files. “Log_config.py” corresponding to the custom

configuration of the logging system and the file __init__

.py to make the log_config.py importable.

Notice that the folder conf here is bound to the docker containers running Airflow.

Indeed, if you open the dockerfile docker-compose-CeleryExecutor.yml, the services running Airflow

have the volume mnt/airflow/conf from the host binded with /usr/local/airflow

/conf in their containers as shown here.

Notice the PYTHONPATH set here, to tell python where to find the log_config module.

Alright, now everything is set up,

open the file airflow.cfg, and look for the parameter logging_config_class.

Here, you have to specify the class describing the logging configuration. In our case it’s log_

config.DEFAULT_LOGGING_CONFIG.

Save the file.

Notice that this class is the big dictionary that you can find from the file log_config.

py we created.

So back to the file look for DEFAULT_LOGGING_CONFIG,

and here it is with the formatters, handlers and loggers defined.

All right, let’s check if the custom configuration file is taken into account by Airflow.

Go to your terminal and execute the script restart.sh.

Now type docker

ps

and docker logs with the container id of the scheduler.

If you see the following line telling you that the user-defined logging config has been successfully imported

then well done!

It means that everything works.

Otherwise, rewatch the video to check if you didn’t make any mistakes.

All right,

I would like to show you one more thing before moving to the next video. From your web browser,

open a new tab and go to localhost

:8080.

Then, turn on the toggle of the DAG logger_dag so that the tasks start executing.

Now, in your code editor,

open the folder “logs” from the left panel here. There are three folders in the logs.

“Dag_processor_manager” contains the logs related to the processing by Airflow

of your DAGs. Given a list of DAG definition files,

the dag processor manager is in charge of parsing and analyzing your DAGs to see what tasks should

run,

creating appropriate tasks instances in the database, recording any errors, killing any task instances

belonging to the DAGs that haven’t issued a heartbeat in a while and so on.

If you open the file dag_processor_manager.log, you can see that the DAGs get

processed at an interval of time defined by the parameter min_file_process

_interval in airflow.cfg.

Okay next the folder “scheduler” contains the logs related to the schedule of your tasks.

If we look for the task t1 which is a task of the DAG logging_dag, you can see when the

task has been scheduled and triggered.

Finally, when you schedule a DAG, a folder with the same name of that DAG is created in the logs.

Then a folder is created for each task with their corresponding logs.

For example, if we click on logger_dag, t1, and the most recent execution date, then open

the file 1.log,

we obtain the output produced by the execution of the task t1. Notice that the 1 here corresponds to

how many times the task has been tried.

So here, t1 has been executed one time.

If the task was retried 2 times, then we would have 2.log instead of 1.log.

Ok now, go back to file log_config.py and scroll down until you reach DEFAULT

_DAG_PARSING_LOGGING_CONFIG.

This dictionary allows you to customise the logs of the dag processor manager. From the file dag

_processor_manager.log, we have the logs with level INFO. Let’s

modify this. In log_config.py change the level here from LOG_LEVEL

to the string ‘CRITICAL’

and save the file. In your terminal, stop the containers by executing stop.sh.

Ok, now in your code editor, delete the file dag_processor_manager.log. Like that.

And start the docker containers again with start.sh.

All right,

back to the code editor, the file dag_processor_manager.log has been recreated,

but if you open it, this time you got nothing.

Why?

Because thanks to the modification we made, we only keep the critical log messages and we don't have

any yet.

That’s an example of how you can customize each logger of Airflow and set their own log level which is

not possible from the file airflow.cfg.

Okay so let's revert the modifications we made.

Save the file.

And restart the containers by executing restart.sh.

All right,

that’s it for this long video, a lot of information was given here. In the next video we are going to discover

how to write and read the Airflow logs with AWS S3.

Take a quick break and see you for the next video.

