### **8.2 - How the logging system works in Airflow**

In this video you are going to learn exactly how the logging system of Airflow works.

Let's begin with the basics.

The logging system of Airflow is based on the Python standard library logging offering you a lot of

flexibility in terms of configuration.

Just to give you a little reminder, logs are described as the stream of aggregated, time-ordered events

collected from the output streams of all running processes and backing services.

As you know, Airflow relies on many components such as a web server, a scheduler and one to many workers.

Each component generates its own stream of logs which will be stored into a file by default by using

the logging module.

This module allows to create a logger object which is in charge of obtaining the logs we want according

to a defined log level such as INFO, DEBUG, ERROR or WARNING.

Then, the logs is formatted according to the configuration set in the file airflow.cfg as we will see

later.

Finally, these logs are redirected to a specified destination depending on the Handler used. A Handler

is basically an object deciding what happens with the log.

There are many Handlers available but the most important ones are the FileHandler, StreamHandler and

NullHandler. The FileHandler writes output to a disk file. The StreamHandler writes output to a stream

like the standard output, and the NullHandler does nothing.

It is useful only for testing and developing so actually you should never use it unless you are a contributor

of Airflow. By default,

the FileHandler is set and the logs are stored at the destination specified in the parameter base

_log_folder in airflow.cfg.

It is possible to change the handler in order to customize your Airflow logging system. Just to give

you a better idea of how the logging system is created,

here are the different steps as shown directly from the code.

First, a logger object is fetched from the logger module.

Then the handler and formatter are instantiated.

Notice the parameter SIMPLE_LOG_FORMAT given to the formatter. This parameter

can be found in the airflow.cfg file and it allows you to specify how to format the output of your

logs if you want to

add the time, the loglevel and so on.

Next, the formatter is set to the handler and the handler is set to the logger object.

Finally, the LOG_LEVEL parameter is applied to the logger in order to filter the logs according to the

level set such as INFO, WARNING, DEBUG and so on.

Now we have seen how the logger system is set up,

here is an overview of how it works.

So, the logger object is created and initialised as we have seen previously. When a component of Airflow wants

to print out a log,

it calls the logger object with the method corresponding to the log level of that log.

In this example, “dependencies all met” is an informational log so the method info is called.

If the LOG_LEVEL parameter in your airflow.cfg file is set to the level INFO then you will be

able to see that log.

Then, before the log gets stored at the default location /usr/local/airflow/logs by

the FileHandler, it is processed by the Formatter to apply the correct format.

Okay.

Many handlers are available in Airflow. Thanks to them, we can write the logs at different locations such

as the standard output, AWS S3, ElasticSearch, GCS and so on.

Some of these handlers require a connection such as S3 and GCS.

Since there are external storages, the parameter REMOTE_LOG_CONN_

ID must be set with the connection to the system and the parameter REMOTE_LOGING must

be set to true as well.

Alright, now you know how the logging system works

it’s time to move to the practice part. See you in the next video.
