### **9.2 - [Practica] Encrypting sensitive data with Fernet**

In this video we are going to discover how encryption works in Airflow. Indeed, so far in the course,

we created many different connections or variables but we didn’t care about the security and if the

data were safe. So time to change this, let’s get started.

From your terminal, check that you under the folder airflow-materials/airflow-section

-9 and execute the script start.sh

in order to start the docker containers running Airflow.

As you can see, Docker is building a new Docker image. Since we are going to modify the packages installed

with Airflow in the Dockerfile, each time it gets changed, Docker will rebuild the image.

But what did I change actually?

Well, from your code editor, check that you are under the folder airflow-materials/airflow-section

-9 and open the Dockerfile in docker/airflow. Then look for the command pip install where Airflow is

installed.

Here,

we got the usual packages, celery, postgres and ssh.

Okay,

nothing special except that I removed one package that is very important for the security of Airflow.

If you guessed it,

Well done,

otherwise let’s create a connection from the Airflow UI.

But first, go back to the terminal, and wait for the build to finish.

I’m going to pause the video right now and I will back when it is done. Ok

the build is done. Type docker ps to check that the containers are running.

Ok, let’s create a connection.

Open your web browser, and type localhost

:8080.

Enter.

From there,

click on “Admin”, “Connection” and “Create”. Let’s name the connection “my_conn”.

Select HTTP for the connection type.

In login type “my_login”

and for the password

“my_password”.

Finally, from the extra field, type the following string { “access_key”: “my_key”, “secret_key”: “my_secret” }.

Click on “Save”. Ok the new connection has been created.

Now if we click on it, the first thing to notice is that the password field looks empty.

Don’t worry it is still there, but not shown from the UI.

So let’s take a look at this connection in the metadatabase of Airflow.

Go back to your terminal and type the command “docker exec -it”,

copy and paste the container id of postgres,

“psql -d

airflow -U airflow”.

Basically, we are going to execute the postgres interpreter and connect to the database “airflow” with

the user “airflow”. Enter. Ok now we are connected,

type “\dt”,

and we got the different tables used by Airflow such as dag, dag_run, job, slot_pool

and so on.

In our case, we gonna focus on the table “connection”. Type “\d connection” to show the columns.

obtain the conn_id to identify the connection.

This is the unique identifier that we specify when we need a connection for a remote storage for example.

Then, host, schema, login, password and so on.

Next, execute the request

“SELECT login

, password

, extra

FROM connection

WHERE conn_id=’my_conn’;”

with two simple quotes and the semicolon at the end.

Enter. And we have a big issue. As you can see here,

the password isn’t encrypted neither the extra field as shown here.

So anybody having access to the metadatabase can potentially steal your credentials.

Worst, if you go the Airflow UI. Then, create a new connection. Name it

“postgres”,

select the type

Postgres,

the host is “postgres”.

Type “airflow” for the schema, the login and the password.

Finally, set the port 5432.

Check that you have the same values as mine and click on “Save”. Go to “Data Profiling” and “Ad Hoc Query”.

You should already know this view from previous videos

but as a quick reminder, Airflow lets you querying any database connections you saved from its metastore.

So, if we select the connection we just created which is “postgres” and type the request

“SELECT login,password,

extra

FROM connection WHERE conn_id=’my_conn’”.

“Run!”.

As you can see we obtain the password and the extra field in clear as well,

but this time directly from the UI which is even worse than from Postgres.

So how can we make things more secure?

First let’s desactive

this view. In my opinion, since Airflow is your orchestrator,

it should not be allowed to make requests to explore data.

There are other dedicated tools for this task so as a best practice I strongly advise you to turn off

this feature. Go to your code editor and open the file airflow.cfg in mnt/airflow. From there,

look for the parameter “secure_mode”.

This parameter allows to enable or disable unsecure features like Charts and Ad Hoc Queries.

So replace “False” by “True”.

Like that. Save the file and go to your terminal.

Exit the docker container by hitting control-D. Then restart the webserver by executing “docker-compose

-f

docker-compose-CeleryExecutor.yml restart webserver”.

Enter.

Ok, back to the UI, if you refresh Airflow, and click on “Data Profiling”,

as you can see, Ad Hoc Queries and Charts are not accessible anymore.

Perfect. Now what about the passwords and extras in clear in connections?

Let’s go back to the connection panel.

As you can see from the right, there are two columns, is encrypted and is extra encrypted.

The first column means that a password value exists and will be encrypted with a fernet key.

Don't worry I will come back at it in a minute.

Then, the second column indicates if the field extra where we put the json string is encrypted.

If we look for the connection ‘my_conn’, the values from these columns here, indicate that the

password and the extra field is not encrypted. Let’s fix this.

When you want to encrypt your sensitive data, the first thing you have to do is to install the package

crypto along with airflow.

So, from your code editor, open the Dockerfile and add the package ‘crypto’ at the instruction where

the pip install airflow is done.

Like that. Save the file. Go to your terminal and restart the docker container by executing restart.

sh.

As you can see, since we add a new package to the Dockerfile, the image is rebuilt.

. I’m gonna pause the video right now and I will back when it’s done. Ok

the build is done.

Now the package crypto is installed,

the second step is to define a fernet key that will be used to encrypt the sensitive data. Back to your

code editor,

open the file airflow.cfg and look for the parameter “fernet_key”.

So what is a fernet key? Well, without diving too much into the details,

Fernet is a symmetric encryption method which makes sure that the value encrypted cannot be manipulated

/read without the fernet key.

This key is a URL-safe base64-encoded key with 32 bytes

bringing the time when the value got encrypted. When a value needs to be encrypted, a fernet object is

is instantiated based on that key and the method encrypt is called.

So how can we generate a Fernet key?

Well, there is a little code snippet to execute. In you terminal,

type “docker

ps”,

then “docker exec

-it”,

copy and paste the id of the web server,

“/bin

/bash”.

Enter.

In your code editor, open the file generate_fernet_key in the folder docs, and copy

the command.

This command simply create a Fernet object in order to generate a Fernet key based on the module cryptography.

Ok, back to the terminal, paste the command and hit Enter.

As you can see here,

a new fernet key has been generated.

Copy the key, go back to your code editor, open the file airflow.cfg

and paste it. Like that.

Perfect, now everything is set up start encrypting the data. Save the file.

Then in your terminal, exit the container by hitting control-d.

And restart the web server by executing “docker-compose -f

docker-compose-CeleryExecutor.yml restart webserver”.

Enter.

Now the web server is restarted, type “docker ps”

then “docker exec -it”,

the container id of postgres,

“psql -d airflow -U airflow”.

Enter.

Execute the same request than before SELECT login, password, extra

FROM connection WHERE conn_id=’my_conn’

; Enter. and the password is still in clear. Well this is not what you expected, isn’t it?

Actually, when you set a new fernet key, you have to edit connections that were already defined

before the new fernet key was added.

Otherwise, the encryption will not be applied except for the new connections.

Let’s fix this. From the Airflow UI, click on this little icon here in order to edit the connection. Then retype

the password

“my_password”

and click on “Save”.

Now,

back to your terminal,

if you execute the request again, as you can see the password and the extras fields have been encrypted

as expected.

Well done!

You have made your Airflow instance more secure by disabling unsecure features like Ad Hoc Queries or

Charts, and by encrypting your sensitive data with the Fernet encryption. All right, keep everything

running,

I have one more thing to show you about the Fernet key.

See you in the next video.
