# The environment explained

In this notebook we go through our environment. The commands in this notebook can be executed in the Terminal tab of the instruqt challenge

## Components

In our environment we have 6 docker containers available with different roles.

In a picture this is our environment:

<img src="images/environment.png" style="width:100%;height:600px;"/>

This is a set of docker containers stitched together with docker-compose.

Within the instruqt terminal tab you can execute the command `docker ps` to see them running.

```
CONTAINER ID   IMAGE                                      PORTS                                                  NAMES
6207c5d00a08   datamesh_python                                                                                   datamesh_vm_python_1
779e6e00057e   azurite:3.15.0                             0.0.0.0:10000-10002->10000-10002/tcp                   datamesh_vm_azserver_1
d0f94cd5c8d9   datamesh_delta                             0.0.0.0:38080->8080/tcp                                datamesh_vm_delta_1
9e541b534128   fsouza/fake-gcs-server:1.37                4443/tcp, 0.0.0.0:4443->443/tcp                        datamesh_gcsserver_1
7fb0ec49b86f   jamesdbloom/mockserver:mockserver-5.11.1   0.0.0.0:1080-1081->1080-1081/tcp                       datamesh_mockserver_1
798b63f05e93   datamesh_sparkshell                                                                               datamesh_vm_sparkshell_1
d88a7a2abbd2   datamesh_sparkprepare                                                                             datamesh_vm_sparkprepare_1
d133e76d0f6c   datamesh_jupyter                           0.0.0.0:14040->4040/tcp, 0.0.0.0:18888->8888/tcp       datamesh_vm_jupyter_1
0d1d6016fa4e   localstack/localstack-full:0.12.15         5678/tcp, 0.0.0.0:4563-4584->4563-4584/tcp, 8080/tcp   datamesh_vm_s3server_1
```

### S3server

A [localstack](https://localstack.cloud/) docker container acting as Amazon S3 object storage.  
In here we have our datasets stored in a bucket called `demodata`

### azserver

A [azurite](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azurite) docker container acting as Azure Blob object storage.  
In here we have our datasets stored in storage containers called `world` and `sales`

### gcsserver

A [fake-gcs-server](https://github.com/fsouza/fake-gcs-server) docker container acting as Google Cloud storage.  
In here we have our datasets stored in a bucket called `storage-bucket`

### Sparkprepare

A Scala spark-shell with unlimited S3 access to interact directly with the datasets.  
This container is mainly used to prepare the datasets used in the workshop.  

It also has azure cli installed used to put copies of the S3 data in the Azure blob storage

At the end of the workshop this container can be used to update/transform the data without using the Delta Sharing server.

### Jupyter

A Jupyter-lab notebook server the notebooks are running on.  


### Sparkshell

A Scala spark-shell configured to interact with the Delta Sharing server.  
No direct S3/Azure access so datasets can only be loaded through the Delta sharing server.

### Delta

A [Delta Sharing](https://delta.io/sharing/) server.

### Python

A Python container configured to interact with the Delta Sharing server

## Available datasets

We have two different data sets stored in S3 and Azure blob storage:

- cities: A dataset with the worlds cities and their properties
- sales: A sample sales dataset with orders and orderlines


### Cities
Raw data available at: `s3://demodata/rawdata/world/cities/sample.csv`  
Data is transformed into Delta Lake format and available at: `s3://demodata/silver/world/cities` and `wasbs://world@devstoreaccount1.blob.azserver:10000/cities/cities`

### Sales

Raw data available at: `s3://demodata/rawdata/sales/sample.csv`  
Data is transformed into Delta Lake format and available at: `s3://demodata/silver/sales` and `wasbs://sales@devstoreaccount1.blob.azserver:10000/sales`  
Next to the format change the dataset is also partitioned by `year`, `month` and `day` based on the orderdate field.

# Even more in depth?

If you want to know how the docker containers are configured to work together?  
At the final challenge we have documented the configurations of each docker container in the `COMPOSE_README.md` file