# How to use the Data Catalogue file

Originally created by Duncan Leng, 2021-11-16

Added to Data Science DaSH template by QA team, 2022-01-12

## Introduction

The data catalogue files (`/data_catalogue.ini`) consist of a set of key-value pairs that store parameters related to our data marts. This will be the same for each user in a team, but might change in the future as the team moves to different servers or systems.

By parameterising the parts of the database connections the team can all use the same common connection strings and can switch to future systems easily. 

As these values are static for different users and are not sensitive, this file can be kept under version control. As new mart details are required the data_catalogue will grow. 

## Example /data_catalogue.ini

This is an example of the data_catalogue.ini. It consists of a series of sections for each data mart we connect to. In future the keys could grow to include other details we need about the data source. 

```ini
[s3_connection_bucket_example]
bucket = dash-123456789-prod-s3-data-wip
cases_folder = PROJECT/NAME/source_data/cases
deaths_folder = PROJECT/NAME/source_data/deaths

[s3_connection_file_example]
path = s3://dash-123456789-prod-s3-data-wip/PROJECT/NAME/source_data/lookups/encodings.json

[db_connection_had]
database = HealthAnalysisDirectorate
server = sae-prd-mart-sql.database.windows.net

[db_connection_ref]
database = reference
server = sae-prd-mart-sql.database.windows.net
```

Over time new key-values might get added.


## Using in python

We recommend using configparser: https://docs.python.org/3/library/configparser.html

To use the data_catalogue file you can do the following:

1. `import configparser`
2. make the config object
3. define the path to the data_catalogue file - this can be relative!
4. read in the data_catalogue file
5. access the data_catalogue 

As the data_catalogue file will always be in the same place in the repo for everyone, you can use relative paths to read in the file. 

In [None]:
import configparser
import os

# make config object
data_catalogue = configparser.ConfigParser()

# you might need to update this to your system's path
# read in the data_catalogue file
path_to_data_catalogue = os.path.join(
    "..", "..", "data_catalogue", "data_catalogue.ini"
)
data_catalogue.read(path_to_data_catalogue)

The key value pairs within the ini file are read in as a structure similar to a nested dictionary.

Once read, you can access different elements of the configuration file like this:

In [None]:
data_catalogue["s3_connection_bucket_example"]["bucket"]

In [None]:
data_catalogue.sections()

In [None]:
list(data_catalogue["db_connection_had"])