<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#The-Catalog-Service-API" data-toc-modified-id="The-Catalog-Service-API-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>The Catalog Service API</a></span></li><li><span><a href="#Importing-the-configuration-file" data-toc-modified-id="Importing-the-configuration-file-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Importing the configuration file</a></span></li><li><span><a href="#Catalog-Module" data-toc-modified-id="Catalog-Module-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Catalog Module</a></span></li><li><span><a href="#Get-Datasets" data-toc-modified-id="Get-Datasets-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Get Datasets</a></span></li></ul></div>

In [1]:
import aepp
from aepp import catalog

# The Catalog Service API

The Catalog API is a service that would help investigating the data lake catalog that is available.\
In the Catalog API, you can access the datasets or the batches that are ingested in your organization.\
Catalog is the system of record for data location and lineage within Adobe Experience Platform.\ Catalog Service does not contain the actual files or directories that contain the data. Instead, it holds the metadata and description of those files and directories.

Catalog acts as a metadata store or "catalog" where you can find information about your data within Experience Platform.

Use Catalog to answer the following questions: 
* Where is my data located? 
* At what stage of processing is this data? 
* What systems or processes have acted on my data? 
* What errors occurred during processing? 
* If successful, how much data was processed?

# Importing the configuration file

The complete explanation on how to prepare the config file the can be found on the first template of this serie.\
If you want to understand how you can prepare the file used, you can either read the first template of this serie, or read the [getting started](https://github.com/adobe/aepp/blob/main/docs/getting-started.md) page of the aepp module in github.

In [2]:
import aepp
prod = aepp.importConfigFile('myconfigFile.json',sandbox='prod',connectInstance=True)

# Catalog Module

You can instantiate the `Catalog` module with the config parameter once you have loaded your configuration. 

In [3]:
from aepp import catalog

Each sub module has a class to instantiate in order to create the API connection with the service, in this case, the Catalog API. The instantiation will generate a token for the API connection and takes care of generating a new one if needed.\
It will also connect you with the API in the sandbox provided in the config file, or in the variable used during the import of the config file (can be seen in this example). 

In [4]:
myCatalog = catalog.Catalog(config=prod)

The class has several data attribute that can be useful to you.
* sandbox : It will provide you which sandbox is connected to this instance
* header : In case you want to copy the header to other application (ex: POSTMAN)
* data : it will provide some dictionaries once you have ran the `getDatasets()` method

In [5]:
myCatalog.sandbox

'prod'

# Get Datasets

One use-case from the Catalog API is to retrieve the list of datasets.\
You can realize this task by using the `getDatasets` method.

In [7]:
mydatasets = myCatalog.getDataSets()

In [8]:
len(mydatasets)

155

In [10]:
type(mydatasets)

dict

As you can see the Catalog API is returning a dictionary, where each dataset ID is a key and the object is giving you all the descriptions.\
This can be a bit of a pain to handle because you are not sure what is the ID of the dataset yet.\
For that reason the data attribute is automatically created when the `getDatasets` method is being executed.

The `data` attributes will contain 3 keys for dictionaries:
* ids : The table will be name of the dataset and its ID. (key is dataset name)
* schema_ref : the table will be name of the dataset and its schema reference (key is dataset name)
* table_names : the table will be the name of the dataset and its table name for Query Service (key is the dataset name)

By knowing the name of a dataset name, you can access its `id` easily with the following selector

In [15]:
myCatalog.data.ids['datanalyst 1']

'6059fd4fc52f8819484a7c1c'

The same can be used for the `schena_ref` or `table_names`