<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Building-the-Solution-Design" data-toc-modified-id="Building-the-Solution-Design-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Building the Solution Design</a></span><ul class="toc-item"><li><span><a href="#Cleaning-the-elements" data-toc-modified-id="Cleaning-the-elements-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Cleaning the elements</a></span><ul class="toc-item"><li><span><a href="#Fixing-null-values" data-toc-modified-id="Fixing-null-values-1.1.1"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Fixing null values</a></span></li></ul></li><li><span><a href="#Fixing-metrics-table" data-toc-modified-id="Fixing-metrics-table-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Fixing metrics table</a></span></li><li><span><a href="#Metrics-and-dimensions-SDR" data-toc-modified-id="Metrics-and-dimensions-SDR-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Metrics and dimensions SDR</a></span><ul class="toc-item"><li><span><a href="#Optional-:-Concat-dataframe" data-toc-modified-id="Optional-:-Concat-dataframe-1.3.1"><span class="toc-item-num">1.3.1&nbsp;&nbsp;</span>Optional : Concat dataframe</a></span></li></ul></li><li><span><a href="#Saving-your-file" data-toc-modified-id="Saving-your-file-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Saving your file</a></span></li></ul></li><li><span><a href="#Connecting-with-AEP-Schema" data-toc-modified-id="Connecting-with-AEP-Schema-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Connecting with AEP Schema</a></span><ul class="toc-item"><li><span><a href="#Schema-Manager" data-toc-modified-id="Schema-Manager-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Schema Manager</a></span></li><li><span><a href="#Merging-SDR-with-Schema-definition" data-toc-modified-id="Merging-SDR-with-Schema-definition-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Merging SDR with Schema definition</a></span></li><li><span><a href="#Saving-your-file" data-toc-modified-id="Saving-your-file-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Saving your file</a></span></li></ul></li></ul></div>

With very large organization, it may happen that many people are working on Data Views.\
Your data views are your core representation of your data store.\
It is what is being used for the request by your workspaces and reports.\
Having a clear view on what has been defined in your data view is very important and we can create a script to extract that information very easily in the next steps. 

In order to achieve that view and document it, you can use the cjapy module to build your solution design for your analyst.

We first load cjapy and the configuration used.

In [None]:
import cjapy
cjapy.importConfigFile('myconfig.json')

Once we have done that we can instantiate the connection to CJA API via the `CJA` class.

In [None]:
cja = cjapy.CJA()

# Building the Solution Design

In order to build a solution design, you need to have a complete view of what has been setup in your data view. 

From the `cja` connection that we have built, we will extract the correct data view that we want to see and all of its dimensions and metrics 

In [None]:
dataviews = cja.getDataViews()

In [None]:
dataviews.head(2)

Selecting a data view by using its name

In [None]:
dv_id = dataviews.at[dataviews[dataviews['name']=='Adobe Store - Prod'].index[0],'id']
dv_id

Now that we have its id, we can use it to retrieve the different components associated with it

In [None]:
dimensions = cja.getDimensions(dv_id,full=True)
metrics = cja.getMetrics(dv_id,full=True)

The data returned by the methods are `dataframes` which make them very easy to manipulate and to already save them. 

Using the `head` method, you can see the 5 rows that have been returned

In [None]:
dimensions.head()

You can see that you have lots of details about each of these dimensions.\
You can see the number of elements by using the `len()` method

In [None]:
len(dimensions)

## Cleaning the elements

First of all, you may have default dimensions or metrics that have been integrated in that data view but are not that important because they do not contain data.\
Removing them would actually clean up the table.\
This task, as not sexy, is a crucial task to understand and realize before doing any sort of data analysis in the future.\
We are using this simple task of creating a Solution Design to introduce some concepts such as:
* identifying null values
* cleaning null values

There are some columns that you may also not want to duplicate on your solution design, therefore we will remove them as well.

### Fixing null values

Not a Number (NaN) of Not defined value (na) are elements that could break some basic condition formatting, so we would like to clean these parts first.

In [None]:
dimensions.isnull().sum()

In [None]:
dimensions['hasData'] = dimensions['hasData'].fillna(False) ## if no information, just place False as default
dimensions['derivedFieldCompatible'] = dimensions['derivedFieldCompatible'].fillna(False) ## if no information, just place False as default
dimensions['dataSetType'] = dimensions['dataSetType'].fillna("system")  ## if no information, just place "system" as default
dimensions['sourceFieldId'] = dimensions['sourceFieldId'].fillna("cja")

Base on these informaiton `hasData`, `derivedFieldCompatible` and `dataSetType`, there could be already a good filtering done for your solution design. 

## Fixing metrics table

You can also look at random lines in your dataframe by using the `sample()` method, the argument giving the number of row to return.  

In [None]:
metrics.sample(4)

In [None]:
metrics.isnull().sum()

In [None]:
metrics['hasData'] = metrics['hasData'].fillna(False) ## if no information, just place False as default
metrics['derivedFieldCompatible'] = metrics['derivedFieldCompatible'].fillna(False) ## if no information, just place False as default
metrics['dataSetType'] = metrics['dataSetType'].fillna("system")  ## if no information, just place "system" as default
metrics['sourceFieldId'] = metrics['sourceFieldId'].fillna("cja")  ## if no information, just place "cja" as default

## Metrics and dimensions SDR

The Solution Design Reference basde on CJA implementation can be exported via once we reframe it to the columns we want to have.\
You can filter columns by placing them in a list for filtering.\
If you want to have a copy of your dataframe, use the `copy()` method, that will avoid doing some modification to your original dataframe.

In our example here, we will only select attributes that we find important for the usage of that notebook.

In [None]:
dimensions_sdr = dimensions[dimensions['hasData']][['id','name','dataSetType','sourceFieldId']].copy() ## filtering for dimensions that contain data
metrics_sdr = metrics[metrics['hasData']][['id','name','dataSetType','sourceFieldId']].copy()## filtering for metrics that contain data

### Optional : Concat dataframe

You can combine 2 dataframe via the `concat()` method of the pandas module.

In [None]:
import pandas as pd ## using the pd alias

The `concat` method will take an iterable of dataframe and concat them together.\
`ignore_index` will reset the index.

In [None]:
df_cja = pd.concat([dimensions_sdr,metrics_sdr],ignore_index=True)

In [None]:
df_cja.sample(5)

As you can see the sourcFieldId can be cleaned up as it should provide us with some information about the path used for the data ingestion.\
For the interest of time, we will not cover that part but know that Lookup and profile are ingesting the path with a prefix to avoid collision. 

## Saving your file

You can always save a dataframe as a CSV or as an Excel file.

In [None]:
df_cja.to_excel('my_sdr.xslx',sheet_name='sdr_combine',index=False) ## to excel

In [None]:
df_cja.to_csv('my_sdr_schema.csv',index=False) ## to csv

# Connecting with AEP Schema

Customer Journey Analytics is loading the data based on the dataset that are being used in Adobe Experience Platform.\
Getting to know and understand the schema that is being used to capture the data is important.\
In order to do that, you can always log-in to the Adobe Experience Platform via the UI, but you can also retrieve more useful information by using the `aepp` module

The `aepp` module is divided in different services that can be used for analysing your Adobe Experience Platform implementation.\
In our scenario, we will just require to load the `schema` sub module

In [None]:
import aepp
from aepp import schema, schemamanager

The Adobe Experience Platform is divided itself into different sandboxes.\
While loading the configuration, you can specify which sandbox you want to use.\
It is also recommnended to store the configuration in a variable, that we will name `prod` because we are using the prod sandbox.\
We can save the configuratio by passing `True` to the `connectInstance` parameter

In [None]:
prod = aepp.importConfigFile('myconfig.json',sandbox='bighouse',connectInstance=True)

You can then use the configuration to instantiate your connection to your schema for the `prod` sandbox.

In [None]:
schemaProd = schema.Schema(config=prod)

We will retrieve all schemas

In [None]:
allSchemas = schemaProd.getSchemas()

By retrieving the schemas, we have created a storage to easily find the schema ID to be used in these data classes:
* schemaProd.data.schemas_altId
* schemaProd.data.schemas_id

Using the name of our schema, we can easily extract its id: 

In [None]:
schemaProd.data.schemas_id['Adobestore']

## Schema Manager

We can use a native functionality of aepp to build a dataframe representation of the schema\
The usage of the `SchemaManager` class will simplify the extraction of the fields

In [None]:
adobeStore = schemamanager.SchemaManager(schemaProd.data.schemas_id['Adobestore'])

In [None]:
df_schema = adobeStore.to_dataframe(queryPath=True,excludeObjects=True)

You can see that the paths have been flatten and provided in 2 columns:
* path : containing the path flatten with more details for list [] and array of objects []{}
* querypath : it is the same path but without the notation that helps understanding its type.
* excludeObjects : If set to True, it will not show the node that serve as object node.

For mapping the path to the one display in CJA, we will use the query path. 

In [None]:
df_schema

In [None]:
len(df_schema) ## checking the size

## Merging SDR with Schema definition

Once you have your dataframe from the schema manager clean-up you can use it to merge it with the solution design.\
We will create a new dataframe that can replicate a path in case a path is used in both the dimension and the metrics.

In [None]:
from copy import deepcopy

In [None]:
new_dataframe = []
for index, row in df_schema.iterrows():
    data = {}
    flag_found = False
    for index_cja, row_cja in df_cja.iterrows():
        if row['querypath'] in row_cja['id']:
            data['xdm_path'] = row['querypath']
            data['xdm_title'] = row['title']
            data['xdm_type'] = row['type']
            data['cja_id'] = row_cja['id']
            data['cja_name'] = row_cja['name']
            data['cja_type'] = (lambda row : 'dimension' if row['id'].startswith('variables') else 'metric')(row_cja)
            new_dataframe.append(deepcopy(data))
            flag_found = True
            data = {}
    if flag_found == False:
        data['xdm_path'] = row['querypath']
        data['xdm_title'] = row['title']
        data['xdm_type'] = row['type']
        data['cja_id'] = None
        data['cja_name'] = None
        data['cja_type'] = None
        new_dataframe.append(deepcopy(data))

Transforming the new data object into a proper dataframe.

In [None]:
df_new = pd.DataFrame(new_dataframe)

We can show some variables that are defined in the XDM and their corresponding variable name in 

In [31]:
df_new[df_new['cja_id'].astype(bool)].sample(3) ## extract

Unnamed: 0,xdm_path,xdm_title,xdm_type,cja_id,cja_name,cja_type
2569,device.type,Type,string,variables/device.typeID.global-classify-string...,Audio Support,dimension
2445,_experience.decisioning.propositions,Involved Propositions,array,variables/66577f485e767a2c9e40b94d._experience...,Experience Name,dimension
3480,commerce.purchases.value,,number,metrics/commerce.purchases.value,Commerce Purchases,metric


## Saving your file

You can alsways save your data that is contained in a dataframe by passing the `to_csv()` method

In [None]:
df_new.to_csv('my_sdr_schema.csv',index=False)