# 🚀 Generate a DomoStats style Dataset

Arguably

- This is an example of using DomoJupyter for ETL
- Using Domo as a replacement for custom connectors

Generating the Dataset is just a case of getting your data into the shape that you want it.

Mentally (and physically) separate the act of retrieving data from an API vs restructuring your data for use.


## ▶️ Where do we begin?

PROBLEM: "I want a dataset that shows me information about accounts"

CONVERT that into a statement of granularity: "what does one row of the dataset represent"

PLAN the project.

- look at the data you receive. describe how it differs from what you want it to look like.
- stub out some function names
- add appropriate parameters and expected response object


In [1]:
### this is nested dictionary represents one of many 'data_provider_type' where each data_provider_type has multpiple accounts associated with it

test = [
    {
        "key": "abstract-credential-store",
        "name": "Abstract Credential Store",
        "authenticationScheme": "fields",
        "unassociatedDataSourceCount": 0,
        "accounts": [
            {
                "id": 71,
                "name": "domo_creds",
                "userId": "1893952720",
                "displayName": "DomoLibrary - testrename 2024-03-20",
                "type": "data",
                "dataProviderType": "abstract-credential-store",
                "valid": True,
                "dateOfExpiration": None,
                "dataSourceCount": 0,
                "daysToExpiry": None,
                "expired": None,
            },
            {
                "id": 87,
                "name": "Abstract Credential Store Account",
                "userId": "1893952720",
                "displayName": "jw_creds",
                "type": "data",
                "dataProviderType": "abstract-credential-store",
                "valid": True,
                "dateOfExpiration": None,
                "dataSourceCount": 0,
                "daysToExpiry": None,
                "expired": None,
            },
            {
                "id": 88,
                "name": "fake_account",
                "userId": "1893952720",
                "displayName": "fake_account",
                "type": "data",
                "dataProviderType": "abstract-credential-store",
                "valid": True,
                "dateOfExpiration": None,
                "dataSourceCount": 0,
                "daysToExpiry": None,
                "expired": None,
            },
            {
                "id": 94,
                "name": "Abstract Credential Store Account",
                "userId": "1893952720",
                "displayName": "my_domo_community_access_token - updated 2024-02-23",
                "type": "data",
                "dataProviderType": "abstract-credential-store",
                "valid": True,
                "dateOfExpiration": None,
                "dataSourceCount": 0,
                "daysToExpiry": None,
                "expired": None,
            },
        ],
    }
]

### utils

because the `api_response` contains nested lists, we have to do a double list comprehension to flatten into a list of dictionaries.


In [2]:
# add me to functions/utils.py

from typing import Any, List


def flatten_list_of_lists(list_of_lists) -> List[Any]:
    # these are the same.

    # gather = []
    # for list in list_of_lists:    # for each dataprovider_type in entire_list.
    #     for row in list:          # for each account in dataprovider_type.
    #         gather.append(row)    # accumulate the account in a new list called gather

    gather = [row for ls in list_of_lists for row in ls]  # nested list comprehension
    return gather


flatten_list_of_lists([["a", "b", "c"], [1, 2, 3], ["john", "jacob", "jingle"]])

['a', 'b', 'c', 1, 2, 3, 'john', 'jacob', 'jingle']

DEVELOPER_NOTE

- while it is common for APIs to return JSON in `camelCase`, in python we will rewrite everything in `snake_case`.
- we can also take the liberty of renaming properties to something user friendly


In [3]:
import re


def format_str_camel_case(text):
    # https://www.w3resource.com/python-exercises/string/python-data-type-string-exercise-97.php
    # Replace hyphens with spaces, then apply regular expression substitutions for title case conversion
    # and add an underscore between words, finally convert the result to lowercase

    return "_".join(
        re.sub(
            "([A-Z][a-z]+)", r" \1", re.sub("([A-Z]+)", r" \1", text.replace("-", " "))
        ).split()
    ).lower()


print(format_str_camel_case("doesThisWork?"))
print(format_str_camel_case("what about This?"))

does_this_work?
what_about_this?


### make a new notebook for your solution in implementations/monit_accounts.ipynb


In [4]:
# move your code to functions/implementation/monit_accounts.py


def main():
    pass


main()

## 🚀 SOLUTION


### STEP 1. handle authentication


In [5]:
from dotenv import load_dotenv

load_dotenv(".env")

True

In [6]:
from solutions.auth import get_session_token
import os

DOMO_INSTANCE = "domo-community"
domo_password = os.environ["DOMO_PASSWORD"]
domo_username = os.environ["DOMO_USERNAME"]


def get_instance_session_token(domo_username, domo_password, domo_instance):
    return get_session_token(
        domo_instance=domo_instance,
        domo_password=domo_password,
        domo_username=domo_username,
    )


test_session_token = get_instance_session_token(
    domo_username, domo_password=domo_password, domo_instance=DOMO_INSTANCE
)

### Step 2. Get Data


In [7]:
from solutions.accounts import get_accounts
from typing import List


def get_instance_accounts(
    session_token, domo_instance, debug_api: bool = False
) -> List[dict]:

    res = get_accounts(
        domo_instance=domo_instance, session_token=session_token, debug_api=debug_api
    )

    account_ls = res.response

    return account_ls


test_dataproviders_ls = get_instance_accounts(
    session_token=test_session_token, domo_instance=DOMO_INSTANCE, debug_api=False
)[0:1]

test_dataproviders_ls

[{'key': 'abstract-credential-store',
  'name': 'Abstract Credential Store',
  'authenticationScheme': 'fields',
  'unassociatedDataSourceCount': 0,
  'accounts': [{'id': 71,
    'name': 'domo_creds',
    'userId': '1893952720',
    'displayName': 'domolibrary test account - updated 2024-03-23',
    'type': 'data',
    'dataProviderType': 'abstract-credential-store',
    'valid': True,
    'dateOfExpiration': None,
    'dataSourceCount': 0,
    'daysToExpiry': None,
    'expired': None},
   {'id': 87,
    'name': 'Abstract Credential Store Account',
    'userId': '1893952720',
    'displayName': 'jw_creds',
    'type': 'data',
    'dataProviderType': 'abstract-credential-store',
    'valid': True,
    'dateOfExpiration': None,
    'dataSourceCount': 0,
    'daysToExpiry': None,
    'expired': None},
   {'id': 88,
    'name': 'fake_account',
    'userId': '1893952720',
    'displayName': 'fake_account',
    'type': 'data',
    'dataProviderType': 'abstract-credential-store',
    'valid'

instead of complex nested `for loops`, we will build a function to handle data transformation at the row granularity.<br>
This approach improves testabilty because we can test the output of one row


In [8]:
"get_instance_accounts returns a list of account types which we'll capture as account_ls"
"each member of account_ls is a data_provider_type"
"each data_provider_type has a list of accounts"

test_dataprovider_obj = test_dataproviders_ls[0]
test_dataprovider_obj

{'key': 'abstract-credential-store',
 'name': 'Abstract Credential Store',
 'authenticationScheme': 'fields',
 'unassociatedDataSourceCount': 0,
 'accounts': [{'id': 71,
   'name': 'domo_creds',
   'userId': '1893952720',
   'displayName': 'domolibrary test account - updated 2024-03-23',
   'type': 'data',
   'dataProviderType': 'abstract-credential-store',
   'valid': True,
   'dateOfExpiration': None,
   'dataSourceCount': 0,
   'daysToExpiry': None,
   'expired': None},
  {'id': 87,
   'name': 'Abstract Credential Store Account',
   'userId': '1893952720',
   'displayName': 'jw_creds',
   'type': 'data',
   'dataProviderType': 'abstract-credential-store',
   'valid': True,
   'dateOfExpiration': None,
   'dataSourceCount': 0,
   'daysToExpiry': None,
   'expired': None},
  {'id': 88,
   'name': 'fake_account',
   'userId': '1893952720',
   'displayName': 'fake_account',
   'type': 'data',
   'dataProviderType': 'abstract-credential-store',
   'valid': True,
   'dateOfExpiration': No

In [9]:
test_account = test_dataprovider_obj["accounts"][0]
test_account

{'id': 71,
 'name': 'domo_creds',
 'userId': '1893952720',
 'displayName': 'domolibrary test account - updated 2024-03-23',
 'type': 'data',
 'dataProviderType': 'abstract-credential-store',
 'valid': True,
 'dateOfExpiration': None,
 'dataSourceCount': 0,
 'daysToExpiry': None,
 'expired': None}

## Step 3: Format Data


In [37]:
def format_account_v1(account_obj, dataprovider_obj, **kwargs):
    """most granular level"""

    s = {**account_obj, "data_provider_name": dataprovider_obj["name"]}

    # rename a field and remove the old field
    s["account_id"] = s.pop("id")
    s["account_name"] = s.pop("displayName")

    # remove fields
    s.pop("name")
    s.pop("type")
    s.pop("daysToExpiry")
    s.pop("valid")
    s.pop("expired")

    return {format_str_camel_case(key): value for key, value in s.items()}


format_account_v1(account_obj=test_account, dataprovider_obj=test_dataprovider_obj)

{'user_id': '1893952720',
 'data_provider_type': 'abstract-credential-store',
 'date_of_expiration': None,
 'data_source_count': 0,
 'data_provider_name': 'Abstract Credential Store',
 'account_id': 71,
 'account_name': 'domolibrary test account - updated 2024-03-23'}

In [38]:
def format_account_v2(account_obj, dataprovider_obj, **kwargs):
    """most granular level"""

    s = {**account_obj, "data_provider_name": dataprovider_obj["name"]}

    # rename a field and remove the old field
    s["account_id"] = s.pop("id")
    s["account_name"] = s.pop("displayName")

    # remove fields
    s.pop("name")
    s.pop("valid")

    return {format_str_camel_case(key): value for key, value in s.items()}


format_account_v2(account_obj=test_account, dataprovider_obj=test_dataprovider_obj)

{'user_id': '1893952720',
 'type': 'data',
 'data_provider_type': 'abstract-credential-store',
 'date_of_expiration': None,
 'data_source_count': 0,
 'days_to_expiry': None,
 'expired': None,
 'data_provider_name': 'Abstract Credential Store',
 'account_id': 71,
 'account_name': 'domolibrary test account - updated 2024-03-23'}

In [11]:
# %pip install pandas

In [36]:
import pandas as pd
from typing import Callable

def format_domostats_accounts(api_response,format_fn: Callable,  is_dataframe: bool = True):

    account_ls = [
        format_fn(account_obj=account_obj, dataprovider_obj=dataprovider_obj)
        for dataprovider_obj in api_response
        for account_obj in dataprovider_obj["accounts"]
    ]  # produces nested list of lists

    if not is_dataframe:
        return account_ls

    return pd.DataFrame(account_ls)


format_domostats_accounts(test_dataproviders_ls,format_fn=format_account_v1, is_dataframe=True)[0:5]

# passing functions into functions allow us to have configurable results without significantly refactoring code.
# notice that by passing the format function as a function (instead of calling it outside of format_domostats_accounts), 
# we can have different permutations of the accounts report!
# any idea what adding kwargs does for us?

format_domostats_accounts(test_dataproviders_ls,format_fn=format_account_v2, is_dataframe=True)[0:5]

Unnamed: 0,user_id,type,data_provider_type,date_of_expiration,data_source_count,days_to_expiry,expired,data_provider_name,account_id,account_name
0,1893952720,data,abstract-credential-store,,0,,,Abstract Credential Store,71,domolibrary test account - updated 2024-03-23
1,1893952720,data,abstract-credential-store,,0,,,Abstract Credential Store,87,jw_creds
2,1893952720,data,abstract-credential-store,,0,,,Abstract Credential Store,88,fake_account
3,1893952720,data,abstract-credential-store,,0,,,Abstract Credential Store,92,jw_username_password_auth
4,1893952720,data,abstract-credential-store,,0,,,Abstract Credential Store,94,my_domo_community_access_token - updated 2024-...


In [44]:
def generate_monit_instance_accounts(
    session_token : str,
    domo_instance : str,
    format_fn : callable,
    is_dataframe: bool = True,
    return_raw: bool = False,
    debug_api: bool = False,
) -> pd.DataFrame:

    api_response = get_instance_accounts(
        domo_instance=domo_instance, session_token=session_token, debug_api=debug_api
    )

    if return_raw:
        return api_response

    return format_domostats_accounts(
        api_response=api_response, is_dataframe=is_dataframe, format_fn=format_fn
    )

generate_monit_instance_accounts(
    domo_instance=DOMO_INSTANCE,
    session_token= test_session_token,
    format_fn = format_account_v1,
    is_dataframe=True,
)[0:5]

Unnamed: 0,user_id,data_provider_type,date_of_expiration,data_source_count,data_provider_name,account_id,account_name
0,1893952720,abstract-credential-store,,0,Abstract Credential Store,71,domolibrary test account - updated 2024-03-23
1,1893952720,abstract-credential-store,,0,Abstract Credential Store,87,jw_creds
2,1893952720,abstract-credential-store,,0,Abstract Credential Store,88,fake_account
3,1893952720,abstract-credential-store,,0,Abstract Credential Store,92,jw_username_password_auth
4,1893952720,abstract-credential-store,,0,Abstract Credential Store,94,my_domo_community_access_token - updated 2024-...


### 🎓 USE CASES TO CONSIDER

1. Recall, `session_token` will mimic the access rights and permissions of the user the session token is based off of. Under what circumstances would the list of account_objects retrieved NOT represent the entire list of account objects existent in the instance? How might you address that issue?

2. Recall, the base behavior in Domo of updating datasets is a full REPLACE operation. How would that impact your ability to track changes over time in account objects? What steps might you take to modify your code to track history?

3. Notice that `get_accounts()` does not retrieve account configuration (that's a different API) how might you approach building a dataset that monitors account configuration?

- Recall, that you cannot see account secret fields in plain text unless you are in DomoJupyter. What kind of workflow might you need to accurately see account configuration and build a dataset off of it?


### 🧪 Extra Challenge

Notice that the named user_id is just a user_id is just a name.

1. Construct a function, `get_user_by_id` that retrieves user information
2. Create a function `format_account` that receives an account_obj and adds decorator information (like the user display_name),


In [None]:
import domojupyter as dj


def main(session_token, domo_instance):

    df = generate_monit_instance_accounts(
        domo_instance=domo_instance,
        session_token=session_token,
        format_fn=format_account_v1,
        is_dataframe=True,
    )

    dj.write_dataframe(df, "YOUR_DATASET_NAME_v1")

    df = generate_monit_instance_accounts(
        domo_instance=domo_instance,
        session_token=session_token,
        format_fn=format_account_v2,
        is_dataframe=True,
    )

    dj.write_dataframe(df, "YOUR_DATASET_NAME_v2")