# Generate a DomoStats style Dataset
In this tutorial we will:
Investigate how we can use python skills to produce a DomoStats / DomoGovernance styled dataset

Generating the Dataset is just a case of getting your data into the shape that you want it.

Mentally (and physically) separate the act of retrieving data from an API vs restructuring your data for use.

## ▶️ how might we organize our project?

PROBLEM: "I want a dataset that shows me information about accounts"

CONVERT that into a statement of granularity.

PLAN the project.
- look at the data you receive.  describe how it differs from what you want it to look like.
- stub out some function names
- add a doc string for what each function should do
- add appropriate parameters and expected response object

### utils

because the `api_response` contains nested lists, we have to do a double list comprehension to flatten into a list of dictionaries.

In [66]:
def flatten_list_of_lists(list_of_lists):
    # these are the same.
    
    # gather = []
    # for list in list_of_lists:
    #     for row in list:
    #         gather.append(row)
            

    return [row for ls in list_of_lists for row in ls]

flatten_list_of_lists([['a','b','c'],[1,2,3],['john','jacob','jingle']])

['a', 'b', 'c', 1, 2, 3, 'john', 'jacob', 'jingle']

DEVELOPER_NOTE
- while it is common for APIs to return JSON in `camelCase`, in python we will rewrite everything in `snake_case`.
- we can also take the liberty of renaming properties to something user friendly

In [32]:
import re

def format_str_camel_case(text):
    # https://www.w3resource.com/python-exercises/string/python-data-type-string-exercise-97.php
    # Replace hyphens with spaces, then apply regular expression substitutions for title case conversion
    # and add an underscore between words, finally convert the result to lowercase

    return '_'.join(
        re.sub('([A-Z][a-z]+)', r' \1',
        re.sub('([A-Z]+)', r' \1',
        text.replace('-', ' '))).split()).lower() 

print(format_str_camel_case("doesThisWork?"))
print(format_str_camel_case("what about This?"))

does_this_work?
what_about_this?


## SPOILER!! -- SOLUTION --

### for local development use python-dotenv

- the domolibrary is not available for use outside of domojupyter
- so to handle authentication, we will store credentials in a .env file

In [None]:
# %pip install python-dotenv

In [2]:
from dotenv import load_dotenv
import os
load_dotenv('.env', override= True)

True

### handle authentication

In [7]:
from solutions.get_full_auth_v1 import get_full_auth

domo_instance = 'domo-community'

session_token = get_full_auth(domo_instance=domo_instance,
                              domo_password= os.environ['DOMO_PASSWORD'],
                              domo_username= os.environ['DOMO_USERNAME'],
                              return_raw= False
                              )



In [67]:
from solutions.get_accounts_v2 import get_accounts 
from typing import List

def get_instance_accounts(session_token,
                          domo_instance,
                          debug_api : bool = False) -> List[dict]:
    
    res = get_accounts(domo_instance = domo_instance , session_token = session_token, debug_api= debug_api)

    account_ls = res.json()

    return account_ls

get_instance_accounts(session_token = session_token, domo_instance= domo_instance, debug_api = False)[0:1]

[{'key': 'abstract-credential-store',
  'name': 'Abstract Credential Store',
  'authenticationScheme': 'fields',
  'unassociatedDataSourceCount': 0,
  'accounts': [{'id': 71,
    'name': 'domo_creds',
    'userId': '1893952720',
    'displayName': 'DomoLibrary - testrename 2024-03-20',
    'type': 'data',
    'dataProviderType': 'abstract-credential-store',
    'valid': True,
    'dateOfExpiration': None,
    'dataSourceCount': 0,
    'daysToExpiry': None,
    'expired': None},
   {'id': 87,
    'name': 'Abstract Credential Store Account',
    'userId': '1893952720',
    'displayName': 'jw_creds',
    'type': 'data',
    'dataProviderType': 'abstract-credential-store',
    'valid': True,
    'dateOfExpiration': None,
    'dataSourceCount': 0,
    'daysToExpiry': None,
    'expired': None},
   {'id': 88,
    'name': 'fake_account',
    'userId': '1893952720',
    'displayName': 'fake_account',
    'type': 'data',
    'dataProviderType': 'abstract-credential-store',
    'valid': True,
  


instead of complex nested `for loops`, we will build a function to handle data transformation at the row granularity.<br>
This approach improves testabilty because we can test the output of one row

In [51]:
test_account_ls = get_instance_accounts(domo_instance = domo_instance, 
                                  session_token = session_token)


"get_instance_accounts returns a list of account types which we'll capture as account_ls"
"each member of account_ls is a data_provider_type"
"each data_provider_type has a list of accounts"

test_data_provider_type = test_account_ls[0]
test_data_provider_type

{'key': 'abstract-credential-store',
 'name': 'Abstract Credential Store',
 'authenticationScheme': 'fields',
 'unassociatedDataSourceCount': 0,
 'accounts': [{'id': 71,
   'name': 'domo_creds',
   'userId': '1893952720',
   'displayName': 'DomoLibrary - testrename 2024-03-20',
   'type': 'data',
   'dataProviderType': 'abstract-credential-store',
   'valid': True,
   'dateOfExpiration': None,
   'dataSourceCount': 0,
   'daysToExpiry': None,
   'expired': None},
  {'id': 87,
   'name': 'Abstract Credential Store Account',
   'userId': '1893952720',
   'displayName': 'jw_creds',
   'type': 'data',
   'dataProviderType': 'abstract-credential-store',
   'valid': True,
   'dateOfExpiration': None,
   'dataSourceCount': 0,
   'daysToExpiry': None,
   'expired': None},
  {'id': 88,
   'name': 'fake_account',
   'userId': '1893952720',
   'displayName': 'fake_account',
   'type': 'data',
   'dataProviderType': 'abstract-credential-store',
   'valid': True,
   'dateOfExpiration': None,
   'da

In [52]:
test_account = test_data_provider_type['accounts'][0]
test_account

{'id': 71,
 'name': 'domo_creds',
 'userId': '1893952720',
 'displayName': 'DomoLibrary - testrename 2024-03-20',
 'type': 'data',
 'dataProviderType': 'abstract-credential-store',
 'valid': True,
 'dateOfExpiration': None,
 'dataSourceCount': 0,
 'daysToExpiry': None,
 'expired': None}

In [57]:
import copy


def process_account(account_obj, data_provider_name):

    """most granular level"""

    s = {**account_obj, "data_provider_name": data_provider_name}

    # rename a field and remove the old field
    s["account_id"] = s.pop("id")
    s["account_name"] = s.pop("displayName")


    # remove fields
    s.pop("name")
    s.pop("type")
    s.pop("daysToExpiry")
    s.pop("valid")
    s.pop("expired")


    return {format_str_camel_case(key): value for key, value in s.items()}


process_account(
    account_obj=test_account, data_provider_name=test_data_provider_type["name"]
)

{'user_id': '1893952720',
 'data_provider_type': 'abstract-credential-store',
 'date_of_expiration': None,
 'data_source_count': 0,
 'data_provider_name': 'Abstract Credential Store',
 'account_id': 71,
 'account_name': 'DomoLibrary - testrename 2024-03-20'}

In [58]:
import pandas as pd


def process_data_povider(data_provider_obj: dict) -> List[dict]:
    """receives the data_provider obj and flattens to the account obj"""
    account_ls = data_provider_obj["accounts"]

    return [
        process_account(account_obj, data_provider_name=data_provider_obj["name"])
        for account_obj in account_ls
    ]


pd.DataFrame(process_data_povider(test_data_provider_type))[0:5]

Unnamed: 0,user_id,data_provider_type,date_of_expiration,data_source_count,data_provider_name,account_id,account_name
0,1893952720,abstract-credential-store,,0,Abstract Credential Store,71,DomoLibrary - testrename 2024-03-20
1,1893952720,abstract-credential-store,,0,Abstract Credential Store,87,jw_creds
2,1893952720,abstract-credential-store,,0,Abstract Credential Store,88,fake_account
3,1893952720,abstract-credential-store,,0,Abstract Credential Store,92,jw_username_password_auth
4,1893952720,abstract-credential-store,,0,Abstract Credential Store,94,my_domo_community_access_token - updated 2024-...


In [None]:
def format_domostats_accounts(api_response, is_to_dataframe: bool = True):
    
    accounts_by_providers_ls = [
        process_data_povider(data_provider_obj) for data_provider_obj in api_response
    ]  # produces nested list of lists

    accounts_ls = flatten_list_of_lists(accounts_by_providers_ls)

    if not is_to_dataframe:
        return accounts_by_providers_ls
        
    return pd.DataFrame(accounts_ls)

In [65]:
import pandas as pd


def generate_domostats_get_accounts(
    domo_instance,
    session_token,
    is_to_dataframe: bool = True,
    return_raw: bool = False,
    debug_api: bool = False,
) -> pd.DataFrame:

    api_response = get_instance_accounts(
        domo_instance=domo_instance, session_token=session_token, debug_api=debug_api
    )

    if return_raw:
        return api_response

    return format_domostats_accounts(
        api_response=api_response, is_to_dataframe=is_to_dataframe
    )

In [None]:
generate_domostats_get_accounts(
    domo_instance=domo_instance, session_token=session_token
)[0:5]

## 🧪 Extra Challenge

Notice that the named user_id is just a user_id is just a name.  

1. Construct a function, `get_user_by_id` that retrieves user information
2. Create a function `format_account` that receives an account_obj and adds decorator information (like the user display_name), 

## ▶️ create a function main() that output the formatted DataFrame

1. create a function `main` that receives a domojupyter account_name, updates the datast, `YourInitials_MONIT_DomoAccount` and returns the dataframe

🎓 USE CASES TO CONSIDER
1. Recall, `session_token` will mimic the access rights and permissions of the user the session token is based off of.  Under what circumstances would the list of account_objects retrieved NOT represent the entire list of account objects existent in the instance?  How might you address that issue?

2. Recall, the base behavior in Domo of updating datasets is a full REPLACE operation.  How would that impact your ability to track changes over time in account objects?  What steps might you take to modify your code to track history?

3. Notice that `get_accounts()` does not retrieve account configuration (that's a different API) how might you approach building a dataset that monitors account configuration?  
- Recall, that you cannot see account secret fields in plain text unless you are in DomoJupyter.  What kind of workflow might you need to accurately see account configuration and build a dataset off of it?