# Generate a DomoStats style Dataset

Generating the Dataset is just a case of getting your data into the shape that you want it.

Developer Tip
- separate the act of retrieving data from act of restructuring your data for use
- to have recycleable code, consider moving functions into a separate .py file in a subfolder.
- notice you must have an `__ini__.py` file in the subfolder in order to import it as a module

# Domo Authentication

In [1]:
import functions.utils as utils

creds = utils.get_account_credentials("username_password_auth", is_abstract_account=True)
domo_instance = creds['DOMO_INSTANCE']

In [2]:
import functions.auth as auth

session_token = auth.get_full_auth(domo_username = creds['DOMO_USERNAME'],
                              domo_password = creds['DOMO_PASSWORD'],
                              domo_instance = creds['DOMO_INSTANCE'] )
assert session_token

# Interact with Domo

In [3]:
import requests
from typing import List

def get_accounts(domo_instance, session_token) -> List[dict]:
    """retrieves a list of accounts this user has access to"""
    
    headers = {"x-domo-authentication": session_token}    
    url = f'https://{domo_instance}.domo.com/api/data/v2/datasources/providers'
    
    res= requests.request(method = "GET",
                            url = url,
                            headers = headers
                           )
    
    data = res.json()
    
    if not res.ok:
        DomoAPIRequestError(data)
        
    return data

get_accounts(domo_instance = domo_instance, session_token = session_token)[0:1]

[{'key': 'abstract-credential-store',
  'name': 'Abstract Credential Store',
  'authenticationScheme': 'fields',
  'unassociatedDataSourceCount': 0,
  'accounts': [{'id': 71,
    'name': 'domo_creds',
    'userId': '1893952720',
    'displayName': 'domo_creds - update 2024-02-21',
    'type': 'data',
    'dataProviderType': 'abstract-credential-store',
    'valid': True,
    'dateOfExpiration': None,
    'dataSourceCount': 0,
    'daysToExpiry': None,
    'expired': None},
   {'id': 87,
    'name': 'Abstract Credential Store Account',
    'userId': '1893952720',
    'displayName': 'jw_creds',
    'type': 'data',
    'dataProviderType': 'abstract-credential-store',
    'valid': True,
    'dateOfExpiration': None,
    'dataSourceCount': 0,
    'daysToExpiry': None,
    'expired': None},
   {'id': 88,
    'name': 'fake_account',
    'userId': '1893952720',
    'displayName': 'fake_account',
    'type': 'data',
    'dataProviderType': 'abstract-credential-store',
    'valid': True,
    'da

# How should we begin?
Look at what the API, `get_accounts` gives you -- a `List[dict]` where each dict represents a `dataProiderType` and within each `dataProviderType` there's the list of `accounts` that belong to that data provider type.

💡 BIG TIP!  Express your Target granularity
"We want one row for each account"

DEVELOPER_NOTE
- while it is common for APIs to return JSON in `camelCase`, in our code we will rewrite everything in `snake_case`.
- we can also take the liberty of renaming properties to something user friendly

## utils

In [4]:
import re

def format_str_camel_case(text):
    # https://www.w3resource.com/python-exercises/string/python-data-type-string-exercise-97.php
    # Replace hyphens with spaces, then apply regular expression substitutions for title case conversion
    # and add an underscore between words, finally convert the result to lowercase

    return '_'.join(
        re.sub('([A-Z][a-z]+)', r' \1',
        re.sub('([A-Z]+)', r' \1',
        text.replace('-', ' '))).split()).lower() 

print(format_str_camel_case("doesThisWork?"))
print(format_str_camel_case("what about This?"))

does_this_work?
what_about_this?


## process API Response
instead of a bunch of nested for loops, we will build a function to handle data transformation at the row granularity.<br>
This approach improves testabilty because we can test the output of one row

In [5]:
test_data = get_accounts(domo_instance = domo_instance, session_token = session_token)
test_data[0]

{'key': 'abstract-credential-store',
 'name': 'Abstract Credential Store',
 'authenticationScheme': 'fields',
 'unassociatedDataSourceCount': 0,
 'accounts': [{'id': 71,
   'name': 'domo_creds',
   'userId': '1893952720',
   'displayName': 'domo_creds - update 2024-02-21',
   'type': 'data',
   'dataProviderType': 'abstract-credential-store',
   'valid': True,
   'dateOfExpiration': None,
   'dataSourceCount': 0,
   'daysToExpiry': None,
   'expired': None},
  {'id': 87,
   'name': 'Abstract Credential Store Account',
   'userId': '1893952720',
   'displayName': 'jw_creds',
   'type': 'data',
   'dataProviderType': 'abstract-credential-store',
   'valid': True,
   'dateOfExpiration': None,
   'dataSourceCount': 0,
   'daysToExpiry': None,
   'expired': None},
  {'id': 88,
   'name': 'fake_account',
   'userId': '1893952720',
   'displayName': 'fake_account',
   'type': 'data',
   'dataProviderType': 'abstract-credential-store',
   'valid': True,
   'dateOfExpiration': None,
   'dataSou

In [6]:
test_data[0]['accounts'][0]

{'id': 71,
 'name': 'domo_creds',
 'userId': '1893952720',
 'displayName': 'domo_creds - update 2024-02-21',
 'type': 'data',
 'dataProviderType': 'abstract-credential-store',
 'valid': True,
 'dateOfExpiration': None,
 'dataSourceCount': 0,
 'daysToExpiry': None,
 'expired': None}

In [7]:
def process_domostats_get_accounts_account(account_obj, data_provider_name):
    """most granular level"""
    
    account_obj.update({"data_provider_name": data_provider_name}) 
    
    account_obj['dataProviderType'] = account_obj['dataProviderType'] 
    account_obj['account_id'] = account_obj.pop('id')
    account_obj['account_name'] = account_obj.pop('displayName')

    account_obj.pop('name')
    account_obj.pop('type')
    account_obj.pop('daysToExpiry')
    account_obj.pop('valid')
    account_obj.pop('expired')

    return { format_str_camel_case(key) : value for key, value in account_obj.items()}

test_data_provider_type = test_data[0]['accounts'][0].copy()
test_account = test_data[0]['accounts'][0].copy()

process_domostats_get_accounts_account(account_obj = test_account , 
                                       data_provider_name = test_data_provider_type['name'])

{'user_id': '1893952720',
 'data_provider_type': 'abstract-credential-store',
 'date_of_expiration': None,
 'data_source_count': 0,
 'data_provider_name': 'domo_creds',
 'account_id': 71,
 'account_name': 'domo_creds - update 2024-02-21'}

In [8]:
import pandas as pd 

def process_domostats_get_accounts_data_povider(data_provider_obj) -> List[dict]:
    """receives the data_provider obj and flattens to the account obj"""
    account_ls = data_provider_obj['accounts']
    
    return [process_domostats_get_accounts_account(account_obj.copy(), 
                                                   data_provider_name = data_provider_obj['name']) for account_obj in account_ls]
    
test_data_provider_type = test_data[0].copy()

pd.DataFrame(process_domostats_get_accounts_data_povider(test_data_provider_type))[0:5]

Unnamed: 0,user_id,data_provider_type,date_of_expiration,data_source_count,data_provider_name,account_id,account_name
0,1893952720,abstract-credential-store,,0,Abstract Credential Store,71,domo_creds - update 2024-02-21
1,1893952720,abstract-credential-store,,0,Abstract Credential Store,87,jw_creds
2,1893952720,abstract-credential-store,,0,Abstract Credential Store,88,fake_account
3,1893952720,abstract-credential-store,,0,Abstract Credential Store,92,jw_username_password_auth


because the `api_response` contains nested lists, we have to do a double list comprehension to flatten into a list of dictionaries.

```
for list_of_dictionaries in list_of_lists:
    for dict in list_of_dictionaries:
        <do_something_with_dict>
```

is same as


```
[<do_something_with_dict> for list_of_dictionaries in list_of_lists for dict in list_of_dictionaries]
```


In [9]:
import pandas as pd


def generate_domostats_get_accounts(
    domo_instance, session_token
) -> pd.DataFrame:
    
    api_response = get_accounts(domo_instance = domo_instance, session_token = session_token) 
    
    accounts_by_providers_ls = [ process_domostats_get_accounts_data_povider(data_provider_obj) for data_provider_obj in api_response] # produces nested list of lists
    
    # return accounts_by_providers_ls # one list of accounts per data_provider_type.  uncomment to see an early return
    
    account_ls = [account for account_ls in accounts_by_providers_ls for account in account_ls] 
    
    # return account_ls # a list of accounts.  uncomment to see an early return
    
    return pd.DataFrame(account_ls)

generate_domostats_get_accounts(
    domo_instance = domo_instance,
    session_token = session_token
)[0:5]
    

Unnamed: 0,user_id,data_provider_type,date_of_expiration,data_source_count,data_provider_name,account_id,account_name
0,1893952720,abstract-credential-store,,0,Abstract Credential Store,71,domo_creds - update 2024-02-21
1,1893952720,abstract-credential-store,,0,Abstract Credential Store,87,jw_creds
2,1893952720,abstract-credential-store,,0,Abstract Credential Store,88,fake_account
3,1893952720,abstract-credential-store,,0,Abstract Credential Store,92,jw_username_password_auth
4,1893952720,dataset-copy,,1,DataSet Copy,1,dsa - northshore


## 🚧 HOMEWORK!!

Notice that the named user_id is just a user_id is just a name.  

1. Construct a function, `get_user_by_id` that retrieves user information
2. Modify `process_domostats_get_accounts_account` to call the `get_user_by_id` function for the `userId`
3. Modify the upstream functions to pass the appropriate authentication information

In [10]:
# def get_user_by_id(user_id, access_token, domo_instance):
#     url = f'https://{domo_instance}.domo.com/api/content/v2/users/{user_id}'
    
#     return {"username" : "sample_response"} # delete this row and finish

# test_user_id = 123 # your_user_id
# get_user_by_id( test_user_id, session_token, domo_instance)   

# def process_domostats_get_accounts_account(account_obj, data_provider_name, session_token, domo_instance):
#     """most granular level"""
    
#     account_obj.update({"data_provider_name": data_provider_name}) 
    
#     account_obj.update({"user_name" : get_user_by_id(user_id = account_obj['userId'])}) # finish this line!
                                                     
#     account_obj['dataProviderType'] = account_obj['dataProviderType'] 
#     account_obj['account_id'] = account_obj.pop('id')
#     account_obj['account_name'] = account_obj.pop('displayName')
    
#     account_obj.pop('name')
#     account_obj.pop('type')
#     account_obj.pop('daysToExpiry')
#     account_obj.pop('valid')
#     account_obj.pop('expired')

#     return { format_str_camel_case(key) : value for key, value in account_obj.items()}

# test_data_provider_type = test_data[0].copy()

# pd.DataFrame(process_domostats_get_accounts_data_povider(test_data_provider_type))[0:5]

## Output as a Dataframe

To output as a dataframe, modify the configuration of the JupyterWorkspace to have an output file.

1. add an output dataset to domojupyter workspace to interact with<br><br>
   Data > Jupyter Workspaces > Edit (Workspace Name) > Output Datasets > Add Output Dataset - "DomoStas - Accounts"

2. call `domojupyter.write_dataframe` to output the dataset.  Note it is theoretically possible to apply a PARTITION or UPSERT scheme instead of a straight REPLACE

In [13]:
import domojupyter as domo

def main():
    
    creds = utils.get_account_credentials('username_password_auth')
    domo_instance = creds['DOMO_INSTANCE']
    
    session_token = auth.get_full_auth(domo_username = creds['DOMO_USERNAME'],
                                  domo_password = creds['DOMO_PASSWORD'],
                                  domo_instance = domo_instance
                                )

    df = generate_domostats_get_accounts(domo_instance = domo_instance,
                                         session_token = session_token)
    

    domo.write_dataframe(df, 'DomoStats - Accounts')
    
    
main()

  columns = [{'name': column, 'type': _convert_type(df.dtypes[ind].name)} for ind, column in enumerate(df.columns)]
  columns = [{'name': column, 'type': _convert_type(df.dtypes[ind].name)} for ind, column in enumerate(df.columns)]
  columns = [{'name': column, 'type': _convert_type(df.dtypes[ind].name)} for ind, column in enumerate(df.columns)]
  columns = [{'name': column, 'type': _convert_type(df.dtypes[ind].name)} for ind, column in enumerate(df.columns)]
  columns = [{'name': column, 'type': _convert_type(df.dtypes[ind].name)} for ind, column in enumerate(df.columns)]
  columns = [{'name': column, 'type': _convert_type(df.dtypes[ind].name)} for ind, column in enumerate(df.columns)]
  columns = [{'name': column, 'type': _convert_type(df.dtypes[ind].name)} for ind, column in enumerate(df.columns)]
