Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add and incorporate macro for turning jsons (and only jsons) into strings #129

Open
wants to merge 2 commits into
base: releases/v0.4.latest
Choose a base branch
from

Conversation

fivetran-jamie
Copy link
Contributor

What change does this PR introduce?

  • Adds new macro, get_json_columns_in_relation to return a list of columns that are of type JSON (on BigQuery only, as we haven't rolled out JSON support in other Fivetran destinations).
  • Includes call to get_json_columns_in_relation in fill_staging_columns, so that each JSON field gets converted to a json-formatted STRING. This is necessary because only some connectors have JSONs currently and downstream json functions do not work seamlessly with the actual JSON-type columns.
    • fill_staging_columns was chosen as this is called in every (most?) source package model. therefore, we don't have to mass update a bunch of packages. however, we want this to be right and not break anything!

If this PR introduces a new macro, how did you test the new macro?

get_json_columns_in_relation

I first tested get_json_columns_in_relation by directly calling it in a local test model.

the model i ran it against was just a select * from Joe's seed file
tiktok_ads_source_integration_tests_2.tiktok_adgroup_history_data, which contains a JSON field called AGE_GROUPS.

image

When running, the test model, i was able to get the macro to return and output ['AGE_GROUPS']

fill_staging_columns

I tested the incorporation of get_json_columns_in_relation into fill_staging_columns by running the tiktok ads source package (specifically from the integration_tests folder) locally.

With tiktok_adgroup_history_data as the source, the model where trouble would pop up was stg_tiktok_ads__ad_group_history, specifically on the line where we try to coalesce age_groups and an actual-string field, age.
image

With my additions, the fields CTE in stg_tiktok_ads__ad_group_history compiles to the following

fields as (

    select
       
    
    action_days
   
        as 
    
    action_days
    
, 
    
    adgroup_id
    
        as 
    
    adgroup_id

# ................ cutting stuff out that's not relevant .......................... #

, 
    cast(null as string) as 
    
    age
    
 , 
    TO_JSON_STRING( 
    
    age_groups
    
 )
        as 
    
    age_groups
    
, 
    
    languages
   
        as 
    
    languages
    

, cast('' as string) as source_relation

    from base
), 

And so the coalesce of age_groups and age in the final CTE does not cause an error and the model runs successfully.

and the data looks OK in bigquery
image

If this PR introduces a modification to an existing macro, which packages is the macro currently present in and what steps were taken to test compatibility across packages?

Honestly we should test this with other packages as well -- ones with JSONs and not -- given the far-reaching nature of this macro change. perhaps we could have some customers try this out

Did you update the README to reflect the macro addition/modifications?

  • Yes
  • No (provide further explanation)

@fivetran-jamie
Copy link
Contributor Author

keeping this in our back pocket for now.

if we want to move forward with this PR we should add a variable that is a boolean, telling us whether or not to convert back to strings or maintain the json type

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant