Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Errors when extracting JSON values #45

Closed
2 of 4 tasks
LewisDavies opened this issue Sep 23, 2022 · 12 comments
Closed
2 of 4 tasks

[Bug] Errors when extracting JSON values #45

LewisDavies opened this issue Sep 23, 2022 · 12 comments
Labels
bug Something isn't working

Comments

@LewisDavies
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Describe the issue

I'm trying to use the metadata list variables, e.g. stripe__plan_metadata, to specify the fields to extract but I'm seeing two distinct error pattens:

  • For fields with a full stop, e.g. feature.id, the package tries to use the same value as a column name. This fails because column names can't contain ..
  • I have another field called set. Snowflake expects this to set a session variable, which causes an error.

Relevant error log or model output

-- plan.id
10:51:08  Database Error in model stg_stripe__plan (models/stg_stripe__plan.sql)
10:51:08    001003 (42000): SQL compilation error:
10:51:08    syntax error line 224 at position 21 unexpected '.'.
10:51:08    syntax error line 227 at position 24 unexpected '('.
10:51:08    syntax error line 227 at position 51 unexpected ','.
10:51:08    syntax error line 227 at position 53 unexpected ''feature.id''.
10:51:08    syntax error line 229 at position 11 unexpected 'as'.
10:51:08    compiled SQL at target/run/stripe_source/models/stg_stripe__plan.sql

-- set
10:48:51  Database Error in model stg_stripe__plan (models/stg_stripe__plan.sql)
10:48:51    001003 (42000): SQL compilation error:
10:48:51    syntax error line 259 at position 14 unexpected 'set'.
10:48:51    compiled SQL at target/run/stripe_source/models/stg_stripe__plan.sql

Expected behavior

Ideally, columns should be created from the metadata with valid names. However, I imagine there are a ton of corner cases regarding character substitution and reserved keywords.

An alternative approach would be to pass metadata fields to the final models without modification. This would let users parse the values in a separate view.

dbt Project configurations

vars:
stripe__plan_metadata:
- archived
- feature.handle
- feature.id
- feature.type
- generated
- limit
- plan.id
- plan.handle
- set
- version

Package versions

packages:

  • package: fivetran/stripe
    version: 0.7.1

What database are you using dbt with?

snowflake

dbt Version

Core:

  • installed: 1.2.1
  • latest: 1.2.1 - Up to date!

Plugins:

  • snowflake: 1.2.0 - Up to date!

Additional Context

No response

Are you willing to open a PR to help address this issue?

  • Yes.
  • Yes, but I will need assistance and will schedule time during our office hours for guidance
  • No.
@LewisDavies LewisDavies added the bug Something isn't working label Sep 23, 2022
@fivetran-joemarkiewicz
Copy link
Contributor

Hi @LewisDavies thanks so much for opening this issue. This is a very interesting error you are coming across. I have a few quick questions to better understand the root of the issue:

  • Can you confirm for the full stop case feature.id, that the field name in the JSON object is indeed listed as feature.id. I don't imagine the JSON macro we have that pivots the fields would be causing this, but want to confirm the source is accurate and we just need to account for these edge cases.
  • Similar to the above, do you know if Stripe has any guardrails in place for these metadata fields? For example, can fields be created with special characters? If so, we will need to add regex matching into the code logic to account (find and replace) for these special characters.
  • Would you be able to share the contents of the target/run/stripe_source/models/stg_stripe__plan.sql file? I am curious to see how the compiled sql looks for this model.

In addition to the above questions, I feel the set issue may possibly be able to be addressed by double quoting the fields in the variable list. Would you be able to attempt the following:

vars:
    stripe__plan_metadata: ['"set"','feature.id','etc.']

I am not positive it will work, but would be worth the try! I will keep digging in on my end to see if there is a better way to address the reserved word issue. In the meantime, it would be great if you could help provide details to my above questions as well. Thanks so much again for raising this with our team 😄

@LewisDavies
Copy link
Contributor Author

Hey @fivetran-joemarkiewicz, sorry for the slow reply.

  • The full stops are in the field names taken from Stripe, so the name appears as feature.id in the raw data Fivetran has imported.
  • There don't seem to be any guardrails around the field names other than how many you can add and how long they are.
  • The full file is pretty long but the key part is in the final CTE:
replace(json_extract_path_text(try_parse_json( metadata ), 'plan.id' ), '"', '') as plan.id,
replace(json_extract_path_text(try_parse_json( metadata ), 'plan.handle' ), '"', '') as plan.handle,
replace(json_extract_path_text(try_parse_json( metadata ), 'set' ), '"', '') as set,

It's failing when the package tries to set a column name containing . or a reserved word. Enclosing them in quotes didn't work.

@fivetran-joemarkiewicz
Copy link
Contributor

No worries at all @LewisDavies thanks for sharing more information on the issue you are experiencing. Let me try and fiddle around with this a bit and see what I can come back with regarding this parsing issue. 🤔

@LewisDavies
Copy link
Contributor Author

Hey @fivetran-reneeli, here are a couple of models that don't pull through the metadata fields. The logic is all there, the columns have just been excluded from the final output.

I checked information_schema.columns and found a lot of tables that include a metadata column. The ones at the top are populated in my data, the others are empty:

CUSTOMER
CUSTOMER_BALANCE_TRANSACTION
INVOICE_LINE_ITEM
PLAN
PRICE
PRODUCT

-------------------

APPLICATION_FEE_REFUND
BANK_ACCOUNT
CARD
CHARGE
CHECKOUT_SESSION
COUPON
CREDIT_NOTE
DISPUTE
FILE_LINK
INVOICE
INVOICE_ITEM
ORDER_HISTORY
PAYMENT_INTENT
PAYMENT_METHOD
PAYOUT
PROMOTION_CODE
REFUND
SESSION
SETUP_INTENT
SKU
SOURCE
SUBSCRIPTION_HISTORY
SUBSCRIPTION_ITEM
SUBSCRIPTION_SCHEDULE
TAX_RATE
TRANSFER
TRANSFER_REVERSAL

@fivetran-reneeli
Copy link
Contributor

fivetran-reneeli commented Nov 15, 2022

Hi @LewisDavies ! Thank you for details and appreciate you linking the models.

  • So summing up our slack convo, since we're making many updates to our Stripe package this quarter I plan on including the point about bringing down the metadata fields to the final models. Will be addressing the other requests you and other customers have made as well in our overhaul as well 👍
  • As for the json values, we have the fix although it's going to be within our fivetran utils package, not the stripe package itself. The next release of stripe will have these updates, if you want to keep an eye out for that. Keep in mind the next stripe release is also dependent on the release of dbt v1. I will update the README for more info.

@CraigWilson-ZOE
Copy link

CraigWilson-ZOE commented Dec 5, 2022

Hi All,
I have upgraded to the latest 0.7.4 packages for Stripe (both the source and normal package) and I am still getting very similar errors as the original issue.

11:37:30  Database Error in model stg_stripe__plan (models/stg_stripe__plan.sql)
11:37:30    Syntax error: Unexpected keyword AS at [199:2]
11:37:30    compiled Code at target/run/stripe_source/models/stg_stripe__plan.sql

When I check out the compiled SQL I can see a field called interval listed in the select list:

    interval
    
 as plan_interval 

This is failing because on BigQuery interval is a reserved word and the query parser isn't liking the syntax.

I believe here that, to ensure issues like this do not happen, at least in BigQuery, they should be quoted using the tick so that reserved words can be referenced in a query i.e.:

    `interval`
    
 as plan_interval 

The above SQL now passes and the model is able to run.

Can this be looked into please?

It is worth noting that I am not using any vars or any additional configuration apart from running the model directly.

@CraigWilson-ZOE
Copy link

Looking at the solution above on using the alias function of the vars stripe__plan_metadata this did not work and will not fix this issue as the original code will always fail:

   interval
    
 as plan_interval 


    from base
),

final as (
    
    select 
        id as plan_id,
        active as is_active,
        amount,
        currency,
        plan_interval, -- Field is aliased within get_plan_columns macro
        interval_count,
        metadata,
        nickname,
        product_id

        
        , replace( 

  json_extract_scalar(metadata, '$.{'name': 'interval'}' )

, '"', '') as {'name':_'interval'},
replace( 

  json_extract_scalar(metadata, '$.{'alias': 'craig_interval'}' )

, '"', '') as {'alias':_'craig_interval'}

This is the SQL that is produced, and as you can see at the top of the query, the keyword is still used, whereas the json_extract is then used later. So the query is not fixed.

@CraigWilson-ZOE
Copy link

The interval column is not in the metadata column, but is instead a "normal" column:
Screenshot 2022-12-05 at 13 12 24

@fivetran-joemarkiewicz
Copy link
Contributor

Hi @CraigWilson-ZOE thanks for raising this with out team. I think the issue you are detailing is different from the original issue outlined in this Bug Report. This bug report was detailing reserved words within the metadata fields that are being parsed out via a var.

Your bug however, is a field that is explicitly being selected within the model (regardless of the var) and could not be addressed using the metadata solution outlined in this bug report.

I did try to recreate the issue you had using the latest version of the package(s) and saw the interval field was being backticked in the compiled code and ran successfully on BigQuery 🤔
image

The quoting should be happening within the get_plan_columns macro from the source package. The only way I could see dbt skipping over the quote is if the target.type for your dbt adapter is not BigQuery. Would you be able to confirm that the profiles.yml for your dbt target is a BigQuery type in this instance?

For a more immediate fix, you may be able to downgrade just your stripe_source package to v0.7.3 as the only change from v0.7.4 was some under the hood updates and adjusting this interval cast to be performed within the macro opposed to the select statement.

@CraigWilson-ZOE
Copy link

Ah! I think you have solved the issue for me actually.

We are using the fal adapter for our dbt, so that we can run Python models locally and not have to spin up a data proc cluster. This means that our target.type won't be BigQuery.

Here is a excerpt from our profiles.yml:

default:
  target: development
  outputs:
    development:
      type: fal
      threads: 16
      db_profile: warehouse

    warehouse:
      type: bigquery
      method: oauth

I am wondering how we might be able to get around this or have this changed to deal with these new type of adapters 🤔

@fivetran-joemarkiewicz
Copy link
Contributor

@CraigWilson-ZOE that is definitely interesting and thanks for sharing! I can see how this would cause an error for you with the target.type not matching our supported warehouses target.type names. That being said, you are using BigQuery so I still feel this can be supported. I am just not entirely familiar with the fal target and how it differs from the basic BigQuery target.

As this is a different conversation from the one being had within this issue, I have opened a new FR on the source package (linked above) for us to continue the discussion. Feel free to add more context within that ticket.

@fivetran-reneeli
Copy link
Contributor

Closing this out as we have updated the package to allow for the metadata json field to be pivoted out. For instructions on how to correctly set it up we've included info in the README!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants