Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Unrecognized name: root_id" and "No matching signature for function TIMESTAMP" #70

Closed
wub opened this issue Sep 3, 2019 · 3 comments

Comments

@wub
Copy link

commented Sep 3, 2019

Hi, I'm using the GCP Snowplow setup. This puts events into a single BigQuery table, with this schema: https://gist.github.com/wub/9d4b9d27f15f8e7c87906e55cb913f68

When I run with this configuration in dbt 0.14.0:

models:
    snowplow:
        vars:
            'snowplow:timezone': 'Pacific/Auckland'
            'snowplow:page_ping_frequency': 10
            'snowplow:events': "dataset_name.events"
            'snowplow:context:web_page': "dataset_name.events"
            'snowplow:context:performance_timing': false
            'snowplow:context:useragent': false
            'snowplow:pass_through_columns': []
    base:
      optional:
        enabled: false
    page_views:
      optional:
        enabled: false

I get this error:

Found 7 models, 22 tests, 0 snapshots, 0 analyses, 163 macros, 0 operations, 1 seed files, 0 sources

21:34:12 | Concurrency: 2 threads (target='dev')
21:34:12 |
21:34:13 | 1 of 5 START table model dataset_name.snowplow_web_page_context [RUN]
21:34:13 | 2 of 5 START incremental model dataset_name.snowplow_id_map.... [RUN]
21:34:17 | 2 of 5 ERROR creating incremental model dataset_name.snowplow_id_map [ERROR in 2.85s]
21:34:19 | 1 of 5 ERROR creating table model dataset_name.snowplow_web_page_context [ERROR in 3.23s]
21:34:19 | 3 of 5 SKIP relation dataset_name.snowplow_page_views.......... [SKIP]
21:34:19 | 4 of 5 SKIP relation dataset_name.snowplow_sessions_tmp........ [SKIP]
21:34:19 | 5 of 5 SKIP relation dataset_name.snowplow_sessions............ [SKIP]
21:34:19 |
21:34:19 | Finished running 3 incremental models, 2 table models in 10.23s.

Completed with 2 errors:

Database Error in model snowplow_id_map (models\identification\snowplow_id_map.sql)
  No matching signature for function TIMESTAMP for argument types: TIMESTAMP. Supported signatures: TIMESTAMP(STRING, [STRING]); TIMESTAMP(DATE, [STRING]); TIMESTAMP(DATETIME, [STRING]) at [101:13]
  compiled SQL at target\compiled\snowplow\identification\snowplow_id_map.sql

Database Error in model snowplow_web_page_context (models\page_views\snowplow_web_page_context.sql)
  Unrecognized name: root_id at [35:9]
  compiled SQL at target\compiled\snowplow\page_views\snowplow_web_page_context.sql

Done. PASS=0 ERROR=2 SKIP=3 TOTAL=5
  • I understand that it's probably dodgy to simply set "dataset_name.events" as the context table
  • Is this related to #52 or #41?
  • Should I override the given models to try and make this work, or am I simply running it wrong?

Any info/direction would be greatly appreciated - thanks!

@drewbanin

This comment has been minimized.

Copy link
Collaborator

commented Sep 3, 2019

Hey @wub - we find that the names of these columns are (confusingly!) different depending on how Snowplow is hosted. I checked out the schema you sent over above - is contexts_com_snowplowanalytics_snowplow_web_page_1_0_0 a table itself or is it a column in the event table?

At a high level, I think there are two distinct things happening here:

Unrecognized name: root_id

  • dbt is looking for a column called root_id, but it might be called something else in your setup. This columns should be the foreign key to the event_id in the event table. I'm no so familiar with the expected Snowplow schema on BigQuery. In something like Redshift, the web_page context will render out as a separate table with two columns: root_id and id. It looks to me like this is not the case in your setup!

No matching signature for function TIMESTAMP for argument types: TIMESTAMP

  • I think this error is coming from these lines of code - BQ doesn't let you call timestamp() on a timestamp field. I think the Snowplow datasets we've worked with have been provided collector_tstamp as a datetime, and this code is intended to turn it into a timestamp.

In general, this package is based on Snowplow's original web data model. It sounds like some models for BQ are coming soon, and it would be super interesting to see how they end up recommending that these models work!

@wub

This comment has been minimized.

Copy link
Author

commented Sep 5, 2019

Hey @drewbanin, thanks for taking the time to look into this.

Every single line is in one table - it seems to make use of BigQuery's nested & repeating column types.

No info yet available on the timeline for BigQuery support for the original web data model (hence me coming to this repo) - so I'll override the relevant models and macros until it works (as you suggest in README).

First step will be changing root_id to contexts_com_snowplowanalytics_snowplow_web_page_1_0_0!

Feel free to close this one; we can probably revisit once Snowplow themselves release BQ for the original model.

@drewbanin

This comment has been minimized.

Copy link
Collaborator

commented Sep 5, 2019

Cool, thanks for the update @wub!

@drewbanin drewbanin closed this Sep 5, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.