Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertions and anonymous tracking #9

Open
awoehrl opened this issue May 6, 2021 · 3 comments
Open

Assertions and anonymous tracking #9

awoehrl opened this issue May 6, 2021 · 3 comments

Comments

@awoehrl
Copy link

awoehrl commented May 6, 2021

For use cases with anonymous tracking activated the assertions fail because of the check for cookie ids:

  OR domain_userid IS NULL
  OR domain_sessionid IS NULL
  OR domain_sessionidx IS NULL

Would it maybe make sense to have these seperated from the other data quality assertions? If I get dataform notifications everyday with "Run failed" because of these, I will probably ignore them after a couple of days and won't realize when there are other issues arising.

@adatzer
Copy link
Contributor

adatzer commented May 6, 2021

Hello @awoehrl and thank you for raising this!
The data models haven’t been built with anonymous tracking in mind that is why those assertions exist also in the dataform-data-models. In other words, those columns are indeed expected to be not null in the output tables, based on how the web data model is expected to work.
If those assertions fail, then this is certainly something to investigate, so it would be great if you could provide some more details in case this is something you have already encountered.

@awoehrl
Copy link
Author

awoehrl commented May 7, 2021

Hi @adatzer,
thanks for your feedback!
I don't have the model in production, but I'm testing it with our data right now. This is where I got the failed assertion errors:

We are in the process of implementing anonymous tracking for some entities: As long as we don't have a positive consent for a user from our CMP, we use the Snowplow anonymous tracking features (random network_userid, no domain_userid, but domain_sessionid enabled). As soon as consent is enabled, we are switching to tracking with all three cookie identifiers enabled.

Basically this means the model could work fine for page_views and sessions, because we have the neccessary data available in both cases. Only users is a bit problematic because of missing IDs for a part of the users.

Is anonymous tracking a use case where you would rather suggest building a custom model from scratch or is this something you are thinking about supporting at a later point maybe?

@adatzer
Copy link
Contributor

adatzer commented May 13, 2021

Hi @awoehrl and thank you very much for providing more details!

Is anonymous tracking a use case where you would rather suggest building a custom model from scratch or is this something you are thinking about supporting at a later point maybe?

This is indeed a valid use-case, so we do plan to support it through the Snowplow data-models (as also referenced by this issue) so as to provide an incrementalized and modular way of modeling that data. Meanwhile, users have the flexibility to decide how to best implement their use cases and Snowplow Insights customers can always reach out directly to their customer success manager about building a use-case based, custom datamodeling solution.

Concerning the current assertions' issue, a proper way to handle them is certainly to be considered, so that, as you mentioned, dataform users don't end up ignoring the respective test suites. The details you shared were really helpful so thanks once again and please keep us posted if there is anything else you spot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants