Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making the JSONPaths for mobile_context reflect 1-0-1 #478

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

bernardosrulzon
Copy link
Contributor

@alexanderdean @jbeemster This seems to be breaking the Storage Loader on Android Tracker 0.6.x

Related question: How should we handle JSONPaths when events come in both 1-0-0 and 1-0-1 formats? e.g. 50% of the user base that still haven't updated the app and are still in version 0.5.x

@jbeemster
Copy link
Member

Hey @bernardosrulzon you are correct these assets need to be updated! For additive changes like this the fields need to be appended to the bottom of both the JSONPath and Redshift DDL.

In this way loading of 1-0-0 events will still be able to be loaded into a 1-0-1 table.

@@ -17,6 +17,8 @@
"$.data.deviceManufacturer",
"$.data.deviceModel",
"$.data.carrier",
"$.data.networkType",
"$.data.networkTechnology",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fields need to be appended to the bottom of the file as they are additive changes.

@@ -32,6 +32,8 @@ CREATE TABLE atomic.com_snowplowanalytics_snowplow_mobile_context_1 (
os_version varchar(255) encode text32k not null,
device_manufacturer varchar(255) encode text255 not null,
device_model varchar(255) encode text32k not null,
network_type varchar(255) encode text32k not null,
network_technology varchar(255) encode text32k not null,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fields need to be inserted under android_idfa.

@bernardosrulzon
Copy link
Contributor Author

Good point! I've also changed the order on the schema, just for clarity

@jbeemster
Copy link
Member

Hey @bernardosrulzon last bit to this is that we would also generally need a file for helping with upgrading the table like this (https://github.com/snowplow/iglu-central/blob/master/sql/com.amazon.aws.cloudfront/migrate_wd_access_log_1_r4_to_r5.sql)

Which can be generated automatically using igluctl.

@bernardosrulzon
Copy link
Contributor Author

Great tool, never used it before! I've updated the DDL and added the migration query. Please feel free to rename and change files.

@alexanderdean
Copy link
Member

Putting into Blocked Redshift artifacts, as this will break things for 50% as @bernardosrulzon figured...

@bernardosrulzon
Copy link
Contributor Author

@alexanderdean Not really. I had overwritten the JSONPaths file before v0.6.x was merged, and events from v0.5.x keep coming correctly. The two new fields are not required on the schema, and load correctly into Redshift.

@bernardosrulzon
Copy link
Contributor Author

bernardosrulzon commented Jan 20, 2017

Query

SELECT v_tracker,
       count(*)
FROM atomic.com_snowplowanalytics_snowplow_mobile_context_1 mc
INNER JOIN atomic.events ON events.event_id = mc.root_id AND events.collector_tstamp = mc.root_tstamp
WHERE collector_tstamp > '2017-01-10'
GROUP BY 1;

Result

v_tracker,count
andr-0.6.1,6695752
andr-0.5.4,340147

I checked some of the events generated by 0.5.4 manually - they are loaded correctly indeed.

@bernardosrulzon
Copy link
Contributor Author

Ah, now I see your point. This will break the load stage for people that haven't migrated the table yet.

Should we be creating a new major version 2-0-0 whenever the number/type of columns is changed?

@alexanderdean
Copy link
Member

alexanderdean commented Jan 20, 2017

Hey @bernardosrulzon -

Should we be creating a new major version 2-0-0 whenever the number/type of columns is changed?

No, the version 1-0-1 is correct per SchemaVer; the problem is that our current JSON Path naming/retrieval code is not sensitive to REVISION or ADDITION schema changes. We plan on fixing this, at which point all the blocked artifacts can be released. In the meantime, if you want to use the additional fields, you can always host the JSON Paths files in your own hosted assets path (the one in your config.yml)...

@bernardosrulzon
Copy link
Contributor Author

That makes sense, but won't people have to upgrade Snowplow to make the retrieval code sensitive to additions and revisions? This will essentially put us in the same spot - until all users have upgraded, we won't be able to merge the blocked artifacts.

@alexanderdean
Copy link
Member

Hey @bernardosrulzon - great question.

When we fix this problem, we are going to do a huge overhaul of how JSON Paths files are namespaced and where they are hosted. They may in fact not be hosted at all, but instead generated dynamically from the relationship between the JSON Schema and the Redshift table definition. We will leave the existing JSON Paths files in place to support legacy Snowplow versions.

Does that make sense?

@bernardosrulzon
Copy link
Contributor Author

A lot! Very interesting roadmap :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants