Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EmrEtlRunner: pass etl_tstamp into Hadoop Enrich as an argument #396

Closed
alexanderdean opened this issue Oct 22, 2013 · 5 comments
Closed

Comments

@alexanderdean
Copy link
Member

Currently no way of knowing that a given event in Redshift or Postgres was generated by a specific ETL run.

@ghost ghost assigned alexanderdean Oct 22, 2013
@alexanderdean
Copy link
Member Author

On disk, run ids look like this: run=2013-08-22-11-20-46

I'm wondering if we should store the run id in Redshift as:

  1. string, run=2013-08-22-11-20-46
  2. string, 2013-08-22-11-20-46
  3. timestamp, 2013-08-22 11:20:46

#3 Might be most flexible for querying... Maybe we call it etl_ts rather than run_id.

@yalisassoon what do you reckon?

@yalisassoon
Copy link
Member

I think timestamp would be fastest for querying and richest (because you
could analyze e.g. how regularly queries occur etc.)
Also agree etl_ts is a better name for this in the database

On Sat, Oct 26, 2013 at 2:44 PM, Alexander Dean notifications@github.comwrote:

On disk, run ids look like this: run=2013-08-22-11-20-46

I'm wondering if we should store the run id in Redshift as:

  1. string, run=2013-08-22-11-20-46
  2. string, 2013-08-22-11-20-46
  3. timestamp, 2013-08-22 11:20:46

#3 #3 Might be most flexible
for querying... Maybe we call it etl_ts rather than run_id.


Reply to this email directly or view it on GitHubhttps://github.com//issues/396#issuecomment-27146532
.

@alexanderdean
Copy link
Member Author

Moving this back so that 0.9.1 can be a Ruby-app-only release...

@alexanderdean alexanderdean changed the title Add run id to our Snowplow events EmrEtlRunner: pass etl_ts into Scala Hadoop Enrich as an argument Jun 12, 2014
@alexanderdean alexanderdean modified the milestones: JSON-powered enrichments, Version 0.9.9 Jun 12, 2014
@fblundun fblundun changed the title EmrEtlRunner: pass etl_ts into Scala Hadoop Enrich as an argument EmrEtlRunner: pass etl_tstamp into Scala Hadoop Enrich as an argument Jun 16, 2014
@alexanderdean alexanderdean changed the title EmrEtlRunner: pass etl_tstamp into Scala Hadoop Enrich as an argument EmrEtlRunner: pass etl_tstamp into Hadoop Enrich as an argument Jun 25, 2014
@alexanderdean
Copy link
Member Author

Missing from CHANGELOG

@alexanderdean
Copy link
Member Author

Fixed in 0.9.6, closing

peel pushed a commit to snowplow/emr-etl-runner that referenced this issue May 26, 2020
peel pushed a commit to snowplow/emr-etl-runner that referenced this issue May 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants