Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure exceptions are not thrown #60

Conversation

mtibben
Copy link
Contributor

@mtibben mtibben commented Oct 19, 2012

Hey @alexanderdean,

This change is to do exception handling within the serde

It's is a reworked version of what we were discussing at 99designs#1

I've taken the approach suggested by @yalisassoon in that we capture all the lines, even if data is missing.

I've added one config parameter:

snowplow.serde.ignore_errors = 1

which will cause the serde to return null instead of throwing an exception when a parsing exception does occur.

This does break a few tests, as the behaviour is changed when bad data is present - it will just always null out the field instead of returning a nulled object. So this should also address #52

If everyone is happy with this approach I'll fix the rest of the tests.

Any comments and feedback appreciated!

cc @larsyencken

By putting the try/catch inside the foreach loops, we can catch exceptions for each parameter that we are parsing and can avoid try/catching in other parts of the code
The idea is to avoid throwing exceptions where possible, so this removes the big try/catch, and replaces it with smaller try/catch blocks, so we can handle errors on a per-field basis.

* Removes the try / catch block around the whole updateByParsing function
* Adds the general exception handling logic to the EventDeserializer.
* Adds a config option to returns null on an exception
Fixes for failing tests
// Replace our timestamp fields with the client's timestamp
String[] timestamp = value.split(" ");
this.dt = timestamp[0];
this.tm = timestamp[1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a break; missing here @mtibben ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, fixed in 53dda25

@alexanderdean
Copy link
Member

Hey @mtibben - many thanks for putting this together! It looks good. The whole ETL process not dying in the case of a single bad row of data is definitely the direction of travel we want to be moving SnowPlow ETL in.

Two quick things:

  1. I had a question above about a missing break;
  2. Where does the Commons Logging actually log to in a Hive/EMR world?

If you could fix up the rest of the tests that would be awesome, many thanks Michael.

I'm going to merge EmrEtlRunner into master very soon (cutting a new release), so will merge this pull request into a new feature branch once I've done this... Leaving this open in the meantime.

@mtibben
Copy link
Contributor Author

mtibben commented Oct 22, 2012

Where does the Commons Logging actually log to in a Hive/EMR world?

That's a good question. I've been logging the hive jobs to S3 using

:emr:
  :jobflow:
    log_uri: 's3n://my-joblog-bucket/'

and I'm assuming that the commons logging is going to the same place.

I haven't been able to verify this yet however

@mtibben
Copy link
Contributor Author

mtibben commented Oct 22, 2012

Test cases have been fixed.

I noticed that the useragent lib is not parsing some iPhone user agent strings correctly, were you aware of that ?

@mtibben
Copy link
Contributor Author

mtibben commented Oct 22, 2012

I've reported an issue: http://java.net/jira/browse/USER_AGENT_UTILS-30

@alexanderdean
Copy link
Member

Many thanks for the test cases, and thanks for raising useragent issue Mike! On the useragent parsing - we probably need to move from the current library to one which uses a regularly-updated 'database' (YAML file, basically) of useragent strings. I've raised a ticket for this: #62

@alexanderdean
Copy link
Member

Many thanks @mtibben - this is now merged into feature/ice-compat, which will be the next release...

@mtibben
Copy link
Contributor Author

mtibben commented Oct 24, 2012

awesome, great stuff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants