-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure exceptions are not thrown #60
Ensure exceptions are not thrown #60
Conversation
By putting the try/catch inside the foreach loops, we can catch exceptions for each parameter that we are parsing and can avoid try/catching in other parts of the code
The idea is to avoid throwing exceptions where possible, so this removes the big try/catch, and replaces it with smaller try/catch blocks, so we can handle errors on a per-field basis. * Removes the try / catch block around the whole updateByParsing function * Adds the general exception handling logic to the EventDeserializer. * Adds a config option to returns null on an exception
Fixes for failing tests
// Replace our timestamp fields with the client's timestamp | ||
String[] timestamp = value.split(" "); | ||
this.dt = timestamp[0]; | ||
this.tm = timestamp[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a break;
missing here @mtibben ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, fixed in 53dda25
Hey @mtibben - many thanks for putting this together! It looks good. The whole ETL process not dying in the case of a single bad row of data is definitely the direction of travel we want to be moving SnowPlow ETL in. Two quick things:
If you could fix up the rest of the tests that would be awesome, many thanks Michael. I'm going to merge EmrEtlRunner into master very soon (cutting a new release), so will merge this pull request into a new feature branch once I've done this... Leaving this open in the meantime. |
That's a good question. I've been logging the hive jobs to S3 using
and I'm assuming that the commons logging is going to the same place. I haven't been able to verify this yet however |
Test cases have been fixed. I noticed that the useragent lib is not parsing some iPhone user agent strings correctly, were you aware of that ? |
I've reported an issue: http://java.net/jira/browse/USER_AGENT_UTILS-30 |
Many thanks for the test cases, and thanks for raising useragent issue Mike! On the useragent parsing - we probably need to move from the current library to one which uses a regularly-updated 'database' (YAML file, basically) of useragent strings. I've raised a ticket for this: #62 |
Many thanks @mtibben - this is now merged into |
awesome, great stuff |
Hey @alexanderdean,
This change is to do exception handling within the serde
It's is a reworked version of what we were discussing at 99designs#1
I've taken the approach suggested by @yalisassoon in that we capture all the lines, even if data is missing.
I've added one config parameter:
which will cause the serde to return null instead of throwing an exception when a parsing exception does occur.
This does break a few tests, as the behaviour is changed when bad data is present - it will just always null out the field instead of returning a nulled object. So this should also address #52
If everyone is happy with this approach I'll fix the rest of the tests.
Any comments and feedback appreciated!
cc @larsyencken