Added some stability changes for empty rows, invalid json, and different types #1

Closed
wants to merge 11 commits into
from

3 participants

@rathboma

Hey,

I'm using your JSON serde, but the json we're processing is pretty flakey and isn't always necessarily valid.

Basically my changes give the following new behaviors:

  • rows with invalid json no-longer cause it to throw an exception (nulls for everything instead)
  • you can set a field as a 'String' even when it is not, and it will not throw an exception
  • if a field does not exist in the json it will return null instead of throwing an exception

I probably did things in a screwy way, but let me know and I can make any changes necessary

matthew added some commits Aug 5, 2011
matthew Lots of stability changes around casting, null rows, and invalid json
- on invalid json input it does not crash
- String as a fallback -- give a column type of "string" and it will always succeed regardless of what the object actually is (it calls toString() if it can't cast the object)
- Given a key that does not exist in the JSON it will not crash, it will just return Null
fab6933
matthew should not check-in jars 4a54951
@rcongiu
Owner
@rathboma

Hey,

I like your solution to the malformed json line, that's def the better way to do it. I wasn't sure how to do that. Although my hack at least shows that a line was present (if invalid) which might be useful for debugging?

On the string component, I understand your hesitations, but here is my argument (and why we need it ourselves):
JSON is schema-less, so one day a field might be a string, another day it might be an int. Without being able to call it a string there's no way you can query this data.

Simple example {"number_calls": "none"} or {"number_calls":23}
It's stupid but it can happen, and it gives you an easy way to brute force the serde into working.

This is how the other build-in serde's work in practice, but because json is typed we get errors at a different place (after deserialization of the type).

@rathboma

Oh, and the cast won't work, because it's trying to cast Int to String at a much lower level in the deserialization stack, which is why I had to add this slightly hacky solution.

@rcongiu
Owner
@rathboma

Yeah, my solution is functional but not very graceful.

Stupid question, could we not just call o.toString()? A string would just return the original string no?

@rathboma rathboma closed this Aug 5, 2011
@rathboma rathboma reopened this Aug 5, 2011
@rcongiu
Owner
@rathboma

Hehe, can't believe I didn't think of that before. Thanks.

Let me know what you decide to do about malformed rows and we'll start using your version of the jar instead of my clone

@rcongiu
Owner
matthew and others added some commits Aug 10, 2011
matthew underscores replace dashes, and toString by default 12f17f6
matthew when deserializing, top-level keys will not be deserialized if they a…
…re not of the right type.
abbed5e
matthew added another test, just in case 6b7619e
matthew JSON serializer bug
- it was looking for base-16 numbers in a string whenever it saw \u
- this was raising exceptions from the parseInt function
- now it tries to do that, then falls back to just adding the characters.
147760d
matthew Added a test for the stupid deserialization of base-16 numbers that a…
…re invalid.

- it doesn't error, hooray!
- it has a test, hooray!
da4d087
matthew SerDe now allows remapping of keynames!
- you can tell the serde that you want 'field1' to be matched by 'newname'
- tests in place, it works
f3a24a2
@tdyas tdyas handle json nulls in map inspector 09679e0
@rathboma rathboma Merge pull request #1 from tdyas/master
handle json nulls in map inspector
7125350
matthew version bump 312475a
@rcongiu rcongiu closed this Jul 9, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment