Added some stability changes for empty rows, invalid json, and different types #1

wants to merge 11 commits into

3 participants



I'm using your JSON serde, but the json we're processing is pretty flakey and isn't always necessarily valid.

Basically my changes give the following new behaviors:

  • rows with invalid json no-longer cause it to throw an exception (nulls for everything instead)
  • you can set a field as a 'String' even when it is not, and it will not throw an exception
  • if a field does not exist in the json it will return null instead of throwing an exception

I probably did things in a screwy way, but let me know and I can make any changes necessary

matthew added some commits Aug 5, 2011
matthew Lots of stability changes around casting, null rows, and invalid json
- on invalid json input it does not crash
- String as a fallback -- give a column type of "string" and it will always succeed regardless of what the object actually is (it calls toString() if it can't cast the object)
- Given a key that does not exist in the JSON it will not crash, it will just return Null
matthew should not check-in jars 4a54951


I like your solution to the malformed json line, that's def the better way to do it. I wasn't sure how to do that. Although my hack at least shows that a line was present (if invalid) which might be useful for debugging?

On the string component, I understand your hesitations, but here is my argument (and why we need it ourselves):
JSON is schema-less, so one day a field might be a string, another day it might be an int. Without being able to call it a string there's no way you can query this data.

Simple example {"number_calls": "none"} or {"number_calls":23}
It's stupid but it can happen, and it gives you an easy way to brute force the serde into working.

This is how the other build-in serde's work in practice, but because json is typed we get errors at a different place (after deserialization of the type).


Oh, and the cast won't work, because it's trying to cast Int to String at a much lower level in the deserialization stack, which is why I had to add this slightly hacky solution.


Yeah, my solution is functional but not very graceful.

Stupid question, could we not just call o.toString()? A string would just return the original string no?

@rathboma rathboma closed this Aug 5, 2011
@rathboma rathboma reopened this Aug 5, 2011

Hehe, can't believe I didn't think of that before. Thanks.

Let me know what you decide to do about malformed rows and we'll start using your version of the jar instead of my clone

matthew and others added some commits Aug 10, 2011
matthew underscores replace dashes, and toString by default 12f17f6
matthew when deserializing, top-level keys will not be deserialized if they a…
…re not of the right type.
matthew added another test, just in case 6b7619e
matthew JSON serializer bug
- it was looking for base-16 numbers in a string whenever it saw \u
- this was raising exceptions from the parseInt function
- now it tries to do that, then falls back to just adding the characters.
matthew Added a test for the stupid deserialization of base-16 numbers that a…
…re invalid.

- it doesn't error, hooray!
- it has a test, hooray!
matthew SerDe now allows remapping of keynames!
- you can tell the serde that you want 'field1' to be matched by 'newname'
- tests in place, it works
@tdyas tdyas handle json nulls in map inspector 09679e0
@rathboma rathboma Merge pull request #1 from tdyas/master
handle json nulls in map inspector
matthew version bump 312475a
@rcongiu rcongiu closed this Jul 9, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment