New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work on improved Tuple support in lift-json #1768

Merged
merged 22 commits into from Apr 21, 2017

Conversation

Projects
None yet
3 participants
@farmdawgnation
Member

farmdawgnation commented Feb 7, 2016

Introduction

Tuples are a wily beast in Scala land, but one that has its point and its place. Given that Tuples are a very good construct to understand JavaScript's heterogenous arrays, it makes sense that lift-json would interpret them in this fashion when serializing and deserializing JSON.

Ticket: #1757.

Summary

Before this pull request, tuples will be serialized to this format:

{"_1": "value1", "_2": "value2"}

After this pull request, tuples will be serialized to this format:

["value1", "value2"]

However, we've also added support for understanding truly heterogeneous collections from the client. So, if you wanted to transmit data that looked something like this:

[["vector1", 2], ["vector2", 2]]

You should now be able to ask lift-json to interpret this as a List[(String, Int)]. Likewise, you could decompose a List[(String, Int)] to that format. This is a fairly common format in JavaScript land, so this change will let developers using lift-json interact with libraries and APIs expecting these formats more fluidly.

State of this PR

  • All tests are currently passing.
  • Support for scala primitives, like Int, seems to be a bit more spotty than for their Java object counterparts, like Integer.
  • The changes in this PR are still able to understand the old serialized format of tuples, so this shouldn't be a breaking change for deserialization. It will, however, change the wire format for anyone who is using tuples. Perhaps we should consider using a flag to turn on / off the new format?

@farmdawgnation farmdawgnation modified the milestone: 3.1 Feb 21, 2016

@thomasdziedzic

This comment has been minimized.

thomasdziedzic commented Aug 25, 2016

Just encountered an issue related to deserializing tuples in lift, are there any updates with regards to this PR since the last push/comment on the related thread?

@farmdawgnation

This comment has been minimized.

Member

farmdawgnation commented Aug 28, 2016

Hey there,

No I've largely been waiting for 3.0 to get out the door before pushing too hard on this since this is significant enough to have to wait on 3.1.

As I recall this is close - the biggest gap I remember is lack of support for scala primitives.

@farmdawgnation

This comment has been minimized.

Member

farmdawgnation commented Jan 24, 2017

Oh hey it's been almost a year since I started this. Should probably circle back to it at some point. :P

farmdawgnation added some commits Jan 24, 2017

Correctly map tuples to the HCol concept.
The mapping engine here determines what we actually try to instantiate
when we see incoming JSON and we're trying to create a concrete series
of objects through extraction.

Here, we indicate to the rest of the system that if we're looking for a
tuple in the instantiated type we'll expect the JSON representation to
be an array of some sort.
Retain the ability to interpret old tuples.
Before my changes, tuples were serialized as follows:

  {"_1": "value", "_2": "value2"}

After my changes, tuples are serialized like this:

  ["value", "value2"]

But we still will stumble on a lot of tuples that have been written out
to some permanent storage in the old format.

Now, if we expect there to be a tuple in a particular object but we
found a JObject representation we will convert it to a JArray under the
hood and recursively re-invoke the newTuple function.

This will only apply if all the keys in the object start with an
underscore, which should both be sufficiently cheap and sufficiently
idicitive of a tuple since we're expecting a tuple in the field we're
trying to inflate anyway.

@farmdawgnation farmdawgnation changed the title from [WIP] Work on improved Tuple support in lift-json to Work on improved Tuple support in lift-json Jan 29, 2017

@farmdawgnation

This comment has been minimized.

Member

farmdawgnation commented Jan 29, 2017

Alright, I think this is officially ready for some consideration. @Shadowfiend tag you're it!

@farmdawgnation farmdawgnation requested a review from Shadowfiend Jan 29, 2017

@farmdawgnation farmdawgnation added the JSON label Jan 29, 2017

@farmdawgnation

This comment has been minimized.

Member

farmdawgnation commented Feb 2, 2017

Codacy is complaining about things that I'm not super motivated to fix. I want the "X" to go away but it's not going away. 😢

@Shadowfiend

This comment has been minimized.

Member

Shadowfiend commented Feb 2, 2017

I can try to clear those in the mornin' if you'd like.

@farmdawgnation

This comment has been minimized.

Member

farmdawgnation commented Feb 2, 2017

By all means [ shrug ]

@Shadowfiend

Shadowfiend requested changes Mar 9, 2017 edited

Some mostly cosmetic changes, one performance change, one potential breakage, and in addition it seems like we need some specs proving this works as expected, since lift-json is pretty well-covered test-wise.

This looks awesome, btw.

@@ -319,14 +331,54 @@ object Extraction {
}
}
def newTuple(root: JValue, mappings: List[Mapping]): Any = {

This comment has been minimized.

@Shadowfiend

Shadowfiend Mar 9, 2017

Member

Should this be private[this]?

This comment has been minimized.

@farmdawgnation

farmdawgnation Mar 18, 2017

Member

Potentially? I can't think of a reason this would be invoked elsewhere. :)

This comment has been minimized.

@farmdawgnation

farmdawgnation Mar 18, 2017

Member

Ah, sorry, no, the pattern here is that we nest the newX functions inside of the larger extraction handler. So, visibility modifiers aren't allowed here because we're not in the class scope (and nothing outside this function can access it anyway).

At some point I'd like to unwind that behavior, but now is not the right time to do that.

case JArray(items) if items.length >= 1 && items.length <= 22 =>
val builtItems: Seq[Object] = items.zip(mappings).map({
case (i,m) =>
build(i, m).asInstanceOf[Object]

This comment has been minimized.

@Shadowfiend

Shadowfiend Mar 9, 2017

Member

Probably worth giving i and m longer names.

This comment has been minimized.

@farmdawgnation

farmdawgnation Mar 18, 2017

Member

Indeed. I probably copied this from some other spot.

case (i,m) =>
build(i, m).asInstanceOf[Object]
})
val tupleClass = Class.forName("scala.Tuple" + items.length)

This comment has been minimized.

@Shadowfiend

Shadowfiend Mar 9, 2017

Member

We should build this ahead of time in a top-level final val; Class.forName is really expensive. We might also use that to remove the >=1 && <= 22 magic numbers above.

This could go hand-in-hand with our friendly neighborhood tuples set, since it's already got the right Class instance anyway.

This comment has been minimized.

@farmdawgnation

farmdawgnation Mar 18, 2017

Member

Good call. Let me look into that.

typedTupleConstructor.newInstance(builtItems: _*)
case JObject(items) if items.forall(_.name.startsWith("_")) =>
newTuple(JArray(items.map(_.value)), mappings)

This comment has been minimized.

@Shadowfiend

Shadowfiend Mar 9, 2017

Member

We probably need to sort these. For example, if we get a {"_3": 5, "_1": "hello", "_2": "uh-oh"}, that should deserialize to Tuple3[String,String,Int]("hello", "uh-oh", 5), but I think with this code we'd get Tuple3[Int,String,String](5, "hello", "uh-oh").

This comment has been minimized.

@farmdawgnation

farmdawgnation Mar 18, 2017

Member

I have a back-compat test somewhere, I think, that suggests otherwise. If so, I'll point to it here. If not, I'll take a closer look at it.

newTuple(JArray(items.map(_.value)), mappings)
case JArray(items) =>
throw new IllegalArgumentException("Cannot create a tuple of length " + items.length)

This comment has been minimized.

@Shadowfiend

Shadowfiend Mar 9, 2017

Member

It would be nice to move this condition above the JObject one so the two array possibilities are together.

}
def mkHeteroContainer(baseType: Type): Mapping = {
val hereroContainerTypes = baseType match {

This comment has been minimized.

@Shadowfiend

Shadowfiend Mar 9, 2017

Member

Should be hetero with a t.

p.getRawType.asInstanceOf[Class[_]]
case x =>
fail("do not know how to get type parameter from " + x)
}

This comment has been minimized.

@Shadowfiend

Shadowfiend Mar 9, 2017

Member

Do we just want a MatchError if baseType isn't a ParameterizedType, or do we want to fail with our own error message?

This comment has been minimized.

@farmdawgnation

farmdawgnation Apr 15, 2017

Member

Yeah we can add a fail condition of our own here.

This comment has been minimized.

@farmdawgnation

farmdawgnation Apr 15, 2017

Member

Oh wait we already do that. I think that's the right call. :)

deserialized must_== holder
}
"deserialize list of heterogenous tuples" in {

This comment has been minimized.

@Shadowfiend

Shadowfiend Mar 9, 2017

Member

Either deserialize a list or deserialize lists?

implicit val formats = DefaultFormats
// MSF: This currently doesn't work with scala primitives?! The type arguments appear as
// java.lang.Object instead of scala.Int. :/

This comment has been minimized.

@Shadowfiend

Shadowfiend Mar 9, 2017

Member

Is this still true? And if it is still true, is that something that was already missing in the standard object serialization of tuples, or not so much?

This comment has been minimized.

@farmdawgnation

farmdawgnation Mar 18, 2017

Member

Is this still true?

I'm not sure. My memory seems to think "no" is the answer there, but that could be a 2.12 development which would still make it true for 2.11.

And if it is still true, is that something that was already missing in the standard object serialization of tuples, or not so much?

If it is still true, it would be reflective of an issue that has always been latent in scalap, I think. Whether or not we triggered that error condition in the old implementation: I don't know.

Let me do some more research on the above.

@farmdawgnation

This comment has been minimized.

Member

farmdawgnation commented Mar 18, 2017

Okay, following up on the issue with Scala primitives...

The underlying problem here is that (apparently) to Java reflection the Scala primitives just looks like an instance of java.lang.Object. We special case this in a lot of different areas of the extraction code, and I have some ideas of some kinds of special casing here, too, but we've probably long ago crossed that threshold where this code was maintainable. :-(

I think we might want to consider a fuller rewrite of lift-json's parser at this point, unfortunately. This is something we should be able to support, but I'm fighting a lot of opaque code and special cases to try and make it happen without very much quick progress. These are signs that the area of code I'm working in might be due for a wholesale teardown and rebuild.

The question here, then, is what do we want to do with this PR?

@Shadowfiend

Some tiny notes.

I think this PR is still 100% worth it, personally. A new extractor with full feature parity to this one is likely months away at best, and I think this is a strong feature. We may, however, want to make the serialization behavior (old-style vs array-style) togglable somehow, otherwise we lose backward-compatibility.

The other option is saying "screw it, we won't support tuple arrays in this version of the extractor ever, and if you want that look to the new one if/whenever it gets off the ground”.

val tupleClass = Class.forName("scala.Tuple" + items.length)
val tupleIndex = items.length - 1
val tupleClass = tuples.lift(tupleIndex).getOrElse {
throw new IllegalArgumentException(s"Cannot instantiate a tuple of length ${items.length} even though that should be a valid tuple length.")

This comment has been minimized.

@Shadowfiend

Shadowfiend Mar 20, 2017

Member

Still think it may be worth turning tuples into a Map (only doing the .getConstructors()(0) bit once during the singleton intialization).

The error here also doesn't seem quite right (at this point we actually couldn't load the tuple class, so while technically we can't instantiate it there's a deeper reason). Additionally, if we hadn't been able to get this class, we would have blown up while initializing the whole singleton object. I'm wondering if we just do a .get here and point to the initialization of tuples to show that if we've gotten to this point in the code, we know that entry exists.

This comment has been minimized.

@farmdawgnation

farmdawgnation Mar 21, 2017

Member

I'm wondering if we just do a .get here and point to the initialization of tuples to show that if we've gotten to this point in the code, we know that entry exists.

No, I think having a getOrElse with an exception here is a good way of refactor-proofing the code. Having worked in a project that made way-too-ample use of Option.get last year I think I've landed on always justifying a retrieval that has no alternative much like Box's openOrThrowException requires us to do. The alternative is making a bet that this code can't be fouled by an accidental fat finger and then surfacing an opaque error message to a user of the Framework. Certainly, surfacing this error at all would be bad, but making it opaque is just insult to injury. :)

Still think it may be worth turning tuples into a Map (only doing the .getConstructors()(0) bit once during the singleton intialization).

I'm not quite following the entirety of this. What's the relevance of the getConstructors bit? Do we not already do that once?

This comment has been minimized.

@Shadowfiend

Shadowfiend Apr 3, 2017

Member

Sorry, basically doing something like this:

val tupleClasses =
  Seq(classOf[Tuple1[_]]...)
val tupleConstructors =
  tupleClasses.map(clazz => (clazz -> clazz.getConstructors()(0))).toMap

And then looking the constructor up in the map every time.

case JArray(items) =>
throw new IllegalArgumentException("Cannot create a tuple of length " + items.length)
case JObject(items) if items.forall(_.name.startsWith("_")) =>
newTuple(JArray(items.map(_.value)), mappings)

This comment has been minimized.

@Shadowfiend

Shadowfiend Mar 20, 2017

Member

Still wondering about this. Even if we have a backward-compat deal that shows that we used to serialize _1 and friends in order, the old way of dealing with tuples (which is constructor-based) would have dealt with a user furnishing an out-of-order set of _*s (not necessarily serialized from here). If we have a test that shows that you can pass {"_3": "5", "_1": "1", "_2": "8"} (e.g.) to a (String, String, String) and it deserializes the same way with this extraction strategy, I'm game. But I don't see how that would happen, tbh.

This comment has been minimized.

@farmdawgnation

farmdawgnation Mar 21, 2017

Member

I think I'm better understanding what these comments about array ordering were getting at and I think you're right. If so, then fixing this should be as simple massaging the key to a numeral and doing a sort before reducing it to a JArray.

@farmdawgnation

This comment has been minimized.

Member

farmdawgnation commented Mar 21, 2017

We may, however, want to make the serialization behavior (old-style vs array-style) togglable somehow, otherwise we lose backward-compatibility.

That's reasonable. I don't think we have a pattern for this kind of toggle in lift-json already, however. Did you have any concrete ideas about how that might look?

The other option is saying "screw it, we won't support tuple arrays in this version of the extractor ever, and if you want that look to the new one if/whenever it gets off the ground”.

So I'm still struggling with this. There's some variation of the phrase like "the lack of a feature is better than the presence of a feature implemented poorly." That said, for all cases except Scala primitives nested inside a type argument my understanding is that this would work. And that's quite a lot of cases.

@Shadowfiend

This comment has been minimized.

Member

Shadowfiend commented Apr 3, 2017

Did you have any concrete ideas about how that might look?

Not per se, but it could just be a boolean flag to extract for now, perhaps.

@farmdawgnation

This comment has been minimized.

Member

farmdawgnation commented Apr 15, 2017

Alright, I'm going to figure out something for turning this off/on then and we'll ship it as is for now. Which is to say, with caveats.

farmdawgnation added some commits Apr 15, 2017

Add experimentalTupleSupport flag to formats
When this is disabled, the new tuple handling code is also disabled and
we should generate tuples as serialized objects with keys like _1, _2,
_3, etc.

When this is enabled, the new tuples handling is active and tuples
will be rendered as JSON arrays, providing support for heterogenous
collections in JSON, so long as you're willing to avoid using any Scala
primitives in your tuples.
Take the constructor reflection hit for tuples at construction
This avoids an extraction time hit on reflection and takes that hit when
the extractor is first inited.
@farmdawgnation

This comment has been minimized.

Member

farmdawgnation commented Apr 15, 2017

Okay I think this PR is ready for further review and (hopefully) a merge.

I opted to store the flag for the experimental tuple support in the Formats. That object is well suited both semantically and in terms of availability in all the places where I need to make decisions on how to handle tuples. I've added a guard everywhere that I (think) I was introducing new behavior but please please please double check me so we don't accidentally introduce a regression.

For now, we'll accept the tradeoff of not supporting Scala primitives in favor of having an exit hatch for folks who want to use heterogenous collections. I'll prepare a blog post about this for my personal blog so there's something out there that I can point folks to when they inevitably ask questions about our support for this new style of Tuple.

If it looks good to you give it the sign-off and I'll merge once the blog post is ready. :)

@Shadowfiend

Left one more suggestion (regarding the naming of the flag). Other than that, this looks great; I did leave a couple of more questions.

Just to clarify, you now have two options:

  • Serialize tuples as JSON objects (old behavior). Works with primitives.
  • Serialize tuples as arrays (new behavior). Doesn't work with primitives.

Additionally:

  • Deserialization of tuples as objects is supported regardless of flags. This supports primitives.
  • Deserialization of tuples as arrays is supported regardless of flags. This does not support primitives.

Is that understanding correct?

Lastly, blog posts as documentation for Lift seems a bit ~. Can we put at least a blurb about this in the lift-json README? Definitely don't want to stop you from writing a blog post---just want to make sure the basics are easily accessible from Lift sources as well.

* Support for the experimental tuple decomposition/extraction that represents tuples as JSON
* arrays. This provides better support for heterogenous arrays in JSON, but enable it at your
* own risk as it does change the behavior of serialization/deserialization and comes
* with some caveats (such as Scala primitives not being recognized reliably during extraction).

This comment has been minimized.

@Shadowfiend

Shadowfiend Apr 16, 2017

Member

Also, should we really label this “experimental”? We're not planning on iterating on it really, it's just a different serialization approach? Perhaps serializeTuplesAsArrays?

This comment has been minimized.

@farmdawgnation

farmdawgnation Apr 18, 2017

Member

I think there should be a yellow flag of sorts that alerts someone who sees nothing except this flag being used that they should tread carefully. If you'd like to create an alias value named "serializeTuplesAsArrays" that reads from this and reference that directly in the extractor, that seems sensible to me. But at least until we're getting some positive reports from the field on this feature I'd like to have a warning inherent in the name. And calling it experimental fits the bill, imo.

This comment has been minimized.

@Shadowfiend

Shadowfiend Apr 18, 2017

Member

From my perspective, the implementation is tested. It may have bugs, which we will fix because we support the feature. But we don't expect the API to change, and we do intend on supporting it, at least until this whole extraction paradigm is ripped out in some potential future. That doesn't merit calling it experimental IMO.

Additionally, I'm opposed to having this as a temporary name, because then we create headaches when we want to make it the “correct” name.

I think the new work you're doing on a TypeTags-based extractor merits being called experimental, because the support horizon on it differs from that of other features (by virtue of being in a separate codebase hehe). But if we're introducing this I assume we intend to support it, and if we intend to support it I think it's not experimental.

If you agree, cool. If not, maybe let's see if someone else wants to chime in and break the tie. If we're nowhere within a day or two or when you circle back (whichever is longest), let's disagree and commit to do it your way.

This comment has been minimized.

@farmdawgnation

farmdawgnation Apr 19, 2017

Member

I think the most compelling argument here are the headaches for converting to a correct name. I still think there are some dangers to turning this on because JSON is simple, but still manages to be bananas at finding bugs in code. That said, it's turned off by default and perhaps we can provide enough context as to the meaning and impact of this change in the documentation to be sufficient.

I'll change it to be named serializeTuplesAsArrays.

deserialized must_== holder
}
"deserialize an out of order old-style tuple w/ experimental tuples enabled" in {

This comment has been minimized.

@Shadowfiend

Shadowfiend Apr 16, 2017

Member

We have existing tests for these working with experimental tuples disabled as well, right? I assume that was the old default, just want to make sure we didn't delete the tests in question at some point.

This comment has been minimized.

@farmdawgnation

farmdawgnation Apr 18, 2017

Member

The old behavior probably isn't covered explicitly, but would be covered in Lift-Json's handling of constructor arguments in ExtractionExamplesSpec.

@@ -122,4 +122,50 @@ object ExtractionBugs extends Specification {
val deserialized = parse(serialized).extract[Holder2]
deserialized must_== holder
}
"deserialize list of homogonous tuples w/ experimental tuples enabled" in {

This comment has been minimized.

@Shadowfiend

Shadowfiend Apr 16, 2017

Member

Given the enabled/disabled tests are identical here, I wouldn't be opposed to collapsing them into a single test---but don't care too much.

This comment has been minimized.

@farmdawgnation

farmdawgnation Apr 18, 2017

Member

I prefer the way this comes out in the test report.

@farmdawgnation

This comment has been minimized.

Member

farmdawgnation commented Apr 18, 2017

Just to clarify, you now have two options:

  • Serialize tuples as JSON objects (old behavior). Works with primitives.
  • Serialize tuples as arrays (new behavior). Doesn't work with primitives.

Correct.

Additionally:

  • Deserialization of tuples as objects is supported regardless of flags. This supports primitives.

Correct.

  • Deserialization of tuples as arrays is supported regardless of flags. This does not support primitives.

Incorrect. To deserialize an array into a tuple the flag must be enabled in your formats object. Unfortunately, the code paths diverge here in the "planning" stage of extraction, before we have the actual data available. So, I either create a planning instruction that handles a Heterogenous Collection (HCol) or I create one that asks for a constructor.

If I were to try to handle this I would have to break our abstraction a bit by specifically checking constructor's class names and munging based on that. Since that risks introducing edge cases where we break the existing, stable serialization, I decided not to go that route.

Lastly, blog posts as documentation for Lift seems a bit ~. Can we put at least a blurb about this in the lift-json README? Definitely don't want to stop you from writing a blog post---just want to make sure the basics are easily accessible from Lift sources as well.

Yeah, you're right. We need to document the current state of the world in the README and API docs. The purpose of the blog post is to answer the question "How is the new thing different from the old thing?" which is a question I prefer to answer in a medium that has a datestamp firmly attached to it.

I'll look to make code changes after your responses to all of my comments today.

@Shadowfiend

Ok, I'm good here! Will give everyone else a couple more days to look and then merge 'er in.

@Shadowfiend

This comment has been minimized.

Member

Shadowfiend commented Apr 21, 2017

Given no more updates, going to merge this sucker right onnnnn innnn.

@Shadowfiend Shadowfiend merged commit ac4d67e into master Apr 21, 2017

2 checks passed

codacy/pr Good work! A positive pull request.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@Shadowfiend Shadowfiend deleted the msf_issue_1757 branch Apr 21, 2017

@Shadowfiend Shadowfiend modified the milestones: 3.1.0-M3, 3.1.0-RC1 May 20, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment