Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Sets in FromMappable and ToMappable #40

Closed
lacarvalho91 opened this issue Aug 20, 2019 · 17 comments
Closed

Support Sets in FromMappable and ToMappable #40

lacarvalho91 opened this issue Aug 20, 2019 · 17 comments

Comments

@lacarvalho91
Copy link

lacarvalho91 commented Aug 20, 2019

This can be achieved the same way as done in #11 which would add support for maps too, however I have found that to be able to use a Set I also needed to define a CanBuild[T, Set[T]] so I guess the same would be true for Map

@nevillelyh
Copy link
Owner

@lacarvalho91
Copy link
Author

Yeah, for MapRecord but not FromMappable or ToMappable. This surfaced from using the avro module, when I changed a field type from List to Set I saw that I needed to define FromMappable and ToMappable instances for Iterable (as well as the CanBuild instance for Set)

@nevillelyh
Copy link
Owner

Ah you're right. Mind give it a try and submit a PR?

@lacarvalho91
Copy link
Author

Yeah I plan on doing so 🙂have already defined them in my project and it seems to work. Will raise a PR later today.

@lacarvalho91
Copy link
Author

@nevillelyh how are FromMappable and ToMappable tested? I'm having trouble working that out

@nevillelyh
Copy link
Owner

It's tested in concrete implementations like https://github.com/nevillelyh/shapeless-datatype/blob/master/avro/src/test/scala/shapeless/datatype/avro/AvroTypeSpec.scala

Map type only makes sense for Avro though since it supports that natively. For BigQuery TableRow and Datastore Entity it's just a nested row.

@lacarvalho91
Copy link
Author

@nevillelyh OK cool, thanks!

@lacarvalho91
Copy link
Author

lacarvalho91 commented Aug 22, 2019

@nevillelyh as soon as I change all the Seq instances to Iterable (mainly toSeq: S[V] => Seq[V] becomes toIterable: S[V] => Iterable[V]) then I get ambiguous implicit errors because for some reason that change makes it so that those instances match the expected type for Option. Which kind of makes sense since its just S[_], I'm confused as to how that change makes this surface though (since it is fine with Seq). If I then add a subtype constraint S[_] <: Iterable[_] it can't derive an instance for Array (this I don't understand either since in Predef there is an implicit conversion that turns an Array into a subtype of Seq) 😭 any suggestions?

@lacarvalho91
Copy link
Author

Ah....that would be why

@nevillelyh
Copy link
Owner

Looked at the PR and this a bit. So basically you want Iterable to cover Set cases but that causes confusion with Option? I wonder if there's way to disable the Option => Iterable implicit here, otherwise we'll have to introduce our own type class and implement for all collection types except Option.

scala> implicitly[Option[Int] => Seq[Int]]
<console>:12: error: No implicit view available from Option[Int] => Seq[Int].
       implicitly[Option[Int] => Seq[Int]]
                 ^

scala> implicitly[Option[Int] => Iterable[Int]]
res8: Option[Int] => Iterable[Int] = $$Lambda$1160/1816269091@38e00b47

@nevillelyh
Copy link
Owner

@lacarvalho91 also still curious what's the use case for Sets. None of the use cases in this repo, e.g. Avro, BigQuery, Datastore, TF.Example supports Set data type. Do you intent to use it on something else?

@lacarvalho91
Copy link
Author

lacarvalho91 commented Sep 4, 2019

@nevillelyh we're decoding Avro with an array field but its encoded as a Set in our case class as we don't want duplicates, we'd rather make duplicates impossible with types other than having to distinct or something and would rather we could just decode it as a set instead of explicitly transforming after decoding the avro

@nevillelyh
Copy link
Owner

nevillelyh commented Sep 4, 2019

In that case I'm not sure this is the best place for that logic. By allowing Set <=> List conversion we're changing the semantics of mapping and records are not guaranteed to round trip, and potentially dangerous, e.g. if the source of truth is Avro array and you mistakenly encoded the field as Set, and loose records as a result.

Can you use a plug-in implicit conversion instead? See AvroType.at[T] in this example:
https://github.com/nevillelyh/shapeless-datatype#avrotype

Also just FYI, the AvroType impl is not the most efficient due to the fact that Avro requires Schema at GenericRecord construction time. It might add significant overhead if you're using this in a high volume application like data pipelines.

@lacarvalho91
Copy link
Author

OK I can look at defining our own avro type to do it. Your comment about efficiency seems to be about Avro in general, unless I'm misreading that?

@nevillelyh
Copy link
Owner

Not it's specifically about this library or shapeless design in general mismatch with Avro.
Other types supported, e.g. BigQuery, JSON, Datastore entity, TF Example are essentially schema-less, and I can dynamically construct records knowing just the Scala type. Not the case for Avro, which requires Schema to be present when constructing GenericRecord, that's why we have AvroRecord & AvroBuilder intermediate types to workaround this issue.

https://github.com/nevillelyh/shapeless-datatype/blob/master/avro/src/main/scala/shapeless/datatype/avro/package.scala#L8
https://github.com/nevillelyh/shapeless-datatype/blob/master/avro/src/main/scala/shapeless/datatype/avro/AvroMappableType.scala#L84

@nevillelyh
Copy link
Owner

@lacarvalho91 Just checking if you've tried my suggestion?

@lacarvalho91
Copy link
Author

Hey @nevillelyh sorry about the delay. Yeah we just defined our own avro type. So I guess this can be closed. Thanks for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants