-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schema override #233
Schema override #233
Conversation
modules/core/src/main/scala/com.snowplowanalytics.iglu/client/IgluCirceClient.scala
Outdated
Show resolved
Hide resolved
/** Error happened during schema resolution step */ | ||
final case class ResolutionError(value: SortedMap[String, LookupHistory]) extends ClientError { | ||
def isNotFound: Boolean = | ||
value.values.flatMap(_.errors).forall(_ == RegistryError.NotFound) | ||
} | ||
|
||
/** Error happened during schema/instance validation step */ | ||
final case class ValidationError(error: ValidatorError) extends ClientError | ||
final case class ValidationError(error: ValidatorError, supersededBy: Option[String]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be a problem here.... I think this case class is used to build some of the bad row types. And they are described by a schema e.g. this one. So we might break bad rows if we add an extra field here.
All of that needs checking on Monday, I might be wrong.
Is there any way we can get get the relevant information in there without changing the bad rows schema? For example, stuff the supersededBy details into ValidatorError?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've missed that, good spot! It might not be pretty but I guess we can add a message for superseding info to messages
list of ValidatorError.InvalidData
. I will push the related commit shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this afresh on Monday.... maybe I was too quick to dismiss this idea to add a field to ValidationError
. Sure, it means we need to release a new version of the bad rows schema, but that's probably OK isn't it? It's a non-breaking change to the schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I guess that would be okay. In that case, we would have to update the schema_violations schema as well. Also, we need to release new version of snowplow-badrows library with updated schema versions. It look like snowplow-badrows' latest version is on the Cats Effect 3 at the moment. We need to branch out new version from older version that doesn't contains Cats Effect 3.
I also pushed the commit that attaches the superseding info to ValidatorError
. I know it doesn't look good because we are using the fields in message
outside of their original purposes but superseding info is attached at least with this way. If you doesn't like this way at all, I can start to look into updating bad rows schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a bit more thinking, let's do this proper way with updating bad rows schemas instead of taking short cut. Otherwise, it would be very hard to extract superseding info from bad rows. I will make necessary changes on the bad rows.
7f27195
to
981a33f
Compare
* @return a [[Resolver.ResolverResult]] boxing the schema Json on success, or a ResolutionError on failure | ||
*/ | ||
def lookupSchemaResult( | ||
schemaKey: SchemaKey | ||
schemaKey: SchemaKey, | ||
resolveSupersedingSchema: Boolean = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will use superseding schema feature only on Enrich. We won't use it on loaders. Since lookupSchemaResult
function is used in loaders as well, it should be possible to ignore the superseding schema version and fetch the schema itself only. Therefore, we added additional argument to disable resolving superseding schemas.
c4d56a1
to
03b2ede
Compare
03b2ede
to
d94f20b
Compare
I have checked that this PR exactly matches the equivalent commit that got merged in the 1.x branch. So I will merge this into develop. |
Same changes as in this PR but branched out from
master
instead series/1.x.