Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema override #233

Merged
merged 1 commit into from
May 25, 2023
Merged

Schema override #233

merged 1 commit into from
May 25, 2023

Conversation

spenes
Copy link
Contributor

@spenes spenes commented Mar 21, 2023

Same changes as in this PR but branched out from master instead series/1.x.

@spenes spenes requested a review from istreeter March 31, 2023 14:24
/** Error happened during schema resolution step */
final case class ResolutionError(value: SortedMap[String, LookupHistory]) extends ClientError {
def isNotFound: Boolean =
value.values.flatMap(_.errors).forall(_ == RegistryError.NotFound)
}

/** Error happened during schema/instance validation step */
final case class ValidationError(error: ValidatorError) extends ClientError
final case class ValidationError(error: ValidatorError, supersededBy: Option[String])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a problem here.... I think this case class is used to build some of the bad row types. And they are described by a schema e.g. this one. So we might break bad rows if we add an extra field here.

All of that needs checking on Monday, I might be wrong.

Is there any way we can get get the relevant information in there without changing the bad rows schema? For example, stuff the supersededBy details into ValidatorError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've missed that, good spot! It might not be pretty but I guess we can add a message for superseding info to messages list of ValidatorError.InvalidData. I will push the related commit shortly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this afresh on Monday.... maybe I was too quick to dismiss this idea to add a field to ValidationError. Sure, it means we need to release a new version of the bad rows schema, but that's probably OK isn't it? It's a non-breaking change to the schema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I guess that would be okay. In that case, we would have to update the schema_violations schema as well. Also, we need to release new version of snowplow-badrows library with updated schema versions. It look like snowplow-badrows' latest version is on the Cats Effect 3 at the moment. We need to branch out new version from older version that doesn't contains Cats Effect 3.

I also pushed the commit that attaches the superseding info to ValidatorError. I know it doesn't look good because we are using the fields in message outside of their original purposes but superseding info is attached at least with this way. If you doesn't like this way at all, I can start to look into updating bad rows schema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a bit more thinking, let's do this proper way with updating bad rows schemas instead of taking short cut. Otherwise, it would be very hard to extract superseding info from bad rows. I will make necessary changes on the bad rows.

@spenes spenes force-pushed the schema-override-master branch 2 times, most recently from 7f27195 to 981a33f Compare April 3, 2023 10:41
* @return a [[Resolver.ResolverResult]] boxing the schema Json on success, or a ResolutionError on failure
*/
def lookupSchemaResult(
schemaKey: SchemaKey
schemaKey: SchemaKey,
resolveSupersedingSchema: Boolean = false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will use superseding schema feature only on Enrich. We won't use it on loaders. Since lookupSchemaResult function is used in loaders as well, it should be possible to ignore the superseding schema version and fetch the schema itself only. Therefore, we added additional argument to disable resolving superseding schemas.

@istreeter istreeter changed the base branch from feature/resolver-mutex to develop May 25, 2023 11:03
@istreeter
Copy link
Contributor

I have checked that this PR exactly matches the equivalent commit that got merged in the 1.x branch. So I will merge this into develop.

@istreeter istreeter merged commit d94f20b into develop May 25, 2023
@istreeter istreeter deleted the schema-override-master branch May 25, 2023 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants