Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add list-schemas for the embedded repo (close #212) #214

Merged
merged 6 commits into from
Nov 21, 2022

Conversation

voropaevp
Copy link
Contributor

No description provided.

@snowplowcla
Copy link

Thanks for your pull request. Is this your first contribution to a Snowplow open source project? Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://docs.snowplowanalytics.com/docs/contributing/contributor-license-agreement/ to learn more and sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.

@voropaevp voropaevp changed the base branch from master to develop November 12, 2022 20:00
Copy link
Contributor

@istreeter istreeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. If it's helpful for making tests pass in other places, then we should merge it because it's helpful.

I spent a while on this review, because I was interested in the limits of where it works and where it doesn't. Until now, the embedded repo implementation has allowed two slightly different types of schema location:

  1. A standalone file on the filesystem, which is a sub-directory of the class path.
  2. A schema packaged into a fat jar file, which is on the class path.

Both of the above work for a single schema lookup, because getClass.getResource(path) works for both. But your implementation of listing schemas only works on type 1, not on type 2.

I wondered how difficult it would be to get it working with schemas in fat jar files. This gist I found shows roughly how to handle the two different cases.

Anyway, I'm not saying to change anything. I just thought it was interesting to consider both cases.

case _ => F.pure(RegistryError.NotFound.asLeft)
case Registry.Embedded(_, base) =>
val path = toSubpath(base, vendor, name)
Utils.unsafeEmbeddedList(path, model).pure[F]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably strictly speaking should be suspended instead of pure:

Sync[F].delay(Utils.unsafeEmbeddedList(path, model))

def unsafeEmbeddedList(path: String, modelMatch: Int): Either[RegistryError, SchemaList] =
try {
val d = new File(getClass.getResource(path).getPath)
val schemaFileRegex: Regex = (".*?" + // path to file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make this line just a little bit more strict?

val schemaFileRegex: Regex = (".*?/schemas/" + // path to file

@@ -158,4 +196,5 @@ private[registries] object Utils {
private[resolver] def repoFailure(failure: Throwable): RegistryError =
RegistryError.RepoFailure(failure.getMessage)

implicit val orderingSchemaKey: Ordering[SchemaKey] = SchemaKey.ordering
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You haved defined this ordering but you are not using it in your implementation.

I proved this by making it private, and then the compiler told me: private val orderingSchemaKey in object Utils is never used

d.listFiles
.filter(_.isFile)
.toList
.filter(_.getName.startsWith(modelMatch.toString))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not filter properly if modelMatch is 1 but the name of the file is 100-0-0.

I would remove this filter line, and instead put a check after the regex matcher a couple of lines below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! It makes it a bit awkward to put these because there isn't a neutral SchemaKey element. It would have to be inside anther filter. 100- could be avoided with s"${modelMatch.toString}-}, see my new commit.

@voropaevp
Copy link
Contributor Author

getResource does not work with jar files, it has to be getResourceAsStream. It also has an issue getting a subfolder, the argument got to be a file. Which makes implementation a lot more awkward than the gist in reference.

I think it is too much effort to make it work for a very uncommon use case (I have not seen it being used ever).

@voropaevp
Copy link
Contributor Author

ready for merge

Copy link
Contributor

@istreeter istreeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

content
.traverse {
case schemaFileRegex(vendor, name, format, model, revision, addition)
if model == modelMatch.toString =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a big fan of cats triple equals === for some extra type safety.

@voropaevp voropaevp merged commit 3ea1755 into develop Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants