-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds a validate cli command #151
Conversation
This is a very simple cli command that runs validate (for Items) or validate_all (for Collections and Catalogs). There is also an `--only` flag to only run validate for container objects. The output is a simple success message, or the exception message and source. This should hopefully help folks track down validation errors in existing STAC objects that can be hard to parse from exception output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me! (but just looking at the cli commands, I don't code python).
Sweet, thanks! I'm going to leave this open for now in case we wanted to add more superpowers (e.g. to help debug #124 and friends). Right now it's pretty naive (e.g. it probably won't give you pretty output if it can't read all linked children). |
Yeah, it'll help with my issues for sure. I'm sure I'll have more feedback as I use it, but seems like a good iterative step. |
Ok, just tried out the validate command and got an error that didn't show what line it encountered it with. It seems like it'd be ideal if the validate command always showed the exact line of the exact file where the link that generated the error came from.
(I think I can figure out the issue, as it's a pretty constrained catalog at this point, but in a more complex catalog I wouldn't be sure |
Right on, thanks for the feedback. I've added some additional logic to walk children and look for missing links. Output looks something like this: There isn't the exact file line (getting the exact line would be some additional lifting), but it gives you the file containing the link and the exact text of the link, which hopefully is easily findable. Is this helpful for your debugging scenario? |
Cool, that helps for some situations. Actual line numbers is likely not necessary in most situations. Would have helped with the last step, but that one was easy to figure out. Doesn't seem to grab my next particular one though (which feels pretty weird):
(I changed the path to just be a completely relative one, but it still seems to have problems with it, though maybe I wrote it wrong? But it gets past the 'validate' command. |
(I think there's also a problem with |
Roger. I've added items to the validate command (and made the output a bit prettier) so you (hopefully) can get something like this instead of a traceback: LMK if that helps. (as an aside, the colorification is with an eye towards #70, so we could e.g. do yellow for best practices, etc) |
This returns a lot of "self" link spam when testing on the data-files catalogs, but according to the spec these are bad links so I guess that's ok? Maybe we'll want to add a flag to quiet self flags later.
Ok, I've been using this a lot. Going to post a number of potential improvements and test cases where more information would help (or it may just be a bug). But it'd be great to get this into the next release, even in its current form, as it's definitely a helpful tool. The first suggestion is to add a 'check-links' option or something like that, which will actually follow all the asset hrefs (probably the link ones too) and tell you if they are actually valid locations. I'm hoping to catch typos and just when people think they're properly linking to the asset. I think this could just be warnings, that it's valid stac, but that the links aren't working. I put basically the same suggestion at https://github.com/sparkgeo/stac-validator as well. |
Ok, I've got a failure I'm stuck on, that doesn't have enough validation information for me to figure it out: unzip the catalog, then:
And then if I try a 'copy' I get:
I'm 95% sure that I got this collection by just using |
This returns a lot of "self" link spam when testing on the data-files catalogs, but according to the spec these are bad links so I guess that's ok? Maybe we'll want to add a flag to quiet self flags later.
I have the link version of this, but not the asset -- I'll add that. There's some |
However, this isn't doing recursive asset validation, so we'll need to rework the flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! It worked on a couple files I tried.
I like that the link and asset following is optional.
Can you update the CHANGELOG before you merge?
@cuttlefish can you take a look at the codecov error when you get a chance? I don't really know what's going on there. |
@gadomski It looks like it was just a spurious runtime error. Merge away! |
This is a very simple cli command that runs validate (for Items) or validate_all (for Collections and Catalogs). There is also an
--only
flag to only run validate for container objects.The output is a simple success message, or the exception message and source. This should hopefully help folks track down validation errors in existing STAC objects that can be hard to parse from exception output.