Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a validate cli command #151

Merged
merged 17 commits into from
Jul 27, 2021
Merged

Adds a validate cli command #151

merged 17 commits into from
Jul 27, 2021

Conversation

gadomski
Copy link
Member

This is a very simple cli command that runs validate (for Items) or validate_all (for Collections and Catalogs). There is also an --only flag to only run validate for container objects.

The output is a simple success message, or the exception message and source. This should hopefully help folks track down validation errors in existing STAC objects that can be hard to parse from exception output.

This is a very simple cli command that runs validate (for Items) or
validate_all (for Collections and Catalogs). There is also an `--only`
flag to only run validate for container objects.

The output is a simple success message, or the exception message and
source. This should hopefully help folks track down validation errors in
existing STAC objects that can be hard to parse from exception output.
@gadomski gadomski linked an issue Jun 23, 2021 that may be closed by this pull request
@gadomski gadomski added this to the v0.2.1 milestone Jun 23, 2021
@gadomski gadomski requested a review from cholmes June 24, 2021 15:24
Copy link
Contributor

@cholmes cholmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! (but just looking at the cli commands, I don't code python).

@gadomski
Copy link
Member Author

Sweet, thanks! I'm going to leave this open for now in case we wanted to add more superpowers (e.g. to help debug #124 and friends). Right now it's pretty naive (e.g. it probably won't give you pretty output if it can't read all linked children).

@cholmes
Copy link
Contributor

cholmes commented Jun 24, 2021

Yeah, it'll help with my issues for sure. I'm sure I'll have more feedback as I use it, but seems like a good iterative step.

@cholmes
Copy link
Contributor

cholmes commented Jun 28, 2021

Ok, just tried out the validate command and got an error that didn't show what line it encountered it with. It seems like it'd be ideal if the validate command always showed the exact line of the exact file where the link that generated the error came from.

(venv) cholmes@C02Y3151JHD3 stactools-pete % ./scripts/stac validate /Users/cholmes/Repos/planet-orders/new-stacs/planet-stac/collection.json 
Traceback (most recent call last):
  File "/opt/salt/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/salt/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/stactools/cli/__main__.py", line 4, in <module>
    run_cli()
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/stactools/cli/cli.py", line 40, in run_cli
    cli(prog_name='stac')
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/stactools/cli/commands/validate.py", line 24, in validate_command
    object.validate_all()
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/pystac/catalog.py", line 777, in validate_all
    self.validate()
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/pystac/stac_object.py", line 58, in validate
    return pystac.validation.validate(self)
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/pystac/validation/__init__.py", line 33, in validate
    stac_dict=stac_object.to_dict(),
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/pystac/collection.py", line 537, in to_dict
    d = super().to_dict(include_self_link)
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/pystac/catalog.py", line 459, in to_dict
    "links": [link.to_dict() for link in links],
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/pystac/catalog.py", line 459, in <listcomp>
    "links": [link.to_dict() for link in links],
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/pystac/link.py", line 249, in to_dict
    d["href"] = self.get_href()
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/pystac/link.py", line 121, in get_href
    if href and is_absolute_href(href) and self.owner and self.owner.get_root():
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/pystac/stac_object.py", line 202, in get_root
    root_link.resolve_stac_object()
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/pystac/link.py", line 213, in resolve_stac_object
    obj = stac_io.read_stac_object(target_href, root=root)
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/pystac/stac_io.py", line 225, in read_stac_object
    d = self.read_json(source, *args, **kwargs)
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/pystac/stac_io.py", line 196, in read_json
    txt = self.read_text(source, *args, **kwargs)
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/pystac/stac_io.py", line 278, in read_text
    return self.read_text_from_href(href)
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/stactools/core/io/__init__.py", line 24, in read_text_from_href
    with fsspec.open(href, "r") as f:
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/fsspec/core.py", line 102, in __enter__
    f = self.fs.open(self.path, mode=mode)
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/fsspec/spec.py", line 968, in open
    **kwargs,
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/fsspec/implementations/local.py", line 144, in _open
    return LocalFileOpener(path, mode, fs=self, **kwargs)
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/fsspec/implementations/local.py", line 235, in __init__
    self._open()
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/fsspec/implementations/local.py", line 240, in _open
    self.f = open(self.path, mode=self.mode)
FileNotFoundError: [Errno 2] No such file or directory: '/Users/cholmes/Repos/planet-orders/new-stacs/planet-stac/planet-stac/collection.json'

(I think I can figure out the issue, as it's a pretty constrained catalog at this point, but in a more complex catalog I wouldn't be sure

@gadomski
Copy link
Member Author

Right on, thanks for the feedback. I've added some additional logic to walk children and look for missing links. Output looks something like this:

Screen Shot 2021-06-28 at 11 34 50 AM

There isn't the exact file line (getting the exact line would be some additional lifting), but it gives you the file containing the link and the exact text of the link, which hopefully is easily findable. Is this helpful for your debugging scenario?

@cholmes
Copy link
Contributor

cholmes commented Jun 28, 2021

Cool, that helps for some situations. Actual line numbers is likely not necessary in most situations. Would have helped with the last step, but that one was easy to figure out. Doesn't seem to grab my next particular one though (which feels pretty weird):

% ./scripts/stac validate /Users/cholmes/Repos/planet-orders/new-stacs/planet-stac/collection.json
OK! STAC object at /Users/cholmes/Repos/planet-orders/new-stacs/planet-stac/collection.json is valid!
% stac move-assets collection.json
Traceback (most recent call last):
  File "/Users/cholmes/Repos/planet-orders/venv/bin/stac", line 8, in <module>
    sys.exit(run_cli())
  File "/Users/cholmes/Repos/planet-orders/venv/lib/python3.7/site-packages/stactools/cli/cli.py", line 40, in run_cli
    cli(prog_name='stac')
  File "/Users/cholmes/Repos/planet-orders/venv/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/cholmes/Repos/planet-orders/venv/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/cholmes/Repos/planet-orders/venv/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/cholmes/Repos/planet-orders/venv/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/cholmes/Repos/planet-orders/venv/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/cholmes/Repos/planet-orders/venv/lib/python3.7/site-packages/stactools/cli/commands/copy.py", line 37, in move_assets_command
    copy=copy)
  File "/Users/cholmes/Repos/planet-orders/venv/lib/python3.7/site-packages/stactools/core/copy.py", line 187, in move_all_assets
    for item in catalog.get_all_items():
  File "/Users/cholmes/Repos/planet-orders/venv/lib/python3.7/site-packages/pystac/catalog.py", line 437, in get_all_items
    yield from self.get_items()
  File "/Users/cholmes/Repos/planet-orders/venv/lib/python3.7/site-packages/pystac/stac_object.py", line 292, in get_stac_objects
    link.resolve_stac_object(root=self.get_root())
  File "/Users/cholmes/Repos/planet-orders/venv/lib/python3.7/site-packages/pystac/stac_object.py", line 202, in get_root
    root_link.resolve_stac_object()
  File "/Users/cholmes/Repos/planet-orders/venv/lib/python3.7/site-packages/pystac/link.py", line 213, in resolve_stac_object
    obj = stac_io.read_stac_object(target_href, root=root)
  File "/Users/cholmes/Repos/planet-orders/venv/lib/python3.7/site-packages/pystac/stac_io.py", line 172, in read_stac_object
    d = self.read_json(source)
  File "/Users/cholmes/Repos/planet-orders/venv/lib/python3.7/site-packages/pystac/stac_io.py", line 151, in read_json
    txt = self.read_text(source, *args, **kwargs)
  File "/Users/cholmes/Repos/planet-orders/venv/lib/python3.7/site-packages/pystac/stac_io.py", line 215, in read_text
    return self.read_text_from_href(href, *args, **kwargs)
  File "/Users/cholmes/Repos/planet-orders/venv/lib/python3.7/site-packages/stactools/core/io/__init__.py", line 24, in read_text_from_href
    with fsspec.open(href, "r") as f:
  File "/Users/cholmes/Repos/planet-orders/venv/lib/python3.7/site-packages/fsspec/core.py", line 102, in __enter__
    f = self.fs.open(self.path, mode=mode)
  File "/Users/cholmes/Repos/planet-orders/venv/lib/python3.7/site-packages/fsspec/spec.py", line 968, in open
    **kwargs,
  File "/Users/cholmes/Repos/planet-orders/venv/lib/python3.7/site-packages/fsspec/implementations/local.py", line 132, in _open
    return LocalFileOpener(path, mode, fs=self, **kwargs)
  File "/Users/cholmes/Repos/planet-orders/venv/lib/python3.7/site-packages/fsspec/implementations/local.py", line 220, in __init__
    self._open()
  File "/Users/cholmes/Repos/planet-orders/venv/lib/python3.7/site-packages/fsspec/implementations/local.py", line 225, in _open
    self.f = open(self.path, mode=self.mode)
FileNotFoundError: [Errno 2] No such file or directory: '/collection.json'

(I changed the path to just be a completely relative one, but it still seems to have problems with it, though maybe I wrote it wrong? But it gets past the 'validate' command.

@cholmes
Copy link
Contributor

cholmes commented Jun 28, 2021

(I think there's also a problem with merge, as that seemed to add on relative path prefixes, but I've got more work to do to make a clean bug report on it).

@gadomski
Copy link
Member Author

(I think there's also a problem with merge, as that seemed to add on relative path prefixes, but I've got more work to do to make a clean bug report on it).

Roger. I've added items to the validate command (and made the output a bit prettier) so you (hopefully) can get something like this instead of a traceback:

Screen Shot 2021-06-28 at 4 55 08 PM

LMK if that helps.

(as an aside, the colorification is with an eye towards #70, so we could e.g. do yellow for best practices, etc)

This returns a lot of "self" link spam when testing on the data-files
catalogs, but according to the spec these are bad links so I guess
that's ok? Maybe we'll want to add a flag to quiet self flags later.
@cholmes
Copy link
Contributor

cholmes commented Jul 19, 2021

Ok, I've been using this a lot. Going to post a number of potential improvements and test cases where more information would help (or it may just be a bug). But it'd be great to get this into the next release, even in its current form, as it's definitely a helpful tool.

The first suggestion is to add a 'check-links' option or something like that, which will actually follow all the asset hrefs (probably the link ones too) and tell you if they are actually valid locations. I'm hoping to catch typos and just when people think they're properly linking to the asset. I think this could just be warnings, that it's valid stac, but that the links aren't working. I put basically the same suggestion at https://github.com/sparkgeo/stac-validator as well.

@cholmes
Copy link
Contributor

cholmes commented Jul 19, 2021

Ok, I've got a failure I'm stuck on, that doesn't have enough validation information for me to figure it out:

test-catalog.zip

unzip the catalog, then:

(venv) cholmes@c02y3151jhd3 test-catalog % stac validate collection.json 
FileNotFound error: [Errno 2] No such file or directory: '/collection.json'
Walking children to find location of missing link(s)...

And then if I try a 'copy' I get:

(venv) cholmes@c02y3151jhd3 test-catalog % stac copy collection.json test
Traceback (most recent call last):
  File "/Users/cholmes/Repos/stactools-pete/venv/bin/stac", line 8, in <module>
    sys.exit(run_cli())
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/stactools/cli/cli.py", line 40, in run_cli
    cli(prog_name='stac')
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/stactools/cli/commands/copy.py", line 67, in copy_command
    copy_catalog(source_catalog, dst, catalog_type, copy_assets)
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/stactools/core/copy.py", line 198, in copy_catalog
    catalog = source_catalog.full_copy()
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/pystac/collection.py", line 764, in full_copy
    return cast(Collection, super().full_copy(root, parent))
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/pystac/catalog.py", line 972, in full_copy
    return cast(Catalog, super().full_copy(root, parent))
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/pystac/stac_object.py", line 373, in full_copy
    link.resolve_stac_object()
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/pystac/link.py", line 219, in resolve_stac_object
    obj = stac_io.read_stac_object(target_href, root=root)
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/pystac/stac_io.py", line 224, in read_stac_object
    d = self.read_json(source, *args, **kwargs)
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/pystac/stac_io.py", line 195, in read_json
    txt = self.read_text(source, *args, **kwargs)
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/pystac/stac_io.py", line 277, in read_text
    return self.read_text_from_href(href)
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/stactools/core/io/__init__.py", line 24, in read_text_from_href
    with fsspec.open(href, "r") as f:
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/fsspec/core.py", line 102, in __enter__
    f = self.fs.open(self.path, mode=mode)
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/fsspec/spec.py", line 968, in open
    **kwargs,
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/fsspec/implementations/local.py", line 144, in _open
    return LocalFileOpener(path, mode, fs=self, **kwargs)
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/fsspec/implementations/local.py", line 235, in __init__
    self._open()
  File "/Users/cholmes/Repos/stactools-pete/venv/lib/python3.7/site-packages/fsspec/implementations/local.py", line 240, in _open
    self.f = open(self.path, mode=self.mode)
FileNotFoundError: [Errno 2] No such file or directory: '/202012_223832_ssc4_u0001/20201211_223832_ssc4_u0001.json'

I'm 95% sure that I got this collection by just using stac planet convert order and then a merge.

gadomski and others added 4 commits July 20, 2021 06:24
This returns a lot of "self" link spam when testing on the data-files
catalogs, but according to the spec these are bad links so I guess
that's ok? Maybe we'll want to add a flag to quiet self flags later.
@gadomski
Copy link
Member Author

The first suggestion is to add a 'check-links' option or something like that, which will actually follow all the asset hrefs (probably the link ones too) and tell you if they are actually valid locations.

I have the link version of this, but not the asset -- I'll add that.

There's some self link resolution that's funny in PySTAC v1.0.0, so once PySTAC v1.0.1 is released (containing stac-utils/pystac#574) I think this PR will be ready to merge, and then we can add features in subsequent PRs.

However, this isn't doing recursive asset validation, so we'll need to
rework the flow.
@gadomski gadomski requested review from cuttlefish and removed request for matthewhanson July 26, 2021 19:53
Copy link
Collaborator

@cuttlefish cuttlefish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! It worked on a couple files I tried.
I like that the link and asset following is optional.
Can you update the CHANGELOG before you merge?

@gadomski
Copy link
Member Author

@cuttlefish can you take a look at the codecov error when you get a chance? I don't really know what's going on there.

@cuttlefish
Copy link
Collaborator

@cuttlefish can you take a look at the codecov error when you get a chance? I don't really know what's going on there.

@gadomski It looks like it was just a spurious runtime error. Merge away!

@cuttlefish cuttlefish merged commit c278e73 into stac-utils:main Jul 27, 2021
@gadomski gadomski deleted the feature/stac-validate branch July 27, 2021 17:29
@gadomski gadomski mentioned this pull request Feb 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Display the name of the file where a validation error occurred.
3 participants