Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Docs build does not build suite pages if being run via a checkpoint and does not upload assets #1944

Closed
Aylr opened this issue Oct 1, 2020 · 7 comments · Fixed by #1990
Assignees

Comments

@Aylr
Copy link
Contributor

Aylr commented Oct 1, 2020

Describe the bug
Data Docs build does not build suite pages if using a GCS bucket

To Reproduce
Steps to reproduce the behavior:

  1. Create a number of suites
  2. Set up a GCS site
  3. great_expectations docs build
  4. Note a GCS site with an index page only - no other files or assets.

Expected behavior
I expect a full datadocs site

Environment (please complete the following information):

  • OS: [e.g. iOS] macOS
  • GE Version: [e.g. 0.10.0] 0.12.3
@eugmandel eugmandel added the triage Used by the GE core team to flag issues that were not yet triaged label Oct 2, 2020
@Aylr
Copy link
Contributor Author

Aylr commented Oct 2, 2020

Data_Docs_created_by_Great_Expectations

@anthonyburdi anthonyburdi self-assigned this Oct 8, 2020
@anthonyburdi
Copy link
Member

FYI while trying to reproduce this: currently great_expectations docs build seems to work as expected, however this issue does crop up when using great_expectations checkpoint run ...

@anthonyburdi
Copy link
Member

anthonyburdi commented Oct 8, 2020

FYI this issue was reproduced on s3 & Filesystem as well when using great_expectations checkpoint run <checkpoint_name>

@Aylr Aylr changed the title Data Docs build does not build suite pages if using a GCS bucket, and does not upload assets Data Docs build does not build suite pages if being run via a checkpoint and does not upload assets Oct 8, 2020
@eugmandel eugmandel added core-team-priority and removed triage Used by the GE core team to flag issues that were not yet triaged labels Oct 13, 2020
@anthonyburdi
Copy link
Member

anthonyburdi commented Oct 15, 2020

Quick update @Aylr @eugmandel - I have a proposed method to fix this in branch anthony/bugfix/issue_1944 and am adding tests before submitting a PR.
The proposed fix is to check for the existence of a data docs site (via a method on HtmlSiteStore to check for an existing index.html) and if it does not exist, recreating the data docs site from scratch during an SiteBuilder.build() by removing the resource_identifiers. This will keep existing functionality as-is, i.e. running a checkpoint for an existing data docs site will only add a page for the new validation and fix the linked issue by rebuilding data docs in full if they don't exist.
Feedback / other ideas welcome!

@Aylr
Copy link
Contributor Author

Aylr commented Oct 15, 2020

A note of mild warning - I vaguely recall the original intent of the resource identifier being passed was to build only the new page rather than the very slow rebuild of the entire site. If a user has many suites (or few large ones), or many validation results this can take quite some time to do.

@anthonyburdi
Copy link
Member

Makes perfect sense! Just to be clear - the resource_identifier is only removed if there is no existing site, in which case you would likely want to rebuild. However noted that it could take a very long time, and a user may not have set a validation_results_limit.

Maybe there is a middle ground where if there is no existing site during a checkpoint run, we just make sure the index / expectation / static sections of the site are built (in addition to the given resource identifier for the validation). I'm looking into DefaultSiteSectionBuilder.build() to see if we can force a build of the expectations pages if the site does not exist and the given resource_identifier is not an ExpectationSuiteIdentifier (assuming that in this case a user would just want to render that single expectation suite).

@anthonyburdi
Copy link
Member

@Aylr FYI - I changed the functionality to the following: All expectation suites are always rendered unless resource_identifiers contains ExpectationSuiteIdentifier(s). If resource_identifiers contains ExpectationSuiteIdentifier(s) then only those are rendered. PR is now ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment