Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGFIX] Allows GE to work when installed in a zip file (PEP 273). Fixes issue #3772 #3798

Merged

Conversation

joseignaciorc
Copy link
Contributor

@joseignaciorc joseignaciorc commented Dec 5, 2021

About this PR

This PR closes the issue #3772.

Changes proposed in this pull request:

  • html_site_store.py has been modified. Now when GE is installed in a zip file, the static files are extracted (i.e. unziped) before they are used, so that the context.build_data_docs() function works as expected instead of failing with a FileNotFoundError or NotADirectoryError
  • test-pep273-compatibility.yml has been created. A GA (Github Action) that verifies that the context.build_data_docs() function works when GE is installed in a directory and also in a zip. This verification is done by trying to run the simple_build_data_docs.py (which also created in this PR)

How these changes were tested:

IMPORTANT:
The tests on test-pep273-compatibility are set to run on every push to the repo. The test is lightweight, but I understand that a Github Action may or may not the best way to test this change, so I am open to discuss changes in the GA and also in the code.

Definition of Done

Please delete options that are not relevant.

  • My code follows the Great Expectations style guide
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added unit tests where applicable and made sure that new and existing tests are passing.
  • I have run any local integration tests and made sure that nothing is broken.

This commit adds a Github Action that checks automatically if issue 3772 ('NotADirectoryError occurs when GE is installed in a zip file and build_data_docs() is called') happens.
The issue itself will be fixed in a another commit
…PEP 273)

This commit fixes issue 3772 ('NotADirectoryError occurs when GE is installed in a zip file and build_data_docs() is called')
@netlify
Copy link

netlify bot commented Dec 5, 2021

👷 Deploy request for niobium-lead-7998 pending review.
Visit the deploys page to approve it

🔨 Explore the source changes: fc64df1

@joseignaciorc joseignaciorc changed the title Bugfix/issue3772 pep273 [BUGFIX] Allows GE to work when installed in a zip file (PEP 273). Fixes issue #3772 Dec 6, 2021
@joseignaciorc joseignaciorc marked this pull request as ready for review December 6, 2021 00:30
@cdkini
Copy link
Member

cdkini commented Dec 7, 2021

Hey @joseignaciorc thanks so much for contributing!

Let me confer internally to determine how best to navigate running this in CI/CD. I'll bring it up during our next engineering meeting and get back to you!

Copy link
Member

@cdkini cdkini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments but the core logic looks great! Still conferring internally about best practices around testing.

Comment on lines 392 to 416
# If `static_assets_source_absdir` contains the string ".zip/" (unix) or ".zip\" (win)
# then we check if Great Expectations has been installed inside a zip file (see PEP 273)
# If so, we need to extract all the static files and run this function again
static_assets_source_absdir = os.path.abspath(static_assets_source_dir)
zip_re = re.match(
f"(.+[.]zip){re.escape(os.sep)}(.+)",
static_assets_source_absdir,
flags=re.IGNORECASE,
)

if zip_re:
zip_filename = zip_re.groups()[0] # e.g.: /home/joe/libs/my_python_libs.zip
path_in_zip = zip_re.groups()[1] # great_expectations/render/view/static
unzip_destination = tempfile.mkdtemp()

if is_zipfile(zip_filename):
with ZipFile(zip_filename) as zipfile:
static_files_to_extract = [
file
for file in zipfile.namelist()
if file.startswith(path_in_zip)
]
zipfile.extractall(unzip_destination, static_files_to_extract)
return self.copy_static_assets(unzip_destination)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this logic is very specific to zip-files, could we extract the logic into a separate method? Maybe something like this:

def _handle_zipped_static_assets(self) -> bool:
       # If `static_assets_source_absdir` contains the string ".zip/" (unix) or ".zip\" (win)
        # then we check if Great Expectations has been installed inside a zip file (see PEP 273)
        # If so, we need to extract all the static files and run this function again
        static_assets_source_absdir = os.path.abspath(static_assets_source_dir)
        zip_re = re.match(
            f"(.+[.]zip){re.escape(os.sep)}(.+)",
            static_assets_source_absdir,
            flags=re.IGNORECASE,
        )

        if zip_re:
            zip_filename = zip_re.groups()[0]  # e.g.: /home/joe/libs/my_python_libs.zip
            path_in_zip = zip_re.groups()[1]  # great_expectations/render/view/static
            unzip_destination = tempfile.mkdtemp()

            if is_zipfile(zip_filename):
                with ZipFile(zip_filename) as zipfile:
                    static_files_to_extract = [
                        file
                        for file in zipfile.namelist()
                        if file.startswith(path_in_zip)
                    ]
                    zipfile.extractall(unzip_destination, static_files_to_extract)
                self.copy_static_assets(unzip_destination)
                return True
       return False

That way, we can isolate the zip-specific code and simply invoke it in copy_static_assets

unzipped = self._handle_zipped_static_files()
if unzipped:
    return

Thoughts?

Comment on lines 10 to 12
)

print(f"Great Expectations location: {ge.__file__}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a quick blurb that explains the purpose of this test?

"""
What does this test and why?

Ensures compatibility with PEP-273: https://www.python.org/dev/peps/pep-0273/
Covers the use-case where GE is installed in a zip file and build_data_docs() is called.
Any other relevant details you may deem relevant!
"""

@talagluck talagluck added community triage Used by the GE core team to flag issues that were not yet triaged labels Dec 7, 2021
@cdkini cdkini added devrel This item is being addressed by the Developer Relations Team and removed triage Used by the GE core team to flag issues that were not yet triaged labels Dec 8, 2021
@cdkini
Copy link
Member

cdkini commented Dec 9, 2021

I just added your test to our integration test matrix; I think that'll probably be the way to go here just so we don't test on each change!

* A new function `_unzip_assets()` is created, to handle the unzipping
* The test in `simple_build_data_docs.py` is now documented
@joseignaciorc
Copy link
Contributor Author

I have added a new commit with the changes you commented, @cdkini

Extracting the zip is now done in its own function (_unzip_assets()) and the test in simple_build_data_docs.py is now documented

@cdkini cdkini self-requested a review December 13, 2021 15:32
@cdkini cdkini enabled auto-merge (squash) December 13, 2021 15:33
@cdkini cdkini merged commit 777d5f1 into great-expectations:develop Dec 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community devrel This item is being addressed by the Developer Relations Team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants