Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #14215: Add missing decode stage to gz/zip files in json ingestion reader. #14375

Merged
merged 5 commits into from
Dec 14, 2023

Conversation

CKristensen
Copy link
Contributor

@CKristensen CKristensen commented Dec 13, 2023

Describe your changes:

Fixes #14215

The decode stage was missing from the storage reader

I worked on fixing the s3 reader for json.gz files because it was giving me a error:
[2023-12-04 10:05:23] ERROR {metadata.Utils:datalake_utils:75} - Error fetching file [bucket-name/path/tofolder/part-00062-sda-c000.json.gz] using [S3Config] due to: [Error reading dataframe due to [a bytes-like object is required, not 'str']]

When I dig into the code, i notice that gz and zip is already supported, but there is a bug in the implementation.
The decode stage was missing from the files that where zip/gz.
So i fixed that.

Type of change:

  • Bug fix

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Carl Kristensen added 2 commits December 13, 2023 13:55
Files that where zip/gz where not being decoded.
This was leading to a error when we wanted them to be.
Copy link

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@CKristensen CKristensen changed the title Fixes 14215: Add decode missing decode stage to gz/zip files in json ingestion reader. Fixes 14215: Add missing decode stage to gz/zip files in json ingestion reader. Dec 13, 2023
Copy link

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@CKristensen CKristensen marked this pull request as ready for review December 13, 2023 13:18
@CKristensen CKristensen requested a review from a team as a code owner December 13, 2023 13:18
@TeddyCr TeddyCr added the safe to test Add this label to run secure Github workflows on PRs label Dec 13, 2023
Copy link

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@CKristensen CKristensen changed the title Fixes 14215: Add missing decode stage to gz/zip files in json ingestion reader. Fixes #14215: Add missing decode stage to gz/zip files in json ingestion reader. Dec 14, 2023
Copy link

sonarcloud bot commented Dec 14, 2023

Quality Gate Passed Quality Gate passed for 'open-metadata-ingestion'

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
40.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

@pmbrull pmbrull merged commit 74df616 into open-metadata:main Dec 14, 2023
15 checks passed
MrVinegar pushed a commit to MrVinegar/OpenMetadata that referenced this pull request Dec 15, 2023
…n json ingestion reader. (open-metadata#14375)

* add decoding stage to gz/zip files.

Files that where zip/gz where not being decoded.
This was leading to a error when we wanted them to be.

* remove unnecessary comment

---------

Co-authored-by: Carl Kristensen <carl.johan.coelho.kristensen@schibsted.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ingestion safe to test Add this label to run secure Github workflows on PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for compressed types in s3/datalake connector
3 participants