Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why bq client create dataset in data loader? #5197

Closed
jx2lee opened this issue Jun 18, 2024 · 3 comments · Fixed by #5218
Closed

Why bq client create dataset in data loader? #5197

jx2lee opened this issue Jun 18, 2024 · 3 comments · Fixed by #5218

Comments

@jx2lee
Copy link

jx2lee commented Jun 18, 2024

I understood data loader is for fetching remote data or local files.

__write_table in the Bigquery export.__process function, which contains the line client creates the dataset. If data loader use to export an existing dataset, I think it can leave out to create the dataset. (I haven't seen any warehouse IO that creates a bigquery dataset level resource. If I missed it, please share the link)

To summarize,

  • bigquery:L374 can be deleted if there's a reason not to? Also, I haven't found any similar issues. If exists a reason why bq client should create a dataset, please let me know 🙏.
  • If you accept this, I'd like to contribute. Can I create a PR myself (if the syntax is not needed)?
@jx2lee
Copy link
Author

jx2lee commented Jun 18, 2024

i found warehouse used create_? (? -> same level in bigquery.dataset)!
in redshift, create_schema parameter is existed. (default true)

Why not use the create_dataset parameter in bigquery.export?

@wangxiaoyou1993
Copy link
Member

You can add the create_dataset param to the bigquery.export method. Feel free to create a PR for it.

@jx2lee
Copy link
Author

jx2lee commented Jun 26, 2024

@wangxiaoyou1993 Could you review the #5218 my colleague created to fix issue?

@wangxiaoyou1993 wangxiaoyou1993 linked a pull request Jul 9, 2024 that will close this issue
6 tasks
wangxiaoyou1993 pushed a commit that referenced this issue Jul 9, 2024
# Description
<!-- Please include a summary of the change and which issue is fixed.
Please also include relevant motivation and context.
List any dependencies that are required for this change.
-->
#5197

- Added a `create_dataset` parameter to the BigQuery `export` method to
allow users to create a dataset if desired.
- Previously, the `__write_table` method always attempted to create a
dataset.
- Now, dataset creation is controlled by the `create_dataset` parameter,
which defaults to `False`.

# How Has This Been Tested?
<!-- Please describe the tests that you ran to verify your changes.
Provide instructions so we can reproduce.
-->
- [x] Tested locally and created test cases to ensure the new parameter
works as expected.


# Checklist
- [ ] The PR is tagged with proper labels (bug, enhancement, feature,
documentation)
- [x] I have performed a self-review of my own code
- [x] I have added unit tests that prove my fix is effective or that my
feature works
- [ ] I have commented my code, particularly in hard-to-understand areas
- [x] I have made corresponding changes to the documentation

cc: 
@wangxiaoyou1993
@jx2lee
<!-- Optionally mention someone to let them know about this pull request
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants