Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File client user guide #1060

Merged
merged 222 commits into from Jan 12, 2023
Merged

File client user guide #1060

merged 222 commits into from Jan 12, 2023

Conversation

neelasha23
Copy link
Contributor

Describe your changes

Issue ticket number and link

Closes #701

Checklist before requesting a review

  • I have performed a self-review of my code
  • I have added thorough tests (when necessary).
  • I have added the right documentation (when needed). Product update? If yes, write one line about this update.

@idomic idomic requested a review from edublancas January 4, 2023 19:25
@idomic
Copy link
Contributor

idomic commented Jan 4, 2023

Looks better already, just make sure the CI is passing before review.

@neelasha23
Copy link
Contributor Author

Looks better already, just make sure the CI is passing before review.

I created the branch from the latest master branch but I see this test failing: tests/io_mod/test_terminalwriter.py::test_code_highlight[with markup and code_highlight]
Is this an intermittent issue?

@idomic @edublancas

Copy link
Contributor

@edublancas edublancas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am I right to assume that these docs are mostly gathering docs that we already had in other places? (that's fine, I'm just wondering where this came from and if we should remove other references to prevent duplication)

doc/user-guide/file_clients.rst Outdated Show resolved Hide resolved

The file clients only upload products generated by the pipeline. If you want to work with an external dataset which is in the cloud please follow the below approach:

* Sql script that runs a CREATE TABLE statement to copy a public dataset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a but confused about this section. I think this is documenting what to do when the dataset we want to work with is in a different place right? But I'm not following how that connects to this list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was mentioned in the original issue: Another important concept to explain is how to bring external datasets that live in the cloud already since File clients only upload products generated by the pipeline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so let's remove this three points since they're confusing and replace them with:

The file clients only upload products generated by the pipeline. If you want to work with an external dataset, you should download such a dataset in the pipeline task that uses it as input. If you need help contact us (add link to our slack)

@edublancas
Copy link
Contributor

I created the branch from the latest master branch but I see this test failing: tests/io_mod/test_terminalwriter.py::test_code_highlight[with markup and code_highlight]
Is this an intermittent issue?

yeah, I saw that in an earlier CI run. We can ignore it

@edublancas
Copy link
Contributor

Update: I found out why the CI failed and fixed it. please rebase

Wxl19980214 and others added 24 commits January 5, 2023 16:35
* Added an Algolia Crawler Section to doc

* Update contributing.md
* add --backend to report command

* update --backend help msg

* Add initial report command tests

* add backend arg to dag.plot

* try using `plot.choose_backend(backend)`

* add `dot -c` to win CIs

* update help msg and remove backend in cli msg

* remove backend assertion

* address feedback
…loomber#825)

* added and tested issue ploomber#805

* fix tepo issues based on Ido's feedback

* pass test

* add psycopg2-binary back

* fix bugs to pass test

* delete unused files

* deleted fileclient/.gitignore

* removed the commented debug

* deleted the blank line at end of file

* no message

* fixed bugs with format checked

* add one test

* checked format with flake8

* removed one test

* fixed mock called twice error

* fixed test_error_unknown_example

* fixed format issues

* solved format issues

* fixed format issues

* fixed CI errors

* removed debug code

* avoid creation of multiple example managers and add more test cases

* add `_suggest_example` and revert arg order change

* refactor: move file read to constructor

* use try-except to bypass exceptions

* print error instead of raising exception

* Update newline breaks

Co-authored-by: shuyang <94rain@msn.com>
Co-authored-by: shuyang <21193371+94rain@users.noreply.github.com>
* handle exception in `jupyter.manager.load_dag`

* Add `test_load_dag_exception_handling` test

* ensure `log.exception.call_args_list` is expected
 (ploomber#920)

* Fixed confusing error when passing something.html as source ploomber#646

* Failing test fixed

* Error message fixed

* _looks_like_file_name check added after import fails

* typo fixed. test fixed.

Co-authored-by: yafim <ifim.vo@gmail.com>
…mber#927)

* start_method added to parallel executor

* tests added to executor with valid start_method values

The file clients only upload products generated by the pipeline. If you want to work with an external dataset which is in the cloud please follow the below approach:

* Sql script that runs a CREATE TABLE statement to copy a public dataset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so let's remove this three points since they're confusing and replace them with:

The file clients only upload products generated by the pipeline. If you want to work with an external dataset, you should download such a dataset in the pipeline task that uses it as input. If you need help contact us (add link to our slack)

@idomic
Copy link
Contributor

idomic commented Jan 9, 2023

@neelasha23 Just a comment, I think instead of rebasing you're doing a merge of the repo, what happens when you do it is:

  1. We get tons of unrelated commits into this PR.
  2. When reviewing, for instance your last PR, it adds tons of unrelated files which aren't relevant and makes the process unnecessarily hard.

Please avoid merging like that so we can adhere to the rest of the PRs.
Also please fix the review comments so we can mark as done.

@idomic idomic merged commit f5a2f0e into ploomber:master Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add a user guide on File clients