File client user guide #1060

neelasha23 · 2023-01-04T18:58:35Z

Describe your changes

Issue ticket number and link

Closes #701

Checklist before requesting a review

I have performed a self-review of my code
I have added thorough tests (when necessary).
I have added the right documentation (when needed). Product update? If yes, write one line about this update.

idomic · 2023-01-04T19:25:37Z

Looks better already, just make sure the CI is passing before review.

neelasha23 · 2023-01-04T19:54:50Z

Looks better already, just make sure the CI is passing before review.

I created the branch from the latest master branch but I see this test failing: tests/io_mod/test_terminalwriter.py::test_code_highlight[with markup and code_highlight]
Is this an intermittent issue?

@idomic @edublancas

edublancas

am I right to assume that these docs are mostly gathering docs that we already had in other places? (that's fine, I'm just wondering where this came from and if we should remove other references to prevent duplication)

doc/user-guide/file_clients.rst

edublancas · 2023-01-05T00:43:16Z

doc/user-guide/file_clients.rst

+
+The file clients only upload products generated by the pipeline. If you want to work with an external dataset which is in the cloud please follow the below approach:
+
+* Sql script that runs a CREATE TABLE statement to copy a public dataset.


I'm a but confused about this section. I think this is documenting what to do when the dataset we want to work with is in a different place right? But I'm not following how that connects to this list

This was mentioned in the original issue: Another important concept to explain is how to bring external datasets that live in the cloud already since File clients only upload products generated by the pipeline.

Ok, so let's remove this three points since they're confusing and replace them with:

The file clients only upload products generated by the pipeline. If you want to work with an external dataset, you should download such a dataset in the pipeline task that uses it as input. If you need help contact us (add link to our slack)

edublancas · 2023-01-05T00:45:27Z

I created the branch from the latest master branch but I see this test failing: tests/io_mod/test_terminalwriter.py::test_code_highlight[with markup and code_highlight]
Is this an intermittent issue?

yeah, I saw that in an earlier CI run. We can ignore it

edublancas · 2023-01-05T02:05:15Z

Update: I found out why the CI failed and fixed it. please rebase

remove email onboarding test

* Added an Algolia Crawler Section to doc * Update contributing.md

* add --backend to report command * update --backend help msg * Add initial report command tests * add backend arg to dag.plot * try using `plot.choose_backend(backend)` * add `dot -c` to win CIs * update help msg and remove backend in cli msg * remove backend assertion * address feedback

…loomber#825) * added and tested issue ploomber#805 * fix tepo issues based on Ido's feedback * pass test * add psycopg2-binary back * fix bugs to pass test * delete unused files * deleted fileclient/.gitignore * removed the commented debug * deleted the blank line at end of file * no message * fixed bugs with format checked * add one test * checked format with flake8 * removed one test * fixed mock called twice error * fixed test_error_unknown_example * fixed format issues * solved format issues * fixed format issues * fixed CI errors * removed debug code * avoid creation of multiple example managers and add more test cases * add `_suggest_example` and revert arg order change * refactor: move file read to constructor * use try-except to bypass exceptions * print error instead of raising exception * Update newline breaks Co-authored-by: shuyang <94rain@msn.com> Co-authored-by: shuyang <21193371+94rain@users.noreply.github.com>

* handle exception in `jupyter.manager.load_dag` * Add `test_load_dag_exception_handling` test * ensure `log.exception.call_args_list` is expected

(ploomber#920) * Fixed confusing error when passing something.html as source ploomber#646 * Failing test fixed * Error message fixed * _looks_like_file_name check added after import fails * typo fixed. test fixed. Co-authored-by: yafim <ifim.vo@gmail.com>

…mber#927) * start_method added to parallel executor * tests added to executor with valid start_method values

* Adds update to cloud notebooks. * Updates hooks for pipelines api * Updates pipelines api user guide

Co-authored-by: yafim <ifim.vo@gmail.com>

…file_client

edublancas · 2023-01-05T18:54:37Z

doc/user-guide/file_clients.rst

+
+The file clients only upload products generated by the pipeline. If you want to work with an external dataset which is in the cloud please follow the below approach:
+
+* Sql script that runs a CREATE TABLE statement to copy a public dataset.


Ok, so let's remove this three points since they're confusing and replace them with:

The file clients only upload products generated by the pipeline. If you want to work with an external dataset, you should download such a dataset in the pipeline task that uses it as input. If you need help contact us (add link to our slack)

idomic · 2023-01-09T14:30:47Z

@neelasha23 Just a comment, I think instead of rebasing you're doing a merge of the repo, what happens when you do it is:

We get tons of unrelated commits into this PR.
When reviewing, for instance your last PR, it adds tons of unrelated files which aren't relevant and makes the process unnecessarily hard.

Please avoid merging like that so we can adhere to the rest of the PRs.
Also please fix the review comments so we can mark as done.

idomic requested a review from edublancas January 4, 2023 19:25

edublancas requested changes Jan 5, 2023

View reviewed changes

Wxl19980214 and others added 24 commits January 5, 2023 16:35

remove xfail test and add to ci integration test

f25d111

remove email onboarding test (ploomber#914)

05fb9cf

remove email onboarding test

documents how to index snippets on algolia (ploomber#915)

f22a8fb

* Added an Algolia Crawler Section to doc * Update contributing.md

allow dotted paths in grid spec

7604328

allows TaskGroup to pass a placeholder in the namer arg

60f34b3

closes ploomber#698

e88b3e8

documents task name placeholders in grid

bde39ae

cache dag in jupyter plugin - closes ploomber#894

d3e7d4b

updates changelog

0d143e1

ploomber release 0.19.8

9aa106f

Bumps up ploomber to version 0.19.9dev

f710ef5

add custom papermill engine for debugging - closes ploomber#823

c42833f

updates changelog

04859ca

fixes typo in docs

7d06700

Handle exceptions in jupyter.manager.load_dag (ploomber#923)

945462b

* handle exception in `jupyter.manager.load_dag` * Add `test_load_dag_exception_handling` test * ensure `log.exception.call_args_list` is expected

fixes typo in docs - closes ploomber#933

ba74cd7

Enabling GPU context through start_method for parallel executor (ploo…

195f164

…mber#927) * start_method added to parallel executor * tests added to executor with valid start_method values

updates contributing.md

bcfc911

updates contributing.md

8e8400a

fixes markdown heading

42a9cee

style improvements

89a5ed5

edublancas and others added 21 commits January 5, 2023 16:35

adds support for zipped notebooks

01c5537

Updates cloud notebooks sections (ploomber#1047)

7f19987

* Adds update to cloud notebooks. * Updates hooks for pipelines api * Updates pipelines api user guide

Update CONTRIBUTING.md

8f0f5d0

adds note on code migration to ploomber-core

ab8ebc6

fixes CI, temporarily removes Python 3.7 from CI

2bfccc0

fixes CI, drops Pyhon 3.7 CI

0788d14

updates changelog

600f225

ploomber release 0.21.8

ebdd8ce

Bumps up ploomber to version 0.21.9dev

5e100ee

build-docs

07e1ed6

deletes reference to deprecated jupyterhub

455d383

fixes some broken links

c492e9d

adds cron job to look for broken links

2d6a0fc

inject only NotebookRunner tasks (ploomber#1056)

42779db

Co-authored-by: yafim <ifim.vo@gmail.com>

Update License (ploomber#1061)

072ff6c

Update License to --> Copyright 2022-Present Ploomber Inc.

ci hotfix

53efbad

updates changelog

8fd3b5a

ploomber release 0.21.9

3b5f553

Bumps up ploomber to version 0.21.10dev

b9a7c97

Added doc

a021725

Revert changes

02cf8c2

neelasha23 force-pushed the file_client branch from 0b3c315 to 02cf8c2 Compare January 5, 2023 11:06

neelasha23 added 4 commits January 5, 2023 16:37

Merge branch 'master' of https://github.com/neelasha23/ploomber into …

dc228c2

…file_client

changelog

787f638

license

6860577

minor change

101df6f

edublancas requested changes Jan 5, 2023

View reviewed changes

review

be92162

idomic merged commit f5a2f0e into ploomber:master Jan 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File client user guide #1060

File client user guide #1060

neelasha23 commented Jan 4, 2023

idomic commented Jan 4, 2023

neelasha23 commented Jan 4, 2023

edublancas left a comment

edublancas Jan 5, 2023

neelasha23 Jan 5, 2023

edublancas Jan 5, 2023

edublancas commented Jan 5, 2023

edublancas commented Jan 5, 2023

edublancas Jan 5, 2023

idomic commented Jan 9, 2023 •

edited


		The file clients only upload products generated by the pipeline. If you want to work with an external dataset which is in the cloud please follow the below approach:

		* Sql script that runs a CREATE TABLE statement to copy a public dataset.

File client user guide #1060

File client user guide #1060

Conversation

neelasha23 commented Jan 4, 2023

Describe your changes

Issue ticket number and link

Checklist before requesting a review

idomic commented Jan 4, 2023

neelasha23 commented Jan 4, 2023

edublancas left a comment

Choose a reason for hiding this comment

edublancas Jan 5, 2023

Choose a reason for hiding this comment

neelasha23 Jan 5, 2023

Choose a reason for hiding this comment

edublancas Jan 5, 2023

Choose a reason for hiding this comment

edublancas commented Jan 5, 2023

edublancas commented Jan 5, 2023

edublancas Jan 5, 2023

Choose a reason for hiding this comment

idomic commented Jan 9, 2023 • edited

idomic commented Jan 9, 2023 •

edited