Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation about correct usage of configure_project #3707

Merged
merged 6 commits into from
Jul 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,8 @@ Many thanks to the following Kedroids for contributing PRs to this release:
## Documentation changes
* Updated the documentation for deploying a Kedro project with Astronomer Airflow.
* Used `kedro-sphinx-theme` for documentation.

* Add mentions about correct usage of `configure_project` with `multiprocessing`.
*
# Release 0.19.4

## Major features and improvements
Expand Down Expand Up @@ -82,6 +83,7 @@ Many thanks to the following Kedroids for contributing PRs to this release:
* Added documentation on best practices for testing nodes and pipelines.
* Clarified docs around using custom resolvers without a full Kedro project.


## Community contributions
Many thanks to the following Kedroids for contributing PRs to this release:
* [ondrejzacha](https://github.com/ondrejzacha)
Expand Down
5 changes: 5 additions & 0 deletions docs/source/kedro_project_setup/session.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,3 +63,8 @@
### `configure_project`

This function reads `settings.py` and `pipeline_registry.py` and registers the configuration before Kedro's run starts. If you have a packaged Kedro project, you only need to run `configure_project` before executing your pipeline.

#### ValueError: Package name not found

Check warning on line 67 in docs/source/kedro_project_setup/session.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/kedro_project_setup/session.md#L67

[Kedro.headings] 'ValueError: Package name not found' should use sentence-style capitalization.
Raw output
{"message": "[Kedro.headings] 'ValueError: Package name not found' should use sentence-style capitalization.", "location": {"path": "docs/source/kedro_project_setup/session.md", "range": {"start": {"line": 67, "column": 6}}}, "severity": "WARNING"}
> ValueError: Package name not found. Make sure you have configured the project using 'bootstrap_project'. This should happen automatically if you are using Kedro command line interface.

Check warning on line 68 in docs/source/kedro_project_setup/session.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/kedro_project_setup/session.md#L68

[Kedro.Spellings] Did you really mean 'bootstrap_project'?
Raw output
{"message": "[Kedro.Spellings] Did you really mean 'bootstrap_project'?", "location": {"path": "docs/source/kedro_project_setup/session.md", "range": {"start": {"line": 68, "column": 88}}}, "severity": "WARNING"}

If you are using `multiprocessing`, you need to be careful about this. Depending on your Operating System, you may have [different default](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods). If the processes are `spawn`, Python will re-import all the modules in each process and thus you need to run `configure_project` again at the start of the new process. For example, this is how Kedro handle this in `ParallelRunner`(https://github.com/kedro-org/kedro/blob/9e883e6a0ba40e3db4497b234dcb3801258e8396/kedro/runner/parallel_runner.py#L84-L85)
noklam marked this conversation as resolved.
Show resolved Hide resolved