Fix: raise soft ulimit for open files in Python runtime startup#3612
Fix: raise soft ulimit for open files in Python runtime startup#3612Ankitsinghsisodya wants to merge 4 commits intoknative:mainfrom
Conversation
Platforms with a low default soft limit (e.g. 1024) caused Python functions to fail under load. This matches the behaviour already present in the Go and Java runtimes by raising the soft limit to the hard limit at middleware startup. Fixes knative#3513
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Ankitsinghsisodya The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @Ankitsinghsisodya. Thanks for your PR. I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with Tip We noticed you've done this a few times! Consider joining the org to skip this step and gain Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Pull request overview
This PR updates the generated Python runtime startup glue code to raise the soft open-file (RLIMIT_NOFILE) limit up to the hard limit during middleware startup, aligning behavior with other runtimes and preventing “too many open files” failures under load on low-default platforms.
Changes:
- Add RLIMIT_NOFILE inspection/adjustment at startup for instanced HTTP Python functions.
- Add the same RLIMIT_NOFILE inspection/adjustment at startup for instanced CloudEvents Python functions.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| templates/python/scaffolding/instanced-http/service/main.py | Raises soft NOFILE limit to hard limit during startup for HTTP scaffolding entrypoint. |
| templates/python/scaffolding/instanced-cloudevents/service/main.py | Raises soft NOFILE limit to hard limit during startup for CloudEvents scaffolding entrypoint. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import resource | ||
| from func_python.http import serve | ||
|
|
||
| logging.basicConfig(level=logging.INFO) | ||
|
|
||
| # Raise the soft limit for open files to match the hard limit. | ||
| # Platforms such as some container runtimes default the soft limit to 1024, | ||
| # which causes failures under load. Go and Java runtimes do this | ||
| # automatically; we replicate the behaviour here. | ||
| try: | ||
| _soft, _hard = resource.getrlimit(resource.RLIMIT_NOFILE) | ||
| if _soft < _hard: | ||
| resource.setrlimit(resource.RLIMIT_NOFILE, (_hard, _hard)) | ||
| logging.info("Raised open-file limit from %d to %d", _soft, _hard) | ||
| except Exception as e: | ||
| logging.warning("Could not raise open-file limit: %s", e) |
There was a problem hiding this comment.
resource is not available on Windows, so importing it at module import time will raise ImportError and prevent local runs (and any Windows-based tooling) from starting. Consider wrapping the import resource in a try/except ImportError (or importing inside the existing try: block) and skipping the ulimit adjustment on unsupported platforms.
| import resource | |
| from func_python.http import serve | |
| logging.basicConfig(level=logging.INFO) | |
| # Raise the soft limit for open files to match the hard limit. | |
| # Platforms such as some container runtimes default the soft limit to 1024, | |
| # which causes failures under load. Go and Java runtimes do this | |
| # automatically; we replicate the behaviour here. | |
| try: | |
| _soft, _hard = resource.getrlimit(resource.RLIMIT_NOFILE) | |
| if _soft < _hard: | |
| resource.setrlimit(resource.RLIMIT_NOFILE, (_hard, _hard)) | |
| logging.info("Raised open-file limit from %d to %d", _soft, _hard) | |
| except Exception as e: | |
| logging.warning("Could not raise open-file limit: %s", e) | |
| from func_python.http import serve | |
| try: | |
| import resource | |
| except ImportError: | |
| resource = None | |
| logging.basicConfig(level=logging.INFO) | |
| # Raise the soft limit for open files to match the hard limit. | |
| # Platforms such as some container runtimes default the soft limit to 1024, | |
| # which causes failures under load. Go and Java runtimes do this | |
| # automatically; we replicate the behaviour here. | |
| if resource is not None: | |
| try: | |
| _soft, _hard = resource.getrlimit(resource.RLIMIT_NOFILE) | |
| if _soft < _hard: | |
| resource.setrlimit(resource.RLIMIT_NOFILE, (_hard, _hard)) | |
| logging.info("Raised open-file limit from %d to %d", _soft, _hard) | |
| except Exception as e: | |
| logging.warning("Could not raise open-file limit: %s", e) |
| import resource | ||
| from func_python.cloudevent import serve | ||
|
|
||
| logging.basicConfig(level=logging.INFO) | ||
|
|
||
| # Raise the soft limit for open files to match the hard limit. | ||
| # Platforms such as some container runtimes default the soft limit to 1024, | ||
| # which causes failures under load. Go and Java runtimes do this | ||
| # automatically; we replicate the behaviour here. | ||
| try: | ||
| _soft, _hard = resource.getrlimit(resource.RLIMIT_NOFILE) | ||
| if _soft < _hard: | ||
| resource.setrlimit(resource.RLIMIT_NOFILE, (_hard, _hard)) | ||
| logging.info("Raised open-file limit from %d to %d", _soft, _hard) | ||
| except Exception as e: | ||
| logging.warning("Could not raise open-file limit: %s", e) |
There was a problem hiding this comment.
resource is not available on Windows, so importing it at module import time will raise ImportError and prevent local runs (and any Windows-based tooling) from starting. Consider wrapping the import resource in a try/except ImportError (or importing inside the existing try: block) and skipping the ulimit adjustment on unsupported platforms.
| import resource | |
| from func_python.cloudevent import serve | |
| logging.basicConfig(level=logging.INFO) | |
| # Raise the soft limit for open files to match the hard limit. | |
| # Platforms such as some container runtimes default the soft limit to 1024, | |
| # which causes failures under load. Go and Java runtimes do this | |
| # automatically; we replicate the behaviour here. | |
| try: | |
| _soft, _hard = resource.getrlimit(resource.RLIMIT_NOFILE) | |
| if _soft < _hard: | |
| resource.setrlimit(resource.RLIMIT_NOFILE, (_hard, _hard)) | |
| logging.info("Raised open-file limit from %d to %d", _soft, _hard) | |
| except Exception as e: | |
| logging.warning("Could not raise open-file limit: %s", e) | |
| from func_python.cloudevent import serve | |
| try: | |
| import resource | |
| except ImportError: | |
| resource = None # type: ignore[assignment] | |
| logging.basicConfig(level=logging.INFO) | |
| # Raise the soft limit for open files to match the hard limit. | |
| # Platforms such as some container runtimes default the soft limit to 1024, | |
| # which causes failures under load. Go and Java runtimes do this | |
| # automatically; we replicate the behaviour here. | |
| if resource is not None: | |
| try: | |
| _soft, _hard = resource.getrlimit(resource.RLIMIT_NOFILE) | |
| if _soft < _hard: | |
| resource.setrlimit(resource.RLIMIT_NOFILE, (_hard, _hard)) | |
| logging.info("Raised open-file limit from %d to %d", _soft, _hard) | |
| except Exception as e: | |
| logging.warning("Could not raise open-file limit: %s", e) | |
| else: | |
| logging.info("Open-file limit adjustment is unavailable on this platform") |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3612 +/- ##
==========================================
+ Coverage 56.26% 56.28% +0.02%
==========================================
Files 180 180
Lines 20522 20543 +21
==========================================
+ Hits 11546 11563 +17
- Misses 7774 7778 +4
Partials 1202 1202
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Platforms with a low default soft limit (e.g. 1024) caused Python functions to fail under load. This matches the behaviour already present in the Go and Java runtimes by raising the soft limit to the hard limit at middleware startup. The logic lives in a dedicated _ulimit.py helper so it is testable in isolation and is not duplicated between the HTTP and CloudEvents scaffolding variants. Error handling distinguishes ImportError (non-Unix platforms where the resource module is absent) from ValueError/OSError (bad values or permission failures), so neither case silently masks the original limit. Five unit tests are added for each scaffolding variant and hack/test-python.sh is updated to run them. Fixes knative#3513
- Cap target at _MAX_NOFILE (65536) when hard == RLIM_INFINITY instead of passing RLIM_INFINITY to setrlimit, which raises OSError on most kernels - Move _configure_ulimit() inside the if __name__ == "__main__" guard so importing the module does not mutate system limits as a side effect - Add RLIM_INFINITY test case (the most common production scenario) - Set mock RLIM_INFINITY to 9223372036854775807 (actual Linux value) so the soft < hard guard is not accidentally short-circuited Addresses review feedback on knative#3513
pytest left .pytest_cache and __pycache__ directories inside the templates tree after local test runs. The embedded FS generator picked them up, causing check-embedded-fs to fail in CI. - Add .gitignore to both scaffolding roots to exclude these artifacts - Update hack/test-python.sh to rm -rf .pytest_cache and __pycache__ after each scaffolding test run - Regenerate zz_filesystem_generated.go with clean templates
|
Closing in favour of the correct upstream fix. The fix belongs in the middleware library, not in the scaffolding templates. Scaffolding is baked into container images at build time, so this PR would only help functions that are rebuilt after it merges — already-deployed functions would get nothing. The correct fix is in knative-extensions/func-python#83, where it lives inside serve() and benefits all deployed functions on the next library version bump. A follow-up PR to knative/func will be raised to bump the pinned func-python version in the scaffolding pyproject.toml once that PR merges and a release is cut. |
Changes
/kind bug
Fixes #3513
Release Note
Docs