-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix temp directory permission issue on worker side #5684
Fix temp directory permission issue on worker side #5684
Conversation
@@ -503,6 +503,7 @@ def get_or_create_tmp_dir(): | |||
os.makedirs(tmp_dir, exist_ok=True) | |||
else: | |||
tmp_dir = tempfile.mkdtemp() | |||
os.chmod(tmp_dir, 0o777) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need the workers to have write authority over this location? Could we do 0o755 instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nevermind. I just checked the docs on os.makedirs()
and the default behavior there is 0o777.
tempfile.mkdtemp sets 0o700 by default.
This looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Let me add some comments before merging |
@@ -526,6 +529,9 @@ def get_or_create_nfs_tmp_dir(): | |||
os.makedirs(tmp_nfs_dir, exist_ok=True) | |||
else: | |||
tmp_nfs_dir = tempfile.mkdtemp(dir=nfs_root_dir) | |||
# mkdtemp creates a directory with permission 0o700 | |||
# change it to be 0o777 to ensure it can be seen in spark UDF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does tmp_nfs_dir
have to be writable in spark UDF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because "shutils" makedirs also use "0o777", I keep it the same with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it has to be writable in spark UDF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it strictly needs to be writeable, but this was the preexisting behavior for the Spark model cache. To avoid regressions, I think it's best to use the same permissions level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! I asked this because can be seen
sounds like readable permission should be enough
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @WeichenXu123 !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WeichenXu123 Before merging, can we resolve os.makedirs()
permissions exceptions thrown when operating on Table ACL clusters?
# Test whether the NFS directory is writable. | ||
test_path = os.path.join(nfs_root_dir, uuid.uuid4().hex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we remove this line before the merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understood what we're trying to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll keep this line because we're now testing whether or not we can write to NFS by attempting to create a directory at test_path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use os.access(nfs_root_dir, os.W_OK)
?
ref: https://stackoverflow.com/a/2113511/6943581
or this approach doesn't necessarily guarantee that nfs_root_dir
is writable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should try to write, as is done in the current PR. The best way to test for write access is to try to write.
From the comments of https://stackoverflow.com/a/2113511/6943581:
Testing a directory for just the write bit isn't enough if you want to write files to the directory. You will need to test for the execute bit as well if you want to write into the directory. os.access('/path/to/folder', os.W_OK | os.X_OK) With os.W_OK by itself you can only delete the directory (and only if that directory is empty
If we're not careful, we might make a mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks a bunch, @WeichenXu123 !
# directory, in this case, return None representing NFS is not available. | ||
return None | ||
finally: | ||
shutil.rmtree(test_path, ignore_errors=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this line fail when test_path
doesn't exist or ignore_errors=True
swallows the exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Fortunately, it swallows:
In [11]: shutil.rmtree("/tmp/asfsadfsdf", ignore_errors=True)
In [12]: shutil.rmtree("/tmp/asfsadfsdf")
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-12-8bc109b15d91> in <module>
----> 1 shutil.rmtree("/tmp/asfsadfsdf")
~/opt/anaconda3/lib/python3.8/shutil.py in rmtree(path, ignore_errors, onerror)
704 orig_st = os.lstat(path)
705 except Exception:
--> 706 onerror(os.lstat, path, sys.exc_info())
707 return
708 try:
~/opt/anaconda3/lib/python3.8/shutil.py in rmtree(path, ignore_errors, onerror)
702 # lstat()/open()/fstat() trick.
703 try:
--> 704 orig_st = os.lstat(path)
705 except Exception:
706 onerror(os.lstat, path, sys.exc_info())
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/asfsadfsdf'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed this line doesn't fail.
* update Signed-off-by: Weichen Xu <weichen.xu@databricks.com> * update Signed-off-by: Weichen Xu <weichen.xu@databricks.com> * update Signed-off-by: Weichen Xu <weichen.xu@databricks.com> * update Signed-off-by: Weichen Xu <weichen.xu@databricks.com> * update Signed-off-by: Weichen Xu <weichen.xu@databricks.com> (cherry picked from commit bd8854f) Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
* update Signed-off-by: Weichen Xu <weichen.xu@databricks.com> * update Signed-off-by: Weichen Xu <weichen.xu@databricks.com> * update Signed-off-by: Weichen Xu <weichen.xu@databricks.com> * update Signed-off-by: Weichen Xu <weichen.xu@databricks.com> * update Signed-off-by: Weichen Xu <weichen.xu@databricks.com> (cherry picked from commit bd8854f) Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu weichen.xu@databricks.com
What changes are proposed in this pull request?
(Please fill in changes proposed in this fix)
How is this patch tested?
(Details)
Does this PR change the documentation?
ci/circleci: build_doc
check. If it's successful, proceed to thenext step, otherwise fix it.
Details
on the right to open the job page of CircleCI.Artifacts
tab.docs/build/html/index.html
.Release Notes
Is this a user-facing change?
(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes