-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change the default deployment layout #799
Conversation
Attached issue: https://pulp.plan.io/issues/7178 |
This is the result of playing around with it and previous experience with another Django project. @bmbouter @mikedep333 please have a look @jlsherrill @ehelms @wbclark you may also find this interesting. If this is agreed on, it'll also need to be reflected in pulpcore-selinux. I don't mind making that patch, but first there should be agreement. |
a9502fb
to
8a1e0ae
Compare
@ekohl I disagree with part of your design. It seems to contradict previous designs, including security (group access is good. World access is insecure.) |
8a1e0ae
to
451bb22
Compare
pulpcore/app/settings.py
Outdated
@@ -28,19 +27,20 @@ | |||
|
|||
ALLOWED_HOSTS = ["*"] | |||
|
|||
MEDIA_ROOT = "/var/lib/pulp/" | |||
DEPLOY_ROOT = Path("/var/lib/pulp") | |||
MEDIA_ROOT = DEPLOY_ROOT / "media" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offline discussion was to change this to artifact
(which already exists) and then add a migration that updates the database.
MEDIA_ROOT = DEPLOY_ROOT / "media" | |
MEDIA_ROOT = DEPLOY_ROOT / "artifact" |
That means the following code needs updates:
pulpcore/pulpcore/app/models/fields.py
Lines 47 to 66 in 09d05da
already_in_place = file.name in [ | |
artifact_storage_path, | |
os.path.join(settings.MEDIA_ROOT, artifact_storage_path), | |
] | |
is_in_artifact_storage = file.name.startswith(os.path.join(settings.MEDIA_ROOT, "artifact")) | |
if not already_in_place and is_in_artifact_storage: | |
raise ValueError( | |
_( | |
"The file referenced by the Artifact is already present in " | |
"Artifact storage. Files must be stored outside this location " | |
"prior to Artifact creation." | |
) | |
) | |
move = file._committed and file.name != artifact_storage_path | |
if move: | |
if not already_in_place: | |
file._file = TemporaryDownloadedFile(open(file.name, "rb")) | |
file._committed = False |
Before proceeding with this, I'd like to get some more feedback if this is really desired because it looks like the entire ArtifactFileField.pre_save
becomes redundant then.
Another approach would be to use a migration that looks at all models with a FileField
. Then if it doesn't exist, look a directory higher and move it into MEDIA_ROOT
https://github.com/django-extensions/django-extensions/blob/d747e09885e90d4c5f702f0bb4f0ea42e63f4eb0/django_extensions/management/commands/unreferenced_files.py#L27-L43 can be the basis of that.
451bb22
to
16ae3f5
Compare
Updated to at least use |
Moving to WIP due to inactivity. |
For what it's worth: I was waiting on a reply from maintainers on how to proceed @ #799 (comment) |
Summarizing a discussion with @mikedep333 and @dkliban. Keep me honest if I missed anything The easiest migration plan is to keep the To support this, the following steps should be taken:
The migration script is as follows: #!/usr/bin/env python3
from pathlib import Path
DEPLOY_ROOT = Path('/var/lib/pulp')
LEGACY_DIR = DEPLOY_ROOT / 'artifact'
NEW_MEDIA_ROOT = DEPLOY_ROOT / 'media'
DESTINATION = NEW_MEDIA_ROOT / LEGACY_DIR.name
if LEGACY_DIR.is_dir() and not LEGACY_DIR.is_symlink():
if DESTINATION.is_dir():
raise SystemExit(f'Target {DESTINATION} already exists')
NEW_MEDIA_ROOT.mkdir(exist_ok=True)
LEGACY_DIR.rename(DESTINATION)
LEGACY_DIR.symlink_to(DESTINATION) This means that the files are available at both locations at the same time. There is a tiny window between the After all services use the new settings, nothing should be using the symlink anymore and can be cleaned up. |
The installer team reviewed this, and we are good with the design. |
Excellent, I'll try to find time to get Travis green on this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm excited about this PR. To merge it we'll also need updates to this page in the docs please: https://github.com/pulp/pulpcore/blob/master/docs/settings.rst It documents the various settings, so the new ones should be documented there please.
To merge this, we'll also need a corresponding PR for pulp_installer that does the symlink. EDIT: We will just tell pulp_installer users to stop the services before upgrading, instead of the symlink. We will still perform the move though. |
28a901f
to
a829b3d
Compare
I'm converting it back to a draft since right now I'm just working on getting the build to pass. Once I figure that out, I'll look at docs.
Can I ask the pulp_installer maintainers to work on that? Every time I have to write Ansible I get frustrated and generally avoid touching it altogether if possible. |
a829b3d
to
169971f
Compare
Ah, it turns out Django's storage system only gained support for |
0ce38eb
to
dea594b
Compare
I think this is now ready for review. |
if media_root_dev and media_root_dev != upload_temp_dir_dev: | ||
warnings.append( | ||
CheckWarning( | ||
"MEDIA_ROOT and FILE_UPLOAD_TEMP_DIR are on different filesystems. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure if there are file moves between these two. That was the reason that they should be on the same FS, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filed an installer story to go along with this change. https://pulp.plan.io/issues/8011. @ekohl Can you please take a look at it and provide feedback?
@ekohl It doesn't look like we need to make any changes to the pulpcore-selinux repository, is my understanding correct?
I summarized the steps, including a minimal migration script. I'd recommend to at least link that in the issue, but possibly copy over some info.
pulp/pulpcore-selinux@d7305fb#diff-c0c5c879ce2e9f03d675c8a4efe177ed5f2ee6a8ff41e4a15bc7500d16ae32ef did that already. |
Thank you! I thought it would be easier to not do a move but simply use a hardlink. However, I have now learned that you can't hardlink a directory. So we will have the installer perform the operations in the script you provided.
|
@@ -14,6 +14,6 @@ class Migration(migrations.Migration): | |||
migrations.AlterField( | |||
model_name='upload', | |||
name='file', | |||
field=models.FileField(max_length=255, storage=django.core.files.storage.FileSystemStorage(location='/var/lib/pulp/upload/'), upload_to=''), | |||
field=models.FileField(max_length=255, storage=django.core.files.storage.FileSystemStorage(location='/var/lib/pulp/upload'), upload_to=''), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this change intentional? It's in a migration that has already been released
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The default for it has changed and if I didn't modify this, Django insisted on a new migration. Since the default is at runtime only (default is not part of the database) I thought it was safe to do so. Given the user may even change the setting, it probably can't even live in the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this change! Once we merge this we should also merge the installer PR.
@@ -32,21 +32,22 @@ | |||
|
|||
ALLOWED_HOSTS = ["*"] | |||
|
|||
MEDIA_ROOT = "/var/lib/pulp/" | |||
DEPLOY_ROOT = Path("/var/lib/pulp") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a net-new setting for Pulp? Or is setting and existing Django setting? We need docs added for it here: https://docs.pulpproject.org/pulpcore/settings.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is new but note that it doesn't affect anything if you change it. Since dynaconf are loaded after setting these variables. That means that if a user sets DEPLOY_ROOT
in dynaconf, you would have to set all variables that use DEPLOY_ROOT
after dynaconf. That can be done if you want of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bmbouter before I start documenting, could you weigh in if this should actually do something? Documenting a setting that doesn't change anything feels useless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the questions (and this effort in general).
There's no need to document settings that can't be used. Let me split my answer in two parts:
-
I had imagined
DEPLOY_ROOT
could be set by the user via dynaconf and if so, it would cause all of the variables that use it to also change. If that's not possible and users would just need to set N variables to effectively change/var/lib/pulp/
then I don't think there is anything to document. -
This does change the default value of at least
MEDIA_ROOT
thought right? That default should be updated in the docs if so.
Let me know if there is more I should respond with. I'm excited about this change, ty!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. I had imagined `DEPLOY_ROOT` could be set by the user via dynaconf and if so, it would cause all of the variables that use it to also change. If that's not possible and users would just need to set N variables to effectively change `/var/lib/pulp/` then I don't think there is anything to document.
I don't think this is true. Though I didn't test it, I can't see how it would work.
from pathlib import Path
DEPLOY_ROOT = Path('/var/lib/pulp')
MEDIA_PATH = DEPLOY_ROOT / 'media'
assert id(DEPLOY_ROOT) != id(MEDIA_ROOT)
assert repr(MEDIA_PATH) == "PosixPath('/var/lib/pulp/media')"
Here MEDIA_PATH
is a new instance of Path
. If you set a new value for DEPLOY_ROOT
via dynaconf, it won't modify anything that copied its value.
Perhaps dynaconf can do something with lazy settings (https://www.dynaconf.com/envvars/#lazy-values) suggests it can do something like that), I have no idea how that would work.
That made me think that perhaps it should be _DEPLOY_ROOT
but if you can use lazy values, perhaps it becomes useful again. I'm looking for advice here since I don't know dynaconf at all.
2\. This does change the default value of at least `MEDIA_ROOT` thought right? That default should be updated in the docs if so.
It does indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's ok, I don't think we need "one var to reroot many other vars". This PR didn't have that goal, and I won't want to introduce additional goals. What you've already got looks great.
I think just documenting the change for the defualt of MEDIA_ROOT
is all we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just documenting the change for the defualt of
MEDIA_ROOT
is all we need.
FYI: that's already done here:
I also took the opportunity to fix the SELinux context type.
FILE_UPLOAD_TEMP_DIR = os.path.join(MEDIA_ROOT, "tmp/") | ||
WORKING_DIRECTORY = os.path.join(MEDIA_ROOT, "tmp/") | ||
CHUNKED_UPLOAD_DIR = os.path.join(MEDIA_ROOT, "upload/") | ||
FILE_UPLOAD_TEMP_DIR = DEPLOY_ROOT / "tmp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the docs https://docs.pulpproject.org/pulpcore/settings.html probably need updates to identify their defaults being tied up with DEPLOY_ROOT
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a Pulp only setting that's being added with this PR. +1 on adding it to the docs. I've added a reference to it in the installer docs[0].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting some docs updates please.
Also does this need any changes to the pulpcore-selinux fc files? https://github.com/pulp/pulpcore-selinux/blob/master/pulpcore.fc |
No, see #799 (comment) |
#799 (comment) summarizes the migration. https://github.com/theforeman/foreman-installer/blob/39003aaed97cb51285761463bdb84cf8bf0cdbca/hooks/pre/34-pulpcore_directory_layout.rb is what foreman-installer uses. pulp/pulp_installer#510 is still open for Pulp itself. |
FYI: I opened #1074 which extracted a small part of this PR to cherry pick as a regression fix in 3.9. For this reason I this PR on it. |
e44ec57
to
857aa97
Compare
This changes the default deployment layout. The main change is that MEDIA_ROOT gets its own directory. This allows limiting the file permissions in a shared Pulp 2 + Pulp 3 deployment and the SELinux file contexts. Another benefit is compatibility with django_extensions' unreferenced_files command which lists all files in MEDIA_ROOT that are not in the database. Other paths are kept on the same absolute paths, but the diff looks bigger because they used derive from MEDIA_ROOT. The documentation is updated to show the latest best practices. fixes #7178
857aa97
to
d5252ed
Compare
Rebased now that a805312 has been merged. I've dropped the database migration changes because of it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for everything you've done for this! I really appreciate both the great contribution and patience to get it right.
I'm glad we got there. |
This changes the default deployment layout. The main change is that MEDIA_ROOT gets its own directory. This allows limiting the file permissions in a shared Pulp 2 + Pulp 3 deployment and the SELinux file contexts. Another benefit is compatibility with django_extensions' unreferenced_files command which lists all files in MEDIA_ROOT that are not in the database.
Other paths are kept on the same absolute paths, but the diff looks bigger because they used derive from MEDIA_ROOT.
The documentation is updated to show the latest best practices.
fixes #7178