Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the default deployment layout #799

Merged
merged 1 commit into from
Feb 2, 2021

Conversation

ekohl
Copy link
Contributor

@ekohl ekohl commented Jul 21, 2020

This changes the default deployment layout. The main change is that MEDIA_ROOT gets its own directory. This allows limiting the file permissions in a shared Pulp 2 + Pulp 3 deployment and the SELinux file contexts. Another benefit is compatibility with django_extensions' unreferenced_files command which lists all files in MEDIA_ROOT that are not in the database.

Other paths are kept on the same absolute paths, but the diff looks bigger because they used derive from MEDIA_ROOT.

The documentation is updated to show the latest best practices.

fixes #7178

@pulpbot
Copy link
Member

pulpbot commented Jul 21, 2020

Attached issue: https://pulp.plan.io/issues/7178

@ekohl
Copy link
Contributor Author

ekohl commented Jul 21, 2020

This is the result of playing around with it and previous experience with another Django project.

@bmbouter @mikedep333 please have a look

@jlsherrill @ehelms @wbclark you may also find this interesting.

If this is agreed on, it'll also need to be reflected in pulpcore-selinux. I don't mind making that patch, but first there should be agreement.

@mikedep333
Copy link
Member

mikedep333 commented Jul 21, 2020

@ekohl I disagree with part of your design. It seems to contradict previous designs, including security (group access is good. World access is insecure.)

https://pulp.plan.io/issues/7178#change-59743

@@ -28,19 +27,20 @@

ALLOWED_HOSTS = ["*"]

MEDIA_ROOT = "/var/lib/pulp/"
DEPLOY_ROOT = Path("/var/lib/pulp")
MEDIA_ROOT = DEPLOY_ROOT / "media"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offline discussion was to change this to artifact (which already exists) and then add a migration that updates the database.

Suggested change
MEDIA_ROOT = DEPLOY_ROOT / "media"
MEDIA_ROOT = DEPLOY_ROOT / "artifact"

That means the following code needs updates:

already_in_place = file.name in [
artifact_storage_path,
os.path.join(settings.MEDIA_ROOT, artifact_storage_path),
]
is_in_artifact_storage = file.name.startswith(os.path.join(settings.MEDIA_ROOT, "artifact"))
if not already_in_place and is_in_artifact_storage:
raise ValueError(
_(
"The file referenced by the Artifact is already present in "
"Artifact storage. Files must be stored outside this location "
"prior to Artifact creation."
)
)
move = file._committed and file.name != artifact_storage_path
if move:
if not already_in_place:
file._file = TemporaryDownloadedFile(open(file.name, "rb"))
file._committed = False

Before proceeding with this, I'd like to get some more feedback if this is really desired because it looks like the entire ArtifactFileField.pre_save becomes redundant then.

Another approach would be to use a migration that looks at all models with a FileField. Then if it doesn't exist, look a directory higher and move it into MEDIA_ROOT https://github.com/django-extensions/django-extensions/blob/d747e09885e90d4c5f702f0bb4f0ea42e63f4eb0/django_extensions/management/commands/unreferenced_files.py#L27-L43 can be the basis of that.

@ekohl
Copy link
Contributor Author

ekohl commented Jul 22, 2020

Updated to at least use pathlib instead of path.py correctly so the tests show if I broke something.

@daviddavis
Copy link
Contributor

Moving to WIP due to inactivity.

@ekohl
Copy link
Contributor Author

ekohl commented Aug 27, 2020

For what it's worth: I was waiting on a reply from maintainers on how to proceed @ #799 (comment)

@ekohl
Copy link
Contributor Author

ekohl commented Sep 1, 2020

Summarizing a discussion with @mikedep333 and @dkliban. Keep me honest if I missed anything

The easiest migration plan is to keep the artifact directory inside MEDIA_ROOT. That means no application code change is needed and no DB migration is needed. Both are a good thing. Exact paths inside MEDIA_ROOT are not important to the admin, but a quick migration with minimal to no downtime is.

To support this, the following steps should be taken:

  • Upgrade pulpcore-selinux to have the correct file contexts for the new locations
  • Prepare layout (see script below)
  • Update settings
  • Restart all services
  • Remove symlink

The migration script is as follows:

#!/usr/bin/env python3
from pathlib import Path

DEPLOY_ROOT = Path('/var/lib/pulp')
LEGACY_DIR = DEPLOY_ROOT / 'artifact'
NEW_MEDIA_ROOT = DEPLOY_ROOT / 'media'
DESTINATION = NEW_MEDIA_ROOT / LEGACY_DIR.name

if LEGACY_DIR.is_dir() and not LEGACY_DIR.is_symlink():
    if DESTINATION.is_dir():
        raise SystemExit(f'Target {DESTINATION} already exists')

    NEW_MEDIA_ROOT.mkdir(exist_ok=True)
    LEGACY_DIR.rename(DESTINATION)
    LEGACY_DIR.symlink_to(DESTINATION)

This means that the files are available at both locations at the same time. There is a tiny window between the rename and symlink_to calls where content can't be found, but on a modern storage system that should be negligible.

After all services use the new settings, nothing should be using the symlink anymore and can be cleaned up.

@mikedep333
Copy link
Member

The installer team reviewed this, and we are good with the design.

@ekohl
Copy link
Contributor Author

ekohl commented Sep 2, 2020

Excellent, I'll try to find time to get Travis green on this PR.

CHANGES/7178.feature Outdated Show resolved Hide resolved
Copy link
Member

@bmbouter bmbouter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm excited about this PR. To merge it we'll also need updates to this page in the docs please: https://github.com/pulp/pulpcore/blob/master/docs/settings.rst It documents the various settings, so the new ones should be documented there please.

@mikedep333
Copy link
Member

mikedep333 commented Sep 2, 2020

To merge this, we'll also need a corresponding PR for pulp_installer that does the symlink.

EDIT: We will just tell pulp_installer users to stop the services before upgrading, instead of the symlink. We will still perform the move though.

@ekohl ekohl force-pushed the 7178-deployment-defaults branch 2 times, most recently from 28a901f to a829b3d Compare September 3, 2020 14:31
@ekohl ekohl marked this pull request as draft September 3, 2020 14:31
@ekohl
Copy link
Contributor Author

ekohl commented Sep 3, 2020

I'm converting it back to a draft since right now I'm just working on getting the build to pass. Once I figure that out, I'll look at docs.

To merge this, we'll also need a corresponding PR for pulp_installer that does the symlink.

Can I ask the pulp_installer maintainers to work on that? Every time I have to write Ansible I get frustrated and generally avoid touching it altogether if possible.

@ekohl
Copy link
Contributor Author

ekohl commented Sep 3, 2020

Ah, it turns out Django's storage system only gained support for pathlib.Path in Django 3.1. That's disappointing. Casting that to a str.

@ekohl ekohl force-pushed the 7178-deployment-defaults branch 4 times, most recently from 0ce38eb to dea594b Compare September 4, 2020 11:38
@ekohl
Copy link
Contributor Author

ekohl commented Dec 8, 2020

I think this is now ready for review.

if media_root_dev and media_root_dev != upload_temp_dir_dev:
warnings.append(
CheckWarning(
"MEDIA_ROOT and FILE_UPLOAD_TEMP_DIR are on different filesystems. "
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure if there are file moves between these two. That was the reason that they should be on the same FS, right?

Copy link
Member

@dkliban dkliban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed an installer story to go along with this change. https://pulp.plan.io/issues/8011. @ekohl Can you please take a look at it and provide feedback?

@ekohl It doesn't look like we need to make any changes to the pulpcore-selinux repository, is my understanding correct?

@ekohl
Copy link
Contributor Author

ekohl commented Dec 18, 2020

I filed an installer story to go along with this change. https://pulp.plan.io/issues/8011. @ekohl Can you please take a look at it and provide feedback?

I summarized the steps, including a minimal migration script. I'd recommend to at least link that in the issue, but possibly copy over some info.

@ekohl It doesn't look like we need to make any changes to the pulpcore-selinux repository, is my understanding correct?

pulp/pulpcore-selinux@d7305fb#diff-c0c5c879ce2e9f03d675c8a4efe177ed5f2ee6a8ff41e4a15bc7500d16ae32ef did that already.

@dkliban
Copy link
Member

dkliban commented Dec 18, 2020

I filed an installer story to go along with this change. https://pulp.plan.io/issues/8011. @ekohl Can you please take a look at it and provide feedback?

I summarized the steps, including a minimal migration script. I'd recommend to at least link that in the issue, but possibly copy over some info.

Thank you! I thought it would be easier to not do a move but simply use a hardlink. However, I have now learned that you can't hardlink a directory. So we will have the installer perform the operations in the script you provided.

@ekohl It doesn't look like we need to make any changes to the pulpcore-selinux repository, is my understanding correct?

pulp/pulpcore-selinux@d7305fb#diff-c0c5c879ce2e9f03d675c8a4efe177ed5f2ee6a8ff41e4a15bc7500d16ae32ef did that already.

@@ -14,6 +14,6 @@ class Migration(migrations.Migration):
migrations.AlterField(
model_name='upload',
name='file',
field=models.FileField(max_length=255, storage=django.core.files.storage.FileSystemStorage(location='/var/lib/pulp/upload/'), upload_to=''),
field=models.FileField(max_length=255, storage=django.core.files.storage.FileSystemStorage(location='/var/lib/pulp/upload'), upload_to=''),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this change intentional? It's in a migration that has already been released

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The default for it has changed and if I didn't modify this, Django insisted on a new migration. Since the default is at runtime only (default is not part of the database) I thought it was safe to do so. Given the user may even change the setting, it probably can't even live in the database.

Copy link
Member

@dkliban dkliban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this change! Once we merge this we should also merge the installer PR.

pulp/pulp_installer#510

@@ -32,21 +32,22 @@

ALLOWED_HOSTS = ["*"]

MEDIA_ROOT = "/var/lib/pulp/"
DEPLOY_ROOT = Path("/var/lib/pulp")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a net-new setting for Pulp? Or is setting and existing Django setting? We need docs added for it here: https://docs.pulpproject.org/pulpcore/settings.html

Copy link
Contributor Author

@ekohl ekohl Jan 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is new but note that it doesn't affect anything if you change it. Since dynaconf are loaded after setting these variables. That means that if a user sets DEPLOY_ROOT in dynaconf, you would have to set all variables that use DEPLOY_ROOT after dynaconf. That can be done if you want of course.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmbouter before I start documenting, could you weigh in if this should actually do something? Documenting a setting that doesn't change anything feels useless.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the questions (and this effort in general).

There's no need to document settings that can't be used. Let me split my answer in two parts:

  1. I had imagined DEPLOY_ROOT could be set by the user via dynaconf and if so, it would cause all of the variables that use it to also change. If that's not possible and users would just need to set N variables to effectively change /var/lib/pulp/ then I don't think there is anything to document.

  2. This does change the default value of at least MEDIA_ROOT thought right? That default should be updated in the docs if so.

Let me know if there is more I should respond with. I'm excited about this change, ty!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. I had imagined `DEPLOY_ROOT` could be set by the user via dynaconf and if so, it would cause all of the variables that use it to also change. If that's not possible and users would just need to set N variables to effectively change `/var/lib/pulp/` then I don't think there is anything to document.

I don't think this is true. Though I didn't test it, I can't see how it would work.

from pathlib import Path
DEPLOY_ROOT = Path('/var/lib/pulp')
MEDIA_PATH = DEPLOY_ROOT / 'media'
assert id(DEPLOY_ROOT) != id(MEDIA_ROOT)
assert repr(MEDIA_PATH) == "PosixPath('/var/lib/pulp/media')"

Here MEDIA_PATH is a new instance of Path. If you set a new value for DEPLOY_ROOT via dynaconf, it won't modify anything that copied its value.

Perhaps dynaconf can do something with lazy settings (https://www.dynaconf.com/envvars/#lazy-values) suggests it can do something like that), I have no idea how that would work.

That made me think that perhaps it should be _DEPLOY_ROOT but if you can use lazy values, perhaps it becomes useful again. I'm looking for advice here since I don't know dynaconf at all.

2\. This does change the default value of at least `MEDIA_ROOT` thought right? That default should be updated in the docs if so.

It does indeed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's ok, I don't think we need "one var to reroot many other vars". This PR didn't have that goal, and I won't want to introduce additional goals. What you've already got looks great.

I think just documenting the change for the defualt of MEDIA_ROOT is all we need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just documenting the change for the defualt of MEDIA_ROOT is all we need.

FYI: that's already done here:

https://github.com/pulp/pulpcore/pull/799/files#diff-1d78ba60e1efa4b7b08982e1bbaeceffefea079d402b7d58e57c0cfc7e6c78d8R58

I also took the opportunity to fix the SELinux context type.

FILE_UPLOAD_TEMP_DIR = os.path.join(MEDIA_ROOT, "tmp/")
WORKING_DIRECTORY = os.path.join(MEDIA_ROOT, "tmp/")
CHUNKED_UPLOAD_DIR = os.path.join(MEDIA_ROOT, "upload/")
FILE_UPLOAD_TEMP_DIR = DEPLOY_ROOT / "tmp"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the docs https://docs.pulpproject.org/pulpcore/settings.html probably need updates to identify their defaults being tied up with DEPLOY_ROOT.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a Pulp only setting that's being added with this PR. +1 on adding it to the docs. I've added a reference to it in the installer docs[0].

[0] https://github.com/pulp/pulp_installer/pull/510/files#diff-c26c81d8d68fbf1c0dfa6018bb2b2256d54bae4729158279ab601b6f34247fffR58

Copy link
Member

@bmbouter bmbouter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting some docs updates please.

@bmbouter
Copy link
Member

Also does this need any changes to the pulpcore-selinux fc files? https://github.com/pulp/pulpcore-selinux/blob/master/pulpcore.fc

@dkliban
Copy link
Member

dkliban commented Jan 11, 2021

Also does this need any changes to the pulpcore-selinux fc files? https://github.com/pulp/pulpcore-selinux/blob/master/pulpcore.fc

No, see #799 (comment)

@bmbouter
Copy link
Member

@ekohl @dkliban what handles moving the files on existing installations to make them consistent with the new MEDIA_ROOT location?

@ekohl
Copy link
Contributor Author

ekohl commented Jan 13, 2021

#799 (comment) summarizes the migration. https://github.com/theforeman/foreman-installer/blob/39003aaed97cb51285761463bdb84cf8bf0cdbca/hooks/pre/34-pulpcore_directory_layout.rb is what foreman-installer uses. pulp/pulp_installer#510 is still open for Pulp itself.

@ekohl
Copy link
Contributor Author

ekohl commented Jan 18, 2021

FYI: I opened #1074 which extracted a small part of this PR to cherry pick as a regression fix in 3.9. For this reason I this PR on it.

@ekohl ekohl force-pushed the 7178-deployment-defaults branch 2 times, most recently from e44ec57 to 857aa97 Compare January 18, 2021 18:20
This changes the default deployment layout. The main change is that
MEDIA_ROOT gets its own directory. This allows limiting the file
permissions in a shared Pulp 2 + Pulp 3 deployment and the SELinux file
contexts. Another benefit is compatibility with django_extensions'
unreferenced_files command which lists all files in MEDIA_ROOT that are
not in the database.

Other paths are kept on the same absolute paths, but the diff looks
bigger because they used derive from MEDIA_ROOT.

The documentation is updated to show the latest best practices.

fixes #7178
@ekohl
Copy link
Contributor Author

ekohl commented Jan 22, 2021

Rebased now that a805312 has been merged. I've dropped the database migration changes because of it.

Copy link
Member

@bmbouter bmbouter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for everything you've done for this! I really appreciate both the great contribution and patience to get it right.

@dkliban dkliban merged commit f8a8c64 into pulp:master Feb 2, 2021
@ekohl ekohl deleted the 7178-deployment-defaults branch February 2, 2021 15:45
@ekohl
Copy link
Contributor Author

ekohl commented Feb 2, 2021

I'm glad we got there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants