Improved pre-migration memory consumption #236
Conversation
|
Might not be enough, however the failure seems to point to bulk_create failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
Katello has reproducer and testing it. So I'll leave it as draft until it's confirmed that the issue is resolved. |
|
This didn't seem to prevent an OOM, but this time postgresql was killed (which may or may not mean anything). I may need to adjust my postgresql settings a bit (i had to increase them to get the customer db to load, its possible i need a bit more memory or might need to reduce how much memory postgresql was taking.) I'll re-test and monitor how much memory the rq workers are actually taking! |
pulp_2to3_migration/app/constants.py
Outdated
| @@ -8,3 +8,5 @@ | |||
| } | |||
|
|
|||
| NOT_USED = 'Not Used' | |||
|
|
|||
| BULK_CREATE_BATCH_SIZE = 1000 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that this change is not strictly related to the PR, however what do you think about renaming BULK_CREATE_BATCH_SIZE to just BATCH_SIZE and also update `bulk_update calls to use this constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
|
With some additional runs and different findings, here was the memory usage of the rq worker doing the migration, as a percentage of overall memory, taken every 10 seconds: That last reading was 2324444 kB of memory (RSS). Checking the oom for this PID, i see: Killed process 18341 (rq) total-vm:2981060kB, anon-rss:2322684kB, file-rss:0kB, shmem-rss:0kB So it seems like the process spiked heavily in a very short time, within 10-20 seconds. |
|
Also, i examined the postgresql logs right before the OOM and saw some very very large queries. Here is the last one trimmed to the first 400 characters. 2020-09-17 17:38:02 UTC LOG: duration: 5625.576 ms statement: INSERT INTO "pulp_2to3_migration_pulp2rpm" ("pulp_id", "pulp_created", "pulp_last_updated", "pulp2content_id", "name", "epoch", "version", "release", "arch", "checksum", "checksumtype", "repodata", "is_modular", "size", "filename") VALUES ('185b2e88-6123-4ac3-ac43-fa0cc4ca45d4'::uuid, '2020-09-17T17:37:55.487155+00:00'::timestamptz, ' the length of that one line in the log is: 267,818,406 I did notice that 'repodata' is in there, i assume that is what is taking up all the data. Does pulp3 still use this information? I don't see it in the rpm_package table |
|
Another note, i did see postgresql jump up quite a bit on one run: postgres 27415 11.5 17.2 2690156 2385480 ? Ss 19:02 6:28 postgres: pulp pulpcore ::1(54378) INSERT (so this one process is taking 17% of memory) |
Migration uses this information to construct Pulp 3 objects, not everything is available in a parsed form in Pulp 2. |
|
I'm looking into where the memory consumption can spike + I think we might want to save pulp_2to3_migration_pulp2rpm specifically in a smaller batches than the rest. |
|
I pushed a second commit which should decrease memory consumption at pre-migration time. |
| ), | ||
| ] | ||
|
|
||
| # TODO: data migration - for consistency and in case some content hasn't been migrated before the upgrade. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This migration doesn't contain data migration. Suitable for tests/experiments only.
| @@ -124,8 +124,6 @@ def pre_migrate_content_detail(cls, content_batch): | |||
| *cls.unit_fields) | |||
| pulp2rpm_to_save = [] | |||
| for rpm in pulp2_content_batch: | |||
| rpm['repodata'] = decompress_repodata(rpm['repodata']) | |||
|
|
|||
| pulp2rpm_to_save.append( | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes a lot of sense, it also has the benefit that decompression will happen at publish time, so the parallelism can help with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@goosemania LGTM pending data migration / any final changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dralley , I believe it happens earlier, at content migration time. We use repodata from pulp 2 to create a Package instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Derp, yeah, I think I realized that at some point and then promptly forgot it.
|
This helped quite a bit and i was able to get past the pre-migration step! |
|
@jlsherrill From your wording I assume you hit another issue later on? |
|
@dralley yeah, i ended up hitting this: https://pulp.plan.io/issues/7538 and then manually hacked around it, and then hit this: https://pulp.plan.io/issues/7540 Neither are related to this change specifically |
| pkg.primary_template_gz = gzip.zlib.compress(pkg.repodata.get('other').encode()) | ||
| pkg.primary_template_gz = gzip.zlib.compress(pkg.repodata.get('filelists').encode()) | ||
| pkg.repodata = '' | ||
| pkg.save() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testing, likely need to switch to bulk_update
8b32b1a
to
fbd3787
Compare
| pkg.primary_template_gz = gzip.zlib.compress(pkg.repodata.get('primary').encode()) | ||
| pkg.other_template_gz = gzip.zlib.compress(pkg.repodata.get('other').encode()) | ||
| pkg.filelists_template_gz = gzip.zlib.compress(pkg.repodata.get('filelists').encode()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR: This is expensive and takes a lot of time. 20K RPMs migrate 2-3 mins. The majority of them will never be used. Is it worth migrating only those which will be used?
Pulp2Rpm and Pulp2Srpm currently have repodata in uncompressed form in the repodata field. This migration compresses it and splits into 3 fields, one for each type.
Repodata field is only used to create a Package object in Pulp3. When Pulp2Rpm object is migrated, and created package is linked to it, I believe Pulp2Rpm.repodata will never be used again. Should we just set all *_template_gz to null in such case and that's it? Am I forgetting any corner case when Package can disappear from the Pulp2Rpm relation?
Why is this migration needed? - for the case when user started RPM migration and it didn't fully finish (e.g. OOM happened), then they are upgrading (because they need this memory fix) and trigger RPM migration again. Those items which were already premigrated but not migrated yet, need this data migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid a need for such migrations, I think we might need a reset or force option. For the case when something went south, user is able to start again. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how will you identify which packages will be used?
+1 on unsetting the repodata, git grep did not show any other usage of it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what will that reset or force option do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how will you identify which packages will be used?
Pulp2Rpm.objects.filter(pulp2content__pulp3_content=None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid a need for such migrations, I think we might need a
resetorforceoption. For the case when something went south, user is able to start again. Thoughts?
I am in favour of adding such option, as you mentioned we should meet and brainstorm
how will you identify which packages will be used?
Pulp2Rpm.objects.filter(pulp2content__pulp3_content=None)
I think it makes sense to filter only the needed content, this will improve the speed of the migration.
Speaking of whether to remove the 'repodata' field - i would rather let it be in case in the future, for some reason, we might need to get to that info. Yes, some of the content will have empty repodata and other not but I do not see this as big issue. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ipanova , after a discussion with @dralley , I think I'm convinced that we can leave this migration as is. It doesn't affect katello users, since it's only for RPM content and RPM migration is not a part of katello yet. As for pure pulp users, there will be very few of them affected (upgrade in the middle of broken migration) and only if their setup is very large. So we go with consistency and migrate all data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@goosemania I'm not sure if it warrants much more thought, but I'm going to test your PR, and I'm going to try a thread pool. I've seen some blog posts to suggest that the Gzip module bypasses the global interpreter lock so compressing / decompressing multiple things in true parallel is possible. But I approve of the current approach regardless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dralley , to be honest, I think for the amount of cases we discussed, it's likely not worth the effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@goosemania On my dev laptop it only took about 30 seconds to run the django migration with 20k RPMs, but I imagine that will be a fair bit faster than the majority of production boxes. I consider it a good sign though. Approved.
| pkg.primary_template_gz = gzip.zlib.compress(pkg.repodata.get('primary').encode()) | ||
| pkg.other_template_gz = gzip.zlib.compress(pkg.repodata.get('other').encode()) | ||
| pkg.filelists_template_gz = gzip.zlib.compress(pkg.repodata.get('filelists').encode()) | ||
| pkg.repodata = '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Added batch_size to all the bulk_create calls. Decrease the size of the content batch as well. closes #7490 https://pulp.plan.io/issues/7490
Pre-migrate compressed repodata for RPMs to save some memory.
Added batch_size to all the bulk_create calls.
Decrease the size of the content batch as well.
closes #7490
https://pulp.plan.io/issues/7490