Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: include parent directories when listing archive members #435

Merged
merged 1 commit into from
Aug 19, 2024

Conversation

weiiwang01
Copy link
Contributor

Zip archives may not include parent directories of files, as these are usually automatically created during extraction. This is exactly what charmcraft does when packing the charm (https://github.com/canonical/charmcraft/blob/b48ef739df4732bb3ed35d67893574b260417a04/charmcraft/utils/file.py#L68-L69). And, when Juju creates the charm manifest from this kind of charm zip file, certain directories are omitted from the manifest. This omission can result in directories from previous charm revisions not being removed during an upgrade. And the residual empty directories may cause problems, for example, an empty python package dist-info directory may cause the package metadata to malfunction.

In this pull request, updating the ArchiveMembers function to add the parent directory to the member list if it does not already exist in the zip archive. This ensures complete cleanup of directories during a charm upgrade.

It is important to note that this change will only ensure the proper cleanup of files for charms that are newly deployed, as charms that are already deployed have their manifests written to the manifest files on disk. While it is possible to modify the manifestDeployer's loadManifest function to include the missing directory entries, this adjustment would only remove residual empty directories from the most recent deployment and would not address problems in older deployments, which seems not worthy it.

Fix: https://bugs.launchpad.net/juju/+bug/2058335

QA Steps

This issue only affects charms deployed from charmhub, as locally uploaded charms are repacked by Juju, which adds the missing directories during this process.

juju deploy github-runner --revision 130 --channel latest/stable
juju refresh github-runner --revision 131

Directories such as openstacksdk-2.1.0.dist-info and openstacksdk-3.0.0.dist-info from two different revisions both exist, causing problem for openstacksdk:

juju ssh github-runner/3 ls "/var/lib/juju/agents/unit*/charm/venv"
...
openstacksdk-2.1.0.dist-info
openstacksdk-3.0.0.dist-info
...

Zip archives can omit files' parent directories, as they will be
automatically created during extraction. When listing members from
a zip archive, ensure that all parent directories are included. This
guarantees that during charm cleanup, directories are not missed.
@jujubot
Copy link
Contributor

jujubot commented Jul 15, 2024

Thanks for opening a pull request! Please follow the instructions here to ensure your pull request is ready for review. Then, a maintainer will review your patch.

@hpidcock @anvial

1 similar comment
@jujubot
Copy link
Contributor

jujubot commented Jul 15, 2024

Thanks for opening a pull request! Please follow the instructions here to ensure your pull request is ready for review. Then, a maintainer will review your patch.

@hpidcock @anvial

@hpidcock hpidcock self-assigned this Jul 23, 2024
@jameinel
Copy link
Member

I'm supportive of this idea, as it makes charms act the way zip and juju would create those .zip files. I have not done the actual validation of the patch, but the mechanism that was discussed I agree with.

@ca-scribner
Copy link

ca-scribner commented Jul 24, 2024

If this was how things always had been I'd like this mechanism a lot. But as someone who has charms deployed in the field right now that are affected by this bug, this gives me pause:

It is important to note that this change will only ensure the proper cleanup of files for charms that are newly deployed, as charms that are already deployed have their manifests written to the manifest files on disk.

I'd prefer if possible a fix (either here, or separately) that fixes the issue for charms that are already deployed (like something that could detect during refresh some of these unmanaged directories and "do the right thing" to clean them up).

As is with this fix, I don't know how to help people that have charms they've already deployed. Even with the newly patched juju, they're going to juju refresh and go into an error state. The only thing I can think of from a charmer standpoint is I can add a hack in upgrade_charm that discovers and deletes these leftover directories for myself, but that feels like a dangerous game.

@weiiwang01
Copy link
Contributor Author

If this was how things always had been I'd like this mechanism a lot. But as someone who has charms deployed in the field right now that are affected by this bug, this gives me pause:

It is important to note that this change will only ensure the proper cleanup of files for charms that are newly deployed, as charms that are already deployed have their manifests written to the manifest files on disk.

I'd prefer if possible a fix (either here, or separately) that fixes the issue for charms that are already deployed (like something that could detect during refresh some of these unmanaged directories and "do the right thing" to clean them up).

As is with this fix, I don't know how to help people that have charms they've already deployed. Even with the newly patched juju, they're going to juju refresh and go into an error state. The only thing I can think of from a charmer standpoint is I can add a hack in upgrade_charm that discovers and deletes these leftover directories for myself, but that feels like a dangerous game.

The situation here is that, let's assume a charm has rev1, rev2, and rev3 installed with the affected Juju. With the current fix rolled out to Juju, we can ensure that future refreshes like rev4 will completely clean up their files. With the approach I mentioned in the description (updating the loadManifest function to insert missing directories), we can ensure rev3's files are also completely cleaned up. However, there's no easy way to guarantee 100% reliable detect files from rev1 and rev2. Therefore, there will still be some cases where the updated loadManifest approach can't resolve the issue.

Another reason I am somewhat reluctant to modify loadManifest is that this modification is unnecessary if everything else is correct. If this change is introduced into the Juju codebase, it raises the question of when it would be appropriate to remove it from the codebase.

For fixing the charm that's already deployed, based on the cause, we know that the residual files are all empty directories. This can be fixed manually by running a command:

juju exec --application the-charm "find /var/lib/juju/agents/unit-*/charm -type d -empty -delete"

What do you think?

@ca-scribner
Copy link

To see if I understand correctly, is this correct?

  • bootstrap current juju (something that is affected by the bug)
  • juju deploy x --revision=1, where rev1 includes package-v1
    • venv contents:
      • package-v1
  • juju refresh x --revision=2, where rev2 includes package-v2
    • venv contents:
      • package-v1 (empty dir)
      • package-v2
  • juju refresh x --revision=3, where rev3 includes package-v3
    • venv contents:
      • package-v1 (empty dir)
      • package-v2 (empty dir)
      • package-v3
  • patch juju to include the proposed fix
  • juju refresh x --revision=4 where rev4 includes package-v4
    • venv contents:
      • package-v1 (empty dir)
      • package-v2 (empty dir)
      • package-v4 <-- no v3 dir at all

In other words, any refresh that occurs in the patched juju will clean up empty packages from the pre-refresh revision and prevent future revisions from creating more, but wont be able to find any leftover directories from the distant past. Does that sound right?

@ca-scribner
Copy link

ca-scribner commented Jul 26, 2024

Just to be clear, I'm not saying the perfect solution exists :) It might be the current solution is as good as can be implemented - I'm just making sure we know the pitfalls. The main issue I worry about is that, once a charm hits this error, there's no automated way out of it.

But rather than speculate about that, can you try these two concrete test cases and see how they go? If they both pass, I'm not worried

case 1 - patching before the error occurs

# With unpatched juju
juju bootstrap lxd lxd
juju add-model test1
juju deploy ubuntu
juju deploy grafana-agent --channel latest/stable --revision 134
juju relate ubuntu grafana-agent
# wait for grafana-agent to deploy and go to Blocked

# patch juju

juju refresh grafana-agent --revision 164
# wait for grafana-agent to refresh and go to Blocked.  It should not go to error.  Blocked here means everything worked

case 2 - patching after the error occurs

# With unpatched juju
juju bootstrap lxd lxd
juju add-model test1
juju deploy ubuntu
juju deploy grafana-agent --channel latest/stable --revision 134
juju relate ubuntu grafana-agent
# wait for grafana-agent to deploy and go to Blocked

juju refresh grafana-agent --revision 164
# wait for grafana-agent to refresh - it will go to Error because of the leftover package

# patch juju
juju refresh grafana-agent --revision 164
# wait for grafana-agent to refresh and go to Blocked.  It should not go to error.  Blocked here means everything worked

@ca-scribner
Copy link

fyi, the case I've seen leaves nested empty directories, eg:

venv/package/
venv/package/licenses/

The suggested human intervention of juju exec --application the-charm "find /var/lib/juju/agents/unit-*/charm -type d -empty -delete" handles this case properly though because find -delete will handle the contents of a directory before the directory itself (so it'll delete the deepest dir, then see the next one up is now empty). ty, its a good easy one-liner to tell people about!

Copy link
Member

@hpidcock hpidcock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank-you.

@hpidcock hpidcock merged commit 6b348d6 into juju:v12 Aug 19, 2024
jujubot added a commit to juju/juju that referenced this pull request Aug 19, 2024
#17939

Updates juju/charm/v12 to v12.1.1 to include fix for charm upgrade without leaving old directories.

juju/charm#435

## QA steps

```
# bootstrap lxd and add a model
$ juju deploy github-runner --revision 130 --channel latest/stable
# wait for deploy and charm to get blocked
$ juju refresh github-runner --revision 131
# wait for upgrade
$ juju ssh 0
ubuntu@juju-32b4eb-0:/var/lib/juju/agents/unit-github-runner-0/charm/venv$ ls | grep openstacksdk
openstacksdk-3.0.0.dist-info
# there should only be one directory with the prefix openstacksdk
```

## Documentation changes

N/A

## Links

**Launchpad bug:** https://bugs.launchpad.net/juju/+bug/2058335

**Jira card:** JUJU-
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants