Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

image-generator: fix image creation edge cases #369

Merged
merged 2 commits into from
Jan 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
5 changes: 3 additions & 2 deletions config/peerpods/podvm/aws-VM-image-create-job.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,10 @@ spec:
value: aws
- name: PODVM_DISTRO
value: rhel
- name: IMAGE_NAME
value: "peer-pod-ami"
# - name: INSTANCE_TYPE
# value: "t3.small" # default is t3.small, uncomment and modify if not available in your region
# - name: IMAGE_NAME
# value: "aws-podvm-image-name" # set custom image name for custom image if you wish to avoid its deletion
envFrom:
- secretRef:
name: peer-pods-secret
Expand All @@ -59,6 +59,7 @@ spec:
- -c
- |
set -e
[[ ! "${IMAGE_NAME}" ]] && UUID=$(uuidgen) && export IMAGE_NAME="peer-pod-ami-${UUID::6}" && echo "IMAGE_NAME:${IMAGE_NAME}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are not requiring IMAGE_NAME to be explicitly unset we probably need to add that to the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User shouldn't even know it's possible to set it, this variable is not mentioned in the docs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snir911 I don't understand the motivation behind the [[ ! "${IMAGE_NAME}" ]] check... what is it for ? Don't we want to enforce the name of the image to be something like peer-pod-ami-XXXXXX ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To preserve previous behavior mostly I think, for example if users wants to specify a name that would be easy for him to track later using AWS cli.
TBH i don't recall if i had better reason, but it make sense to me that if user explicitly set it we should comply

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so IMAGE_NAME is a kind of hidden API that we don't want to document for users but we still want to be usable, correct ? It would be very nice to document this in some document for developers at least, or even comments in the yaml files, because the questions in this PR clearly show that it isn't obvious.

dnf install -y make git unzip
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
Expand Down
12 changes: 6 additions & 6 deletions config/peerpods/podvm/aws-VM-image-delete-job.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,13 @@ spec:
env:
- name: PODVM_DISTRO
value: rhel
- name: IMAGE_NAME
value: "peer-pod-ami"
command:
- /bin/sh
- -c
- |
set -e
[[ ! "${PODVM_AMI_ID}" ]] && echo "PODVM_AMI_ID is missing, it's unknown which image to delete" && exit 1
[[ "${IMAGE_NAME}" ]] && echo "IMAGE_NAME:${IMAGE_NAME} is set, it implies image was not automatically created, delete it manually" && exit 0
Comment on lines +34 to +35
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to make sure it is clear in docs as to what happens with manually created images and that user is responsible for deleting them as the message states.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is custom images supported ATM? AFAIK the reason it's supported is for tech/dev-preview features only

dnf install -y unzip
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
Expand All @@ -41,8 +41,8 @@ spec:
export MAC=$(curl -m 3 -s http://169.254.169.254/latest/meta-data/mac)
[[ ! "${AWS_REGION}" ]] && export AWS_REGION=$(curl -m 30 -s --show-error http://169.254.169.254/latest/meta-data/placement/region)
[[ ! "${AWS_REGION}" ]] && echo "AWS_REGION is missing" && exit 1
export IMAGE_NAME=${IMAGE_NAME:-peer-pod-ami}
PODVM_AMI_ID=$(aws ec2 describe-images --query "Images[*].[ImageId]" --filters "Name=name,Values=${IMAGE_NAME}" --region ${AWS_REGION} --output text) && \
echo "Deleting AMI: ${PODVM_AMI_ID}"
aws ec2 deregister-image --image-id ${PODVM_AMI_ID} --region ${AWS_REGION}
echo "Deleting AMI: ${PODVM_AMI_ID} - DONE"
RES=$(aws ec2 deregister-image --image-id "${PODVM_AMI_ID}" --region "${AWS_REGION}" 2>&1) || ERR=true
echo ${RES}
[[ ${ERR} ]] && [[ "$RES" =~ InvalidAMIID\.(Unavailable|NotFound) ]] # if deregister returned error and image is already deleted, continue
echo "Deleted AMI: ${PODVM_AMI_ID} - DONE"
Comment on lines +45 to +48
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are users supposed to see the error message in the output of the job? Is that captured in a log anywhere? Also a little weird that we say the AMI is deleted when it didn't exist. I feel like there is an "if" statement missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible to see the error in the job logs, however, in this case the error means image is already deleted and there's nothing else to do which is fine, would it make more sense to change the wording to something like: "image deletion job completed"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are users supposed to see the error message in the output of the job?

Users can see these logs with oc logs but they aren't supposed to look at them. This is an internal job.

Is that captured in a log anywhere?

This should probably be captured by the OSC must-gather if it isn't the case already.

It's possible to see the error in the job logs, however, in this case the error means image is already deleted and there's nothing else to do which is fine, would it make more sense to change the wording to something like: "image deletion job completed"?

IIUC the only case where we get the InvalidAMIID\.(Unavailable|NotFound) error is when someone deleted the image automatically created by us in our back, right ? Assuming this is fine, then the job just needs to be idempotent and report something like "PODVM_AMI_ID: \"$PODVM_AMI_ID\" is deleted".

5 changes: 3 additions & 2 deletions config/peerpods/podvm/azure-VM-image-create-job.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,8 @@ spec:
value: "rhel-raw"
- name: SKU
value: "9_2"
- name: IMAGE_NAME
value: "peer-pod-vmimage"
# - name: IMAGE_NAME
# value: "azure-podvm-image-name" # set custom image name for custom image if you wish to avoid its deletion
envFrom:
- secretRef:
name: peer-pods-secret
Expand All @@ -63,6 +63,7 @@ spec:
- -c
- |
set -e
[[ ! "${IMAGE_NAME}" ]] && UUID=$(uuidgen) && export IMAGE_NAME="peer-pod-vmimage-${UUID::6}" && echo "IMAGE_NAME:${IMAGE_NAME}"
rpm --import https://packages.microsoft.com/keys/microsoft.asc
dnf install -y https://packages.microsoft.com/config/rhel/9.0/packages-microsoft-prod.rpm
dnf install -y make git azure-cli unzip
Expand Down
6 changes: 3 additions & 3 deletions config/peerpods/podvm/azure-VM-image-delete-job.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,6 @@ spec:
securityContext:
runAsUser: 0 # needed for container mode dnf access
env:
- name: IMAGE_NAME
value: "peer-pod-vmimage"
envFrom:
- secretRef:
name: peer-pods-secret
Expand All @@ -30,6 +28,8 @@ spec:
- -c
- |
set -e
[[ ! "${AZURE_IMAGE_ID}" ]] && echo "AZURE_IMAGE_ID is missing, it's unknown which image should be deleted" && exit 1
[[ "${IMAGE_NAME}" ]] && echo "IMAGE_NAME:${IMAGE_NAME} is set, it implies image was not automatically created, delete it manually " && exit 0
rpm --import https://packages.microsoft.com/keys/microsoft.asc
dnf install -y https://packages.microsoft.com/config/rhel/9.0/packages-microsoft-prod.rpm
yum install -y azure-cli
Expand All @@ -41,6 +41,6 @@ spec:
[[ ! "${AZURE_CLIENT_SECRET}" ]] && echo "AZURE_CLIENT_SECRET is missing" && exit 1
[[ ! "${AZURE_CLIENT_ID}" ]] && echo "AZURE_CLIENT_ID is missing" && exit 1
az login --service-principal --user=${AZURE_CLIENT_ID} --password=${AZURE_CLIENT_SECRET} --tenant=${AZURE_TENANT_ID} && \
az image delete --image-name "${IMAGE_NAME}" --resource-group "${AZURE_RESOURCE_GROUP}" && \
az image delete --ids "${AZURE_IMAGE_ID}" && \
echo "DONE"
restartPolicy: Never