Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AWS] CI/CD pipelining all the way through AWS and errors #78

Closed
ghkdqhrbals opened this issue Sep 23, 2023 · 7 comments · Fixed by #100 or #101
Closed

[AWS] CI/CD pipelining all the way through AWS and errors #78

ghkdqhrbals opened this issue Sep 23, 2023 · 7 comments · Fixed by #100 or #101
Assignees
Labels
status: help! Help to solving this problem status: question Further information is requested theme: documentation Improvements or additions to documentation type: bug Something isn't working
Milestone

Comments

@ghkdqhrbals
Copy link
Owner

ghkdqhrbals commented Sep 23, 2023

Description

Broken pipe when running CI/CD with Git workflows. Check Actions https://github.com/ghkdqhrbals/spring-chatting-server/actions/runs/6282696603

  • deploy.yaml
name: Deploy to EC2 with Docker Compose

on:
  push:
    branches:
      - main
  workflow_run:
    workflows: ["Build & Test"]
    types:
      - completed

jobs:
  deploy:
    runs-on: ubuntu-latest
    if: ${{ github.event.workflow_run.conclusion == 'success' }}
    steps:
      - name: Checkout code
        uses: actions/checkout@v2
        with:
          fetch-depth: 0  # get all history so we can checkout any branch
      - name: Get latest tag
        id: latesttag
        run: |
          LATEST_TAG=$(git describe --tags --abbrev=0)
          echo "LATEST_TAG=$LATEST_TAG" >> $GITHUB_ENV 

      # Increment version number(ex) 5.0.1 -> 5.0.2)
      # PR title contains "[patch]" -> 5.0.1 -> 5.0.2
      # PR title contains "[minor]" -> 5.0.1 -> 5.1.0
      # PR title contains "[major]" -> 5.0.1 -> 6.0.0
      - name: Increment version based on commit message
        id: increment_version
        run: |
          current_version=${LATEST_TAG#"v"}
          IFS='.' read -ra version_parts <<< "$current_version"
          
          major=${version_parts[0]}
          minor=${version_parts[1]}
          patch=${version_parts[2]}
          
          pr_title=${{ github.event.pull_request.title }}
          
          if [[ $pr_title == *"[major]"* ]]; then
            major=$(( major + 1 ))
            minor=0
            patch=0
          elif [[ $pr_title == *"[minor]"* ]]; then
            minor=$(( minor + 1 ))
            patch=0
          else
            patch=$(( patch + 1 ))
          fi
          
          new_version="$major.$minor.$patch"
          echo "NEW_VERSION=$new_version" >> $GITHUB_ENV

      - name: Create and push new tag
        run: |
          git config --global user.name 'GitHub Actions'
          git config --global user.email 'actions@github.com'
          git tag v${NEW_VERSION}
          git push origin v${NEW_VERSION}

      - name: Deploy to EC2
        env:
          PRIVATE_KEY: ${{ secrets.EC2_SSH_PRIVATE_KEY }}
          EC2_URL: ${{ secrets.EC2_URL }}
          NEW_VERSION: ${{ env.NEW_VERSION }}
        run: |
          echo "$PRIVATE_KEY" > temp_key.pem
          chmod 600 temp_key.pem
          ssh -o StrictHostKeyChecking=no -i temp_key.pem ${EC2_URL} << EOF
            cd spring-chatting-server
            git pull origin main
            sh run.sh ${NEW_VERSION}
          EOF
          rm temp_key.pem

Here's CPU Usage of EC2 instance. EC2's CPU hits 100% 😂😂😂😂

image

Assumption

Maybe beacuse of long term having connection with ec2. Since I build multiple gradle projects, docker image download, rebuild it, run 12 containers, etc., all process take so long. And this makes ec2 choose for cutting off their connection.

Is there any solution for maintaining connection with ec2 and git actions?

@ghkdqhrbals ghkdqhrbals added type: bug Something isn't working status: question Further information is requested status: help! Help to solving this problem labels Sep 23, 2023
@ghkdqhrbals
Copy link
Owner Author

To handle this situation, I'm currently isolating 1) gradle building process and 2) docker image building process into Git-workflows(Not running in EC2 instance).

With this CI/CD, I made the automated git tagging system.

So when deployment process begin,

  1. The deployment workflow gets the latest tag version
  2. Remove v prefix and separate into major, minor and patch
  3. Update the version matching your PR title and push it to github

So if v5.0.1 is the latest tag and the PR title contains **[patch]**, the operation returns an env with v5.0.2 and pushes it to github. If v5.0.2 already exists, the operation will append the commit hash to the end of the new tag, so it may be equal to v5.0.2-abf2154

  1. Run ./gradlew build --build-cache --parallel -Pversion=${new version} which build the .jar files and appending ${new version} at the end of .jar file name. Also creating Dockerfile building with .jar files

If v5.0.2 is new version tag, gradle read the new version and building **_v5.0.2.jar. After building .jar, gradle scripts create Dockerfile that copy matching .jar file (e.g. COPY /build/libs/shop-user-service-v5.0.2.jar /null/app.jar)

  1. Building all Dockerfile with composing, push all created images to ECR

ECR is quite expansive if you want to create multiple repositories. So I create only one repository, push all images tagged with ${service_name}_${new_version} to identifying them.

  1. Access to EC2 and pulling all images from ECR

When pulling all images, we should get all image name, filtering postfix name. For example, ECR has A-service_v5.0.2, A-service_v5.0.1, B-service_v5.0.2, C-service_v5.0.2. Separate each names with _ delimiter, find index[1] is exactly same as v5.0.2 , using for loop create new tags

  1. Run docker-compose-prod.yaml in background

Since I run 12 docker containers in t2.micro instance, it could be overloaded. I'm considering upgrade EC2 or horizontal scaling or EKS for running multiple servers.

@ghkdqhrbals ghkdqhrbals pinned this issue Sep 24, 2023
ghkdqhrbals added a commit that referenced this issue Sep 24, 2023
[patch] #78 add regex for pulling image from ECR
@ghkdqhrbals
Copy link
Owner Author

ghkdqhrbals commented Sep 24, 2023

gradle don't create new Dockerfile because of --build-cache option

In CI/CD step 4, they don't create Dockerfile with new version. So they don't send image to ECR.

  1. Run ./gradlew build --build-cache --parallel -Pversion=${new version} which build the .jar files and appending ${new version} at the end of .jar file name. Also creating Dockerfile building with .jar files
image

As you can see, chatting has v5.0.9 version but api-gateway has only v5.0.8 version.

To fix this issue, we have to remove --build-cache option temporary.

@ghkdqhrbals ghkdqhrbals added the theme: documentation Improvements or additions to documentation label Sep 24, 2023
ghkdqhrbals added a commit that referenced this issue Sep 24, 2023
ghkdqhrbals added a commit that referenced this issue Sep 24, 2023
@ghkdqhrbals
Copy link
Owner Author

gradle cache problem solved

After removing gradle cache, they successfully push new tagged images to ECR. But now EC2 cannot read any of those images. 😂😂

Originally I run run.sh bash script below in EC2 for pull all images.

#!/bin/bash

REPOSITORY_URL=${1}
TAG=${2}

echo "0. ECR Login";
# ECR 로그인
aws ecr get-login-password --region ap-northeast-2 | docker login --username AWS --password-stdin ${REPOSITORY_URL}

echo "1. Pull and re-tag, re-name with $TAG";
# $NEW_VERSION으로 끝나는 모든 이미지 태그를 ECR에서 가져옵니다.
images_to_pull=$(aws ecr list-images --repository-name chat --filter "tagStatus=TAGGED" --query "imageIds[?contains(imageTag, '=${TAG}')].imageTag" --output text)
echo "images_to_pull: $images_to_pull"

# 이미지 목록을 반복하며 각 이미지를 가져온 다음 새로운 태그를 설정합니다.
for image in $images_to_pull; do
  docker pull $REPOSITORY_URL:$image

  # "_"를 기준으로 태그를 분리합니다.
  image_name=$(echo $image | cut -d'_' -f1)
  new_tag=$(echo $image | cut -d'_' -f2)

  # 새로운 이미지 이름과 태그로 이미지를 다시 태그합니다.
  docker tag $REPOSITORY_URL:$image $image_name:$new_tag
done

echo "2. Remove Dangling Docker Images";

sh remove_dangling_image.sh

echo "3. Run Server and DB Container";

docker compose -f docker-compose-prod.yaml up -d

Check each steps and I found that images_to_pull=$(aws ecr list-images --repository-name chat --filter "tagStatus=TAGGED" --query "imageIds[?end-with(imageTag, '=${TAG}')].imageTag" --output text) doesn't return any tagged images from ECR.

And I know that imageIds[?end-with(imageTag, '=${TAG}')].imageTag is wrong. So I change into "imageIds[?contains(imageTag, '${TAG}')].imageTag" which can find tagged images containing ${TAG}

After I change into end-with to contain and removing =, it successfully pull all images as shown as below! 👍

image

There are still things to be configured like removing user-server_, customer-server_ prefix in TAG!

@ghkdqhrbals
Copy link
Owner Author

Untagging ECR pulled images

Adding script below, finally we can get unified tag!

...
  # Removing original tag that pull from ECR
  docker rmi $REPOSITORY_URL:$image
...
  • Finally we can pull docker image successfully and automatically!
image

This is full run.sh script that pull image from ECR, re-tagging re-naming, removing orphan images, composing up all servers.

Full run.sh script

#!/bin/bash

REPOSITORY_URL=${1}
TAG=${2}

echo "0. ECR Login";
# ECR 로그인
aws ecr get-login-password --region ap-northeast-2 | docker login --username AWS --password-stdin ${REPOSITORY_URL}

echo "1. Pull and re-tag, re-name with $TAG";
# $NEW_VERSION으로 끝나는 모든 이미지 태그를 ECR에서 가져옵니다.
images_to_pull=$(aws ecr list-images --repository-name chat --filter "tagStatus=TAGGED" --query "imageIds[?contains(imageTag, '${TAG}')].imageTag" --output text)
echo "images_to_pull: $images_to_pull"

# 이미지 목록을 반복하며 각 이미지를 가져온 다음 새로운 태그를 설정합니다.
for image in $images_to_pull; do
  docker pull $REPOSITORY_URL:$image

  # "_"를 기준으로 태그를 분리합니다.
  image_name=$(echo $image | cut -d'_' -f1)
  new_tag=$(echo $image | cut -d'_' -f2)

  # 새로운 이미지 이름과 태그로 이미지를 다시 태그합니다.
  docker tag $REPOSITORY_URL:$image $image_name:$new_tag

  # 원본 태그의 이미지를 삭제합니다.
  docker rmi $REPOSITORY_URL:$image
done

echo "2. Remove Dangling Docker Images";

sh remove_dangling_image.sh

echo "3. Run Server and DB Container";

docker compose -f docker-compose-prod.yaml up -d

@ghkdqhrbals
Copy link
Owner Author

Add docker image versioning in production server

If latest images with same image name exists, change tag to old and build image with latest.

# 존재하는 태그가 있는지 확인합니다.
if docker inspect $image_name:latest > /dev/null 2>&1; then
  # 중복된 태그가 있으면 "old" 로 태그합니다. 여러 버전을 저장할 순 있지만, 용량문제로 현재는 최신버전과 그 이전버전만 저장합니다.
  docker tag $image_name:latest $image_name:old
fi

# 새로운 이미지 이름과 태그로 이미지를 다시 태그합니다.
docker tag $REPOSITORY_URL:$image $image_name:latest

# 원본 태그의 이미지를 삭제합니다.
docker rmi $REPOSITORY_URL:$image
image image

@ghkdqhrbals
Copy link
Owner Author

Permission Error in deployment workflow

Error response from daemon: pull access denied for configuration-server, repository does not exist or may require 'docker login': denied: requested access to the resource is denied

Running in local successfully, but not in actions.

  • Local docker container lists
gyuminhwangbo@Gyuminui-MacBookPro spring-chatting-server % docker ps
docCONTAINER ID   IMAGE                             COMMAND                  CREATED             STATUS             PORTS                                                                                                         NAMES
40d27718ffc1   api-server:latest                 "/bin/sh -c 'java -j…"   11 minutes ago      Up 11 minutes      0.0.0.0:8000->8000/tcp                                                                                        api-server
f59a42a32c04   chatting-server:latest            "/bin/sh -c 'java -j…"   11 minutes ago      Up 11 minutes      0.0.0.0:8030->8030/tcp                                                                                        chatting-server
6d3ed5434a11   customer-server:latest            "/bin/sh -c 'java -j…"   11 minutes ago      Up 1 second        0.0.0.0:8020->8020/tcp                                                                                        customer-server
fdbc6ddec3a8   user-server:latest                "/bin/sh -c 'java -j…"   11 minutes ago      Up 19 seconds      0.0.0.0:8010->8010/tcp                                                                                        user-server
7fb667a6c854   configuration-server:latest       "/bin/sh -c 'java -j…"   12 minutes ago      Up 11 minutes      0.0.0.0:8888->8888/tcp                                                                                        configuration-server
bd2da1b58998   discovery-server:latest           "/bin/sh -c 'java -j…"   12 minutes ago      Up 11 minutes      0.0.0.0:8761->8761/tcp                                                                                        discovery-server
f227fd4bded5   obsidiandynamics/kafdrop          "/kafdrop.sh"            About an hour ago   Up About an hour   0.0.0.0:9010->9000/tcp                                                                                        spring-chatting-server-kafdrop-1
3981506d5dfc   confluentinc/cp-kafka:7.2.1       "/etc/confluent/dock…"   About an hour ago   Up About an hour   0.0.0.0:8097->8097/tcp, 9092/tcp                                                                              kafka1
b4c5e583aaea   rabbitmq:3-management-alpine      "docker-entrypoint.s…"   About an hour ago   Up About an hour   4369/tcp, 5671/tcp, 0.0.0.0:5672->5672/tcp, 15671/tcp, 15691-15692/tcp, 25672/tcp, 0.0.0.0:15672->15672/tcp   rabbitmq
0a1d504b7355   redis:latest                      "docker-entrypoint.s…"   About an hour ago   Up About an hour   0.0.0.0:6379->6379/tcp                                                                                        user-redis
b92614e7c578   confluentinc/cp-zookeeper:7.2.1   "/etc/confluent/dock…"   About an hour ago   Up About an hour   2181/tcp, 2888/tcp, 3888/tcp                                                                                  zookeeper
570e148d7545   postgres:12-alpine                "docker-entrypoint.s…"   About an hour ago   Up About an hour   5432/tcp, 0.0.0.0:5435->5435/tcp                                                                              user-db
7965d672885f   postgres:12-alpine                "docker-entrypoint.s…"   About an hour ago   Up About an hour   5432/tcp, 0.0.0.0:5434->5434/tcp                                                                              customer-db
4762b63646d8   postgres:12-alpine                "docker-entrypoint.s…"   About an hour ago   Up About an hour   5432/tcp, 0.0.0.0:5433->5433/tcp                                                                              chat-db

Its beacuse of the ECR login permission problem. So we have to add AWS_ACCESS_ID and PW using awscli inside EC2

@ghkdqhrbals ghkdqhrbals self-assigned this Sep 25, 2023
@ghkdqhrbals ghkdqhrbals changed the title [AWS] Broken pipe when running CI/CD with Git workflows [AWS] CI/CD pipelining all the way through AWS and bugs Sep 25, 2023
@ghkdqhrbals ghkdqhrbals changed the title [AWS] CI/CD pipelining all the way through AWS and bugs [AWS] CI/CD pipelining all the way through AWS and errors Sep 25, 2023
@ghkdqhrbals ghkdqhrbals reopened this Sep 28, 2023
@ghkdqhrbals
Copy link
Owner Author

Description

Again, even if EC2 is connected to ECRFullAccessRole, they cannot login to docker 😂 with error message below

Error response from daemon: pull access denied for chatting-server, repository does not exist or may require 'docker login': denied: requested access to the resource is denied

ghkdqhrbals added a commit that referenced this issue Sep 28, 2023
[fix] #78 reconfiguring push_to_ecr.sh
@ghkdqhrbals2 ghkdqhrbals2 added this to the CI/CD milestone Oct 2, 2023
@ghkdqhrbals ghkdqhrbals unpinned this issue Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: help! Help to solving this problem status: question Further information is requested theme: documentation Improvements or additions to documentation type: bug Something isn't working
Projects
2 participants