### AWS Configuration Instructions:

1. **Create Root User (Skip if already done)**:
   - Only use the AWS root account to set up the initial environment.
   - To create a root account (if necessary):
     - Go to [AWS Account](https://aws.amazon.com/resources/create-account/) and sign up for a root user.

2. **Set Up an Admin User and IAM Role**:
   - Create an admin user with enough permissions to create role.
   
   **Steps**:
   - Navigate to **IAM** in the AWS Console.
   - Click on **Users** > **Add User**.
   - Create a user (e.g., `admin-user`) with **Programmatic access** and attach the **AdministratorAccess** policy.
   - Download the **Access Key ID** and **Secret Access Key** for AWS CLI configuration.

   This user will be used to create role and users.

3. **AWS CLI Configuration**:
    - Configure the AWS CLI for both admin and developer users:
   
     ```bash
     aws configure
     ```
    - Enter the **Access Key**, **Secret Key**, default region (e.g., `eu-north-1`), and output format (e.g., `json`).


In [14]:
## Uncomment and install libs
# %pip install --upgrade pip
# %pip install pandas==2.2.2
# %pip install xgboost-cpu==2.1.1
# %pip install scikit-learn==1.5.1
# %pip install numpy==1.24.4
# %pip install ipytest==0.14.2
# %pip install python-dotenv==1.0.1
# %pip install s3fs

In [1]:
import os
import logging
import time

%load_ext dotenv
%dotenv

logger = logging.getLogger('RoleManager')
logger.setLevel(logging.INFO)

logger = logging.getLogger('EcrManager')
logger.setLevel(logging.INFO)

logger = logging.getLogger('AWSClientManager')
logger.setLevel(logging.INFO)

if not logger.hasHandlers():
    console_handler = logging.StreamHandler()  # Log to console
    console_handler.setLevel(logging.DEBUG)  # Ensure the handler logs DEBUG messages
    formatter = logging.Formatter('%(name)s - %(levelname)s - %(message)s')
    console_handler.setFormatter(formatter)

    logger.addHandler(console_handler)

In [2]:
from IAMUserManger import IAMUserManager

from AWSClientManager import AWSClientManager
from RoleManager import RoleManager

logger = logging.getLogger('RoleManager')
logger.setLevel(logging.DEBUG)

account_id = os.environ["ACCOUNT_ID"]
region = os.environ["AWS_REGION"]
user_name = os.environ["USER_NAME"]
role_name = os.environ["ROLE_NAME"]
policy_name = os.environ["POLICY_NAME"]
bucket = os.environ["BUCKET"]
access_key = os.environ["ACCESS_KEY"]
secret_key = os.environ["SECRET_KEY"]

In [3]:
iam_user_manager = IAMUserManager()
iam_user_manager.create_user(user_name)

time.sleep(5)

role_service = RoleManager(account_id, user_name)
role_arn = role_service.create_role_and_policy(role_name, policy_name, region)
time.sleep(5)

User betonowydzik already exists.


Role "football-project-execution-role" already exists.


In [4]:
if access_key == '':
    iam_user_manager.attach_inline_policy(user_name, role_arn)
    access_key_info = iam_user_manager.create_access_key(user_name)

    access_key = access_key_info['AccessKeyId']
    secret_key = access_key_info['SecretAccessKey']
    
    if access_key_info:
        print(f"Access Key ID: {access_key_info['AccessKeyId']}")
        print(f"Secret Access Key: {access_key_info['SecretAccessKey']}")

Add access_key and secret_key in .env file under ACCESS_KEY and SECRET_KEY keys.

In [5]:
aws_client_manager = AWSClientManager(region=region, access_key_id=access_key, secret_access_key=secret_key, account_id=account_id)

In [6]:
from EcrManager import EcrManager

ecr_client = aws_client_manager.get_client('ecr', role_name)
ecr_manager = EcrManager(ecr_client)

AWSClientManager - INFO - Attempting to assume role: arn:aws:iam::284415450706:role/football-project-execution-role
AWSClientManager - INFO - Assumed role football-project-execution-role successfully.


In [7]:
processor_image_name = 'sagemaker-processing-container'
train_image_name = 'xgb-clf-training-container'

time.sleep(5)

processor_repository = ecr_manager.create_repository(processor_image_name)
train_repository = ecr_manager.create_repository(train_image_name)

time.sleep(5)

ecr_manager.put_lifecycle_policy(processor_image_name)
ecr_manager.put_lifecycle_policy(train_image_name)

In [9]:
import base64
import subprocess

auth_data = ecr_client.get_authorization_token()['authorizationData'][0]
auth_token = auth_data['authorizationToken']

username_password = base64.b64decode(auth_token).decode('utf-8')
username, password = username_password.split(':')
registry_uri = auth_data['proxyEndpoint']

auth_command = f"docker login --username {username} --password {password} {registry_uri}"
result = subprocess.run(auth_command, shell=True, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
print(f"Command executed successfully. Output:\n{result.stdout}")


Command executed successfully. Output:
Login Succeeded



In [13]:
print("Docker authenticated successfully to ECR.")
tag = ':latest'
processor_image_uri = '{}.dkr.ecr.{}.amazonaws.com/{}'.format(account_id, region, processor_image_name + tag)
print(f'Processor image name: {processor_image_uri}.')

!docker build -t $processor_image_uri ../containers/preprocessor/docker
push_command = f"docker push {processor_image_uri}"
subprocess.run(push_command, shell=True, check=True)

print(f"Docker image pushed to ECR: {processor_image_uri}")


Docker authenticated successfully to ECR.
Processor image name: 284415450706.dkr.ecr.eu-north-1.amazonaws.com/sagemaker-processing-container:latest.


#0 building with "default" instance using docker driver

#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 222B done
#1 DONE 0.0s

#2 [internal] load metadata for docker.io/library/python:3.10-slim
#2 ...

#3 [auth] library/python:pull token for registry-1.docker.io
#3 DONE 0.0s

#2 [internal] load metadata for docker.io/library/python:3.10-slim
#2 DONE 1.1s

#4 [internal] load .dockerignore
#4 transferring context: 2B done
#4 DONE 0.0s

#5 [1/4] FROM docker.io/library/python:3.10-slim@sha256:eb9ca77b1a0ffbde84c1dc333beb3490a2638813cc25a339f8575668855b9ff1
#5 DONE 0.0s

#6 [internal] load build context
#6 transferring context: 92B done
#6 DONE 0.0s

#7 [3/4] RUN pip install --user --upgrade pip
#7 CACHED

#8 [2/4] COPY requirements.txt .
#8 CACHED

#9 [4/4] RUN pip3 install -r requirements.txt
#9 CACHED

#10 exporting to image
#10 exporting layers done
#10 writing image sha256:b8bca269989bf005f9535cce0be0f840aa44dedeec1c6c41b3c1e78588229862 done
#10 naming

Docker image pushed to ECR: 284415450706.dkr.ecr.eu-north-1.amazonaws.com/sagemaker-processing-container:latest


In [10]:
tag = ':latest'
train_image_uri = '{}.dkr.ecr.{}.amazonaws.com/{}'.format(account_id, region, train_image_name + tag)
print(f'Train image name: {train_image_uri}.')

!docker build -t $train_image_uri ../containers/training/docker
push_command = f"docker push {train_image_uri}"
subprocess.run(push_command, shell=True, check=True)
print(f"Docker image pushed to ECR: {train_image_uri}")

Train image name: 284415450706.dkr.ecr.eu-north-1.amazonaws.com/xgb-clf-training-container:latest.


#0 building with "default" instance using docker driver

#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 386B done
#1 DONE 0.0s

#2 [internal] load metadata for docker.io/library/python:3.10-slim
#2 DONE 0.7s

#3 [internal] load .dockerignore
#3 transferring context: 2B done
#3 DONE 0.0s

#4 [1/6] FROM docker.io/library/python:3.10-slim@sha256:af6f1b19eae3400ea3a569ba92d4819a527be4662971d51bb798c923bba30a81
#4 DONE 0.0s

#5 [internal] load build context
#5 transferring context: 65B done
#5 DONE 0.0s

#6 [3/6] COPY requirements.txt .
#6 CACHED

#7 [4/6] RUN pip install --user --upgrade pip
#7 CACHED

#8 [5/6] RUN pip3 install -r requirements.txt
#8 CACHED

#9 [2/6] RUN apt-get -y update && apt-get install -y --no-install-recommends     python3     build-essential     libssl-dev
#9 CACHED

#10 [6/6] COPY ../train.py /opt/ml/code/train.py
#10 CACHED

#11 exporting to image
#11 exporting layers done
#11 writing image sha256:6e88f867fc154803f77dc89f00b5dbfb0f

Docker image pushed to ECR: 284415450706.dkr.ecr.eu-north-1.amazonaws.com/xgb-clf-training-container:latest


In [5]:
s3_client = aws_client_manager.get_client('s3', role_name)

AWSClientManager - INFO - Attempting to assume role: arn:aws:iam::284415450706:role/football-project-execution-role
AWSClientManager - INFO - Assumed role football-project-execution-role successfully.


In [16]:
from botocore.exceptions import ClientError

try:
    s3_client.create_bucket(
        Bucket=bucket,
        CreateBucketConfiguration={'LocationConstraint': region}
    )
    logging.info(f"S3 bucket {bucket} created successfully.")
except s3_client.exceptions.BucketAlreadyOwnedByYou:
    logging.warning(f"S3 bucket {bucket} already exists and is owned by you.")
except ClientError as e:
    logging.error(f"Error creating S3 bucket: {e}")
    raise



In [7]:
df_local_path = str(os.environ['DATA_FILEPATH_X'])
y_local_path = str(os.environ['DATA_FILEPATH_Y'])

batch_id = 'batch-20231016'

s3_client.upload_file(Filename=df_local_path, Bucket=bucket, Key=f"data/df.csv", ExtraArgs={'Metadata': {'batch-id': batch_id}})
s3_client.upload_file(Filename=y_local_path, Bucket=bucket, Key=f"data/y.csv", ExtraArgs={'Metadata': {'batch-id': batch_id}})