# Analyze container HW usage in ECS per type of workload

- Each ECS tasks has a unique task ID in form of a hash
- We perform the analysis per each region (results should be very similar if not identical)
- We have two types of log statistics
    - Table that shows us task ID with a concrete type of workload, e.g. python script or invoke target
    - Table that shows us information about HW resource usage of a given container from CloudWatch Container Insights
- Data spans 24 hour time interval
- Strategy to compute the table of HW usage per type of workload
    1. Parse the entrypoint - e.g. remove the parameters from the script/invoke calls because these will differ
    2. Group by type of entrypoint and compute average maximum utilization of RAM/CPU

In [1]:
import pandas as pd

## Load data

In [2]:
europe_task_ids = pd.read_csv("stockholm-logs-insights-results-tasks.csv")
europe_task_ids["region"] = "eu-north-1"
tokyo_task_ids = pd.read_csv("tokyo-logs-insights-results-tasks.csv")
tokyo_task_ids["region"] = "ap-northeast-1"

In [3]:
task_ids = pd.concat([europe_task_ids, tokyo_task_ids], axis=0)

In [4]:
europe_container_insights = pd.read_csv("stockholm-logs-insights-results-container-insights.csv")
europe_container_insights["region"] = "eu-north-1"
tokyo_container_insights = pd.read_csv("tokyo-logs-insights-results-container-insights.csv")
tokyo_container_insights["region"] = "ap-northeast-1"

In [5]:
container_insights = pd.concat([europe_container_insights, tokyo_container_insights], axis=0)

## Pre-process

In [6]:
task_ids.head()

Unnamed: 0,@timestamp,@message,@log,@logStream,cmd,region
0,2024-07-12 12:47:29.977,entrypoint.sh: 'mkdir /.dockerenv && invoke ru...,623860924167:/ecs/cmamp-preprod,ecs/cmamp-preprod/f2b57ebf3efd4005b85c93696a2e...,'mkdir /.dockerenv && invoke run_single_datase...,eu-north-1
1,2024-07-12 12:47:29.830,entrypoint.sh: '/app/amp/im_v2/ccxt/db/archive...,623860924167:/ecs/cmamp-preprod,ecs/cmamp-preprod/f359a05ff7a049a991f1f939da5c...,'/app/amp/im_v2/ccxt/db/archive_db_data_to_s3....,eu-north-1
2,2024-07-12 12:47:10.821,entrypoint.sh: 'mkdir /.dockerenv && invoke ru...,623860924167:/ecs/cmamp-preprod,ecs/cmamp-preprod/d8714db7525f4581941dbdd4f822...,'mkdir /.dockerenv && invoke run_single_datase...,eu-north-1
3,2024-07-12 12:44:22.772,entrypoint.sh: '/app/amp/im_v2/ccxt/db/archive...,623860924167:/ecs/cmamp-preprod,ecs/cmamp-preprod/180fcc2badc24d13a9ccd93ecef2...,'/app/amp/im_v2/ccxt/db/archive_db_data_to_s3....,eu-north-1
4,2024-07-12 12:42:18.427,entrypoint.sh: '/app/amp/im_v2/common/data/ext...,623860924167:/ecs/cmamp-preprod,ecs/cmamp-preprod/0503278ebfe7435aa0dc9307cd74...,'/app/amp/im_v2/common/data/extract/download_b...,eu-north-1


In [7]:
task_ids.shape

(3700, 6)

In [8]:
container_insights.head()

Unnamed: 0,TaskId,max(CpuUtilized),max(CpuReserved),max(MemoryUtilized),max(MemoryReserved),region
0,f073519d79404acea3d9ccd4180422e3,255.7964,256,355,1024,eu-north-1
1,814feea28885429ba67a12017c16b87b,506.5957,512,149,2048,eu-north-1
2,e1ef148de5e446ee86f0e05ca19ecccf,411.1087,512,338,2048,eu-north-1
3,0503278ebfe7435aa0dc9307cd743a0e,275.4289,512,324,1024,eu-north-1
4,744dc8f5455e42e3baeaedc0e4a2a859,256.0683,256,259,1024,eu-north-1


In [9]:
container_insights.shape

(3346, 6)

Number of rows might slightly differ since the query insights command that generated the data were executed
a few minutes apart

Check the various starts of the commands

In [10]:
task_ids["cmd"].str[:16].unique()

array(["'mkdir /.dockere", "'/app/amp/im_v2/", "'python /app/amp",
       "'amp/im_v2/commo", "'aws s3 sync s3:", "'invoke run_cros",
       "'/app/amp/datafl", "'cd /data/shared", "'invoke run_note"],
      dtype=object)

Remove single quotes and `'mkdir /.dockerenv && ` it's not informative

In [11]:
task_ids["cmd"] = task_ids["cmd"].str.strip("'")
task_ids["cmd"] = task_ids["cmd"].str.strip()

In [12]:
task_ids["cmd"] = task_ids["cmd"].str.replace("mkdir /.dockerenv && ", "")

In [13]:
task_ids.head()

Unnamed: 0,@timestamp,@message,@log,@logStream,cmd,region
0,2024-07-12 12:47:29.977,entrypoint.sh: 'mkdir /.dockerenv && invoke ru...,623860924167:/ecs/cmamp-preprod,ecs/cmamp-preprod/f2b57ebf3efd4005b85c93696a2e...,invoke run_single_dataset_qa_notebook --stage ...,eu-north-1
1,2024-07-12 12:47:29.830,entrypoint.sh: '/app/amp/im_v2/ccxt/db/archive...,623860924167:/ecs/cmamp-preprod,ecs/cmamp-preprod/f359a05ff7a049a991f1f939da5c...,/app/amp/im_v2/ccxt/db/archive_db_data_to_s3.p...,eu-north-1
2,2024-07-12 12:47:10.821,entrypoint.sh: 'mkdir /.dockerenv && invoke ru...,623860924167:/ecs/cmamp-preprod,ecs/cmamp-preprod/d8714db7525f4581941dbdd4f822...,invoke run_single_dataset_qa_notebook --stage ...,eu-north-1
3,2024-07-12 12:44:22.772,entrypoint.sh: '/app/amp/im_v2/ccxt/db/archive...,623860924167:/ecs/cmamp-preprod,ecs/cmamp-preprod/180fcc2badc24d13a9ccd93ecef2...,/app/amp/im_v2/ccxt/db/archive_db_data_to_s3.p...,eu-north-1
4,2024-07-12 12:42:18.427,entrypoint.sh: '/app/amp/im_v2/common/data/ext...,623860924167:/ecs/cmamp-preprod,ecs/cmamp-preprod/0503278ebfe7435aa0dc9307cd74...,/app/amp/im_v2/common/data/extract/download_bu...,eu-north-1


Log stream suffix matches the task hash, parse it

In [14]:
task_ids["TaskId"] = task_ids["@logStream"].str.split("/").str[-1]

## Join datasets and compute statistics

In [15]:
container_insights["max_cpu_ut_pct"] = (
    container_insights["max(CpuUtilized)"] / container_insights["max(CpuReserved)"]
) * 100

In [16]:
container_insights["max_mem_ut_pct"] = (
    container_insights["max(MemoryUtilized)"] / container_insights["max(MemoryReserved)"]
) * 100

In [17]:
hw_utilization_data = task_ids.merge(container_insights, on="TaskId")

In [18]:
hw_utilization_data.head()

Unnamed: 0,@timestamp,@message,@log,@logStream,cmd,region_x,TaskId,max(CpuUtilized),max(CpuReserved),max(MemoryUtilized),max(MemoryReserved),region_y,max_cpu_ut_pct,max_mem_ut_pct
0,2024-07-12 12:44:22.772,entrypoint.sh: '/app/amp/im_v2/ccxt/db/archive...,623860924167:/ecs/cmamp-preprod,ecs/cmamp-preprod/180fcc2badc24d13a9ccd93ecef2...,/app/amp/im_v2/ccxt/db/archive_db_data_to_s3.p...,eu-north-1,180fcc2badc24d13a9ccd93ecef28a0d,513.7388,512,216,1024,eu-north-1,100.339609,21.09375
1,2024-07-12 12:42:18.427,entrypoint.sh: '/app/amp/im_v2/common/data/ext...,623860924167:/ecs/cmamp-preprod,ecs/cmamp-preprod/0503278ebfe7435aa0dc9307cd74...,/app/amp/im_v2/common/data/extract/download_bu...,eu-north-1,0503278ebfe7435aa0dc9307cd743a0e,275.4289,512,324,1024,eu-north-1,53.794707,31.640625
2,2024-07-12 12:38:07.086,entrypoint.sh: '/app/amp/im_v2/ccxt/db/archive...,623860924167:/ecs/cmamp-preprod,ecs/cmamp-preprod/ec35f6677fa9485283bc23f90908...,/app/amp/im_v2/ccxt/db/archive_db_data_to_s3.p...,eu-north-1,ec35f6677fa9485283bc23f909086e3b,295.4143,2048,244,16384,eu-north-1,14.424526,1.489258
3,2024-07-12 12:34:38.882,entrypoint.sh: '/app/amp/im_v2/ccxt/db/archive...,623860924167:/ecs/cmamp-preprod,ecs/cmamp-preprod/583fbe372d2345bfbad5cfbabdc4...,/app/amp/im_v2/ccxt/db/archive_db_data_to_s3.p...,eu-north-1,583fbe372d2345bfbad5cfbabdc4bc49,502.6486,512,200,1024,eu-north-1,98.173555,19.53125
4,2024-07-12 12:32:47.354,entrypoint.sh: 'mkdir /.dockerenv && invoke ru...,623860924167:/ecs/cmamp-preprod,ecs/cmamp-preprod/015af97ae20a43e4b330340e7f30...,invoke run_cross_dataset_qa_notebook --stage '...,eu-north-1,015af97ae20a43e4b330340e7f300c7f,512.0856,512,691,2048,eu-north-1,100.016719,33.740234


In [19]:
hw_utilization_data.shape

(2271, 14)

In [27]:
# These are heuristical observed cases
def parse_workload(command: str) -> str:
    if command.startswith("invoke") or command.startswith("python"):
        # invoke my_invoke --arg 1 --arg2
        # ->
        # invoke my_invoke
        return " ".join(command.split(" ")[:2])
    elif command.startswith("aws s3 sync"):
        return "aws s3 sync"
    elif command.startswith("cd /data/shared/ecs_tokyo/preprod/ && tar -czf"):
        return "tar -czf"
    else:
        # /app/amp/my_script.py --arg 1
        # ->
        # /app/amp/my_script.py
        return command.split(" ")[0]

In [33]:
hw_utilization_data["task_type"] = hw_utilization_data["cmd"].apply(parse_workload) 

In [34]:
hw_utilization_data["task_type"].unique()

array(['/app/amp/im_v2/ccxt/db/archive_db_data_to_s3.py',
       '/app/amp/im_v2/common/data/extract/download_bulk.py',
       'invoke run_cross_dataset_qa_notebook',
       'invoke run_single_dataset_qa_notebook',
       '/app/amp/im_v2/ccxt/data/extract/download_exchange_data_to_db_periodically.py',
       '/app/amp/im_v2/common/data/transform/resample_rt_bid_ask_data_periodically.py',
       'python /app/amp/im_v2/binance/data/extract/download_historical_bid_ask.py',
       '/app/amp/im_v2/ccxt/data/extract/download_cryptocom_bid_ask.py',
       'amp/im_v2/common/data/transform/resample_daily_bid_ask_data.py',
       'aws s3 sync',
       '/app/amp/dataflow_amp/system/Cx/scripts/run_Cx_prod_system.py',
       'tar -czf'], dtype=object)

In [35]:
hw_utilization_data.head()

Unnamed: 0,@timestamp,@message,@log,@logStream,cmd,region_x,TaskId,max(CpuUtilized),max(CpuReserved),max(MemoryUtilized),max(MemoryReserved),region_y,max_cpu_ut_pct,max_mem_ut_pct,task_type
0,2024-07-12 12:44:22.772,entrypoint.sh: '/app/amp/im_v2/ccxt/db/archive...,623860924167:/ecs/cmamp-preprod,ecs/cmamp-preprod/180fcc2badc24d13a9ccd93ecef2...,/app/amp/im_v2/ccxt/db/archive_db_data_to_s3.p...,eu-north-1,180fcc2badc24d13a9ccd93ecef28a0d,513.7388,512,216,1024,eu-north-1,100.339609,21.09375,/app/amp/im_v2/ccxt/db/archive_db_data_to_s3.py
1,2024-07-12 12:42:18.427,entrypoint.sh: '/app/amp/im_v2/common/data/ext...,623860924167:/ecs/cmamp-preprod,ecs/cmamp-preprod/0503278ebfe7435aa0dc9307cd74...,/app/amp/im_v2/common/data/extract/download_bu...,eu-north-1,0503278ebfe7435aa0dc9307cd743a0e,275.4289,512,324,1024,eu-north-1,53.794707,31.640625,/app/amp/im_v2/common/data/extract/download_bu...
2,2024-07-12 12:38:07.086,entrypoint.sh: '/app/amp/im_v2/ccxt/db/archive...,623860924167:/ecs/cmamp-preprod,ecs/cmamp-preprod/ec35f6677fa9485283bc23f90908...,/app/amp/im_v2/ccxt/db/archive_db_data_to_s3.p...,eu-north-1,ec35f6677fa9485283bc23f909086e3b,295.4143,2048,244,16384,eu-north-1,14.424526,1.489258,/app/amp/im_v2/ccxt/db/archive_db_data_to_s3.py
3,2024-07-12 12:34:38.882,entrypoint.sh: '/app/amp/im_v2/ccxt/db/archive...,623860924167:/ecs/cmamp-preprod,ecs/cmamp-preprod/583fbe372d2345bfbad5cfbabdc4...,/app/amp/im_v2/ccxt/db/archive_db_data_to_s3.p...,eu-north-1,583fbe372d2345bfbad5cfbabdc4bc49,502.6486,512,200,1024,eu-north-1,98.173555,19.53125,/app/amp/im_v2/ccxt/db/archive_db_data_to_s3.py
4,2024-07-12 12:32:47.354,entrypoint.sh: 'mkdir /.dockerenv && invoke ru...,623860924167:/ecs/cmamp-preprod,ecs/cmamp-preprod/015af97ae20a43e4b330340e7f30...,invoke run_cross_dataset_qa_notebook --stage '...,eu-north-1,015af97ae20a43e4b330340e7f300c7f,512.0856,512,691,2048,eu-north-1,100.016719,33.740234,invoke run_cross_dataset_qa_notebook


In [36]:
hw_utilization_stats = hw_utilization_data.groupby("task_type")[["max_cpu_ut_pct","max_mem_ut_pct"]].agg(['mean', 'max'])

In [37]:
hw_utilization_stats

Unnamed: 0_level_0,max_cpu_ut_pct,max_cpu_ut_pct,max_mem_ut_pct,max_mem_ut_pct
Unnamed: 0_level_1,mean,max,mean,max
task_type,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
/app/amp/dataflow_amp/system/Cx/scripts/run_Cx_prod_system.py,29.314713,39.564175,3.971354,8.203125
/app/amp/im_v2/ccxt/data/extract/download_cryptocom_bid_ask.py,59.713353,98.944473,7.634277,8.154297
/app/amp/im_v2/ccxt/data/extract/download_exchange_data_to_db_periodically.py,76.636006,100.250703,21.662239,36.71875
/app/amp/im_v2/ccxt/db/archive_db_data_to_s3.py,60.690039,100.344199,13.356254,54.486084
/app/amp/im_v2/common/data/extract/download_bulk.py,68.380907,100.617695,21.64279,58.984375
/app/amp/im_v2/common/data/transform/resample_rt_bid_ask_data_periodically.py,46.394865,92.929473,11.505392,25.0
amp/im_v2/common/data/transform/resample_daily_bid_ask_data.py,48.934553,86.273975,54.335007,78.566081
aws s3 sync,78.475548,99.086875,16.057478,27.734375
invoke run_cross_dataset_qa_notebook,86.425432,100.077168,20.677359,82.983398
invoke run_single_dataset_qa_notebook,98.673216,100.609961,28.315666,37.255859


Necessary dependency for `to_markdown()`

In [None]:
! sudo /venv/bin/pip install tabulate

In [41]:
hw_utilization_stats.to_markdown("hw_utilizations_stats.md")

In [39]:
! sudo /venv/bin/pip install tabulate

Collecting tabulate
  Downloading tabulate-0.9.0-py3-none-any.whl (35 kB)
Installing collected packages: tabulate
Successfully installed tabulate-0.9.0
