- A: CPU
- B: Memory
- C: Network
- D: Epochs

|         | -               | +                  |
| ------- | :-------------: | :----------------: |
| CPU     | 750m            | 1500m              |
| Memory  | 1.5Gi           | 3Gi                |
| Network | FashionMNISTCNN | FashionMNISTResNet |
| Epochs  | 5               | 10                 |

$2^{4-1}$ Experimental Design

| Run    | I     | A     | B      |C     |D     |
| :----: | :---: | :---: | :----: |:---: |:---: |
| 1 (11) | +     | -     | -      | -    | -    |
| 2 (12) | +     | +     | -      | -    | +    |
| 3 (13) | +     | -     | +      | -    | +    |
| 4 (14) | +     | +     | +      | -    | -    |
| 5 (15) | +     | -     | -      | +    | +    |
| 6 (16) | +     | +     | -      | +    | -    |
| 7 (17) | +     | -     | +      | +    | -    |
| 8 (18) | +     | +     | +      | +    | +    |

$I=ABCD$, Resolution $IV$


In [159]:
import os
import numpy as np
import pandas as pd

from tbparse import SummaryReader
import statsmodels.api as sm
from statsmodels.formula.api import ols
from pyDOE2 import *

LOG_DIR = "/home/engineer/fltk-testbed/logging"
EXP_NAME = "exp_11"
FACTORS = ["cpu", "memory", "network", "epochs"]
LEVELS = [
    ["750m", "1500m"],
    ["1.5Gi", "3Gi"],
    ["FashionMNISTCNN", "FashionMNISTResNet"],
    ["5", "10"],
]

In [160]:
doe = fracfact("A B C AB AC BC ABC")
doe = doe[:, [0, 1, 2, -1]]
doe

array([[-1., -1., -1., -1.],
       [ 1., -1., -1.,  1.],
       [-1.,  1., -1.,  1.],
       [ 1.,  1., -1., -1.],
       [-1., -1.,  1.,  1.],
       [ 1., -1.,  1., -1.],
       [-1.,  1.,  1., -1.],
       [ 1.,  1.,  1.,  1.]])

In [186]:
combinations = []
for run in doe:
    combination = []
    for i, level in enumerate(LEVELS):
        if run[i] == 1:
            combination.append(level[1])
        else:
            combination.append(level[0])
    combinations.append(combination)

combinations = np.repeat(combinations, 4, axis=0)
combinations

array([['750m', '1.5Gi', 'FashionMNISTCNN', '5'],
       ['750m', '1.5Gi', 'FashionMNISTCNN', '5'],
       ['750m', '1.5Gi', 'FashionMNISTCNN', '5'],
       ['750m', '1.5Gi', 'FashionMNISTCNN', '5'],
       ['1500m', '1.5Gi', 'FashionMNISTCNN', '10'],
       ['1500m', '1.5Gi', 'FashionMNISTCNN', '10'],
       ['1500m', '1.5Gi', 'FashionMNISTCNN', '10'],
       ['1500m', '1.5Gi', 'FashionMNISTCNN', '10'],
       ['750m', '3Gi', 'FashionMNISTCNN', '10'],
       ['750m', '3Gi', 'FashionMNISTCNN', '10'],
       ['750m', '3Gi', 'FashionMNISTCNN', '10'],
       ['750m', '3Gi', 'FashionMNISTCNN', '10'],
       ['1500m', '3Gi', 'FashionMNISTCNN', '5'],
       ['1500m', '3Gi', 'FashionMNISTCNN', '5'],
       ['1500m', '3Gi', 'FashionMNISTCNN', '5'],
       ['1500m', '3Gi', 'FashionMNISTCNN', '5'],
       ['750m', '1.5Gi', 'FashionMNISTResNet', '10'],
       ['750m', '1.5Gi', 'FashionMNISTResNet', '10'],
       ['750m', '1.5Gi', 'FashionMNISTResNet', '10'],
       ['750m', '1.5Gi', 'FashionMNIST

In [195]:
accuracies = []
for i in range(11, 19):
    try:
        path = f"{LOG_DIR}/exp_{i}/train_job_0"
        reader = SummaryReader(path, pivot=True)
        df = reader.scalars
        accuracies.append(df["accuracy per epoch"][-1:].values[0])
    except:
        print("Directory not found")
accuracies

Directory not found
Directory not found
Directory not found
Directory not found


[[89.27999877929688, 88.73999786376953, 89.22000122070312, 88.81999969482422],
 [89.22000122070312, 88.73999786376953, 89.12000274658203, 88.9000015258789],
 [89.0199966430664, 88.68000030517578, 89.66000366210938, 89.16000366210938],
 [89.04000091552734, 88.77999877929688, 88.94000244140625, 89.05999755859375]]

In [217]:
np.append(combinations[:4], np.array(accuracies[0]).reshape(4, 1), axis=1)

array([['750m', '1.5Gi', 'FashionMNISTCNN', '5', '89.27999877929688'],
       ['750m', '1.5Gi', 'FashionMNISTCNN', '5', '88.73999786376953'],
       ['750m', '1.5Gi', 'FashionMNISTCNN', '5', '89.22000122070312'],
       ['750m', '1.5Gi', 'FashionMNISTCNN', '5', '88.81999969482422']],
      dtype='<U32')

In [163]:


code_size = [3.8455, 3.8191, 3.8634, 3.5061, 3.4598, 3.5469, 3.6727, 3.6933, 3.6498, 3.7082, 3.7410, 3.6761, 3.8330, 3.8056, 3.8578,
            4.0807, 4.0717, 4.1164, 3.7095, 3.7507, 3.6635, 3.9735, 3.9510, 3.9984, 3.7492, 3.7743, 3.7127, 4.0879, 4.0790, 4.1131,
            4.4633, 4.4321, 4.4779, 3.9523, 3.9066, 3.9857, 4.2953, 4.2866, 4.3247, 4.3491, 4.3636, 4.3313, 4.4558, 4.4289, 4.4851,
            3.9958, 3.9641, 4.0015, 3.6184, 3.6291, 3.6091, 3.8506, 3.8440, 3.8246, 3.7288, 3.7585, 3.6959, 3.9914, 3.9688, 4.0100]
df = pd.DataFrame({'Code_Size': code_size,
                   'Processors': np.repeat(['W', 'X', 'Y', 'Z'], 15),
                   'Workloads': np.r_[np.repeat(['I', 'J', 'K', 'L', 'W'],3),
                                      np.repeat(['I', 'J', 'K', 'L', 'W'],3),
                                      np.repeat(['I', 'J', 'K', 'L', 'W'],3),
                                      np.repeat(['I', 'J', 'K', 'L', 'W'],3)]})


In [164]:
# rp.summary_cont(df.groupby(['Processors', 'Workloads']))['Code_Size']

In [165]:
model = ols('Code_Size ~ C(Processors)*C(Workloads)', df).fit()

# Seeing if the overall model is significant
print(f"Overall model F({model.df_model: .0f},{model.df_resid: .0f}) = {model.fvalue: .3f}, p = {model.f_pvalue: .4f}")

Overall model F( 19, 40) =  318.779, p =  0.0000


In [166]:
model.summary()

0,1,2,3
Dep. Variable:,Code_Size,R-squared:,0.993
Model:,OLS,Adj. R-squared:,0.99
Method:,Least Squares,F-statistic:,318.8
Date:,"Sat, 16 Dec 2023",Prob (F-statistic):,1.17e-37
Time:,21:06:12,Log-Likelihood:,143.76
No. Observations:,60,AIC:,-247.5
Df Residuals:,40,BIC:,-205.6
Df Model:,19,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,3.8427,0.016,246.591,0.000,3.811,3.874
C(Processors)[T.X],0.2469,0.022,11.205,0.000,0.202,0.291
C(Processors)[T.Y],0.6151,0.022,27.911,0.000,0.571,0.660
C(Processors)[T.Z],0.1445,0.022,6.555,0.000,0.100,0.189
C(Workloads)[T.J],-0.3384,0.022,-15.355,0.000,-0.383,-0.294
C(Workloads)[T.K],-0.1707,0.022,-7.747,0.000,-0.215,-0.126
C(Workloads)[T.L],-0.1342,0.022,-6.091,0.000,-0.179,-0.090
C(Workloads)[T.W],-0.0105,0.022,-0.478,0.635,-0.055,0.034
C(Processors)[T.X]:C(Workloads)[T.J],-0.0433,0.031,-1.389,0.172,-0.106,0.020

0,1,2,3
Omnibus:,2.384,Durbin-Watson:,2.945
Prob(Omnibus):,0.304,Jarque-Bera (JB):,1.481
Skew:,-0.082,Prob(JB):,0.477
Kurtosis:,2.248,Cond. No.,27.9


In [167]:
res = sm.stats.anova_lm(model, typ= 2)
res

Unnamed: 0,sum_sq,df,F,PR(>F)
C(Processors),2.929369,3.0,1340.358949,3.778409e-40
C(Workloads),1.328227,4.0,455.806817,8.894693000000001e-33
C(Processors):C(Workloads),0.154801,12.0,17.70766,2.3268e-12
Residual,0.02914,40.0,,
