<br />
<div align="center">
  <a href="https://deepwok.github.io/">
    <img src="../imgs/deepwok.png" alt="Logo" width="160" height="160">
  </a>

  <h1 align="center">Lab 4 for Advanced Deep Learning Systems (ADLS) - Software Stream</h1>

  <p align="center">
    ELEC70109/EE9-AML3-10/EE9-AO25
    <br />
		Written by
    <a href="https://aaron-zhao123.github.io/">Aaron Zhao, Cheng Zhang, Pedro Gimenes </a>
  </p>
</div>

# General introduction
In this lab, you will learn how to use the search functionality in the software stack of MASE to implement a Network Architecture Search.

There are in total 4 tasks you would need to finish, there is also 1 optional task.

What is Network Architecture Search?

## A Handwritten JSC Network

We follow a similar procedure of what you have tried in lab3 to setup the dataset, copy and paste the following code snippet to a file, and name it lab4.py.

In [88]:
import sys
import logging
import os
from pathlib import Path
from pprint import pprint as pp

# figure out the correct path
machop_path = Path(".").resolve().parent.parent /"machop"
assert machop_path.exists(), "Failed to find machop at: {}".format(machop_path)
sys.path.append(str(machop_path))

from chop.dataset import MaseDataModule, get_dataset_info
from chop.tools.logger import set_logging_verbosity, get_logger

from chop.passes.graph.analysis import (
    report_node_meta_param_analysis_pass,
    profile_statistics_analysis_pass,
)
from chop.passes.graph import (
    add_common_metadata_analysis_pass,
    init_metadata_analysis_pass,
    add_software_metadata_analysis_pass,
)
from torch import nn
from chop.passes.graph.utils import get_parent_name
from chop.passes import report_graph_analysis_pass
from chop.passes import report_graph_analysis_pass

from chop.tools.get_input import InputGenerator
from chop.ir.graph.mase_graph import MaseGraph

from chop.models import get_model_info, get_model

set_logging_verbosity("info")

logger = get_logger("chop")
logger.setLevel(logging.INFO)

batch_size = 8
model_name = "jsc-tiny"
dataset_name = "jsc"


data_module = MaseDataModule(
    name=dataset_name,
    batch_size=batch_size,
    model_name=model_name,
    num_workers=0,
)
data_module.prepare_data()
data_module.setup()

model_info = get_model_info(model_name)
model = get_model(
    model_name,
    task="cls",
    dataset_info=data_module.dataset_info,
    pretrained=False,
    checkpoint = None)

input_generator = InputGenerator(
    data_module=data_module,
    model_info=model_info,
    task="cls",
    which_dataloader="train",
)

dummy_in = {"x": next(iter(data_module.train_dataloader()))[0]}

# create the graph
_ = model(**dummy_in)

# generate the mase graph and initialize node metadata
mg = MaseGraph(model=model)

[32mINFO    [0m [34mSet logging level to info[0m


In [89]:
print(mg.modules['seq_blocks'])

Module(
  (0): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (1): ReLU(inplace=True)
  (2): Linear(in_features=16, out_features=5, bias=True)
  (3): ReLU(inplace=True)
)


This time we are going to use a slightly different network, so we define it as a Pytorch model, copy and paste this snippet also to `lab4.py`.

**Note**

MASE integrates seamlessly with native Pytorch models.

In [94]:
from torch import nn
from chop.passes.graph.utils import get_parent_name
from chop.passes import report_graph_analysis_pass

# define a new model
class JSC_Three_Linear_Layers(nn.Module):
    def __init__(self):
        super(JSC_Three_Linear_Layers, self).__init__()
        self.seq_blocks = nn.Sequential(
            nn.BatchNorm1d(16),  # 0
            nn.ReLU(16),  # 1
            nn.Linear(16, 16),  # linear  2
            nn.Linear(16, 16),  # linear  3
            nn.Linear(16, 5),   # linear  4
            nn.ReLU(5),  # 5
        )

    def forward(self, x):
        return self.seq_blocks(x)


model = JSC_Three_Linear_Layers()

# generate the mase graph and initialize node metadata
mg = MaseGraph(model=model)
mg, _ = init_metadata_analysis_pass(mg, None)

# report the graph
print(mg.modules['seq_blocks'])

Module(
  (0): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (1): ReLU(inplace=True)
  (2): Linear(in_features=16, out_features=16, bias=True)
  (3): Linear(in_features=16, out_features=16, bias=True)
  (4): Linear(in_features=16, out_features=5, bias=True)
  (5): ReLU(inplace=True)
)


## Model Architecture Modification as a Transformation Pass

Similar to what you have done in `lab2`, one can also implement a change in model architecture as a transformation pass:

In [95]:
def instantiate_linear(in_features, out_features, bias):
    if bias is not None:
        bias = True
    return nn.Linear(
        in_features=in_features,
        out_features=out_features,
        bias=bias)

def redefine_linear_transform_pass(graph, pass_args=None):
    main_config = pass_args.pop('config')
    default = main_config.pop('default', None)
    if default is None:
        raise ValueError(f"default value must be provided.")
    i = 0
    for node in graph.fx_graph.nodes:
        i += 1
        # if node name is not matched, it won't be tracked
        config = main_config.get(node.name, default)['config']
        name = config.get("name", None)
        if name is not None:
            ori_module = graph.modules[node.target]
            in_features = ori_module.in_features
            out_features = ori_module.out_features
            bias = ori_module.bias
            if name == "output_only":
                out_features = out_features * config["channel_multiplier"]
            elif name == "both":
                in_features = in_features * config["channel_multiplier"]
                out_features = out_features * config["channel_multiplier"]
            elif name == "input_only":
                in_features = in_features * config["channel_multiplier"]
            new_module = instantiate_linear(in_features, out_features, bias)
            parent_name, name = get_parent_name(node.target)
            setattr(graph.modules[parent_name], name, new_module)
    return graph, {}



pass_config = {
"by": "name",
"default": {"config": {"name": None}},
"seq_blocks_2": {
    "config": {
        "name": "output_only",
        # weight
        "channel_multiplier": 2,
        }
    },
"seq_blocks_3": {
    "config": {
        "name": "both",
        "channel_multiplier": 2,
        }
    },
"seq_blocks_4": {
    "config": {
        "name": "input_only",
        "channel_multiplier": 2,
        }
    },
}

# this performs the architecture transformation based on the config
mg, _ = redefine_linear_transform_pass(
    graph=mg, pass_args={"config": pass_config})

# report the graph
print(mg.modules['seq_blocks'])

Module(
  (0): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (1): ReLU(inplace=True)
  (2): Linear(in_features=16, out_features=32, bias=True)
  (3): Linear(in_features=32, out_features=32, bias=True)
  (4): Linear(in_features=32, out_features=5, bias=True)
  (5): ReLU(inplace=True)
)


The modified network features linear layers expanded to double their size, yet it’s unusual to sequence three linear layers consecutively without interposing any non-linear activations (do you know why?).

So we are interested in a modified network:

In [96]:
# define a new model
class JSC_Three_Linear_Layers(nn.Module):
    def __init__(self):
        super(JSC_Three_Linear_Layers, self).__init__()
        self.seq_blocks = nn.Sequential(
            nn.BatchNorm1d(16),  # 0
            nn.ReLU(16),  # 1
            nn.Linear(16, 16),  # linear seq_2
            nn.ReLU(16),  # 3
            nn.Linear(16, 16),  # linear seq_4
            nn.ReLU(16),  # 5
            nn.Linear(16, 5),  # linear seq_6
            nn.ReLU(5),  # 7
        )

    def forward(self, x):
        return self.seq_blocks(x)
    
model = JSC_Three_Linear_Layers()

# generate the mase graph and initialize node metadata
mg = MaseGraph(model=model)
mg, _ = init_metadata_analysis_pass(mg, None)

# report the graph
print(mg.modules['seq_blocks'])

Module(
  (0): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (1): ReLU(inplace=True)
  (2): Linear(in_features=16, out_features=16, bias=True)
  (3): ReLU(inplace=True)
  (4): Linear(in_features=16, out_features=16, bias=True)
  (5): ReLU(inplace=True)
  (6): Linear(in_features=16, out_features=5, bias=True)
  (7): ReLU(inplace=True)
)


1. Can you edit your code, so that we can modify the above network to have layers expanded to double their sizes?

### Question 1

In [73]:
new_pass_config = {
"by": "name",
"default": {"config": {"name": None}},
"seq_blocks_2": {
    "config": {
        "name": "output_only",
        # weight
        "channel_multiplier": 2,
        }
    },
"seq_blocks_4": {
    "config": {
        "name": "both",
        "channel_multiplier": 2,
        }
    },
"seq_blocks_6": {
    "config": {
        "name": "input_only",
        "channel_multiplier": 2,
        }
    },
}

In [74]:
# this performs the architecture transformation based on the config
mg, _ = redefine_linear_transform_pass(
    graph=mg, pass_args={"config": new_pass_config})

# report the graph
print(mg.modules['seq_blocks_2'])

{'': GraphModule(
  (seq_blocks): Module(
    (0): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (1): ReLU(inplace=True)
    (2): Linear(in_features=16, out_features=32, bias=True)
    (3): ReLU(inplace=True)
    (4): Linear(in_features=32, out_features=32, bias=True)
    (5): ReLU(inplace=True)
    (6): Linear(in_features=32, out_features=5, bias=True)
    (7): ReLU(inplace=True)
  )
), 'seq_blocks': Module(
  (0): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (1): ReLU(inplace=True)
  (2): Linear(in_features=16, out_features=32, bias=True)
  (3): ReLU(inplace=True)
  (4): Linear(in_features=32, out_features=32, bias=True)
  (5): ReLU(inplace=True)
  (6): Linear(in_features=32, out_features=5, bias=True)
  (7): ReLU(inplace=True)
), 'seq_blocks.0': BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True), 'seq_blocks.1': ReLU(inplace=True), 'seq_blocks.2': Linear(in_features=16, out_featur

### Question 2

2. In `lab3`, we have implemented a grid search, can we use the grid search to search for the best channel multiplier value?

In [75]:
#!./ch search --config configs/examples/jsc_lab_4_channel_mul.toml --load ../mase_output/jsc-toy-lab-1_classification_jsc_2024-01-26/software/training_ckpts/best.ckpt --load-type pl

In [76]:
pass_config_2 = {
"by": "name",
"default": {"config": {"name": None}},
"seq_blocks_2": {
    "config": {
        "name": "output_only",
        # weight
        "channel_multiplier": 2,
        }
    },
"seq_blocks_4": {
    "config": {
        "name": "both",
        "channel_multiplier": 2,
        }
    },
"seq_blocks_6": {
    "config": {
        "name": "input_only",
        "channel_multiplier": 2,
        }
    },
}

import copy
# build a new search space
channel_multipliers = [2,4,6,8]

search_spaces = []
for d_config in channel_multipliers:
    pass_config_2['seq_blocks_2']['config']['channel_multiplier'] = d_config
    pass_config_2['seq_blocks_4']['config']['channel_multiplier'] = d_config
    pass_config_2['seq_blocks_6']['config']['channel_multiplier'] = d_config

    # dict.copy() and dict(dict) only perform shallow copies
    # in fact, only primitive data types in python are doing implicit copy when a = b happens
    search_spaces.append(copy.deepcopy(pass_config_2))

In [77]:
search_spaces

[{'by': 'name',
  'default': {'config': {'name': None}},
  'seq_blocks_2': {'config': {'name': 'output_only', 'channel_multiplier': 2}},
  'seq_blocks_4': {'config': {'name': 'both', 'channel_multiplier': 2}},
  'seq_blocks_6': {'config': {'name': 'input_only', 'channel_multiplier': 2}}},
 {'by': 'name',
  'default': {'config': {'name': None}},
  'seq_blocks_2': {'config': {'name': 'output_only', 'channel_multiplier': 4}},
  'seq_blocks_4': {'config': {'name': 'both', 'channel_multiplier': 4}},
  'seq_blocks_6': {'config': {'name': 'input_only', 'channel_multiplier': 4}}},
 {'by': 'name',
  'default': {'config': {'name': None}},
  'seq_blocks_2': {'config': {'name': 'output_only', 'channel_multiplier': 6}},
  'seq_blocks_4': {'config': {'name': 'both', 'channel_multiplier': 6}},
  'seq_blocks_6': {'config': {'name': 'input_only', 'channel_multiplier': 6}}},
 {'by': 'name',
  'default': {'config': {'name': None}},
  'seq_blocks_2': {'config': {'name': 'output_only', 'channel_multiplier'

In [82]:
#new_mg = copy.deepcopy(mg)
print(new_mg.modules['seq_blocks'])


Module(
  (0): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (1): ReLU(inplace=True)
  (2): Linear(in_features=16, out_features=256, bias=True)
  (3): ReLU(inplace=True)
  (4): Linear(in_features=256, out_features=256, bias=True)
  (5): ReLU(inplace=True)
  (6): Linear(in_features=256, out_features=5, bias=True)
  (7): ReLU(inplace=True)
)


In [83]:
import torch
from torchmetrics.classification import MulticlassAccuracy
from chop.passes.graph.transforms import (
    quantize_transform_pass,
    summarize_quantization_analysis_pass,
)

# mg, _ = init_metadata_analysis_pass(mg, None)
# mg, _ = add_common_metadata_analysis_pass(mg, {"dummy_in": dummy_in})
# mg, _ = add_software_metadata_analysis_pass(mg, None)

metric = MulticlassAccuracy(num_classes=5)
num_batchs = 5
# This first loop is basically our search strategy,
# in this case, it is a simple brute force search

recorded_accs = []
for i, pass_config in enumerate(search_spaces):

    # print the config multiplier
    print('Config Multiplier:', pass_config['seq_blocks_4']['config']['channel_multiplier'], '\n')
    
    # we need to make a deep copy of the graph and the config so that the original graph is used when we multiply the channels
    config = copy.deepcopy(pass_config)
    new_mg = copy.deepcopy(mg)

    #Function to redefine the linear transform
    new_mg , _ = redefine_linear_transform_pass(graph=new_mg, pass_args={"config": config})
    print(new_mg.modules['seq_blocks'], '\n')

    j = 0

    # this is the inner loop, where we also call it as a runner.
    acc_avg, loss_avg = 0, 0
    accs, losses = [], []
    
    for inputs in data_module.train_dataloader():
        xs, ys = inputs
        preds = mg.model(xs)
        loss = torch.nn.functional.cross_entropy(preds, ys)
        acc = metric(preds, ys)
        accs.append(acc)
        losses.append(loss)
        if j > num_batchs:
            break
        j += 1
    acc_avg = sum(accs) / len(accs)
    loss_avg = sum(losses) / len(losses)
    recorded_accs.append(acc_avg)

Config Multiplier: 2 

Module(
  (0): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (1): ReLU(inplace=True)
  (2): Linear(in_features=16, out_features=64, bias=True)
  (3): ReLU(inplace=True)
  (4): Linear(in_features=64, out_features=64, bias=True)
  (5): ReLU(inplace=True)
  (6): Linear(in_features=64, out_features=5, bias=True)
  (7): ReLU(inplace=True)
) 

Config Multiplier: 4 

Module(
  (0): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (1): ReLU(inplace=True)
  (2): Linear(in_features=16, out_features=128, bias=True)
  (3): ReLU(inplace=True)
  (4): Linear(in_features=128, out_features=128, bias=True)
  (5): ReLU(inplace=True)
  (6): Linear(in_features=128, out_features=5, bias=True)
  (7): ReLU(inplace=True)
) 

Config Multiplier: 6 

Module(
  (0): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (1): ReLU(inplace=True)
  (2): Linear(in_features=16, out_features=192, bias=T

In [84]:
print(recorded_accs)

[tensor(0.0655), tensor(0.1732), tensor(0.1988), tensor(0.2119)]


### Question 3

3. You may have noticed, one problem with the channel multiplier is that it scales all layers uniformly, ideally, we would like to be able to construct networks like the following:

In [147]:
# define a new model
class JSC_Three_Linear_Layers(nn.Module):
    def __init__(self):
        super(JSC_Three_Linear_Layers, self).__init__()
        self.seq_blocks = nn.Sequential(
            nn.BatchNorm1d(16),
            nn.ReLU(16),
            nn.Linear(16, 32),  # output scaled by 2
            nn.ReLU(32),  # scaled by 2
            nn.Linear(32, 64),  # input scaled by 2 but output scaled by 4
            nn.ReLU(64),  # scaled by 4
            nn.Linear(64, 5),  # scaled by 4
            nn.ReLU(5),
        )

    def forward(self, x):
        return self.seq_blocks(x)
    
model = JSC_Three_Linear_Layers()

# generate the mase graph and initialize node metadata
mg = MaseGraph(model=model)
mg, _ = init_metadata_analysis_pass(mg, None)

# report the graph
print(mg.modules['seq_blocks'])

Module(
  (0): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (1): ReLU(inplace=True)
  (2): Linear(in_features=16, out_features=32, bias=True)
  (3): ReLU(inplace=True)
  (4): Linear(in_features=32, out_features=64, bias=True)
  (5): ReLU(inplace=True)
  (6): Linear(in_features=64, out_features=5, bias=True)
  (7): ReLU(inplace=True)
)


In [120]:
def instantiate_linear(in_features, out_features, bias):
    if bias is not None:
        bias = True
    return nn.Linear(
        in_features=in_features,
        out_features=out_features,
        bias=bias)

def redefine_linear_transform_pass_non_uniform(graph, pass_args=None):
    print('start')
    main_config = pass_args.pop('config')
    default = main_config.pop('default', None)
    if default is None:
        raise ValueError(f"default value must be provided.")
    i = 0
    for node in graph.fx_graph.nodes:
        i += 1
        # if node name is not matched, it won't be tracked
        config = main_config.get(node.name, default)['config']
        names = config.get("names", None)
        print(names)
        if names is not None:
            ori_module = graph.modules[node.target]
            in_features = ori_module.in_features
            out_features = ori_module.out_features
            bias = ori_module.bias
            for i, name in enumerate(names):
                print(i,name)
                if name == "output_only":
                    out_features = out_features * config["channel_multiplier"][i]
                elif name == "both":
                    in_features = in_features * config["channel_multiplier"][i]
                    out_features = out_features * config["channel_multiplier"][i]
                elif name == "input_only":
                    in_features = in_features * config["channel_multiplier"][i]
            new_module = instantiate_linear(in_features, out_features, bias)
            parent_name, name = get_parent_name(node.target)
            setattr(graph.modules[parent_name], name, new_module)
    return graph, {}

In [121]:
pass_config_3 = {
"by": "name",
"default": {"config": {"name": None}},
"seq_blocks_2": {
    "config": {
        "names": [ "output_only"],
        # weight
        "channel_multiplier": [2],
        }
    },
"seq_blocks_4": {
    "config": {
        "names": ["input_only","output_only"],
        "channel_multiplier": [2,2],
        }
    },
"seq_blocks_6": {
    "config": {
        "names": ["input_only"],
        "channel_multiplier": [2],
        }
    },
}

In [122]:
channel_multipliers = [(1,1), (1,2), (2,1), (2,2),(2,3),(3,2),(3,3),(3,4),(4,3),(4,4),(4,5),(5,4),(5,5)]
  
combinations = []
for multiplier_a,multiplier_b in channel_multipliers:
    new_config = pass_config_3.copy()  # Copy the original config
    print(multiplier_a,multiplier_b)
    for key in pass_config_3:
 
        if key.startswith("seq_blocks"):
            if key.endswith("2"):
                # For each multiplier, create a new dict with updated multiplier
                new_config[key] = new_config[key].copy()  # Copy the seq_block dict
                new_config[key]['config'] = new_config[key]['config'].copy()  # Copy the config dict
                new_config[key]['config']['channel_multiplier'] = multiplier_a  # Update multiplier
 
            if key.endswith("4"):
                # For each multiplier, create a new dict with updated multiplier]                
                new_config[key] = new_config[key].copy()  # Copy the seq_block dict
                new_config[key]['config'] = new_config[key]['config'].copy()  # Copy the config dict
                new_config[key]['config']['channel_multiplier'] = [multiplier_a,multiplier_b]  # Update multiplier
                # new_config[key]['config']['channel_multiplier'][1] = multiplier_b  # Update multiplier
 
            if key.endswith("6"):
                # For each multiplier, create a new dict with updated multiplier]                
                new_config[key] = new_config[key].copy()  # Copy the seq_block dict
                new_config[key]['config'] = new_config[key]['config'].copy()  # Copy the config dict
                new_config[key]['config']['channel_multiplier'] = multiplier_b  # Update multiplier
            
    combinations.append(new_config)
 
 
 
for i, d in enumerate(combinations, start=1):
    print(f"Dictionary {i}:")
    for key, value in d.items():
        print(f"  {key}: {value}", '\n')
    
#mg, _ = redefine_linear_transform_pass_non_uniform(graph=mg, pass_args={"config": pass_config})
 
#print(mg.modules['seq_blocks_2'])

1 1
1 2
2 1
2 2
2 3
3 2
3 3
3 4
4 3
4 4
4 5
5 4
5 5
Dictionary 1:
  by: name 

  default: {'config': {'name': None}} 

  seq_blocks_2: {'config': {'names': ['output_only'], 'channel_multiplier': 1}} 

  seq_blocks_4: {'config': {'names': ['input_only', 'output_only'], 'channel_multiplier': [1, 1]}} 

  seq_blocks_6: {'config': {'names': ['input_only'], 'channel_multiplier': 1}} 

Dictionary 2:
  by: name 

  default: {'config': {'name': None}} 

  seq_blocks_2: {'config': {'names': ['output_only'], 'channel_multiplier': 1}} 

  seq_blocks_4: {'config': {'names': ['input_only', 'output_only'], 'channel_multiplier': [1, 2]}} 

  seq_blocks_6: {'config': {'names': ['input_only'], 'channel_multiplier': 2}} 

Dictionary 3:
  by: name 

  default: {'config': {'name': None}} 

  seq_blocks_2: {'config': {'names': ['output_only'], 'channel_multiplier': 2}} 

  seq_blocks_4: {'config': {'names': ['input_only', 'output_only'], 'channel_multiplier': [2, 1]}} 

  seq_blocks_6: {'config': {'names':

In [134]:
import torch
from torchmetrics.classification import MulticlassAccuracy
from chop.passes.graph.transforms import (
    quantize_transform_pass,
    summarize_quantization_analysis_pass,
)

# mg, _ = init_metadata_analysis_pass(mg, None)
# mg, _ = add_common_metadata_analysis_pass(mg, {"dummy_in": dummy_in})
# mg, _ = add_software_metadata_analysis_pass(mg, None)

metric = MulticlassAccuracy(num_classes=5)
num_batchs = 5
# This first loop is basically our search strategy,
# in this case, it is a simple brute force search

recorded_accs = []
for i, pass_config in enumerate(combinations):

    # print the config multiplier
    print('Config Multiplier:', pass_config['seq_blocks_4']['config']['channel_multiplier'], '\n')
    
    # we need to make a deep copy of the graph and the config so that the original graph is used when we multiply the channels
    config = copy.deepcopy(pass_config)
    new_mg = copy.deepcopy(mg)

    #Function to redefine the linear transform
    new_mg , _ = redefine_linear_transform_pass_non_uniform(graph=new_mg, pass_args={"config": config})
    print(new_mg.modules['seq_blocks'], '\n')

    j = 0

    # this is the inner loop, where we also call it as a runner.
    acc_avg, loss_avg = 0, 0
    accs, losses = [], []
    
    for inputs in data_module.train_dataloader():
        xs, ys = inputs
        preds = mg.model(xs)
        loss = torch.nn.functional.cross_entropy(preds, ys)
        acc = metric(preds, ys)
        accs.append(acc)
        losses.append(loss)
        if j > num_batchs:
            break
        j += 1
    acc_avg = sum(accs) / len(accs)
    loss_avg = sum(losses) / len(losses)
    recorded_accs.append(acc_avg)



Config Multiplier: [1, 1] 

start
None
None
None
['output_only']
0 output_only
None
['input_only', 'output_only']
0 input_only
1 output_only


TypeError: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), but expected one of:
 * (tuple of ints size, *, tuple of names names, torch.memory_format memory_format, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
 * (tuple of ints size, *, torch.memory_format memory_format, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)


In [151]:
pass_config = {
"by": "name",
"default": {"config": {"name": None}},
"seq_blocks_2": {
    "config": {
        "names": ["output_only"],
        # weight
        "channel_multiplier": [2],
        }
    },
"seq_blocks_4": {
    "config": {
        "names": ["input_only","output_only"],
        "channel_multiplier": [2,2],
        }
    },
"seq_blocks_6": {
    "config": {
        "names": ["input_only"],
        "channel_multiplier": [2],
        }
    },
}
 
 
def redefine_linear_transform_pass_2(graph, pass_args=None):
    #print('start')
    main_config = pass_args.pop('config')
    default = main_config.pop('default', None)
    if default is None:
        raise ValueError(f"default value must be provided.")
    i = 0
    for node in graph.fx_graph.nodes:
        i += 1
        # if node name is not matched, it won't be tracked
        config = main_config.get(node.name, default)['config']
        names = config.get("names", None)
        #print(names)
        if names is not None:
            ori_module = graph.modules[node.target]
            in_features = ori_module.in_features
            out_features = ori_module.out_features
            bias = ori_module.bias
            for i, name in enumerate(names):
                #print(i,name)
                if name == "output_only":
                    out_features = out_features * config["channel_multiplier"][i]
                elif name == "both":
                    in_features = in_features * config["channel_multiplier"][i]
                    out_features = out_features * config["channel_multiplier"][i]
                elif name == "input_only":
                    in_features = in_features * config["channel_multiplier"][i]
            new_module = instantiate_linear(in_features, out_features, bias)
            parent_name, name = get_parent_name(node.target)
            setattr(graph.modules[parent_name], name, new_module)
    return graph, {}
 
import copy

mg = MaseGraph(model=model)
mg, _ = init_metadata_analysis_pass(mg, None)
  
channel_multipliers = [(1,1), (1,2), (2,1), (2,2),(2,3),(3,2),(3,3),(3,4),(4,3),(4,4),(4,5),(5,4),(5,5)]
  
combinations = []
for multiplier_a,multiplier_b in channel_multipliers:
    new_config = pass_config.copy()  # Copy the original config
    #print(multiplier_a,multiplier_b)
    for key in pass_config:
 
        if key.startswith("seq_blocks"):
            if key.endswith("2"):
                # For each multiplier, create a new dict with updated multiplier
                new_config[key] = new_config[key].copy()  # Copy the seq_block dict
                new_config[key]['config'] = new_config[key]['config'].copy()  # Copy the config dict
                new_config[key]['config']['channel_multiplier'] = [multiplier_a]  # Update multiplier
 
            if key.endswith("4"):
                # For each multiplier, create a new dict with updated multiplier           
                new_config[key] = new_config[key].copy()  # Copy the seq_block dict
                new_config[key]['config'] = new_config[key]['config'].copy()  # Copy the config dict
                new_config[key]['config']['channel_multiplier'] = [multiplier_a,multiplier_b]  # Update multiplier
                
            if key.endswith("6"):
                # For each multiplier, create a new dict with updated multiplier]                
                new_config[key] = new_config[key].copy()  # Copy the seq_block dict
                new_config[key]['config'] = new_config[key]['config'].copy()  # Copy the config dict
                new_config[key]['config']['channel_multiplier'] = [multiplier_b]  # Update multiplier
            
    combinations.append(new_config)
  
for i, d in enumerate(combinations, start=1):
    print(f"Dictionary {i}:")
    for key, value in d.items():
        print(f"  {key}: {value}")
    print()  # Print a blank line for spacing between dictionaries
 
#mg, _ = redefine_linear_transform_pass_2(graph=mg, pass_args={"config": pass_config})
 
#print(mg.modules['seq_blocks'])

Dictionary 1:
  by: name
  default: {'config': {'name': None}}
  seq_blocks_2: {'config': {'names': ['output_only'], 'channel_multiplier': [1]}}
  seq_blocks_4: {'config': {'names': ['input_only', 'output_only'], 'channel_multiplier': [1, 1]}}
  seq_blocks_6: {'config': {'names': ['input_only'], 'channel_multiplier': [1]}}

Dictionary 2:
  by: name
  default: {'config': {'name': None}}
  seq_blocks_2: {'config': {'names': ['output_only'], 'channel_multiplier': [1]}}
  seq_blocks_4: {'config': {'names': ['input_only', 'output_only'], 'channel_multiplier': [1, 2]}}
  seq_blocks_6: {'config': {'names': ['input_only'], 'channel_multiplier': [2]}}

Dictionary 3:
  by: name
  default: {'config': {'name': None}}
  seq_blocks_2: {'config': {'names': ['output_only'], 'channel_multiplier': [2]}}
  seq_blocks_4: {'config': {'names': ['input_only', 'output_only'], 'channel_multiplier': [2, 1]}}
  seq_blocks_6: {'config': {'names': ['input_only'], 'channel_multiplier': [1]}}

Dictionary 4:
  by: na

In [152]:
import torch
from torchmetrics.classification import MulticlassAccuracy
from chop.passes.graph.transforms import (
    quantize_transform_pass,
    summarize_quantization_analysis_pass,
)

# mg, _ = init_metadata_analysis_pass(mg, None)
# mg, _ = add_common_metadata_analysis_pass(mg, {"dummy_in": dummy_in})
# mg, _ = add_software_metadata_analysis_pass(mg, None)

metric = MulticlassAccuracy(num_classes=5)
num_batchs = 5
# This first loop is basically our search strategy,
# in this case, it is a simple brute force search

recorded_accs = []
for i, pass_config in enumerate(combinations):

    # print the config multiplier
    print('Config Multiplier:', pass_config['seq_blocks_4']['config']['channel_multiplier'], '\n')
    
    # we need to make a deep copy of the graph and the config so that the original graph is used when we multiply the channels
    config = copy.deepcopy(pass_config)
    new_mg = copy.deepcopy(mg)

    #Function to redefine the linear transform
    new_mg , _ = redefine_linear_transform_pass_2(graph=new_mg, pass_args={"config": config})
    print(new_mg.modules['seq_blocks'], '\n')

    j = 0

    # this is the inner loop, where we also call it as a runner.
    acc_avg, loss_avg = 0, 0
    accs, losses = [], []
    
    for inputs in data_module.train_dataloader():
        xs, ys = inputs
        preds = mg.model(xs)
        loss = torch.nn.functional.cross_entropy(preds, ys)
        acc = metric(preds, ys)
        accs.append(acc)
        losses.append(loss)
        if j > num_batchs:
            break
        j += 1
    acc_avg = sum(accs) / len(accs)
    loss_avg = sum(losses) / len(losses)
    recorded_accs.append(acc_avg)

    pass_config['seq_blocks_4']['config']['channel_multiplier'] = [multiplier_a, multiplier_b]


Config Multiplier: [1, 1] 

Module(
  (0): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (1): ReLU(inplace=True)
  (2): Linear(in_features=16, out_features=32, bias=True)
  (3): ReLU(inplace=True)
  (4): Linear(in_features=32, out_features=64, bias=True)
  (5): ReLU(inplace=True)
  (6): Linear(in_features=64, out_features=5, bias=True)
  (7): ReLU(inplace=True)
) 

Config Multiplier: [1, 2] 

Module(
  (0): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (1): ReLU(inplace=True)
  (2): Linear(in_features=16, out_features=32, bias=True)
  (3): ReLU(inplace=True)
  (4): Linear(in_features=32, out_features=128, bias=True)
  (5): ReLU(inplace=True)
  (6): Linear(in_features=128, out_features=5, bias=True)
  (7): ReLU(inplace=True)
) 

Config Multiplier: [2, 1] 

Module(
  (0): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (1): ReLU(inplace=True)
  (2): Linear(in_features=16, out_feature

In [154]:
print(acc_avg)
print(recorded_accs)

tensor(0.1774)
[tensor(0.1726), tensor(0.2786), tensor(0.1560), tensor(0.2470), tensor(0.1786), tensor(0.1036), tensor(0.2262), tensor(0.2345), tensor(0.1762), tensor(0.1060), tensor(0.2944), tensor(0.1429), tensor(0.1774)]


Can you then design a search so that it can reach a network that can have this kind of structure?

### Question 4

4. Integrate the search to the `chop` flow, so we can run it from the command line.

## Optional task (scaling the search to real networks)

We have looked at how to search, on the architecture level, for a simple linear layer based network. MASE has the following components that you can have a look:

* [Cifar10 dataset](https://github.com/DeepWok/mase/blob/main/machop/chop/dataset/vision/cifar.py)
* [VGG](https://github.com/DeepWok/mase/blob/main/machop/chop/models/vision/vgg_cifar/vgg_cifar.py), this is a variant used for CIFAR
* [TPE-based Search](https://github.com/DeepWok/mase/blob/main/machop/chop/actions/search/strategies/optuna.py), implementd using [Optuna](https://optuna.readthedocs.io/en/stable/reference/index.html)

Can you define a search space (maybe channel dimension) for the VGG network, and use the TPE-search to tune it?