# Recommendation

## Why use Modlee?

> At Modlee, we're on a mission to ensure that everyone, everywhere has access to top-tier machine learning solutions. We're flipping the script on how ML knowledge is shared, going beyond the realms of Hugging Face, GitHub, and Papers with Code. Let's be honest, we're all diving into similar models, right? Modlee is your turbocharged ticket to effortlessly and swiftly connect with the ideal models for your datasets, making your journey smoother, faster, and with minimal effort on your part.

[image]

We're working towards this vision, and would love give you a sneak peak of our technology. Some of the below features in this demo are at different stages of development. 

## Here's how it works in Pytorch
1) You prepare your dataset:
```
        training_dataloader = torch.utils.data.DataLoader(...)
```
2) Modlee recommends a model close to your target solution by analyzing your dataset and solution requirements:
```
        modlee_model = modlee.Recommender(training_dataloader,max_model_size_MB=10, ...)
```
3) While you train the model, Modlee prepares everything you need for your convenience:
```
        modlee_model.train()
```
4) Modlee auto-documents your experiment locally and learns from non-sensitive details to enhance ML model recommendations for the community:
```
        modlee_model.train_documentation_locations()
```

## Let's see what Modlee recommends for MNIST ... (~5 mins)

First let's quickly install the modlee package, should take ~10 seconds. Thanks for your patience!

In [2]:
import os
SERVER_ENDPOINT = 'http://ec2-3-84-155-233.compute-1.amazonaws.com:7070'

def setup(demo_header='demos_demo04_'):
    os.system(
        f'curl -o modlee-0.0.1.post6-py3-none-any.whl {SERVER_ENDPOINT}/get_wheel/{demo_header}modlee-0.0.1.post6-py3-none-any.whl -O')
    os.system(
        f'curl -o modleesurvey-0.0.1-py3-none-any.whl {SERVER_ENDPOINT}/get_wheel/{demo_header}modleesurvey-0.0.1-py3-none-any.whl -O  > /dev/null 2>&1')
    os.system(
        f'curl -o onnx2torch-1.5.11-py3-none-any.whl {SERVER_ENDPOINT}/get_wheel/{demo_header}onnx2torch-1.5.11-py3-none-any.whl -O  > /dev/null 2>&1')
    os.system("pip3 install -q 'modlee-0.0.1.post6-py3-none-any.whl' 'modleesurvey-0.0.1-py3-none-any.whl' 'onnx2torch-1.5.11-py3-none-any.whl' torch==2.1.0 torchsummary==1.5.1 ipywidgets==7.7.1  > /dev/null 2>&1")
    # os.system("pip3 install -q 'modleesurvey-0.0.1-py3-none-any.whl' 'onnx2torch-1.5.11-py3-none-any.whl' torchsummary==1.5.1 ipywidgets  > /dev/null 2>&1")
    os.system("pip3 install -q onnx_graphsurgeon==0.3.27 --index-url https://pypi.ngc.nvidia.com  > /dev/null 2>&1")
setup()
  
import modlee
modlee.init(api_key="community",run_dir='./')

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 39751  100 39751    0     0  7763k      0 --:--:-- --:--:-- --:--:-- 9704k


### 1. You prepare your dataset

In [3]:
import torch, torchvision
import torchvision.transforms as transforms
from torchvision.transforms import v2
# torch.set_default_device('cuda')

transform_train = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
# transform_train = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),transforms.Resize((300,300))])
def remap_255(x, n_unique=21):
    # return x
    mask = x!=255
    # mask = mask.to('cuda')
    # mask = mask.to(x.)
    x = x.type(torch.LongTensor)
    # x = x.to('cuda')
    mask = mask.to(x.device)
    # print(x.device, mask.device)
    x = x.where(mask, n_unique-1)
    x = x.squeeze()
    return x
    return x.type(torch.LongTensor).where(mask, n_unique-1).squeeze().to('cuda')
    
transforms = v2.Compose(
    [
        # v2.ToImage(),
        # v2.RandomPhotometricDistort(p=1),
        # v2.RandomZoomOut(fill={tv_tensors.Image: (123, 117, 104), "others": 0}),
        # v2.RandomIoUCrop(),
        # v2.RandomHorizontalFlip(p=1),
        # v2.SanitizeBoundingBoxes(),
        v2.ToTensor(),
        v2.Resize((300,300)),
        # v2.ToDtype(torch.float32, scale=True),
        # v2.ToTensor(),
        # v2.Lambda(remap_255)
    ]
)
train_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_train)
train_dataset = torchvision.datasets.VOCSegmentation(
    root='./data', year='2007',
    image_set='test',
    # image_set='train',
    download=True,
    transform=transforms,
    target_transform=v2.Compose([v2.ToTensor(), v2.Resize((300,300)), v2.Lambda(remap_255)])
    )
train_dataset = torchvision.datasets.wrap_dataset_for_transforms_v2(train_dataset, )

# train_dataset.data.to(torch.device('cuda'))
train_dataloader = torch.utils.data.DataLoader(
    train_dataset,
    # batch_size=64,
    batch_size=16,
    pin_memory=True,
    # num_workers=torch.cuda.device_count()*4
    # collate_fn=lambda batch: list(zip(*batch))
    )
# train_dataloader.to(torch.device('cuda'))



Files already downloaded and verified
Using downloaded and verified file: ./data/VOCtest_06-Nov-2007.tar
Extracting ./data/VOCtest_06-Nov-2007.tar to ./data


### 2. Modlee recommends a model close to your target solution by analyzing your dataset and solution requirements:

In [4]:
recommender = modlee.recommender.from_modality_task(
    modality='image',
    # task='classification',
    task='segmentation',
    )
recommender.fit(train_dataloader)
modlee_model = recommender.model 


[Modlee] -> Just a moment, analyzing your dataset ...



In [5]:
recommender.get_model_details()

--- Modlee Recommended Model Details --->

[Modlee] -> In case you want to take a deeper look, I saved the summary of my current model recommendation here:
                    file: ./modlee_model.txt

[Modlee] -> I also saved the model as a python editable version (model def, train, val, optimizer):
                    file: ./modlee_model.py
            This is a great place to start your own model exploration!


In [6]:
!cat ./modlee_model.txt
!cat ./modlee_model.py
# train_dataloader.dataset.to('cuda')
b1,b2 = next(iter(train_dataloader))
print(b1.device)
# modlee_model.to(torch.device('cuda'))
print(modlee_model.device, b1.device)
# b1.to(modlee_model.device)
# modlee_model(b1.to(modlee_model.device)).shape
modlee_model(b1).shape

<
   ir_version: 9,
   opset_import: ["" : 17],
   producer_name: "pytorch",
   producer_version: "2.2.0"
>
main_graph (float[input_1_dynamic_axes_1,3,300,300] input_1, float[21,512,1,1] model_classifier_model_4_weight, float[21] model_classifier_model_4_bias, float[64,3,7,7] onnx__Conv_525, float[64] onnx__Conv_526, float[64,64,1,1] onnx__Conv_528, float[64] onnx__Conv_529, float[64,64,3,3] onnx__Conv_531, float[64] onnx__Conv_532, float[256,64,1,1] onnx__Conv_534, float[256] onnx__Conv_535, float[256,64,1,1] onnx__Conv_537, float[256] onnx__Conv_538, float[64,256,1,1] onnx__Conv_540, float[64] onnx__Conv_541, float[64,64,3,3] onnx__Conv_543, float[64] onnx__Conv_544, float[256,64,1,1] onnx__Conv_546, float[256] onnx__Conv_547, float[64,256,1,1] onnx__Conv_549, float[64] onnx__Conv_550, float[64,64,3,3] onnx__Conv_552, float[64] onnx__Conv_553, float[256,64,1,1] onnx__Conv_555, float[256] onnx__Conv_556, float[128,256,1,1] onnx__Conv_558, float[128] onnx__Conv_559, float[128,128,3,3] 

torch.Size([16, 21, 300, 300])

### 3. While you train the model, Modlee prepares everything you need for your convenience:

In [20]:
# Assumes that modlee_model is 
import inspect

class RecommendedModel(modlee.recommender.RecommendedModel):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        
    def configure_optimizers(self,):
        optimizer = torch.optim.AdamW(
            self.parameters(),
            lr=0.001,
        )
        self.scheduler = lr_scheduler.ReduceLROnPlateau(
            optimizer,
            factor=0.8,
            patience=200,
        )
        return optimizer
    
    def on_train_epoch_end(self) -> None:
        """
        Update the learning rate scheduler
        """
        sch = self.scheduler
        if isinstance(sch, torch.optim.lr_scheduler.ReduceLROnPlateau):
            sch.step(self.trainer.callback_metrics["loss"])
            self.log('scheduler_last_lr',sch._last_lr[0])
        return super().on_train_epoch_end()
    
recd_model = RecommendedModel(modlee_model)

# The built-in configure callbacks function should be the same as the base ModleeModel
print("==== ORIGINAL configure_callbacks ====")
print(inspect.getsource(recd_model.configure_callbacks))
# The updated configure_optimizers, with patience of 200, should be printed
print("==== ORIGINAL configure_optimizers ====")
print(inspect.getsource(modlee.recommender.RecommendedModel.configure_optimizers))
print("==== UPDATED configure_optimizers ====")
print(inspect.getsource(recd_model.configure_optimizers))

==== ORIGINAL configure_callbacks ====
    def configure_callbacks(self):
        base_callbacks = super().configure_callbacks()
        # base_callbacks.append(
        #     pl.callbacks.EarlyStopping(
        #         'val_loss',
        #         patience=10,
        #         verbose=True,)
        # )
        return base_callbacks

==== ORIGINAL configure_optimizers ====
    def configure_optimizers(self,):
        optimizer = torch.optim.AdamW(
            self.parameters(),
            lr=0.001,
        )
        self.scheduler = lr_scheduler.ReduceLROnPlateau(
            optimizer,
            factor=0.8,
            patience=10,
        )
        return optimizer

==== ORIGINAL configure_optimizers ====
    def configure_optimizers(self,):
        optimizer = torch.optim.AdamW(
            self.parameters(),
            lr=0.001,
        )
        self.scheduler = lr_scheduler.ReduceLROnPlateau(
            optimizer,
            factor=0.8,
            patience=200,
      

In [6]:
# print(dir(modlee_model))
# import inspect
# print(inspect.getsource(modlee_model.train))
# import lightning.pytorch as pl
# callbacks = modlee_model.configure_callbacks()
# print(callbacks)
# trainer = pl.Trainer(
#     max_epochs=1,
#     # callbacks 2,3,4 (logOutput, logParams, PushAPI) are fine
#     # callback 0 (dataStats) is fine
#     # 1 also seems fine
#     # callbacks=[callbacks[c] for c in [1]], 
#     callbacks=callbacks,
#     enable_model_summary=False,
#     )
# with modlee.start_run() as run:
#     trainer.fit(model=modlee_model, 
#         train_dataloaders=train_dataloader,
#         val_dataloaders=train_dataloader)
recommender.train(max_epochs=1, val_dataloaders=train_dataloader)

INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:lightning.pytorch.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


----------------------------------------------------------------
Training your recommended modlee model:
     - Running this model: ./modlee_model.py
     - On the dataloader previously analyzed by the recommender
----------------------------------------------------------------


Sanity Checking: 0it [00:00, ?it/s]

  rank_zero_warn(


Training: 0it [00:00, ?it/s]

  return torch.tensor(
  axes = axes.detach().cpu().numpy()
  steps = [1] * len(starts)
  for start, end, axis, step in zip(starts, ends, axes, steps):
  if sizes.nelement() != 0:
  sizes = sizes.tolist()
  if not self.ignore_bs_ch_size and input_shape[:2] != sizes[:2]:


Validation: 0it [00:00, ?it/s]

INFO: Metric val_loss improved. New best score: 16.994
INFO:lightning.pytorch.callbacks.early_stopping:Metric val_loss improved. New best score: 16.994


### 4. Modlee auto-documents your experiment locally and learns from non-sensitive details:
Sharing helps to enhance ML model recommendations across the entire community of modlee users

In [7]:
recommender.train_documentation_locations()


-----------------------------------------------------------------------------------------------

Modlee documented all the details about your trained model and experiment here: 

        Path: /home/ubuntu/projects/modlee_survey/notebooks/mlruns/0/23cd9c1a052c49a88e1b73c22a4ad574/
        Experiment_id: automatically assigned to | 0
        Run_id: automatically assigned to | 23cd9c1a052c49a88e1b73c22a4ad574

-----------------------------------------------------------------------------------------------

