# FlashNet

We will use Tencent I/O traces that were replayed on our NVMe device. This data is ready to be used for training after applying some preprocessing. As for the raw version of Tencent trace, it can be downloaded here: https://www.googleapis.com/drive/v3/files/1Xj6rFBOsY9Wt_XlCiAiZn7LfSCAYy1AB?alt=media&key=AIzaSyBbUdTut1W8uzPO1nzCmcHFIw0KsuO3Dfo

#### Goal
This notebook will guide you through the process of preparing a trace profile to train a Neural Network (NN) model for performing per-IO admission control.

## Setup Connection

Connect to Pre-made Lease

**How to create Lease at Chameleon Cloud?**

* Create Reservation <br>
    Go to https://chi.tacc.chameleoncloud.org/project/ or https://chi.uc.chameleoncloud.org/project/
    
    Reserve Physical Host
    - click "Leases" => "+ Create Lease"
    - lease name = "trovi-ceph"
    - "$node_type", "storage_nvme"
    - Max lease: 7 days

* Launching an Instance
    - In the sidebar, click Compute, then click Instances
    - Click on the Launch Instance
        + pick the correct reservation 
        + count = 1
        + Image: CC-Ubuntu20.04
    - Choose the ssh key

* Allocate floating IPs
    - Book the IP interface
        + Click "Network -> Floating IPs -> Allocate IP To Project"
        + Write description (optional)
        + Click "Allocate IP"
    - Click "Associate" OR click "attach interface"
        + Click "Network -> Floating IPs"
    - Wait a few minutes until node spawned

* If you can't access the cc user on SSH
    - open the console terminal via the website
    - edit the .ssh/authorized_keys and add your pub_key manually!!!
    - now, the hostname will be different 
    - run "sudo dhclient" from the web-based ssh console

Declare basic variables for ssh

In [1]:
export ip_addr="192.5.86.149"
export SSHKEY_NAME="$USER-flashnet"
export SSHKEY_FILE="$HOME/.ssh/$SSHKEY_NAME"
export USER="$USER"

Create ssh folder

In [2]:
mkdir -p ~/.ssh
echo "Host *" > ~/.ssh/config
echo " StrictHostKeyChecking no" >> ~/.ssh/config
sudo rm -rf output/*

Create ssh key pair

In [3]:
rm -rf $HOME/.ssh/$USER-*
ssh-keygen -t rsa -b 4096  -P '' -C $SSHKEY_NAME -f $SSHKEY_FILE <<< y
echo "SSH key with name: $SSHKEY_NAME created"

Generating public/private rsa key pair.
Your identification has been saved in /home/rani_api3939_gmail_com/.ssh/rani_api3939_gmail_com-flashnet
Your public key has been saved in /home/rani_api3939_gmail_com/.ssh/rani_api3939_gmail_com-flashnet.pub
The key fingerprint is:
SHA256:CGqbJS2f1xOtaLNCkLAiB3pjit8KQJUqudvgaqUZvMc rani_api3939_gmail_com-flashnet
The key's randomart image is:
+---[RSA 4096]----+
|   ..            |
|o ..             |
|.*...            |
|O.Bo . . .       |
|B*=oo . S .      |
|*+ O.. o o       |
|o=Xoo = +        |
|.BoEoo o .       |
|+ o. ..          |
+----[SHA256]-----+
SSH key with name: rani_api3939_gmail_com-flashnet created


Add this Trovi session machine's public key to reserved node

In [4]:
pubkey=`cat $SSHKEY_FILE.pub`
echo "echo \"$pubkey\" >> ~/.ssh/authorized_keys"
echo "chmod 644 ~/.ssh/authorized_keys"
echo "echo 'StrictHostKeyChecking no' > ~/.ssh/config; chmod go-rw ~/.ssh/config"

echo "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQC6yeckn+S3SJsTgdSL48tQkp/QkASNoWNx+U7zdHfatsX1x5FGjWtx4x+yIq4rEBXLr703WiV4zGzZSiji6ywTlDBxIyJqpQvlDpflR7xRBBKPwyfu/TDiMXpPRORfoXtLdp9P/DfY64h5rs/6G1MZd+shH6QXJtSrxhTzc3vcGSK1xBvGHIaDVWZ01Rna3QbpzB3H7kurdu0Ku4F2cShVybRFfGgg+QCcuQLb6vtFk+2uPJslj7tY0bZiLA0WJBq6Zw998YXfaLkH3SP3+nE9Ryy62/xGpM4Itzn6nC0ao5pK7nSfPAtzzpOxfbM4XkAK6Pa6IM+qT0ZwO8rlkaOcVc2ALmPvWTdw8fBvRkL/+AUAPKrgZ5yAUCSUBII5CXZoNTnYaUr1ddkVmtTyXx9+gRZY5oci1jODAPBpTV+zvyX6zVMqMUQ7JuKkM1IpQY/bcCgL6163L5HRKabdjli1s8EgWrLfqBvjQ2dsYA898KyK9KSFUpxcWm/s9KVb8ZA2JKQs8CeLO2DBS9rs3IDmwMe1wX3NVO7yMWgKTzm/PJVMzeDAwoWjIdknW7UF267ERttuVRApqVzUHQ/02QpI3Q+mLxneS4pZh2OKMrnx4SDY+UfNzxR5ZIMh+DIQ/+AtCMPFl+oi++G6aIOvJkXpMQK28o03w9kf5oDUVhuw1w== rani_api3939_gmail_com-flashnet" >> ~/.ssh/authorized_keys
chmod 644 ~/.ssh/authorized_keys
echo 'StrictHostKeyChecking no' > ~/.ssh/config; chmod go-rw ~/.ssh/config


Add the output of above cell to the reserved node.

Steps:
1. Open your local terminal
2. Connect to the node (ssh cc@xxx.xx.xx)
3. Copy the output of cell above and paste to terminal

### Check Connection

In [5]:
ssh_command="ssh -o \"StrictHostKeyChecking no\" -i $SSHKEY_FILE cc@$ip_addr"
eval "$ssh_command" pwd 
wait_ssh "$ip_addr"

/home/cc
Waiting up to 300 seconds for SSH on 192.5.86.149...
SSH is running!


If a specific node is used, then for some reasons rebuilding the node is needed, **run this cell after the node has been rebuilt**. Otherwise, ignore.

In [None]:
eval "$ssh_command" /bin/bash << EOF
    ssh-keygen -f "/home/$USER/.ssh/known_hosts" -R $ip_addr
EOF

## Step-by-step Guideline:

In [9]:
eval "$ssh_command" /bin/bash << EOF
    git clone https://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@github.com/daniarherikurniawan/flashnet-trovi.git
    cd flashnet-trovi
    find . -type f -iname "*.sh" -exec chmod +x {} \;
    find . -type f -iname "*.py" -exec chmod +x {} \;
    pwd
    sudo chown cc -R .
EOF

Cloning into 'flashnet-trovi'...
/home/cc/flashnet-trovi


### 0. Install Conda Dependencies

In [10]:
# Creating conda env named "flashnet-trovi-env"
eval "$ssh_command" /bin/bash << EOF
    cd flashnet-trovi
    ./install_conda_deps.sh
EOF

Anaconda has already installed.
no change     /home/cc/anaconda3/condabin/conda
no change     /home/cc/anaconda3/bin/conda
no change     /home/cc/anaconda3/bin/conda-env
no change     /home/cc/anaconda3/bin/activate
no change     /home/cc/anaconda3/bin/deactivate
no change     /home/cc/anaconda3/etc/profile.d/conda.sh
no change     /home/cc/anaconda3/etc/fish/conf.d/conda.fish
no change     /home/cc/anaconda3/shell/condabin/Conda.psm1
no change     /home/cc/anaconda3/shell/condabin/conda-hook.ps1
no change     /home/cc/anaconda3/lib/python3.9/site-packages/xontrib/conda.xsh
no change     /home/cc/anaconda3/etc/profile.d/conda.csh
no change     /home/cc/.zshrc
No action taken.
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done


  current version: 22.9.0
  latest version: 23.1.0

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/cc/anaconda3/envs/f

In [11]:
# Install ipykernel
eval "$ssh_command" /bin/bash << EOF
    cd flashnet-trovi
    source ~/.zshrc
    which conda
    conda install -n flashnet-trovi-env ipykernel --update-deps --force-reinstall -y
EOF

/home/cc/anaconda3/condabin/conda
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... Solving environment: ...working... done
done


  current version: 22.9.0
  latest version: 23.1.0

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/cc/anaconda3/envs/flashnet-trovi-env

  added / updated specs:
    - _libgcc_mutex
    - _openmp_mutex
    - asttokens
    - backcall
    - ca-certificates
    - certifi
    - comm
    - debugpy
    - decorator
    - entrypoints
    - executing
    - ipykernel
    - ipython
    - jedi
    - jupyter_client
    - jupyter_core
    - ld_impl_linux-64
    - libffi
    - libgcc-ng
    - libgomp
    - libsodium
    - libstdcxx-ng
    - matplotlib-inline
    - ncurses
    - nest-asyncio
    - openssl
    - packaging
    - parso
    - pexpect
    - pickleshare
    - pip
    - platformdirs
    - prompt-toolkit
    - psutil
    - pty

In [12]:
FLASHNET=`eval "$ssh_command" /bin/bash << EOF
    cd flashnet-trovi
    pwd
EOF`
export FLASHNET=$FLASHNET

In [13]:
eval "$ssh_command" /bin/bash << EOF
    export FLASHNET=$FLASHNET
EOF

### 1. Run Tail Analyzer for Labeling

In [14]:
eval "$ssh_command" /bin/bash << EOF
    source ~/.zshrc
    cd $FLASHNET/model_collection/1_per_io_admission/tail_analyzer/
    conda activate "flashnet-trovi-env" && ./tail_v2.py -files $FLASHNET/data/trace_profile/nvme0n1/tencent.cut.per_50k*.trace
EOF

trace_profiles = ['/home/cc/flashnet-trovi/data/trace_profile/nvme0n1/tencent.cut.per_50k.most_rand_iops.537.trace', '/home/cc/flashnet-trovi/data/trace_profile/nvme0n1/tencent.cut.per_50k.most_size_thpt.222.trace', '/home/cc/flashnet-trovi/data/trace_profile/nvme0n1/tencent.cut.per_50k.rw_60_40.490.trace', '/home/cc/flashnet-trovi/data/trace_profile/nvme0n1/tencent.cut.per_50k.rw_65_35.211.trace', '/home/cc/flashnet-trovi/data/trace_profile/nvme0n1/tencent.cut.per_50k.rw_75_25.379.trace']

Processing /home/cc/flashnet-trovi/data/trace_profile/nvme0n1/tencent.cut.per_50k.most_rand_iops.537.trace
#IO labeled = 50000
Fast IO = 36216
Slow IO = 13784
===== output file : ../dataset/nvme0n1/tencent.cut.per_50k.most_rand_iops.537/profile_v2.labeled
===== output file : ../dataset/nvme0n1/tencent.cut.per_50k.most_rand_iops.537/profile_v2.stats
===== output figure : ../dataset/nvme0n1/tencent.cut.per_50k.most_rand_iops.537/profile_v2.lat_cdf.png

Processing /home/cc/flashnet-trovi/data/trace_pro

### 2. Run Feature Extractor

In [15]:
# run on multiple profiles
eval "$ssh_command" /bin/bash << EOF
    source ~/.zshrc
    cd $FLASHNET/model_collection/1_per_io_admission/feature_extractor/
    conda activate flashnet-trovi-env && ./feat_v2.py -files ../dataset/nvme0n1/tencent.cut.per_50k*/profile_v2.labeled
EOF

trace_profiles = ['../dataset/nvme0n1/tencent.cut.per_50k.most_rand_iops.537/profile_v2.labeled', '../dataset/nvme0n1/tencent.cut.per_50k.most_size_thpt.222/profile_v2.labeled', '../dataset/nvme0n1/tencent.cut.per_50k.rw_60_40.490/profile_v2.labeled', '../dataset/nvme0n1/tencent.cut.per_50k.rw_65_35.211/profile_v2.labeled', '../dataset/nvme0n1/tencent.cut.per_50k.rw_75_25.379/profile_v2.labeled']

Processing ../dataset/nvme0n1/tencent.cut.per_50k.most_rand_iops.537/profile_v2.labeled
Removed 3 first IOs because they don't have enough historical data
===== output file : ../dataset/nvme0n1/tencent.cut.per_50k.most_rand_iops.537/profile_v2.feat_v2.dataset
===== output file : ../dataset/nvme0n1/tencent.cut.per_50k.most_rand_iops.537/profile_v2.feat_v2.readonly.dataset

Processing ../dataset/nvme0n1/tencent.cut.per_50k.most_size_thpt.222/profile_v2.labeled
Removed 3 first IOs because they don't have enough historical data
===== output file : ../dataset/nvme0n1/tencent.cut.per_50k.most_size_

### 3. Train the NN model

In [16]:
# train on multiple datasets
eval "$ssh_command" /bin/bash << EOF
    source ~/.zshrc
    cd $FLASHNET/model_collection/1_per_io_admission/train/
    conda activate flashnet-trovi-env && ./train_and_eval.py -model model_binary_nn -datasets ../dataset/nvme0n1/tencent*cut*per*/profile*feat*.dataset -train_eval_split 50_50
EOF

2023-02-05 15:39:34.860515: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-02-05 15:39:34.860542: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-02-05 15:39:36.249793: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2023-02-05 15:39:36.249820: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (rani-flashnet4): /proc/driver/nvidia/version does not exist
2023-02-05 15:39:36.250051: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2

### 4. Analyze the model performance

In [18]:
# First, we will gather all the stats
eval "$ssh_command" /bin/bash << EOF
    cd $FLASHNET/model_collection/1_per_io_admission/script/
    source ~/.zshrc
    conda activate flashnet-trovi-env && ./gather_eval_stats.py -files ../dataset/nvme*/*cut*/profile*/*/eval.stats
EOF

Found 10 stats files
===== output file : ../dataset/models_performance.csv


In [28]:
eval "$ssh_command" /bin/bash << EOF
source ~/.zshrc
cd flashnet-trovi
python3 -c '
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

df = pd.read_csv("model_collection/1_per_io_admission/dataset/models_performance.csv")
df.head()

def plot_bars(df, column):
    # Plot the ROC-AUC value from the non read_only dataset
    df = df[df["read_only"] == False]
    fig, ax = plt.subplots(figsize=(5, 4))
    # creating the bar plot
    plt.bar(df["trace_name"].tolist(), df[column].tolist(), color =[np.random.rand(3,) for x in range(len(df))], width = 0.4)
    plt.setp( ax.xaxis.get_majorticklabels(), rotation=30, ha="right" )
    # ax.yaxis.set_ticks(np.arange(0, 1.1, 0.2))
    plt.xlabel("Dataset name")
    plt.ylabel(column.upper())
    plt.title(column.upper() + " on Various Dataset")
    # plt.ylim(0,1)
    plt.show()
    plt.savefig(column+".png")

plot_bars(df, "roc_auc")
plot_bars(df, "fnr")
plot_bars(df, "fpr")'

EOF

View output in $FLASHNET