# Execution (Local)

**Root folder:** /home/bravo-z6/Dropbox/_Exp/_cvd/dino_cov/

**Run Jupyter in local:**

```
jupyter notebook \
  --NotebookApp.allow_origin='https://colab.research.google.com' \
  --port=8888 \
  --NotebookApp.port_retries=0
```

**Connect to local execution env**


# **## Training DINO ##**

### Tiny Run

In [None]:
#!python -m torch.distributed.launch --nproc_per_node=8 main_dino.py --arch vit_small --data_path /path/to/imagenet/train --output_dir /path/to/saving_dir

### Data 
#--data_path Data/breast_data/d2/dino_tr/ \

import os

data_dir = "cvd_data/train/dino/"
output_dir = "Results/tiny_cvd/"
os.makedirs(output_dir, exist_ok=True)

!python main_dino.py --arch vit_tiny --batch_size_per_gpu 32 \
                     --epochs 301 \
                     --teacher_temp 0.07 --warmup_teacher_temp_epochs 30 \
                     --norm_last_layer true \
                     --data_path $data_dir \
                     --output_dir $output_dir \
                     --num-channels 1

### Small Run

In [None]:
import os

data_dir = "cvd_data/train/dino/"
output_dir = "Results/tiny_cvd/"
os.makedirs(output_dir, exist_ok=True)

!python main_dino.py --arch vit_small \
                     --momentum_teacher 0.9995 \
                     --batch_size_per_gpu 30 \
                     --data_path cvd_data/train/dino/ \
                     --output_dir Results/small_cvd/ \
                     --epochs 101 \
                     --num-channels 1

# **## Visualizing and Storing attention maps ##**

## --- Visualize Attention Maps ---

```
!python visualize_attention.py --arch vit_small --pretrained_weights Results/checkpoint0080.pth --patch_size 16 \
                                  --image_path Data/breast_data/d2/test/Mri_10_R1_IM-1683-0597.tiff \
                                  --output_dir Attention_maps/
```



### Tiny Run

In [None]:
import os

eval_path = "cvd_data/test/dino/1/" #dino_ts
out_path = "Attention_maps/tiny_cvd/"
os.makedirs(out_path, exist_ok=True)

model_chk = "Results/tiny_cvd/checkpoint0300.pth"
att_names = ["attn-head0.png","attn-head1.png","attn-head2.png","img.png"]

for image in os.listdir(eval_path)[:10]: 
    full_image = eval_path + image
    ##
    !python visualize_attention.py --arch vit_tiny --pretrained_weights $model_chk --patch_size 16 \
                                  --image_path $full_image \
                                  --output_dir $out_path \
                                  --num-channels 1
    ##
    for map_name in att_names: 
        os.rename(out_path + map_name, out_path + os.path.splitext(image)[0] + "_" + map_name)

## --- Store Attention Maps ---

```
!python store_attention.py --arch vit_small --pretrained_weights Results/checkpoint0080.pth --patch_size 16 \
                                  --image_path Data/breast_data/d2/test/Mri_10_R1_IM-1683-0597.tiff \
                                  --output_dir Attention_maps/
```



### Tiny Run

In [None]:
import os

eval_path = "Data/breast_data/d0/dino_ts/test0/"#dino_ts
out_path = "Attention_maps/tiny_d0_gs/arrays/"
model_chk = "Results/tiny_d0_gs/checkpoint0300.pth"

att_names = ["attn-heads.npy"] #,"img.png"

os.makedirs(out_path, exist_ok=True)
for image in os.listdir(eval_path): 
    full_image = eval_path + image
    ##
    !python store_attention.py --arch vit_tiny --pretrained_weights $model_chk --patch_size 16 \
                                  --image_path $full_image \
                                  --output_dir $out_path
    ##
    for map_name in att_names: 
        os.rename(out_path + map_name, out_path + os.path.splitext(image)[0] + "_" + map_name)

# **## Extracting features from DINO ##**

## --- Reorganize features ---

In [None]:
import numpy as np
import pandas as pd
from tqdm import tqdm 

train_meta = pd.read_csv("cvd_data/train_setup_2.csv")
test_meta = pd.read_csv("cvd_data/test_setup_2.csv")

# Reorder test files 
print ("Reorganizing test files.... \nThis might take a while")
for i in tqdm(range(len(test_meta)), ncols=100):
  #
  src = "cvd_data/test/FULL/{0}".format(test_meta.name.iloc[i])
  dst = "cvd_data/test/dino/{0}/{1}".format(test_meta.label.iloc[i], test_meta.name.iloc[i])
  os.makedirs("cvd_data/test/dino/" + str(test_meta.label.iloc[i]), exist_ok=True)
  os.system("cp {0} {1}".format(src, dst))
  #print("cp {0} {1}".format(src, dst))


# Reorder train files 
print ("Reorganizing train files.... \nThis might take a while")
for i in tqdm(range(len(train_meta)), ncols=100):
  #
  src = "cvd_data/train/FULL/{0}".format(train_meta.name.iloc[i])
  dst = "cvd_data/train/dino/{0}/{1}".format(train_meta.label.iloc[i], train_meta.name.iloc[i])
  os.makedirs("cvd_data/train/dino/" + str(train_meta.label.iloc[i]), exist_ok=True)
  os.system("cp {0} {1}".format(src, dst))
  #print("cp {0} {1}".format(src, dst))

Reorganizing test files.... 
This might take a while


100%|█████████████████████████████████████████████████████████████| 870/870 [00:22<00:00, 38.24it/s]


Reorganizing train files.... 
This might take a while


100%|███████████████████████████████████████████████████████████| 6862/6862 [04:11<00:00, 27.26it/s]


## --- Extract DINO features ---

### Tiny run

In [None]:
import os

train_path = "cvd_data/train/dino/"
eval_path  = "cvd_data/test/dino/"
#eval_path = "Data/breast_data/d0/dino_ts/test0/"#dino_ts
out_path = "Features/tiny_cvd/"
model_chk = "Results/tiny_cvd/checkpoint0300.pth"


!python extract_features.py --arch vit_tiny --imsize 480 --multiscale 0 \
                            --train_data_path $train_path --test_data_path $eval_path \
                            --pretrained_weights $model_chk \
                            --output_dir $out_path --num-channels 1

## --- Find labels after extracting features ---

In [2]:
import numpy as np
import pandas as pd
from tqdm import tqdm 

train_meta = pd.read_csv("cvd_data/train_setup_2.csv")
test_meta = pd.read_csv("cvd_data/test_setup_2.csv")

train_feat = pd.read_csv("Features/small_cvd_s/train_features.csv")
test_feat = pd.read_csv("Features/small_cvd_s/test_features.csv")



In [None]:

# Reorder test files 
print ("Reorganizing test files.... \nThis might take a while")
for i in tqdm(range(len(test_meta)), ncols=100):
  #
  src = "cvd_data/test/FULL/{0}".format(test_meta.name.iloc[i])
  dst = "cvd_data/test/dino/{0}/{1}".format(test_meta.label.iloc[i], test_meta.name.iloc[i])
  os.makedirs("cvd_data/test/dino/" + str(test_meta.label.iloc[i]), exist_ok=True)
  os.system("cp {0} {1}".format(src, dst))
  #print("cp {0} {1}".format(src, dst))


# Reorder train files 
print ("Reorganizing train files.... \nThis might take a while")
for i in tqdm(range(len(train_meta)), ncols=100):
  #
  src = "cvd_data/train/FULL/{0}".format(train_meta.name.iloc[i])
  dst = "cvd_data/train/dino/{0}/{1}".format(train_meta.label.iloc[i], train_meta.name.iloc[i])
  os.makedirs("cvd_data/train/dino/" + str(train_meta.label.iloc[i]), exist_ok=True)
  os.system("cp {0} {1}".format(src, dst))
  #print("cp {0} {1}".format(src, dst))

Reorganizing test files.... 
This might take a while


100%|█████████████████████████████████████████████████████████████| 870/870 [00:22<00:00, 38.24it/s]


Reorganizing train files.... 
This might take a while


100%|███████████████████████████████████████████████████████████| 6862/6862 [04:11<00:00, 27.26it/s]


# **## Dimensionality reduction on features ##**

## --- Visualize with t-SNE ---

### Tiny run

In [None]:
data = "Features/tiny_cvd/"
subset = "train"
#subset = "test"

mode = "class"
#mode = "birads"

abrv = "tr_" if subset == "train" else "ts_"

if   mode == "class":
  out_i = data + "tsne/" + abrv + "class_tsne.png"
  out_r = data + "tsne/" + abrv + "class_tsne.npy"
elif mode == "birads": 
  out_i = "Features/tiny/tsne/" + abrv + "brd_tsne.png"
  out_r = "Features/tiny/tsne/" + abrv + "brd_tsne.npy"
elif mode == "ID": 
  out_i = "Features/tiny/tsne/" + abrv + "id_tsne.png"
  out_r = "Features/tiny/tsne/" + abrv + "id_tsne.npy"

!python visualize_tSNE.py --data_path $data --subset $subset \
                          --num_samples -1 --out_image $out_i --out_results $out_r \
                          --n_components 2 --mode $mode \
                          --noise 50 --n_iter 2000

2022-06-08 19:04:26,079 [INFO ]  Start data loading.... This might take a while.
2022-06-08 19:04:26,095 [INFO ]  Data with shape (6852, 192) successfully loaded! 
2022-06-08 19:04:26,095 [INFO ]  Starting t-SNE with params: 
This might take a while....
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 6852 samples in 0.002s...
[t-SNE] Computed neighbors for 6852 samples in 0.867s...
[t-SNE] Computed conditional probabilities for sample 1000 / 6852
[t-SNE] Computed conditional probabilities for sample 2000 / 6852
[t-SNE] Computed conditional probabilities for sample 3000 / 6852
[t-SNE] Computed conditional probabilities for sample 4000 / 6852
[t-SNE] Computed conditional probabilities for sample 5000 / 6852
[t-SNE] Computed conditional probabilities for sample 6000 / 6852
[t-SNE] Computed conditional probabilities for sample 6852 / 6852
[t-SNE] Mean sigma: 0.000000
[t-SNE] KL divergence after 250 iterations with early exaggeration: 30.271259
[t-SNE] KL divergence after 2000 it

### Small run

In [None]:
data = "Features/tiny_cvd/"
subset = "train"
#subset = "test"

mode = "class"
#mode = "birads"

abrv = "tr_" if subset == "train" else "ts_"

if   mode == "class":
  out_i = data + "tsne/" + abrv + "class_tsne.png"
  out_r = data + "tsne/" + abrv + "class_tsne.npy"
elif mode == "birads": 
  out_i = "Features/tiny/tsne/" + abrv + "brd_tsne.png"
  out_r = "Features/tiny/tsne/" + abrv + "brd_tsne.npy"
elif mode == "ID": 
  out_i = "Features/tiny/tsne/" + abrv + "id_tsne.png"
  out_r = "Features/tiny/tsne/" + abrv + "id_tsne.npy"

!python visualize_tSNE.py --data_path $data --subset $subset \
                          --num_samples -1 --out_image $out_i --out_results $out_r \
                          --n_components 2 --mode $mode \
                          --noise 50 --n_iter 2000

2022-06-08 19:04:26,079 [INFO ]  Start data loading.... This might take a while.
2022-06-08 19:04:26,095 [INFO ]  Data with shape (6852, 192) successfully loaded! 
2022-06-08 19:04:26,095 [INFO ]  Starting t-SNE with params: 
This might take a while....
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 6852 samples in 0.002s...
[t-SNE] Computed neighbors for 6852 samples in 0.867s...
[t-SNE] Computed conditional probabilities for sample 1000 / 6852
[t-SNE] Computed conditional probabilities for sample 2000 / 6852
[t-SNE] Computed conditional probabilities for sample 3000 / 6852
[t-SNE] Computed conditional probabilities for sample 4000 / 6852
[t-SNE] Computed conditional probabilities for sample 5000 / 6852
[t-SNE] Computed conditional probabilities for sample 6000 / 6852
[t-SNE] Computed conditional probabilities for sample 6852 / 6852
[t-SNE] Mean sigma: 0.000000
[t-SNE] KL divergence after 250 iterations with early exaggeration: 30.271259
[t-SNE] KL divergence after 2000 it

## --- Visualize with UMAP ---



### Tiny run

In [None]:
data = "Features/tiny_cvd/"
subset = "train"
subset = "test"

mode = "class"

abrv = "tr_" if subset == "train" else "ts_"

if   mode == "class":
  out_i = data + "umap/" + abrv + "class_umap.png"
  out_r = data + "umap/" + abrv + "class_umap.npy"
elif mode == "birads": 
  out_i = "Features/tiny/umap/" + abrv + "brd_umap.png"
  out_r = "Features/tiny/umap/" + abrv + "brd_umap.npy"
elif mode == "ID": 
  out_i = "Features/tiny/umap/" + abrv + "id_umap.png"
  out_r = "Features/tiny/umap/" + abrv + "id_umap.npy"

!python visualize_umap.py --data_path $data --subset $subset \
                          --num_samples -1 --out_image $out_i --out_results $out_r \
                          --n_components 2 --n_neighbors 50 --metric euclidean \
                          --mode $mode --noise 0

2022-06-08 19:24:05,203 [INFO ]  Start data loading.... This might take a while.
2022-06-08 19:24:05,213 [INFO ]  Data with shape (870, 192) successfully loaded! 
2022-06-08 19:24:05,213 [INFO ]  Starting UMAP with params: 
This might take a while....
2022-06-08 19:24:49,403 [INFO ]  UMAP done! 
2022-06-08 19:24:49,403 [INFO ]  Results saved! 
2022-06-08 19:24:49,403 [INFO ]  Visualizing.... 
2022-06-08 19:24:49,859 [INFO ]  Figure saved as: Features/tiny_cvd/umap/ts_class_umap.png



# **## Classification on DINO features ##**

## --- Conditioning data ---

In [None]:
import os
import pandas as pd

seq = "d0"
subset = "train"  # val
split = "birads" # acr

#input_path = "Data/breast_data/d2/test/"
#output_path = "Data/breast_data/d2/ts_" + split + "/"

input_path = "Data/breast_data/" + seq + "/train/"
output_path = "Data/breast_data/" + seq + "/tr_" + split + "/"

meta_file = "Data/breast_data/metadata/" + subset + "_acr_birads.csv"

meta = pd.read_csv(meta_file)
files_copied = 0

for file_ in os.listdir(input_path): 
  #
  _, pat, roi, name_img = file_.split("_")
  indexes = meta.index[(meta["patient"] == "Breast_Mri_" + str(pat)) & (meta['ROI'] == str(roi))].tolist()
  
  if indexes != []: 
    files_copied += 1
    found = meta.iloc[indexes[0]][split]
    os.makedirs(os.path.join(output_path, str(found)), exist_ok = True)
    os.system ("cp {0} {1}".format(os.path.join(input_path, file_), os.path.join(output_path, str(found), file_)))
    #print ("cp {0} {1}".format(os.path.join(input_path, file_), os.path.join(output_path, str(found), file_)))

print ("========================================")
print ("Subset: {0}, Sequence: {1}, split: {2} \n".format(subset, seq, split))
print ("Input_path: {0}, \nOutput_path: {1} \n".format(input_path, output_path))
print ("Files copied: {0} \nDone!".format(files_copied))
print ("========================================")

Subset: train, Sequence: d0, split: birads 

Input_path: Data/breast_data/d0/train/, 
Output_path: Data/breast_data/d0/tr_birads/ 

Files copied: 757 
Done!


## --- kNN ---

### per ACR

In [None]:
model_chk = "Results/tiny_cvd/checkpoint0300.pth"
data_path = "cvd_data/"
split = "class"

train_set = "train/dino/"; 
val_set = "test/dino/"
dump_features = "Features/dump/"
load_features = "Features/dump/"
patch_size = 16

!python -m torch.distributed.launch --nproc_per_node=1 eval_knn.py \
                --arch vit_tiny --patch_size $patch_size \
                --pretrained_weights $model_chk \
                --checkpoint_key teacher --num_channels 1 \
                --data_path $data_path \
                --load_features $load_features \
                --dump_features $dump_features \
                --train_path $train_set --val_path $val_set

and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See 
https://pytorch.org/docs/stable/distributed.html#launch-utility for 
further instructions

| distributed init (rank 0): env://
/usr/bin/nvidia-modprobe: unrecognized option: "-s"

ERROR: Invalid commandline, please run `/usr/bin/nvidia-modprobe --help` for
       usage information.

/usr/bin/nvidia-modprobe: unrecognized option: "-s"

ERROR: Invalid commandline, please run `/usr/bin/nvidia-modprobe --help` for
       usage information.

fatal: no es un repositorio git ( o ningún padre en el punto de montado /)
Parando en el límite del sistema de archivos (GIT_DISCOVERY_ACROSS_FILESYSTEM no establecido).
git:
  sha: N/A, status: clean, branch: N/A

arch: vit_tiny
batch_size_per_gpu: 128
checkpoint_key: teacher
data_path: cvd_data/
dist_url: env://
dump_featur

### per BIRADS

In [None]:
model_chk = "Results/tiny_d0_gs/checkpoint0300.pth"
data_path = "Data/breast_data/d0/"
split = "birads"

train_set = "tr_" + split + "/"; val_set = "ts_" + split + "/"
dump_features = "Features/dump/"
load_features = None #"Features/dump/"
patch_size = 16

!python -m torch.distributed.launch --nproc_per_node=1 eval_knn.py \
                --arch vit_tiny --patch_size $patch_size \
                --pretrained_weights $model_chk \
                --checkpoint_key teacher --num_channels 1 \
                --data_path $data_path \
                --load_features $load_features \
                --dump_features $dump_features \
                --train_path $train_set --val_path $val_set

and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See 
https://pytorch.org/docs/stable/distributed.html#launch-utility for 
further instructions

| distributed init (rank 0): env://
git:
  sha: cb711401860da580817918b9167ed73e3eef3dcf, status: has uncommited changes, branch: main

arch: vit_tiny
batch_size_per_gpu: 128
checkpoint_key: teacher
data_path: Data/breast_data/d0/
dist_url: env://
dump_features: Features/dump/
gpu: 0
load_features: None
local_rank: 0
nb_knn: [10, 20, 100, 200]
num_channels: 1
num_workers: 10
patch_size: 16
pretrained_weights: Results/tiny_d0_gs/checkpoint0300.pth
rank: 0
temperature: 0.07
train_path: tr_birads/
use_cuda: True
val_path: ts_birads/
world_size: 1
Data loaded with 758 train and 236 val imgs.
Creating ViT with channel size: 1
Model vit_tiny 16x16 built.
Take key teacher in 