<a href="https://colab.research.google.com/github/facebookresearch/vissl/blob/v0.1.6/tutorials/Feature_Extraction_V0_1_6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Copyright (c) Facebook, Inc. and its affiliates. All rights reserved.

# Feature Extraction

In this tutorial, we look at a simple example of how to use VISSL to extract features for [ResNet-50 Torchvision pre-trained model](https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py#L16).

You can make a copy of this tutorial by `File -> Open in playground mode` and make changes there. Please do *NOT* request access to this tutorial.

**NOTE:** Please ensure your Collab Notebook has a GPU available. To ensure this, simply follow: `Edit -> Notebook Settings -> select GPU.`

# Install VISSL


Installing VISSL is straightfoward. We will install VISSL from source using pip, following the instructions from [here](https://github.com/facebookresearch/vissl/blob/master/INSTALL.md#install-vissl-pip-package). Note, you can also install VISSL in a conda environment or from our conda/pip binaries.

In [None]:
# Install pytorch version 1.8
!pip install torch==1.8.0+cu101 torchvision==0.9.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

# install Apex by checking system settings: cuda version, pytorch version, and python version
import sys
import torch
version_str="".join([
    f"py3{sys.version_info.minor}_cu",
    torch.version.cuda.replace(".",""),
    f"_pyt{torch.__version__[0:5:2]}"
])
print(version_str)

# install apex (pre-compiled with optimizer C++ extensions and CUDA kernels)
!pip install apex -f https://dl.fbaipublicfiles.com/vissl/packaging/apexwheels/{version_str}/download.html

# # clone vissl repository and checkout latest version.
!git clone --recursive https://github.com/facebookresearch/vissl.git

%cd vissl/

!git checkout v0.1.6
!git checkout -b v0.1.6

# install vissl dependencies
!pip install --progress-bar off -r requirements.txt
!pip install opencv-python

# update classy vision install to commit compatible with v0.1.6
!pip uninstall -y classy_vision
!pip install classy-vision@https://github.com/facebookresearch/ClassyVision/tarball/4785d5ee19d3bcedd5b28c1eb51ea1f59188b54d

# Update fairscale to commit compatible with v0.1.6
!pip uninstall -y fairscale
!pip install fairscale@https://github.com/facebookresearch/fairscale/tarball/df7db85cef7f9c30a5b821007754b96eb1f977b6

# install vissl dev mode (e stands for editable)
!pip install -e .[dev]

VISSL should be successfuly installed by now and all the dependencies should be available.

In [None]:
import vissl
import tensorboard
import apex
import torch

## Download the ResNet-50 weights from Torchvision

We download the weights from the [torchvision ResNet50 model](https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py#L16):

In [None]:
!wget https://download.pytorch.org/models/resnet50-19c8e357.pth -P /content/

## Creating a dummy dataset

For the purpose of this tutorial, since we don't have ImageNet on the disk, we will create a dummy dataset by copying an image from COCO dataset in ImageNet dataset folder style as below:

In [None]:
!mkdir -p /content/dummy_data/train/class1
!mkdir -p /content/dummy_data/train/class2
!mkdir -p /content/dummy_data/val/class1
!mkdir -p /content/dummy_data/val/class2

# create 2 classes in train and add 5 images per class
!wget http://images.cocodataset.org/val2017/000000439715.jpg -q -O /content/dummy_data/train/class1/img1.jpg
!wget http://images.cocodataset.org/val2017/000000439715.jpg -q -O /content/dummy_data/train/class1/img2.jpg
!wget http://images.cocodataset.org/val2017/000000439715.jpg -q -O /content/dummy_data/train/class1/img3.jpg
!wget http://images.cocodataset.org/val2017/000000439715.jpg -q -O /content/dummy_data/train/class1/img4.jpg
!wget http://images.cocodataset.org/val2017/000000439715.jpg -q -O /content/dummy_data/train/class1/img5.jpg

!wget http://images.cocodataset.org/val2017/000000439715.jpg -q -O /content/dummy_data/train/class2/img1.jpg
!wget http://images.cocodataset.org/val2017/000000439715.jpg -q -O /content/dummy_data/train/class2/img2.jpg
!wget http://images.cocodataset.org/val2017/000000439715.jpg -q -O /content/dummy_data/train/class2/img3.jpg
!wget http://images.cocodataset.org/val2017/000000439715.jpg -q -O /content/dummy_data/train/class2/img4.jpg
!wget http://images.cocodataset.org/val2017/000000439715.jpg -q -O /content/dummy_data/train/class2/img5.jpg

# create 2 classes in val and add 5 images per class
!wget http://images.cocodataset.org/val2017/000000439715.jpg -q -O /content/dummy_data/val/class1/img1.jpg
!wget http://images.cocodataset.org/val2017/000000439715.jpg -q -O /content/dummy_data/val/class1/img2.jpg
!wget http://images.cocodataset.org/val2017/000000439715.jpg -q -O /content/dummy_data/val/class1/img3.jpg
!wget http://images.cocodataset.org/val2017/000000439715.jpg -q -O /content/dummy_data/val/class1/img4.jpg
!wget http://images.cocodataset.org/val2017/000000439715.jpg -q -O /content/dummy_data/val/class1/img5.jpg

!wget http://images.cocodataset.org/val2017/000000439715.jpg -q -O /content/dummy_data/val/class2/img1.jpg
!wget http://images.cocodataset.org/val2017/000000439715.jpg -q -O /content/dummy_data/val/class2/img2.jpg
!wget http://images.cocodataset.org/val2017/000000439715.jpg -q -O /content/dummy_data/val/class2/img3.jpg
!wget http://images.cocodataset.org/val2017/000000439715.jpg -q -O /content/dummy_data/val/class2/img4.jpg
!wget http://images.cocodataset.org/val2017/000000439715.jpg -q -O /content/dummy_data/val/class2/img5.jpg


## Using the custom data in VISSL

Next step for us is to register the dummy data we created above with VISSL. Registering the dataset involves telling VISSL about the dataset name and the paths for the dataset. For this, we create a simple json file with the metadata and save it to `configs/config/dataset_catalog.py` file.

**NOTE**: VISSL uses the specific `dataset_catalog.json` under the path `configs/config/dataset_catalog.json`.

In [None]:
json_data = {
    "dummy_data_folder": {
      "train": [
        "/content/dummy_data/train", "/content/dummy_data/train"
      ],
      "val": [
        "/content/dummy_data/val", "/content/dummy_data/val"
      ]
    }
}

# use VISSL's api to save or you can use your custom code.
from vissl.utils.io import save_file
save_file(json_data, "/content/vissl/configs/config/dataset_catalog.json", append_to_json=False)

Next, we verify that the dataset is registered with VISSL. For that we query VISSL's dataset catalog as below:

In [None]:
from vissl.data.dataset_catalog import VisslDatasetCatalog

# list all the datasets that exist in catalog
print(VisslDatasetCatalog.list())

# get the metadata of dummy_data_folder dataset
print(VisslDatasetCatalog.get("dummy_data_folder"))

** Please migrate to the version in iopath repo. **
https://github.com/facebookresearch/iopath 



['dummy_data_folder']
{'train': ['/content/dummy_data/train', '/content/dummy_data/train'], 'val': ['/content/dummy_data/val', '/content/dummy_data/val']}


# Loading Pre-trained models in VISSL

VISSL supports Torchvision models out of the box. Generally, for loading any non-VISSL model, one needs to correctly set the following configuration options:

```yaml
WEIGHTS_INIT:
  # path to the .torch weights files
  PARAMS_FILE: ""
  # name of the state dict. checkpoint = {"classy_state_dict": {layername:value}}. Options:
  #   1. classy_state_dict - if model is trained and checkpointed with VISSL.
  #      checkpoint = {"classy_state_dict": {layername:value}}
  #   2. "" - if the model_file is not a nested dictionary for model weights i.e.
  #      checkpoint = {layername:value}
  #   3. key name that your model checkpoint uses for state_dict key name.
  #      checkpoint = {"your_key_name": {layername:value}}
  STATE_DICT_KEY_NAME: "classy_state_dict"
  # specify what layer should not be loaded. Layer names with this key are not copied
  # By default, set to BatchNorm stats "num_batches_tracked" to be skipped.
  SKIP_LAYERS: ["num_batches_tracked"]
  ####### If loading a non-VISSL trained model, set the following two args carefully #########
  # to make the checkpoint compatible with VISSL, if you need to remove some names
  # from the checkpoint keys, specify the name
  REMOVE_PREFIX: ""
  # In order to load the model (if not trained with VISSL) with VISSL, there are 2 scenarios:
  #    1. If you are interested in evaluating the model features and freeze the trunk.
  #       Set APPEND_PREFIX="trunk.base_model." This assumes that your model is compatible
  #       with the VISSL trunks. The VISSL trunks start with "_feature_blocks." prefix. If
  #       your model doesn't have these prefix you can append them. For example:
  #       For TorchVision ResNet trunk, set APPEND_PREFIX="trunk.base_model._feature_blocks."
  #    2. where you want to load the model simply and finetune the full model.
  #       Set APPEND_PREFIX="trunk."
  #       This assumes that your model is compatible with the VISSL trunks. The VISSL
  #       trunks start with "_feature_blocks." prefix. If your model doesn't have these
  #       prefix you can append them.
  #       For TorchVision ResNet trunk, set APPEND_PREFIX="trunk._feature_blocks."
  # NOTE: the prefix is appended to all the layers in the model
  APPEND_PREFIX: ""
  ```

## Extract the TRUNK features

We are ready to extract the TRUNK features now. For the purpose of this tutorial, we will use synthetic dataset and train on dummy images. VISSL supports training on wide range of datasets and allows adding custom datasets. Please see VISSL documentation on how to use the datasets. To train on ImageNet instead: assuming your ImageNet dataset folder path is `/path/to/my/imagenet/folder/`, you can add the following command line 
input to your training command: 
```
config.DATA.TRAIN.DATASET_NAMES=[imagenet1k_folder] \
config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
config.DATA.TRAIN.DATA_PATHS=["/path/to/my/imagenet/folder/train"] \
config.DATA.TRAIN.LABEL_SOURCES=[disk_folder]
```

VISSL provides a [helper python tool](https://github.com/facebookresearch/vissl/blob/main/tools/run_distributed_engines.py) that allows to use VISSL for training purposes. This tool allows:
- training and feature extraction.
- training on 1-gpu, multi-gpu, or even multi-machine using Pytorch DDP or Fairscale FSDP.

VISSL provides yaml configuration files for extracting features [here](https://github.com/facebookresearch/vissl/tree/main/configs/config/feature_extraction). 

For the purpose of this tutorial, we will use the config file for extracting features from several layers of the trunk of ResNet-50 supervised model on 1-gpu.


In [None]:
%cd /content/vissl/
!python3 tools/run_distributed_engines.py \
    hydra.verbose=true \
    config=feature_extraction/extract_resnet_in1k_8gpu \
    +config/feature_extraction/trunk_only=rn50_layers.yaml \
    config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
    config.DATA.TRAIN.LABEL_SOURCES=[disk_folder] \
    config.DATA.TRAIN.DATASET_NAMES=[dummy_data_folder] \
    config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=2 \
    config.DATA.TEST.DATA_SOURCES=[disk_folder] \
    config.DATA.TEST.LABEL_SOURCES=[disk_folder] \
    config.DATA.TEST.DATASET_NAMES=[dummy_data_folder] \
    config.DATA.TEST.BATCHSIZE_PER_REPLICA=2 \
    config.DISTRIBUTED.NUM_NODES=1 \
    config.DISTRIBUTED.NUM_PROC_PER_NODE=1 \
    config.CHECKPOINT.DIR="/content/checkpoints" \
    config.MODEL.WEIGHTS_INIT.PARAMS_FILE="/content/resnet50-19c8e357.pth" \
    config.MODEL.WEIGHTS_INIT.APPEND_PREFIX="trunk.base_model._feature_blocks." \
    config.MODEL.WEIGHTS_INIT.STATE_DICT_KEY_NAME="" \
    config.EXTRACT_FEATURES.CHUNK_THRESHOLD=-1


/content/vissl
** fvcore version of PathManager will be deprecated soon. **
** Please migrate to the version in iopath repo. **
https://github.com/facebookresearch/iopath 

####### overrides: ['hydra.verbose=true', 'config=feature_extraction/extract_resnet_in1k_8gpu', '+config/feature_extraction/trunk_only=rn50_layers.yaml', 'config.DATA.TRAIN.DATA_SOURCES=[disk_folder]', 'config.DATA.TRAIN.LABEL_SOURCES=[disk_folder]', 'config.DATA.TRAIN.DATASET_NAMES=[dummy_data_folder]', 'config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=2', 'config.DATA.TEST.DATA_SOURCES=[disk_folder]', 'config.DATA.TEST.LABEL_SOURCES=[disk_folder]', 'config.DATA.TEST.DATASET_NAMES=[dummy_data_folder]', 'config.DATA.TEST.BATCHSIZE_PER_REPLICA=2', 'config.DISTRIBUTED.NUM_NODES=1', 'config.DISTRIBUTED.NUM_PROC_PER_NODE=1', 'config.CHECKPOINT.DIR=/content/checkpoints', 'config.MODEL.WEIGHTS_INIT.PARAMS_FILE=/content/resnet50-19c8e357.pth', 'config.MODEL.WEIGHTS_INIT.APPEND_PREFIX=trunk.base_model._feature_blocks.', 'config.MODE

And we are done!! We have the features, for layers `conv1, res2, res3, res4, res5, res5avg` in `checkpoints/*.npy`. Additionally we save the data indexes and targets for each image.

In [None]:
!ls /content/checkpoints/

log.txt					rank0_chunk0_train_conv1_features.npy
rank0_chunk0_test_conv1_features.npy	rank0_chunk0_train_conv1_inds.npy
rank0_chunk0_test_conv1_inds.npy	rank0_chunk0_train_conv1_targets.npy
rank0_chunk0_test_conv1_targets.npy	rank0_chunk0_train_res2_features.npy
rank0_chunk0_test_res2_features.npy	rank0_chunk0_train_res2_inds.npy
rank0_chunk0_test_res2_inds.npy		rank0_chunk0_train_res2_targets.npy
rank0_chunk0_test_res2_targets.npy	rank0_chunk0_train_res3_features.npy
rank0_chunk0_test_res3_features.npy	rank0_chunk0_train_res3_inds.npy
rank0_chunk0_test_res3_inds.npy		rank0_chunk0_train_res3_targets.npy
rank0_chunk0_test_res3_targets.npy	rank0_chunk0_train_res4_features.npy
rank0_chunk0_test_res4_features.npy	rank0_chunk0_train_res4_inds.npy
rank0_chunk0_test_res4_inds.npy		rank0_chunk0_train_res4_targets.npy
rank0_chunk0_test_res4_targets.npy	rank0_chunk0_train_res5avg_features.npy
rank0_chunk0_test_res5avg_features.npy	rank0_chunk0_train_res5avg_inds.npy
rank0_chunk0_test_res5avg_in

# Loading Extracted Trunk Features


We also offer a clean and easy to use [API](https://github.com/facebookresearch/vissl/blob/v0.1.6/vissl/utils/extract_features_utils.py) for loading and manipulating the extracted features. The features will have shape

In [None]:
from vissl.utils.extract_features_utils import ExtractedFeaturesLoader

# We will load the res5 test features
features = ExtractedFeaturesLoader.load_features(
  input_dir="/content/checkpoints/",
  split="test", 
  layer="res5"
)

feature_shape = features['features'].shape
indeces_shape = features['inds'].shape
targets_shape = features['targets'].shape

print(f"Res5 test features have the following shape: {feature_shape}")
print(f"Res5 test indexes have the following shape: {indeces_shape}")
print(f"Res5 test targets have the following shape: {targets_shape}")

Res5 test features have the following shape: (10, 2048, 2, 2)
Res5 test indexes have the following shape: (10,)
Res5 test targets have the following shape: (10, 1)


# Download Torchvision Model Compatible with VISSL Heads 

Next, we will extract the features from the HEAD of the model. First we must download a VISSL compatible checkpoint: while we can load the torchvision TRUNK into vissl without any changes, we must slightly reformat the checkpoint to load the HEAD. 

See [here](https://github.com/facebookresearch/vissl/blob/main/extra_scripts/convert_vissl_to_torchvision.py) as an example for the vissl checkpoint format.

In [None]:
!wget https://dl.fbaipublicfiles.com/vissl/tutorials/resnet_50_torchvision_vissl_compatible.torch -P /content/

In [None]:
!python3 tools/run_distributed_engines.py \
    hydra.verbose=true \
    config=feature_extraction/extract_resnet_in1k_8gpu \
    +config/feature_extraction/with_head=rn50_supervised.yaml \
    config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
    config.DATA.TRAIN.LABEL_SOURCES=[disk_folder] \
    config.DATA.TRAIN.DATASET_NAMES=[dummy_data_folder] \
    config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=2 \
    config.DATA.TEST.DATA_SOURCES=[disk_folder] \
    config.DATA.TEST.LABEL_SOURCES=[disk_folder] \
    config.DATA.TEST.DATASET_NAMES=[dummy_data_folder] \
    config.DATA.TEST.BATCHSIZE_PER_REPLICA=2 \
    config.DISTRIBUTED.NUM_NODES=1 \
    config.DISTRIBUTED.NUM_PROC_PER_NODE=1 \
    config.CHECKPOINT.DIR="/content/checkpoints" \
    config.MODEL.WEIGHTS_INIT.PARAMS_FILE="/content/resnet_50_torchvision_vissl_compatible.torch"

** fvcore version of PathManager will be deprecated soon. **
** Please migrate to the version in iopath repo. **
https://github.com/facebookresearch/iopath 

####### overrides: ['hydra.verbose=true', 'config=feature_extraction/extract_resnet_in1k_8gpu', '+config/feature_extraction/with_head=rn50_supervised.yaml', 'config.DATA.TRAIN.DATA_SOURCES=[disk_folder]', 'config.DATA.TRAIN.LABEL_SOURCES=[disk_folder]', 'config.DATA.TRAIN.DATASET_NAMES=[dummy_data_folder]', 'config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=2', 'config.DATA.TEST.DATA_SOURCES=[disk_folder]', 'config.DATA.TEST.LABEL_SOURCES=[disk_folder]', 'config.DATA.TEST.DATASET_NAMES=[dummy_data_folder]', 'config.DATA.TEST.BATCHSIZE_PER_REPLICA=2', 'config.DISTRIBUTED.NUM_NODES=1', 'config.DISTRIBUTED.NUM_PROC_PER_NODE=1', 'config.CHECKPOINT.DIR=/content/checkpoints', 'config.MODEL.WEIGHTS_INIT.PARAMS_FILE=/content/resnet_50_torchvision_vissl_compatible.torch', 'hydra.verbose=true']
INFO 2021-10-14 19:03:35,049 distributed_launcher.py: 18

And we are done!! We have the features for the output of the HEAD. Here we have output the features, the data indexes, and the targets of each image. 

In [None]:
!ls /content/checkpoints/ | grep heads

rank0_chunk0_test_heads_features.npy
rank0_chunk0_test_heads_inds.npy
rank0_chunk0_test_heads_targets.npy
rank0_chunk0_train_heads_features.npy
rank0_chunk0_train_heads_inds.npy
rank0_chunk0_train_heads_targets.npy
rank0_chunk1_test_heads_features.npy
rank0_chunk1_test_heads_inds.npy
rank0_chunk1_test_heads_targets.npy
rank0_chunk1_train_heads_features.npy
rank0_chunk1_train_heads_inds.npy
rank0_chunk1_train_heads_targets.npy
rank0_chunk2_test_heads_features.npy
rank0_chunk2_test_heads_inds.npy
rank0_chunk2_test_heads_targets.npy
rank0_chunk2_train_heads_features.npy
rank0_chunk2_train_heads_inds.npy
rank0_chunk2_train_heads_targets.npy
rank0_chunk3_test_heads_features.npy
rank0_chunk3_test_heads_inds.npy
rank0_chunk3_test_heads_targets.npy
rank0_chunk3_train_heads_features.npy
rank0_chunk3_train_heads_inds.npy
rank0_chunk3_train_heads_targets.npy
rank0_chunk4_test_heads_features.npy
rank0_chunk4_test_heads_inds.npy
rank0_chunk4_test_heads_targets.npy
rank0_chunk4_train_heads_features.

# Extract the Output of the Model Head

We are ready to extract the HEAD now. We will reuse the same dataset and base configuration and change a few configuration options. 

In the launch_distributed command above, we will replace 


```
+config/trunk_only=feature_extraction/trunk_only=rn50_layers.yaml \
```

with the following:

```
+config/trunk_only=feature_extraction/with_head=rn50_supervised.yaml \
```

Taking a look at the differences between the two config options


```yaml
# feature_extraction/trunk_only/rn50_layers.yaml
# @package _global_
config:
  MODEL:
    FEATURE_EVAL_SETTINGS:
      EVAL_MODE_ON: True
      FREEZE_TRUNK_ONLY: True
      EXTRACT_TRUNK_FEATURES_ONLY: True
      SHOULD_FLATTEN_FEATS: False
      LINEAR_EVAL_FEAT_POOL_OPS_MAP: [
        ["conv1", ["AvgPool2d", [[10, 10], 10, 4]]],
        ["res2", ["AvgPool2d", [[16, 16], 8, 0]]],
        ["res3", ["AvgPool2d", [[13, 13], 5, 0]]],
        ["res4", ["AvgPool2d", [[8, 8], 3, 0]]],
        ["res5", ["AvgPool2d", [[6, 6], 1, 0]]],
        ["res5avg", ["Identity", []]],
      ]
    TRUNK:
      NAME: resnet
      RESNETS:
        DEPTH: 50
  EXTRACT_FEATURES:
    CHUNK_THRESHOLD: -1
```

```yaml
# feature_extraction/with_head/rn50_supervised.yaml
# @package _global_
config:
  MODEL:
    FEATURE_EVAL_SETTINGS:
      EVAL_MODE_ON: True
      FREEZE_TRUNK_AND_HEAD: True
      EVAL_TRUNK_AND_HEAD: True
    TRUNK:
      NAME: resnet
      RESNETS:
        DEPTH: 50
    HEAD:
      PARAMS: [
        ["mlp", {"dims": [2048, 1000]}],
      ]
  EXTRACT_FEATURES:
    CHUNK_THRESHOLD: -1
```

- For both configs we set `EVAL_MODE_ON: True`.
- Since we are not training, we want to freeze the weights. For extracting the TRUNK, we set: `FREEZE_TRUNK_ONLY: True`, whereas for extracting the HEAD, we set `FREEZE_TRUNK_AND_HEAD: True`. 
- To extract the TRUNK features, we set `EXTRACT_TRUNK_FEATURES_ONLY: True`, since we want to preserve the tensor's shape, we set `SHOULD_FLATTEN_FEATS: False`, and finally we specify the layers we want to extract in `LINEAR_EVAL_FEAT_POOL_OPS_MAP`.
- To extract the HEAD, we set `EVAL_TRUNK_AND_HEAD: True`. We also need to specify the HEAD model, here we have a (2048,1000),fully-connected linear layer from the TRUNK to the model output. And finally, 
- Finally CHUNK_THRESHOLD controls how many features to accumulate before writing them to disk. The option of `-1` means to keep all in memory before writing to disk.

As a reminder please check the `vissl/config/defaults.yaml` file for more information on all config options.

# Loading Extracted Head Features

Using the same [API](https://github.com/facebookresearch/vissl/blob/v0.1.6/vissl/utils/extract_features_utils.py) as above, we can load the HEAD features.

In [None]:
from vissl.utils.extract_features_utils import ExtractedFeaturesLoader

# We will load the res5 test features
features = ExtractedFeaturesLoader.load_features(
  input_dir="/content/checkpoints/",
  split="train", 
  layer="heads"
)

# Access the shapes of each of the features.
feature_shape = features['features'].shape
indeces_shape = features['inds'].shape
targets_shape = features['targets'].shape

print(f"Head train features have the following shape: {feature_shape}")
print(f"Head train indexes have the following shape: {indeces_shape}")
print(f"Head train targets have the following shape: {targets_shape}")

Head train features have the following shape: (10, 1000)
Head train indexes have the following shape: (10,)
Head train targets have the following shape: (10, 1)
