# FIBAD Getting Started

In this getting started notebook we'll create an instance of a FIBAD object, train a builtin model on the CiFAR training dataset, and then use that trained model to run inference on the CiFAR testing dataset.

## Create a FIBAD instance

In [1]:
import fibad

f = fibad.Fibad()

  from torch.distributed.optim import ZeroRedundancyOptimizer
[2025-02-04 12:58:33,912 fibad:INFO] Runtime Config read from: /home/drew/code/fibad/src/fibad/fibad_default_config.toml


## Update the configuration

In [2]:
f.config["model"]["name"] = "ExampleAutoencoder"

For this demo, we'll make a few adjustments to the default configuration settings that the `fibad` object was instantiated with.
By accessing the `.config` attribute of the fibad instance, we can modify any configuration value.
There are many configuration values that can be set, but here, we update only the model to train.

## Train a model

In [3]:
f.train()

Files already downloaded and verified


[2025-02-04 12:58:34,729 fibad.models.model_registry:INFO] Using criterion: torch.nn.CrossEntropyLoss with default arguments.
2025-02-04 12:58:34,901 ignite.distributed.auto.auto_dataloader INFO: Use data loader kwargs for dataset 'Dataset CifarDataSet': 
	{'sampler': <torch.utils.data.sampler.SubsetRandomSampler object at 0x7f52c3b66e40>, 'batch_size': 512, 'num_workers': 2, 'pin_memory': True}
2025-02-04 12:58:34,902 ignite.distributed.auto.auto_dataloader INFO: Use data loader kwargs for dataset 'Dataset CifarDataSet': 
	{'sampler': <torch.utils.data.sampler.SubsetRandomSampler object at 0x7f528a178320>, 'batch_size': 512, 'num_workers': 2, 'pin_memory': True}
  from tqdm.autonotebook import tqdm
2025/02/04 12:58:35 INFO mlflow.system_metrics.system_metrics_monitor: Started monitoring system metrics.
[2025-02-04 12:58:35,289 fibad.pytorch_ignite:INFO] Training model on device: cuda


  1%|1         | 1/79 [00:00<?, ?it/s]

  1%|1         | 1/79 [00:00<?, ?it/s]

  1%|1         | 1/79 [00:00<?, ?it/s]

  1%|1         | 1/79 [00:00<?, ?it/s]

  1%|1         | 1/79 [00:00<?, ?it/s]

  1%|1         | 1/79 [00:00<?, ?it/s]

  1%|1         | 1/79 [00:00<?, ?it/s]

  1%|1         | 1/79 [00:00<?, ?it/s]

  1%|1         | 1/79 [00:00<?, ?it/s]

  1%|1         | 1/79 [00:00<?, ?it/s]

[2025-02-04 12:59:25,991 fibad.pytorch_ignite:INFO] Total training time: 50.70[s]
[2025-02-04 12:59:25,992 fibad.pytorch_ignite:INFO] Latest checkpoint saved as: /home/drew/code/fibad/docs/pre_executed/results/20250204-125833-train-ka1H/checkpoint_epoch_10.pt
[2025-02-04 12:59:25,992 fibad.pytorch_ignite:INFO] Best metric checkpoint saved as: /home/drew/code/fibad/docs/pre_executed/results/20250204-125833-train-ka1H/checkpoint_10_loss=-120.1850.pt
2025/02/04 12:59:26 INFO mlflow.system_metrics.system_metrics_monitor: Stopping system metrics monitoring...
2025/02/04 12:59:26 INFO mlflow.system_metrics.system_metrics_monitor: Successfully terminated system metrics monitoring!
[2025-02-04 12:59:26,028 fibad.train:INFO] Finished Training


The output of the training will be stored in a time-stamped directory under the `./results/`.
By default, a copy of the final configuration used in training is persisted as `runtime_config.toml`.
To run fibad again with the same configuration, you can reference this runtime_config.toml file.

If running in another notebook, instantiate a fibad object like so:
```
new_fibad_instance = fibad.Fibad(config_file='./results/<timestamped_directory>/runtime_config.toml')
```

Or from the command line:
```
>> fibad train --runtime-config ./results/<timestamped_directory>/runtime_config.toml
```

Note here we're training on only a small handful of CiFAR data, but FIBAD has demonstrated that it can scale up to training sets with >1M samples.

## Run inference

In [4]:
f.config["data_set"]["test_size"] = 1.0
f.config["data_set"]["train_size"] = 0.0
f.config["data_set"]["validate_size"] = 0.0
f.config["data_loader"]["batch_size"] = 128

f.infer()

Files already downloaded and verified


[2025-02-04 12:59:26,734 fibad.models.model_registry:INFO] Using criterion: torch.nn.CrossEntropyLoss with default arguments.
[2025-02-04 12:59:26,736 fibad.infer:INFO] data set has length 50000
2025-02-04 12:59:26,736 ignite.distributed.auto.auto_dataloader INFO: Use data loader kwargs for dataset 'Dataset CifarDataSet': 
	{'sampler': None, 'batch_size': 128, 'num_workers': 2, 'pin_memory': True}
[2025-02-04 12:59:27,068 fibad.pytorch_ignite:INFO] Evaluating model on device: cuda
[2025-02-04 12:59:27,068 fibad.pytorch_ignite:INFO] Total epochs: 1
[2025-02-04 12:59:44,114 fibad.pytorch_ignite:INFO] Total evaluation time: 17.05[s]
[2025-02-04 12:59:44,164 fibad.infer:INFO] Inference results saved in: /home/drew/code/fibad/docs/pre_executed/results/20250204-125926-infer-xT9b


Once a model has been trained, we can use the model weights file to run inference.
By default running `infer` will look for the latest available model weights file.
A specific model weights file can be specified with `f.config['infer']['model_weights_file'] = <path_to_model_weights_file>`.

Here we'll make use of the last trained model weights file, and update the data set splits so that 100% of the data will be used for inference.

With the configuration updated, we can run inference by calling `f.infer()`.

The results of running inference are saved in the output directory noted in the last log line.
The default output format is batched .npy files.
Additionally a ChromaDB vector database will be populated with the inference results to enable efficient similarity search.