# Classification (step through)

# Fellow the Megadector Classification repo first

# Overview


This README describes how to train and run an animal "species" classifier. "Species" is in quotes, because the classifier can be trained to identify animals at arbitrary levels within the biological taxonomy of animals.

This guide is written for internal use at Microsoft AI for Earth. Certain services, such as MegaDB and various private repos are only accessible interally within Microsoft. However, this guide may still be of interest to more technical users of the AI for Earth Camera Trap services.

The classifiers trained with this pipeline are intended to be used in conjunction with MegaDetector, i.e., we use MegaDetector to find animals and crop them out, and we train/run our classifiers on those crops.



# Setup

### 1. Install Environment through yml file

Install Anaconda or miniconda3. Then create the conda environment using the following command:

In [None]:
!conda create -n cameratraps-classifier
!conda activate cameratraps-classifier
!cd git/cameratraps/
!conda env update -f environment-classifier.yml --prune
#could be for a while - about 30 mins 

### 2.Verifying that CUDA is available (and dealing with the case where it isn't)


Verify that CUDA is available (assumes that the current working directory is the CameraTraps repo root):

In [None]:
!python sandbox/torch_test.py

If CUDA isn't available but should be (i.e., you have an NVIDIA GPU and recent drivers)...
it would return : `CUDA available: False`

YMMV, but in at least one Linux environment, the following fixed this issue:



In [None]:
!pip uninstall torch torchvision
!conda install pytorch=1.10.1 torchvision=0.11.2 -c pytorch

### 3. Optional steps to make classification faster in Linux

If you are on Linux, you may also get some speedup by installing the accimage package for acclerated image loading. Because this is Linux-only and optional, we have commented it out of the environment file, but you can install it with:


In [None]:
!conda install -c conda-forge accimage

Similarly, on Linux, you may get some speedup by installing Pillow-SIMD:

In [None]:
!pip uninstall -y pillow
!pip install pillow-simd

### 4. Environment Variables

The following environment variables are useful to have in `.bashrc`:

In [1]:
# Python development
export PYTHONPATH="/path/to/repos/CameraTraps:/path/to/repos/ai4eutils"
export MYPYPATH=$PYTHONPATH

# accessing MegaDB
export COSMOS_ENDPOINT="[INTERNAL_USE]"
export COSMOS_KEY="[INTERNAL_USE]"

# running Batch API
export BATCH_DETECTION_API_URL="http://[INTERNAL_USE]/v3/camera-trap/detection-batch"
export CLASSIFICATION_BLOB_STORAGE_ACCOUNT="[INTERNAL_USE]"
export CLASSIFICATION_BLOB_CONTAINER="classifier-training"
export CLASSIFICATION_BLOB_CONTAINER_WRITE_SAS="[INTERNAL_USE]"
export DETECTION_API_CALLER="[INTERNAL_USE]"

# AniMl repo Set up


#### Clone relevant repos and build Conda environments

To clone the necessary repos, navigate to the `studio-lab-user`'s `home` directory, and from the sidebar menu select the Git icon > "Clone A Repository". Enter the Git repo URL for the [microsoft/CameraTraps](https://github.com/microsoft/CameraTraps) repo (https://github.com/microsoft/CameraTraps.git), and click "Clone". If you had the "Search for environment.yml and build Conda environment." Box checked, it should automatically build the `cameratraps` Conda environment.

Follow the same steps to clone the
- [microsoft/ai4eutils](https://github.com/microsoft/ai4eutils) repo
- [animl-analytics](https://github.com/tnc-ca-geo/animl-analytics) repo
- and this ([animl-ml](https://github.com/tnc-ca-geo/animl-ml)) repo

Next, navigate to `~/Cameratraps/` project root directory and run `conda env update -f environment-classifier.yml --prune` to build the `cameratraps-classifier` Conda environment, which is the primary one we'll be using.

Finally, activate the `cameratraps-classifier` env and install `azure-cosmos` dependency (it's required but seemed to be missing from the env):

```
conda activate cameratraps-classifier
conda install -n cameratraps-classifier -c conda-forge azure-cosmos
```


In [None]:
!cd ~ 
!git clone https://github.com/microsoft/ai4eutils ai4eutils
!git clone https://github.com/tnc-ca-geo/animl-analytics animl-analytics
!git clone https://github.com/tnc-ca-geo/animl-ml animl-ml


#### Add additional directories

Add additional directories (`~/classifier-training`, `~/images`, `~/crops`, etc.) so that the contents of your `home/` directory matches the following structure:

```
ai4eutils/                      # Microsoft's AI for Earth Utils repo

animl-analytics/                # animl-analytics repo (utilities for exporting images)

animl-ml/                       # This repo, contains Animl-specific utilities

CameraTraps/                    # Microsoft's CameraTraps repo
    classification/
        BASE_LOGDIR/            # classification dataset and splits
            LOGDIR/             # logs and checkpoints from a single training run

classifier-training/            
    mdcache/                    # cached "MegaDetector" outputs
        v5.0b/                  #   NOTE: MegaDetector is in quotes because we're
            datasetX.json       #   also storing Animl annotations here too
    megaclassifier/             # files relevant to MegaClassifier

crops/                          # local directory to save cropped images
    datasetX/                   # images are organized by dataset
        img0___crop00.jpg

images/                         # local directory to save full-size images
    datasetX/                   # images are organized by dataset
        img0.jpg

```



#### Setup Env variables
The following environment variables are useful to have in `.bashrc`:

```bash
# Python development
export PYTHONPATH="/home/<user>/CameraTraps:/home/<user>/ai4eutils"
export MYPYPATH=$PYTHONPATH
```

It's also helpful to set a `$BASE_LOGDIR` variable for the session:
```bash
export BASE_LOGDIR="/home/<user>/CameraTraps/classification/BASE_LOGDIR"
```


### AWS Configuration 

```bash
!aws configure
```
```
AWS Access Key ID [None]: enter your Key ID
AWS Secret Access Key [None]: enter your Access Key
Default region name [None]: us-west-2
Default output format [None]: json
```

Then we can test whether it worked by running the following command:
``` 
aws s3 ls s3://animl-images-archive-prod 
```

# Training pipeline

## 1. Select classification labels for training


In the Animl Interfece, the following filters make sense when you're exporting the data:
- fox
- bird
- skunk
- rodent
- lizard

Then download the data with selected labels from the AniMl interface by clicking `EXPORT TO COCO` format. Then we get the cct.json file. Next, we can use it to download the images.

## 2. Download all the full size images referenced in the cct.json(COCO) file


This code downloads image files from Amazon S3 and saves them to a local directory. 

The code takes two arguments: "--coco-file", the path to the coco file, and "--output-dir", the local directory to download the images to.

The code uses the "boto3" library, which is the Amazon Web Services (AWS) SDK for Python, to access and interact with AWS services. The AWS profile and region are set as environment variables, and a session is established with boto3.

The "download_image_files" function takes a list of image records, the destination directory, and the source bucket (defaulted to "animl-images-archive-prod"). It prints the number of image files being downloaded and downloads each image by using the "download_file" method of the S3 client.

The "load_json" function loads the data from a file in JSON format.

The code runs the "download_image_files" function if both the "--coco-file" and "--output-dir" arguments are provided. If either argument is missing, a message is displayed to the user to supply both arguments.





In [None]:
!python ~/animl-analytics/utils/download_images.py \
 --coco-file  ~/classifier-training/mdcache/v5.0b/<dataset_name>_cct.json\
 --output-dir ~/images/<dataset_name>

#remember to change the dataset_name


## 3. Create a classification label specification JSON file(same format that MegaDetector outputs)

Create a classification label specification JSON file (usually named label_spec.json). This file defines the labels that our classifier will be trained to distinguish, as well as the original dataset labels and/or biological taxa that will map to each classification label. 

Some of the following steps expect the image annotations to be in the same format that MegaDetector outputs after processing a batch of images. <b>To convert the COCO for Cameratraps file</b> that we exported from Animl to a MegaDetector results file, navigate to the /home/studio-lab-user/ directory and run:

In [None]:
!python animl-ml/classification/utils/cct_to_md.py \
  --input_filename ~/classifier-training/mdcache/v5.0b/<dataset_name>_cct.json \
  --output_filename ~/classifier-training/mdcache/v5.0b/<dataset_name>_md.json
#remember to change the dataset_name

The code defines two functions: cct_to_md and _parse_args.

cct_to_md takes two arguments, the input_filename which is the path to the cct file in json format and the output_filename which is the path to the output file in MegaDetector format.

The function starts by checking if the input file exists, and if the output file name is not provided, it derives it from the input file name by adding "_md-format" before the file extension.

The function then loads the contents of the input file into a dictionary d. It checks if the input file contains the required keys 'annotations', 'images' and 'categories', and if any of the keys is missing, it raises an error.

The function then prepares metadata for the output file. It initializes a defaultdict image_id_to_annotations to store the annotations for each image. It also creates a dictionary category_id_to_name to store the mapping from category ID to category name.

The function then loops over each image in the input file and for each image, it loops over its annotations. If the annotation has a bounding box (bbox), the code creates a detection dictionary and appends it to a list of detections. The detection dictionary contains the category name, confidence score, and the bounding box in MegaDetector format.

Finally, the function writes the output to a file in the MegaDetector format.

_parse_args is a helper function that defines and parses the command line arguments. It takes the input file path and the output file path as command line arguments and returns them as a Namespace object.

## 4. For images with ground-truth bounding boxes, generate bounding boxes using MegaDetector


While some labeled images in MegaDB already have ground-truth bounding boxes, other images do not. For the labeled images without bounding box annotations, we run MegaDetector to get bounding boxes. MegaDetector can be run either locally or via the Batch Detection API.

This step consists of 3 sub-steps:
1. Run MegaDetector (either locally or via Batch API) on the queried images.
2. Cache MegaDetector results on the images to JSON files in `classifier-training/mdcache`.
3. Download and crop the images to be used for training the classifier.


### Crop images

To crop images to their detections' respective bounding boxes, run:



In [None]:
!python animl-ml/classification/utils/crop_detections.py \
    ~/classifier-training/mdcache/v5.0b/<dataset_name>_md.json \
    ~/crops/<dataset_name> \
    --images-dir ~/images/<dataset_name> \
    --threshold 0 \  # irrelevant for ground-truthed detections but we pass it in anyhow
    --square-crops \
    --threads 50 \
    --logdir $BASE_LOGDIR
#remember to change the dataset_name

In [None]:
Debug 
!conda install azure-storage-blob
!conda update libffi 

### Convert MegaDetector results file to queried_images.json

Microsoft's CameraTraps/classification/create_classification_dataset.py takes the output of json_validator.py (see their docs on what that does here) as an input. To convert our MegaDetecotr results file to queried_images.json file, run:

In [None]:
python animl-ml/classification/utils/md_to_queried_images.py \
  --input_filename ~/classifier-training/mdcache/v5.0b/<dataset_name>_md.json \
  --dataset <dataset_name> \
  --output_filename $BASE_LOGDIR/queried_images.json

The script `md_to_queried_images.py` is used to convert a MegaDetector output file (in JSON format) into a file that can be used as input for the `json_validator.py` script. It accepts three arguments:

- input_filename: the path to the MegaDetector output file.
- dataset: the name of the dataset.
- output_filename: (optional) the filename for the output. If not provided, it will be created using the input file name.
The script performs the following steps:

1. Reads the MegaDetector output file.
2. Filters out images that have more than one detection.
3. Converts the MegaDetector output into the format required by json_validator.py.
4. Writes the output to a file in JSON format.

### Create classification dataset & split crops into train/val/test sets


This step is well documented in the `microsoft/CameraTraps/classification` [README](https://github.com/microsoft/CameraTraps/tree/main/classification#4-create-classification-dataset-and-split-image-crops-into-trainvaltest-sets-by-location)


Preparing a classification dataset for training involves two steps.

1. Create a CSV file (`classification_ds.csv`) representing our classification dataset, where each row in this CSV represents a single training example, which is an image crop with its label. Along with this CSV file, we also create a `label_index.json` JSON file which defines a integer ordering over the string classification label names.
2. Split the training examples into 3 sets (train, val, and test) based on the geographic location where the images were taken. The split is specified by a JSON file (`splits.json`).


In [None]:
!python CameraTraps/classification/create_classification_dataset.py \
    $BASE_LOGDIR \
    --mode csv splits \
    --queried-images-json $BASE_LOGDIR/queried_images.json \
    --cropped-images-dir ~/crops \
    --detector-output-cache-dir ~/classifier-training/mdcache --detector-version 5.0b \
    --threshold 0 \
    --min-locs 3 \
    --val-frac 0.2 \
    --test-frac 0.2 \
    --method random

1. Takes in various arguments such as output directory, mode (to create a CSV file or splits), test set, queried images, cropped images, detector version, confidence threshold, minimum locations, validation fraction, test fraction, splits method, and label specification.

2. It reads the object detection results and creates a CSV file with information about the dataset, the location of the object in the image, and the label of the object.

3. The code uses the create_classification_csv function to generate a CSV file with the information about the classification dataset. This function first filters the detections based on the confidence score, minimum locations, and test set locations. Then it crops the images and saves the information in the CSV file.

4. The code also creates a label index JSON file that contains the names of all the labels in the dataset and their indices.

5. If the mode includes creating splits, the code uses the create_splits_random function to split the data randomly into train, validation, and test splits, or create_splits_smallest_first to split the data in the smallest first manner, based on the selected splits method.

6. The splits information is saved in the SPLITS_FILENAME.

7. The code uses the tqdm library to show a progress bar during the creation of the dataset.

This function appears to split a dataset into training, validation, and testing sets. The splitting of the dataset into these subsets is based on either random sampling or smallest-label-first. The random sampling split is created by the create_splits_random function, while the smallest-label-first split is created by the create_splits_smallest_label_first function.

In the create_splits_random function, the dataset is first merged into a single string of 'dataset/location' and then transformed into a DataFrame that has the number of images for each label and location. This DataFrame is then used to randomly generate splits of the data into training, validation, and testing sets, where the fraction of data for each set is determined by the input arguments val_frac and test_frac. A score is calculated for each split, and the split with the lowest score is chosen as the final split. The score is calculated as the sum of the squared differences between the target fraction of images for each label and the actual fraction of images for each label in each split.

In the create_splits_smallest_label_first function, the dataset is first transformed into a DataFrame that has the number of images for each label and location. The DataFrame is then sorted based on the number of images for each label, and the smallest label is added to the training set first. The process of adding labels to the training set is repeated until all the labels have been added to either the training, validation, or testing set. The fraction of the data for each set is determined by the input arguments val_frac and test_frac. A label specification file can also be provided to the function through the label_spec_json_path argument, which is a JSON file that maps each label to a set (training, validation, or testing).

Both functions return a dictionary with keys 'train', 'val', and 'test' that map to lists of (dataset, location) tuples, where each tuple represents an image in the dataset.

### Start to Train classifier

In [None]:
!python train_classifier.py \
    $BASE_LOGDIR \
    ~/crops \
    --model-name efficientnet-b3 --pretrained \
    --label-weighted \
    --epochs 50 --batch-size 80 --lr 3e-5 \ # I set batch-size 80 because 
    --weight-decay 1e-6 \
    --num-workers 4 \ 
    --logdir $BASE_LOGDIR --log-extreme-examples 3

### Get the result

calculate the type I and type II error
split,label,precision,recall
* train,bird,0.9961076873175478, 0.977247414478918
* train, fox,0.9919623603215055,0.998618511940004
* train,lizard,0.9695334814344653,0.9938191281717632
* train,rodent,0.9977037887485649,0.9949622166246851
* train,skunk,0.9686162624821684,0.9970631424375918
* val,bird,0.8967679691268693,0.8831353919239905
* val,fox,0.8561484918793504,0.983344437041972
* val,lizard,0.9367167919799498,0.943217665615142
* val,rodent,0.9760956175298805,0.8808423215202876
* val,skunk,0.9041916167664671,0.8435754189944135
* test,bird,0.9653996101364523,0.9317968015051741
* test,fox,0.9631038417649296,0.996066089693155
* test,lizard,0.8991060025542784,0.9336870026525199
* test,rodent,0.969416126042632,0.9199648197009674
* test,skunk,0.9265175718849841,0.9764309764309764


#calculate the type I and type II error if we have precision and recall 
type_I_error = 1 - precision
type_II_error = 1 - recall