<div align="center" dir="auto">
<p dir="auto"><a href="https://colab.research.google.com/github/encord-team/encord-notebooks/blob/main/colab-notebooks/encord_active_neptune.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
<div align="center" dir="auto">
  <div style="flex: 1; padding: 10px;">
    <a href="https://join.slack.com/t/encordactive/shared_invite/zt-1hc2vqur9-Fzj1EEAHoqu91sZ0CX0A7Q" target="_blank" style="text-decoration:none">
      <img alt="Join us on Slack" src="https://img.shields.io/badge/Join_Our_Community-4A154B?label=&logo=slack&logoColor=white">
    </a>
    <a href="https://docs.encord.com/docs/active-overview" target="_blank" style="text-decoration:none">
      <img alt="Documentation" src="https://img.shields.io/badge/docs-Online-blue">
    </a>
    <a href="https://twitter.com/encord_team" target="_blank" style="text-decoration:none">
      <img alt="Twitter Follow" src="https://img.shields.io/twitter/follow/encord_team?label=%40encord_team&amp;style=social">
    </a>
    <img alt="Python versions" src="https://img.shields.io/pypi/pyversions/encord-active">
    <a href="https://pypi.org/project/encord-active/" target="_blank" style="text-decoration:none">
      <img alt="PyPi project" src="https://img.shields.io/pypi/v/encord-active">
    </a>
    <a href="https://docs.encord.com/docs/active-contributing" target="_blank" style="text-decoration:none">
      <img alt="PRs Welcome" src="https://img.shields.io/badge/PRs-Welcome-blue">
    </a>
    <img alt="Licence" src="https://img.shields.io/github/license/encord-team/encord-active">
  </div>
</div>

## 🏁 Overview

👋 Hi there! 

This 📒 notebook covers:
- Creating an Encord Active project
- Exploring your images
- Curating training data
- Training a Neural Network model and track the experiment with neptune.ai

<br>

> 💡 Learn more about 🟣 Encord Active: 
* [GitHub](https://github.com/encord-team/encord-active) 
* [Docs](https://docs.encord.com/docs/active-overview)

## 🔦 Import Necessary Libraries and Modules

In [1]:
from pathlib import Path
from typing import List

# Load torch...!!!
import torch
import torch.nn as nn

# Load torchvision ...!!!
from torchvision.transforms import CenterCrop, Compose, Normalize, Resize, ToTensor
from torchvision import datasets


# Download Caltech101 dataset
datasets.Caltech101(Path.cwd(), target_type="category", download=True)

## 🟣 Initialize Local Encord Active Project

In [None]:
#@title 👇🏽 Run this utility code for Colab notebooks
import sys
sys.stdout.fileno = lambda: 1
sys.stderr.fileno = lambda: 2

In [2]:
from encord_active.lib.metrics.execute import run_metrics_by_embedding_type
from encord_active.lib.metrics.metric import EmbeddingType
from encord_active.lib.project.local import ProjectExistsError, init_local_project
from encord_active.lib.project.project import Project
from encord_active.public.dataset import ActiveClassificationDataset, ActiveObjectDataset



# If you want to include the Caltech101 category as labels in the project
from encord_active.lib.labels.label_transformer import (
    ClassificationLabel,
    DataLabel,
    LabelTransformer,
)


class ClassificationTransformer(LabelTransformer):
    def from_custom_labels(self, _, data_files: List[Path]) -> List[DataLabel]:
        return [DataLabel(f, ClassificationLabel(class_=f.parent.name)) for f in data_files]
    
label_transformer = ClassificationTransformer()


def collect_all_images(root_folder: Path) ->  list[Path]:
    image_extensions = {".jpg", ".jpeg", ".png", ".bmp"}
    image_paths = []

    for file_path in root_folder.glob("**/*"):
        if file_path.suffix.lower() in image_extensions:
            image_paths.append(file_path)

    return image_paths

# Enter path to the downloaded torchvision dataset
root_folder = Path("./caltech101")

# Path to the Encord Active project directory
projects_dir = Path("./ea-caltech/")

if not projects_dir.exists():
  projects_dir.mkdir()

image_files = collect_all_images(root_folder)

try:
    project_path: Path = init_local_project(
        files = image_files,
        target = projects_dir,
        project_name = "neptune_ea_project",
        symlinks = False,
        label_transformer=label_transformer
    )
except ProjectExistsError as e:
    project_path = Path("./ea/neptune_ea_project")
    print(e)  # A project already exist with that name at the given path.

run_metrics_by_embedding_type(
    EmbeddingType.IMAGE,
    data_dir=project_path,
    use_cache_only=True
)

ea_project = Project(project_path)

  from .autonotebook import tqdm as notebook_tqdm
Importing data: 100%|██████████| 9144/9144 [00:06<00:00, 1330.47it/s]
Constructing project: 100%|██████████| 9144/9144 [00:02<00:00, 3374.34it/s]
Saving label rows: 100%|██████████| 9144/9144 [00:32<00:00, 280.50it/s]
2023-12-14 08:12:02.734 | INFO     | encord_active.lib.metrics.execute:_execute_metrics:129 - Running metric Area
2023-12-14 08:12:20.487 | INFO     | encord_active.lib.metrics.execute:_execute_metrics:129 - Running metric Aspect Ratio
2023-12-14 08:12:39.871 | INFO     | encord_active.lib.metrics.execute:_execute_metrics:129 - Running metric Random Values on Images
2023-12-14 08:12:55.865 | INFO     | encord_active.lib.metrics.execute:_execute_metrics:129 - Running metric Image Diversity
2023-12-14 08:12:55.866 | INFO     | encord_active.lib.embeddings.embeddings:get_embeddings:287 - /Users/steve/Code/encord-notebooks/test/ea-caltech/neptune_ea_project/embeddings/cnn_images.pkl not found. Generating embeddings...
2023-12-

Encord Active stores and manages the data information locally from a SQLite database, and enter your project hash to ensure the module syncs the data with the right project.

Your project hash and related metadata should be under “ea-caltech” >> “neptune_ea_project” >> `project_meta.yml` 


In [5]:
def _convert_image_to_rgb(image):
    return image.convert("RGB")

SIZE = 32
transform = Compose(
    [
        Resize(SIZE),
        CenterCrop(SIZE),
        _convert_image_to_rgb,
        ToTensor(),
        Normalize(
            (0.48145466, 0.4578275, 0.40821073),
            (0.26862954, 0.26130258, 0.27577711),
        ),
    ]
)

train_dataset = ActiveClassificationDataset(
    database_path=Path("ea-caltech/encord-active.sqlite"),
    project_hash="<REPLACE WITH YOUR PROJECT HASH>",  # caltech
    tag_name="train",
    transform=transform,
)

from torch.utils.data import DataLoader

batch_size = 64
train_dataloader = DataLoader(train_dataset, batch_size=batch_size)
test_dataloader = train_dataloader
print(len(train_dataloader))

for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

device = "cpu"

Connection to database: sqlite:////Users/steve/Code/encord-notebooks/test/ea-caltech/encord-active.sqlite
143
Shape of X [N, C, H, W]: torch.Size([64, 3, 32, 32])
Shape of y: torch.Size([64]) torch.int64


## 🧪 Track Experiment with neptune.ai

In [9]:
import neptune

run = neptune.init_run(
    project="<ENTER NEPTUNE PROJECT NAME>",
    api_token="<ENTER YOUR neptune.ai API TOKEN>", # Best practice to save your toekn as an ENV VARIABLE
) 

https://app.neptune.ai/stephen-encord/test-encord/e/TES-8


## 🕸️ Define Neural Network Architecture and Run Training

In [10]:
from neptune_pytorch import NeptuneLogger

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(SIZE * SIZE * 3, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 102),  # Adjust to the number of classes
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
classes = train_dataset.class_names

def train(dataloader, model, loss_fn, optimizer, npt_logger):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)
        pred = model(X)
        loss = loss_fn(pred, y)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss_val = loss.item()
            correct_predictions = (pred.argmax(1) == y).type(torch.float).sum().item()
            accuracy = (correct_predictions / len(X)) * 100
            run[npt_logger.base_namespace]["batch/loss"].append(loss_val)
            run[npt_logger.base_namespace]["batch/accuracy"].append(accuracy)
            print(f"loss: {loss_val:>7f}  [{(batch + 1) * len(X):>5d}/{size:>5d}], Accuracy: {accuracy:>0.1f}%")

    npt_logger.log_checkpoint()
    print("Logged training checkpoint to neptune.ai!")

def test(dataloader, model, loss_fn, npt_logger):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    accuracy = 100 * correct
    run[npt_logger.base_namespace]["test/loss"].append(test_loss)
    run[npt_logger.base_namespace]["test/accuracy"].append(accuracy)
    print(f"Test Error: \n Accuracy: {accuracy:>0.1f}%, Avg loss: {test_loss:>8f} \n")

# Initialize Neptune Logger here (npt_logger)

npt_logger = NeptuneLogger(
                            run=run,
                            model=model,
                            log_model_diagram=True,
                            log_gradients=True,
                            log_parameters=True,
                            log_freq=30,
                        )

epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer, npt_logger)
    test(test_dataloader, model, loss_fn, npt_logger)
print("Done!")

Epoch 1
-------------------------------
loss: 4.645097  [   64/ 9144], Accuracy: 0.0%
loss: 4.607966  [ 6464/ 9144], Accuracy: 3.1%
Logged training checkpoint to neptune.ai!
Test Error: 
 Accuracy: 10.1%, Avg loss: 4.509915 

Epoch 2
-------------------------------
loss: 4.568168  [   64/ 9144], Accuracy: 7.8%
loss: 4.528150  [ 6464/ 9144], Accuracy: 6.2%
Logged training checkpoint to neptune.ai!
Test Error: 
 Accuracy: 12.0%, Avg loss: 4.360246 

Epoch 3
-------------------------------
loss: 4.486201  [   64/ 9144], Accuracy: 7.8%
loss: 4.462654  [ 6464/ 9144], Accuracy: 6.2%
Logged training checkpoint to neptune.ai!
Test Error: 
 Accuracy: 14.2%, Avg loss: 4.251571 

Epoch 4
-------------------------------
loss: 4.424158  [   64/ 9144], Accuracy: 7.8%
loss: 4.406035  [ 6464/ 9144], Accuracy: 6.2%
Logged training checkpoint to neptune.ai!
Test Error: 
 Accuracy: 17.7%, Avg loss: 4.171848 

Epoch 5
-------------------------------
loss: 4.359495  [   64/ 9144], Accuracy: 9.4%
loss: 4.35

## 🌃 Log Training Images to neptune.ai

In [11]:
from neptune.types import File

dataiter = iter(test_dataloader)
images, labels = next(dataiter)

# Predict batch of n_samples
n_samples = 30
imgs = images[:n_samples].to(device)
probs = torch.nn.functional.softmax(model(imgs), dim=1)

# Decode probs and log tensors as image
for i, ps in enumerate(probs):
    pred = classes[torch.argmax(ps)]
    ground_truth = classes[labels[i]]
    description = f"pred: {pred} | ground truth: {ground_truth}"

    # Log series of tensors as image and predictions
    run[npt_logger.base_namespace]["predictions"].append(
        File.as_image(imgs[i].cpu().squeeze().permute(2, 1, 0).clip(0, 1)),
        name=f"{i}_{pred}_{ground_truth}",
        description=description,
    )

## 🤖 Log neptune.ai Model

In [8]:
npt_logger.log_model("model")

Shutting down background jobs, please wait a moment...


Done!
Waiting for the remaining 1 operations to synchronize with Neptune. Do not kill this process.
All 1 operations synced, thanks for waiting!
Explore the metadata in the Neptune app:
https://app.neptune.ai/stephen-encord/test-encord/e/TES-6/metadata


## 🛑 Stop The Training Log

In [8]:
run.stop()

Shutting down background jobs, please wait a moment...
Done!
All 0 operations synced, thanks for waiting!
Explore the metadata in the Neptune app:
https://app.neptune.ai/stephen-encord/test-encord/e/TES-7/metadata


## ✅ Wrap up

📓 This Colab notebook showed you how to: 
- Create an Encord Active project.
- Explore your images.
- Curate training data.
- Train a Neural Network model and track the experiment with neptune.ai

---

🟣 Encord Active is an open-source framework for improving your computer vision data and model quality.  **Check out the project on [GitHub](https://github.com/encord-team/encord-active), leave a star 🌟** if you like it. We welcome you to [contribute](https://docs.encord.com/docs/active-contributing) if you find something is missing.

---

👉 Check out the 📖 [Encord Blog](https://encord.com/blog/) and 📺 [YouTube](https://www.youtube.com/@encord) channel to stay up-to-date with the latest in computer vision, foundation models, active learning, and data-centric AI.

