# **Training Results**

| Epoch | Training Loss | Validation Loss | Cer     |
|-------|---------------|-----------------|---------|
| 1     | 0.179400      | 0.066785        | 0.004684|
| 2     | 0.028600      | 0.029214        | 0.001779|


Note: CER = character error rate

# **Train Metrics**
  epoch                    =           2.0
<br>total_flos               = 13413832854GF
<br>train_loss               =         0.104
<br>train_runtime            =    0:28:00.20
<br>train_samples_per_second =        11.456
<br>train_steps_per_second   =         0.717

#**Eval Metrics**
epoch                   =        2.0
<br>eval_cer                =     0.0018
<br>eval_loss               =     0.0292
<br>eval_runtime            = 0:03:32.53
<br>eval_samples_per_second =     11.321
<br>eval_steps_per_second   =       0.71



In [None]:
!pip install --upgrade pip
!pip install torch torchvision torchaudio
!pip install fsspec==2024.6.1
!pip install datasets==3.0.0
!pip install gcsfs==2024.6.0
!pip install jiwer
!pip install evaluate

Collecting pip
  Downloading pip-24.3.1-py3-none-any.whl.metadata (3.7 kB)
Downloading pip-24.3.1-py3-none-any.whl (1.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 24.1.2
    Uninstalling pip-24.1.2:
      Successfully uninstalled pip-24.1.2
Successfully installed pip-24.3.1
Collecting fsspec==2024.6.1
  Downloading fsspec-2024.6.1-py3-none-any.whl.metadata (11 kB)
Downloading fsspec-2024.6.1-py3-none-any.whl (177 kB)
Installing collected packages: fsspec
  Attempting uninstall: fsspec
    Found existing installation: fsspec 2024.10.0
    Uninstalling fsspec-2024.10.0:
      Successfully uninstalled fsspec-2024.10.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gcsfs 2024.10.

In [None]:
from google.colab import drive
import os, sys, itertools
os.environ['TOKENIZERS_PARALLELISM']='false'

import pandas as pd
from sklearn.model_selection import train_test_split

from PIL import Image

import torch
from torch.utils.data import Dataset

import datasets
from datasets import load_dataset

import transformers
from transformers import Seq2SeqTrainingArguments, Seq2SeqTrainer
from transformers import VisionEncoderDecoderModel, TrOCRProcessor, default_data_collator

import evaluate

In [None]:
print("Python:".rjust(15), sys.version[0:6])
print("Pandas:".rjust(15), pd.__version__)
print("Datasets:".rjust(15), datasets.__version__)
print("Transformers:".rjust(15), transformers.__version__)
print("Torch:".rjust(15), torch.__version__)

        Python: 3.10.1
        Pandas: 2.2.2
      Datasets: 3.0.0
  Transformers: 4.46.3
         Torch: 2.5.1+cu121


In [None]:
# Mount Google Drive
drive.mount('/content/drive', force_remount=True)
# Path to your dataset directory
path = '/content/drive/My Drive/CMPE 252 Project/whiteplate_normal/'

# Create the DataFrame
file_names = []
texts = []

# Loop through the directory and extract file names and labels
for file in os.listdir(path):
    if file.endswith(('.jpg', '.png')):  # Adjust extensions as per your dataset
        file_names.append(file)
        # Extract license plate from the file name (assuming the file name is the plate)
        texts.append(os.path.splitext(file)[0])  # Remove file extension

# Create a DataFrame
dataset = pd.DataFrame({'file_name': file_names, 'text': texts})

# Train/test split
train_dataset, test_dataset = train_test_split(dataset, train_size=0.80, random_state=42)

# Reset indices
train_dataset.reset_index(drop=True, inplace=True)
test_dataset.reset_index(drop=True, inplace=True)

# Print dataset information
print("Train Dataset Info:")
print(train_dataset.info())
print("\nTest Dataset Info:")
print(test_dataset.info())

Mounted at /content/drive
Train Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9624 entries, 0 to 9623
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   file_name  9624 non-null   object
 1   text       9624 non-null   object
dtypes: object(2)
memory usage: 150.5+ KB
None

Test Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2406 entries, 0 to 2405
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   file_name  2406 non-null   object
 1   text       2406 non-null   object
dtypes: object(2)
memory usage: 37.7+ KB
None


In [None]:
train_dataset.head(12)

Unnamed: 0,file_name,text
0,OX08CBB.png,OX08CBB
1,SN40FFE.png,SN40FFE
2,RH66ZDD.png,RH66ZDD
3,TT37EVC.png,TT37EVC
4,FX07EUZ.png,FX07EUZ
5,NO34ABY.png,NO34ABY
6,ZV30IMV.png,ZV30IMV
7,JP78IEZ.png,JP78IEZ
8,YX68TFU.png,YX68TFU
9,EV90FEX.png,EV90FEX


In [None]:
class License_Plates_OCR_Dataset(Dataset):

    def __init__(self, root_dir, df, processor, max_target_length=128):
        self.root_dir = root_dir
        self.df = df
        self.processor = processor
        self.max_target_length = max_target_length

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        # get file name + text
        file_name = self.df['file_name'][idx]
        text = self.df['text'][idx]
        # prepare image (i.e. resize + normalize)
        image = Image.open(self.root_dir + file_name).convert("RGB")
        pixel_values = self.processor(image, return_tensors="pt").pixel_values
        # add labels (input_ids) by encoding the text
        labels = self.processor.tokenizer(text, padding="max_length", max_length=self.max_target_length).input_ids
        # important: make sure that PAD tokens are ignored by the loss function
        labels = [label if label != self.processor.tokenizer.pad_token_id
                  else -100 for label in labels]

        encoding = {"pixel_values" : pixel_values.squeeze(), "labels" : torch.tensor(labels)}
        return encoding

In [None]:
MODEL_CKPT = "microsoft/trocr-base-printed"
MODEL_NAME =  MODEL_CKPT.split("/")[-1] + "_license_plates_ocr"
NUM_OF_EPOCHS = 2

In [None]:
# Initialize the processor
processor = TrOCRProcessor.from_pretrained(MODEL_CKPT)

# Define the License_Plates_OCR_Dataset class (assuming it's implemented elsewhere)
# root_dir is now set to the dataset_path, and df is passed for train and test datasets

train_ds = License_Plates_OCR_Dataset(
    root_dir=path,
    df=train_dataset,
    processor=processor
)

test_ds = License_Plates_OCR_Dataset(
    root_dir=path,
    df=test_dataset,
    processor=processor
)

NameError: name 'TrOCRProcessor' is not defined

In [None]:
print(f"The training dataset has {len(train_ds)} samples in it.")
print(f"The testing dataset has {len(test_ds)} samples in it.")

The training dataset has 9624 samples in it.
The testing dataset has 2406 samples in it.


In [None]:
encoding = train_ds[0]

for k,v in encoding.items():
    print(k, " : ", v.shape)

pixel_values  :  torch.Size([3, 384, 384])
labels  :  torch.Size([128])


In [None]:
image = Image.open(train_ds.root_dir + train_dataset['file_name'][0]).convert("RGB")

image

NameError: name 'Image' is not defined

In [None]:
labels = encoding['labels']
labels[labels == -100] = processor.tokenizer.pad_token_id
label_str = processor.decode(labels, skip_special_tokens=True)
print(label_str)


NameError: name 'encoding' is not defined

In [None]:
model = VisionEncoderDecoderModel.from_pretrained(MODEL_CKPT)

config.json:   0%|          | 0.00/4.13k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Config of the encoder: <class 'transformers.models.vit.modeling_vit.ViTModel'> is overwritten by shared encoder config: ViTConfig {
  "attention_probs_dropout_prob": 0.0,
  "encoder_stride": 16,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.0,
  "hidden_size": 768,
  "image_size": 384,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "model_type": "vit",
  "num_attention_heads": 12,
  "num_channels": 3,
  "num_hidden_layers": 12,
  "patch_size": 16,
  "qkv_bias": false,
  "transformers_version": "4.46.3"
}

Config of the decoder: <class 'transformers.models.trocr.modeling_trocr.TrOCRForCausalLM'> is overwritten by shared decoder config: TrOCRConfig {
  "activation_dropout": 0.0,
  "activation_function": "gelu",
  "add_cross_attention": true,
  "attention_dropout": 0.0,
  "bos_token_id": 0,
  "classifier_dropout": 0.0,
  "cross_attention_hidden_size": 768,
  "d_model": 1024,
  "decoder_attention_heads": 16,
  "decoder_ffn_dim": 4096,
  "decoder

generation_config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
model.config.decoder_start_token_id = processor.tokenizer.cls_token_id
model.config.pad_token_id = processor.tokenizer.pad_token_id

model.config.vocab_size = model.config.decoder.vocab_size

model.config.eos_token_id = processor.tokenizer.sep_token_id
model.config.max_length = 64
model.config.early_stopping = True
model.config.no_repeat_ngram_size = 3
model.config.length_penalty = 2.0
model.config.num_beams = 4

In [None]:
cer_metric = evaluate.load("cer")

def compute_metrics(pred):
    label_ids = pred.label_ids
    pred_ids = pred.predictions

    pred_str = processor.batch_decode(pred_ids, skip_special_tokens=True)
    label_ids[label_ids == -100] = processor.tokenizer.pad_token_id
    label_str = processor.batch_decode(label_ids, skip_special_tokens=True)

    cer = cer_metric.compute(predictions=pred_str, references=label_str)

    return {"cer" : cer}


Downloading builder script:   0%|          | 0.00/5.60k [00:00<?, ?B/s]

In [None]:
args = Seq2SeqTrainingArguments(
    output_dir = MODEL_NAME,
    num_train_epochs=NUM_OF_EPOCHS,
    predict_with_generate=True,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    logging_first_step=True,
    hub_private_repo=True,
    push_to_hub=True
)



testingLicensePlate


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

In [None]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) n
Token is valid (permission: fineGrained).
The token `testingLicensePlate` has been saved to /root/.cache/huggingface/stored_tokens
Your token has been saved to /root/.cache/huggingface/token
Login successful.
The current active token is:

In [None]:
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments

# Update args for Seq2SeqTrainer
args = Seq2SeqTrainingArguments(
    output_dir="./results",
    eval_strategy="epoch",  # Changed from `evaluation_strategy` to `eval_strategy`
    learning_rate=5e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    weight_decay=0.01,
    save_total_limit=3,
    num_train_epochs=NUM_OF_EPOCHS,
    predict_with_generate=True,
    logging_dir="./logs",
    logging_strategy="epoch",
)

# Initialize the Trainer
trainer = Seq2SeqTrainer(
    model=model,
    processing_class=processor,  # Updated to use `processing_class`
    args=args,
    compute_metrics=compute_metrics,
    train_dataset=train_ds,
    eval_dataset=test_ds,
    data_collator=default_data_collator
)

Api key: 610b65cafc807a2520e2754f9248364c728ef52b

In [None]:
train_results = trainer.train()

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Epoch,Training Loss,Validation Loss,Cer
1,0.1794,0.066785,0.004684
2,0.0286,0.029214,0.001779


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr

In [None]:
trainer.save_model()
trainer.log_metrics("train", train_results.metrics)
trainer.save_metrics("train", train_results.metrics)
trainer.save_state()

***** train metrics *****
  epoch                    =           2.0
  total_flos               = 13413832854GF
  train_loss               =         0.104
  train_runtime            =    0:28:00.20
  train_samples_per_second =        11.456
  train_steps_per_second   =         0.717


In [None]:
metrics = trainer.evaluate()
trainer.log_metrics("eval", metrics)
trainer.save_metrics("eval", metrics)

Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr

***** eval metrics *****
  epoch                   =        2.0
  eval_cer                =     0.0018
  eval_loss               =     0.0292
  eval_runtime            = 0:03:32.53
  eval_samples_per_second =     11.321
  eval_steps_per_second   =       0.71
