# Vision Transformer for G-code Generation

This notebook demonstrates how to train a Vision Transformer model to generate G-code from images using Hugging Face's ecosystem.

In [11]:
# Install necessary libraries
!pip install transformers datasets torch torchvision tokenizers

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Train a Custom Tokenizer

Train a custom tokenizer on the G-code dataset.

In [13]:
import os
from tokenizers import Tokenizer, models, pre_tokenizers, decoders, trainers, processors
from tokenizers.pre_tokenizers import Split

# Train a custom tokenizer
def train_gcode_tokenizer(gcode_dir, vocab_size=8000, min_frequency=2, special_tokens=['<s>', '<pad>', '</s>', '<unk>', '<mask>']):
    # Initialize a tokenizer
    tokenizer = Tokenizer(models.BPE())
    
    # Pre-tokenizer to split letters and numbers
    tokenizer.pre_tokenizer = Split(
        pattern=r"(?<=\D)(?=\d)|(?<=\d)(?=\D)",
        behavior="isolated",
    )
    
    # Get a list of all G-code files
    gcode_files = [os.path.join(gcode_dir, f) for f in os.listdir(gcode_dir) if f.endswith('.txt')]
    
    # Train the tokenizer
    trainer = trainers.BpeTrainer(vocab_size=vocab_size, min_frequency=min_frequency, special_tokens=special_tokens)
    tokenizer.train(gcode_files, trainer)
    
    tokenizer.post_processor = processors.TemplateProcessing(
        single="<s> $A </s>",
        special_tokens=[
            ("<s>", tokenizer.token_to_id("<s>")),
            ("</s>", tokenizer.token_to_id("</s>")),
        ],
    )
    tokenizer.decoder = decoders.ByteLevel()
    
    return tokenizer

# Directory containing G-code files
gcode_dir = "dataset/gcode"  # Replace with the path to your G-code directory

# Train and save the tokenizer
tokenizer = train_gcode_tokenizer(gcode_dir)
tokenizer.save("./gcode_tokenizer")






## Data Preparation

Create a dataset class to handle the image and G-code pairs and a function to load the dataset using the custom tokenizer.

In [15]:
import os
from PIL import Image
import torch
from torch.utils.data import Dataset
import torchvision.transforms as transforms
from transformers import PreTrainedTokenizerFast

# Load the custom tokenizer
tokenizer = PreTrainedTokenizerFast(tokenizer_file="./gcode_tokenizer")
tokenizer.add_special_tokens({'pad_token': '<pad>', 'eos_token': '</s>', 'bos_token': '<s>'})

# Dataset class to handle image and G-code pairs
class ImageGCodeDataset(Dataset):
    def __init__(self, image_dir, gcode_dir, transform=None, tokenizer=None):
        self.image_dir = image_dir
        self.gcode_dir = gcode_dir
        self.transform = transform
        self.tokenizer = tokenizer
        self.image_files = sorted(os.listdir(image_dir))
        self.gcode_files = sorted(os.listdir(gcode_dir))

    def __len__(self):
        return len(self.image_files)

    def __getitem__(self, idx):
        img_path = os.path.join(self.image_dir, self.image_files[idx])
        gcode_path = os.path.join(self.gcode_dir, self.gcode_files[idx])
        
        image = Image.open(img_path).convert("RGB")
        if self.transform:
            image = self.transform(image)
        
        with open(gcode_path, 'r', encoding='utf-8', errors='ignore') as f:
            gcode = f.read()

        if self.tokenizer:
            gcode = self.tokenizer(gcode, return_tensors='pt', padding='max_length', truncation=True, max_length=512)

        return {"pixel_values": image, "labels": gcode['input_ids'].squeeze()}

# Function to load the dataset
def load_dataset(image_dir, gcode_dir, tokenizer):
    # Define the image transformations
    transform = transforms.Compose([
        transforms.Resize((224, 224)),  # Resize the images to 224x224 pixels
        transforms.ToTensor(),          # Convert the images to PyTorch tensors
    ])
    
    # Create the dataset object
    dataset = ImageGCodeDataset(image_dir, gcode_dir, transform, tokenizer)
    return dataset

Exception: data did not match any variant of untagged enum ModelWrapper at line 1884 column 3

## Model Definition

Define a Vision Transformer model for image encoding and add a custom head for text (G-code) generation.

In [None]:
from transformers import VisionEncoderDecoderModel, ViTModel, BertConfig, EncoderDecoderConfig, BertLMHeadModel
import torch

# Load the vision transformer model
encoder = ViTModel.from_pretrained("google/vit-base-patch16-224-in21k")

# Load the BERT language model
decoder_config = BertConfig.from_pretrained("bert-base-uncased")
decoder = BertLMHeadModel.from_pretrained("bert-base-uncased", config=decoder_config)

# Configuration for the encoder-decoder model
config = EncoderDecoderConfig.from_encoder_decoder_configs(encoder.config, decoder.config)
config.decoder_start_token_id = tokenizer.cls_token_id
config.pad_token_id = tokenizer.pad_token_id
config.vocab_size = tokenizer.vocab_size

# Define the model
model = VisionEncoderDecoderModel(encoder=encoder, decoder=decoder, config=config)

# Move model to GPU if available
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model.to(device)

## Training

Define the training loop using Hugging Face's `Trainer`.

In [None]:
from transformers import Trainer, TrainingArguments

# Load dataset
image_dir = "dataset/images"  # Replace with the path to your image directory
gcode_dir = "dataset/gcode"   # Replace with the path to your G-code directory
dataset = load_dataset(image_dir, gcode_dir, tokenizer)

# Split dataset into train and validation
train_size = int(0.9 * len(dataset))
val_size = len(dataset) - train_size
train_dataset, val_dataset = torch.utils.data.random_split(dataset, [train_size, val_size])

# Define training arguments
training_args = TrainingArguments(
    output_dir='./results',          # Output directory
    num_train_epochs=3,              # Total number of training epochs
    per_device_train_batch_size=16,  # Batch size for training
    per_device_eval_batch_size=16,   # Batch size for evaluation
    warmup_steps=500,                # Number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # Strength of weight decay
    logging_dir='./logs',            # Directory for storing logs
    logging_steps=10,                # Logging steps
)

# Define a custom collate function
def collate_fn(batch):
    pixel_values = torch.stack([item['pixel_values'] for item in batch])
    labels = torch.stack([item['labels'] for item in batch])
    pixel_values = pixel_values.to(device)
    labels = labels.to(device)
    return {'pixel_values': pixel_values, 'labels': labels}

# Initialize Trainer
trainer = Trainer(
    model=model,                         # The instantiated ðŸ¤— Transformers model to be trained
    args=training_args,                  # Training arguments, defined above
    train_dataset=train_dataset,         # Training dataset
    eval_dataset=val_dataset,            # Evaluation dataset
    data_collator=collate_fn             # Custom data collator to handle device placement
)

# Train the model
trainer.train()

## Evaluation

Evaluate the model's performance on the validation set.

In [None]:
# Evaluate the model
results = trainer.evaluate()

print(f"Validation Accuracy: {results['eval_accuracy']}")

## Final Remarks

This setup provides a basic framework to train a Vision Transformer model to generate G-code from images using Hugging Face's ecosystem. The training script initializes the dataset, defines the model, sets up the trainer, and evaluates the model.

You can customize the model architecture, training parameters, and evaluation metrics according to your specific requirements.