Skip to content

samnevis/Wiki-Transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wikipedia Transformer

A PyTorch implementation of a transformer-based language model trained on Wikipedia text. This project is inspired by the nanoGPT architecture but simplified for educational purposes.

Overview

This project implements a transformer-based language model that can:

  • Train on Wikipedia text data
  • Generate text based on user prompts
  • Save and load model checkpoints

The model architecture is based on the transformer architecture with:

  • 6 transformer layers
  • 6 attention heads
  • 384 embedding dimensions
  • 1536 feedforward dimensions

Features

  • Iteration-based Training: Uses iterations instead of epochs for more flexible training
  • Gradient Accumulation: Implements gradient accumulation for effective larger batch sizes
  • Learning Rate Scheduling: Includes warmup and cosine decay for better convergence
  • Mixed Precision Training: Uses automatic mixed precision for faster training
  • Checkpoint Saving: Saves the best model based on loss
  • Text Generation: Generates text from user prompts with configurable parameters

Requirements

  • Python 3.8+
  • PyTorch 2.0+
  • Hugging Face datasets
  • tiktoken (OpenAI's tokenizer)

Example Generation

Here's an example of text generated by the model with the prompt "The best thing is":

The best thing is the attribute of the god, according to Homer from the greatest of the Iliad. In the earliest Greek, he is the most important attribute of the life, especially in the Roman form of the Greek language of the poem.

Aristotle was credited with Phoenicians across the late 18th century. It was considered the basis of Apollo's Egypt and Homer's son Asperipi observed to flee to his mother, and he also gave the Niobids until his death in Troy

This example demonstrates the model's ability to generate coherent text that follows the style and content patterns found in Wikipedia articles, particularly those about historical and mythological topics.

Model Architecture

The model uses a standard transformer architecture with:

  • Multi-head self-attention
  • Position-wise feedforward networks
  • Layer normalization
  • Residual connections
  • Dropout for regularization

Training Progress

The model has been trained for 16,000 iterations with a training loss of 1.0533. This indicates good progress in learning the patterns in the Wikipedia text. The model contains approximately 150 million parameters and was trained in about 1 hour on an NVIDIA RTX 4090 GPU.

Acknowledgments

About

LLM using Pytorch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages