Wikipedia Transformer

A PyTorch implementation of a transformer-based language model trained on Wikipedia text. This project is inspired by the nanoGPT architecture but simplified for educational purposes.

Overview

This project implements a transformer-based language model that can:

Train on Wikipedia text data
Generate text based on user prompts
Save and load model checkpoints

The model architecture is based on the transformer architecture with:

6 transformer layers
6 attention heads
384 embedding dimensions
1536 feedforward dimensions

Features

Iteration-based Training: Uses iterations instead of epochs for more flexible training
Gradient Accumulation: Implements gradient accumulation for effective larger batch sizes
Learning Rate Scheduling: Includes warmup and cosine decay for better convergence
Mixed Precision Training: Uses automatic mixed precision for faster training
Checkpoint Saving: Saves the best model based on loss
Text Generation: Generates text from user prompts with configurable parameters

Requirements

Python 3.8+
PyTorch 2.0+
Hugging Face datasets
tiktoken (OpenAI's tokenizer)

Example Generation

Here's an example of text generated by the model with the prompt "The best thing is":

The best thing is the attribute of the god, according to Homer from the greatest of the Iliad. In the earliest Greek, he is the most important attribute of the life, especially in the Roman form of the Greek language of the poem.

Aristotle was credited with Phoenicians across the late 18th century. It was considered the basis of Apollo's Egypt and Homer's son Asperipi observed to flee to his mother, and he also gave the Niobids until his death in Troy

This example demonstrates the model's ability to generate coherent text that follows the style and content patterns found in Wikipedia articles, particularly those about historical and mythological topics.

Model Architecture

The model uses a standard transformer architecture with:

Multi-head self-attention
Position-wise feedforward networks
Layer normalization
Residual connections
Dropout for regularization

Training Progress

The model has been trained for 16,000 iterations with a training loss of 1.0533. This indicates good progress in learning the patterns in the Wikipedia text. The model contains approximately 150 million parameters and was trained in about 1 hour on an NVIDIA RTX 4090 GPU.

Acknowledgments

Inspired by the nanoGPT project
Uses the Hugging Face datasets library
Uses OpenAI's tiktoken for tokenization

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
simple_transformer.py		simple_transformer.py
wikipedia_transformer.py		wikipedia_transformer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wikipedia Transformer

Overview

Features

Requirements

Example Generation

Model Architecture

Training Progress

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Wikipedia Transformer

Overview

Features

Requirements

Example Generation

Model Architecture

Training Progress

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages