Skip to content

semji/gpt3-tokenizer-php

Repository files navigation

GPT3 tokenizer

PHP Text Tokenizer for GPT models

About

A PHP toolkit to tokenize text like GPT family of models process it.

Forked from https://github.com/CodeRevolutionPlugins/GPT-3-Encoder-PHP to fit our usage, fix bugs and add unit testing.

Usage

The mbstring PHP extension is needed for this tool to work correctly (in case non-ASCII characters are present in the tokenized text): details here on how to install mbstring PHP 8.1 is needed too;

use Semji\GPT3Tokenizer\Encoder;
$prompt = "Many words map";
$encoder = new Encoder();
$encoder->encode($prompt);