TexTeller is a ViT-based model designed for end-to-end formula recognition. It can recognize formulas in natural images and convert them into LaTeX-style formulas.
TexTeller is trained on a larger dataset of image-formula pairs (a 550K dataset available here), exhibits superior generalization ability and higher accuracy compared to LaTeX-OCR, which uses approximately 100K data points. This larger dataset enables TexTeller to cover most usage scenarios more effectively.
A TexTeller checkpoint trained on a 5.5M dataset will be released soon.
python=3.10
pytorch
Note: Only CUDA version >= 12.0 have been fully tested, so we recommend using CUDA version>=12.0
-
Clone the repository:
git clone https://github.com/OleehyO/TexTeller
-
After pytorch installation, install the required packages:
pip install -r requirements.txt
-
Navigate to the
TexTeller/src
directory and run the following command to perform inference:python inference.py -img "/path/to/image.{jpg,png}" # use -cuda option to enable GPU inference #+e.g. python inference.py -img "./img.jpg" -cuda
Checkpoints will be downloaded in your first run.
You can also run the web demo by navigating to the TexTeller/src
directory and running the following command:
./start_web.sh
Then go to http://localhost:8501
in your browser to run TexTeller in the web.
You can change the default settings in
start_web.sh
, such as inference with GPU(e.g.USE_CUDA=True
) or increase the number of beams(e.g.NUM_BEAM=3
) for higher accuracy.
We use ray serve to provide a simple API for using TexTeller in your own projects. To start the server, navigate to the TexTeller/src
directory and run the following command:
python server.py # default settings
You can pass the following arguments to the server.py
script to get custom inference settings(e.g. python server.py --use_gpu
to enable GPU inference):
Argument | Description |
---|---|
-ckpt |
Path to the checkpoint file to load, default is TexTeller pretrained model. |
-tknz |
Path to the tokenizer, default is TexTeller tokenizer. |
-port |
Port number to run the server on, default is 8000. |
--use_gpu |
Whether to use GPU for inference. |
--num_beams |
Number of beams to use for beam search decoding, default is 1. |
--num_replicas |
Number of replicas to run the server on, default is 1. You can use this to get higher throughput. |
--ncpu_per_replica |
Number of CPU cores to use per replica, default is 1. |
--ngpu_per_replica |
Number of GPUs to use per replica, default is 1. You can set this to 0~1 to run multiple replicas on a single GPU(if --num_replicas 2, --ngpu_per_replica 0.7, then 2 gpus are required) |
Client demo can be found in
TexTeller/client/demo.py
, you can refer todemo.py
to send requests to the server.
We provide a dataset example in TexTeller/src/models/ocr_model/train/dataset
, and you can place your own images in the images
directory and annotate the corresponding formula for each image in formulas.jsonl
.
After the dataset is ready, you should change the DIR_URL
variable in .../dataset/loader.py
to the path of your dataset.
If you are using a different dataset, you may need to retrain the tokenizer to match your specific vocabulary. After setting up the dataset, you can do this by:
-
Change the line
new_tokenizer.save_pretrained('./your_dir_name')
inTexTeller/src/models/tokenizer/train.py
to your desired output directory name.To use a different vocabulary size, you should modify the
VOCAB_SIZE
parameter in theTexTeller/src/models/globals.py
. -
Running the following command under
TexTeller/src
directory:python -m models.tokenizer.train
To train the model, you can run the following command under TexTeller/src
directory:
python -m models.ocr_model.train.train
You can set your own tokenizer and checkpoint path(or fine-tune the default model checkpoint if you don't use your own tokenizer while keeping the same model architecture) in TexTeller/src/models/ocr_model/train/train.py
.
Please refer to
train.py
for more details.
Model architecture and training hyperparameters can be adjusted in TexTeller/src/globals.py
and TexTeller/src/models/ocr_model/train/train_args.py
.
We use the Hugging Face Transformers library for model training, so you can find more details about the training hyperparameters in their documentation.
-
Train our model with a larger amount of data(5.5M samples, and soon to be released).
-
Inference acceleration.
-
...
Thanks to LaTeX-OCR which has brought me a lot of inspiration, and im2latex-100K which enriches our dataset.