Skip to content
This repository has been archived by the owner on Sep 14, 2023. It is now read-only.

mrpositron/paper2tex

Repository files navigation

Paper2Tex (discontinued)

Open In Colab

The following project is a tool to extract equations from the research papers (images, PDFs, etc.) and convert it into latex code.

This project is heavily utilizes the following projects:

Credit goes to the authors of the above projects, @MaliParag, @lukas-blecher, @jjdredd.

How to use?

paper2tex.ipynb is the main notebook. It contains the code to extract equations from the paper. The notebook is self explanatory.

Example

“”

Extracted equations are in boxes with yellow border. In top left corner of each box, there is a number which is the id of the equation, and on the top right corner.
The extracted equations are:

  • $$\text{id:}0 \Rightarrow {\frac{1}{N}}\sum_{i=1}^{N}\ell(\mathbf{x}_{i},\Theta)$$
  • $$\text{id:}1 \Rightarrow \Theta_{2}\leftarrow\Theta_{2}-\frac{\alpha}{m}\sum_{i=1}^{m}\frac{\partial F_{2}({\bf x}_{i},\Theta_{2})}{\partial\Theta_{2}}$$
  • $$\text{id:}2 \Rightarrow \ell=F_{2}(F_{1}(\mathbf{u},\Theta_{1}),\Theta_{2})$$
  • $$\text{id:}3 \Rightarrow {\frac{1}{m}}{\frac{\partial\ell(\mathbf{x}_{i},\Phi)}{\partial\Theta}}$$
  • $$\text{id:}4 \Rightarrow \ell=F_{2}(\cdot)$$

Things to do

  • Add a notebook to extract equations from the paper.
  • Implement a GPU version of the code.
  • Upload it to the colab
  • Find a way to use inference LaTeX-OCR in batch mode.
  • Detect paper borders

About

Extracting LaTeX equations from PDF

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published