A JavaScript library that adds “layers” of text to images in PDFs, making scanned image PDFs searchable using ocrmypdf, which is a Python application and library.
For everything to work correctly, you need to have it installed on your OS ocrmypdf.
Debian or Ubuntu users can simply use the following:
sudo apt install ocrmypdf
For more information on how to install on different OS, see the installation documents.
Install ocrmypdf-js with your preferred package manager
npm i ocrmypdf-js
// or
yarn add ocrmypdf-js
// or
pnpm add ocrmypdf-js
Basic example ref.
import { OcrMyPdf } from "ocrmypdf-js";
(async () => {
const ocrmypdf = new OcrMyPdf();
await ocrmypdf.execute({
inputPath: "path/to/input.pdf",
outputPath: "path/to/output.pdf",
});
})();
When creating the constructor, it is possible to pass some parameters such as:
args: string[]
see about arguments in documentationinputPath: string
input pdf pathoutputPath: string
output pdf path
Tip
💡 If the inputPath or outputPath fields are provided, it is not necessary to provide them during execution.
import { resolve } from "path";
import { OcrMyPdf } from "ocrmypdf-js";
(async () => {
const args = ["-l por"]; // troca o idioma padrão para português
const inputPath = resolve("path/simple.pdf");
const inputPath = resolve("path/simple-ocr.pdf");
const ocrmypdf = new OcrMyPdf({ args, inputPath, outputPath });
await ocrmypdf.execute();
})();
Note
The -l por
args to work requires the additional selected language to be installed, see how install;