Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading from v2 to v5 Guide #771

Open
Balearica opened this issue May 29, 2023 · 1 comment
Open

Upgrading from v2 to v5 Guide #771

Balearica opened this issue May 29, 2023 · 1 comment

Comments

@Balearica
Copy link
Collaborator

Balearica commented May 29, 2023

Overview

According to npm statistics (and Git Issues), many users are still using Tesseract.js v2. Version 2 was released in 2019 and includes many bugs, memory leaks, and performance issues that have been fixed in subsequent versions (in some cases v2 is 20x slower than the current version), so updating is strongly recommended. Additionally, v2 is no longer supported, so updating is a requirement to receive support in Git Issues.

While the changes made in each release are fully documented, to make upgrading as easy as possible, below is a guide describing all changes that v2 users may need to make to use the latest version. This guide describes the process of upgrading from v2 to v5. If (for whatever reason) you wish to update from v2 to v4, see the comment below.

Changes Impacting Most Users

  1. createWorker is now async
    • In most code this means worker = Tesseract.createWorker() should be replaced with worker = await Tesseract.createWorker()
  2. The arguments to createWorker have changed--the first two arguments are now language and oem
    1. E.g. createWorker('eng', 1, { logger: m => console.log(m) })
  3. worker.load, worker.loadLanguage, and worker.initialize are no longer needed
    • Simply delete these functions from existing code

Changes Impacting Fewer Users

  1. Electron users
    • Use the browser version of Tesseract.js
      • In v2, many users used the Node.js version
  2. Users of getPDF function
  3. Users who set cacheMethod: 'none' or cacheMethod: 'refresh' as workaround for caching bug
    • This workaround can be removed, the underlying bug has been fixed (see this comment)
  4. Users who set the optional corePath argument
    • corePath must be pointed to a directory containing all 4 of the following files from Tesseract.js-core v5:
      1. tesseract-core.wasm.js
      2. tesseract-core-simd.wasm.js
      3. tesseract-core-lstm.wasm.js
      4. tesseract-core-simd-lstm.wasm.js
  5. Node.js <14 users
    • Node.js v14 is now the earliest version supported
  6. Users of worker.detect function
    1. This function is now disabled by default
    2. To enable, set arguments legacyCore: true and legacyLang: true in createWorker options
      1. E.g. Tesseract.createWorker("eng", 1, {legacyCore: true, legacyLang: true})
  7. Users who implemented progress bars using log messages
    1. The language used in logs was standardized, so any scripts that parse logs may need to be updated
@Balearica Balearica pinned this issue May 29, 2023
@Balearica Balearica changed the title Upgrading from v2 Guide Upgrading from v2 to v4 Guide Sep 28, 2023
@Balearica
Copy link
Collaborator Author

Balearica commented Sep 29, 2023

[Archive] v2 to v4 Guide

The following comment contains the old guide for upgrading from v2 to v4. Users are encouraged to update to the latest version (v5), but this is still provided for informational purposes.

Changes Impacting Most Users

  1. createWorker is now async
    • In most code this means worker = Tesseract.createWorker() should be replaced with worker = await Tesseract.createWorker()
  2. worker.load is no longer needed (createWorker now returns worker pre-loaded)
    • Simply delete worker.load from existing code

Changes Impacting Fewer Users

  1. Electron users
    • Use the browser version of Tesseract.js
      • In v2, many users used the Node.js version
  2. Users of getPDF function
  3. Users who set cacheMethod: 'none' or cacheMethod: 'refresh' as workaround for caching bug
    • This workaround can be removed, the underlying bug has been fixed (see this comment)
  4. Users who set the optional corePath argument
    • You will need to point corePath to a compatible version of Tesseract.js-core (the latest version of Tesseract.js should be used with the latest version of Tesseract.js-core)
    • For significantly faster performance, set corePath to a directory that includes both tesseract-core.wasm.js and tesseract-core-simd.wasm.js
  5. Node.js <14 users
    • Node.js v14 is now the earliest version supported

@Balearica Balearica changed the title Upgrading from v2 to v4 Guide Upgrading from v2 to v5 Guide Sep 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant