Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add github-workflow building wheels #318

Merged
merged 1 commit into from
Apr 16, 2023

Conversation

betaboon
Copy link
Contributor

@betaboon betaboon commented Apr 11, 2023

resolves #123

Hello.

This PR adds a github-workflow to create wheels containing all the required libraries.
builds include:

  • python: 3.8-3.11
  • os: linux and macos

This now builds leptonica and tesseract from source.

some further information tho:
since tesseract and leptonica are being installed via yum (for manylinux), apk (for musllinux) and brew (for macos) all of the built wheels contain a different version of tesseract.
that might pose a problem to some.
i think the only way to get around that would be to compile tesseract from source.

@betaboon betaboon marked this pull request as draft April 12, 2023 11:13
@betaboon betaboon marked this pull request as ready for review April 12, 2023 21:55
@betaboon
Copy link
Contributor Author

@zdenop i think i addressed all your review comments now.

@betaboon betaboon force-pushed the github-actions-build branch 3 times, most recently from 446360b to 4f47f77 Compare April 13, 2023 10:42
@zdenop
Copy link
Contributor

zdenop commented Apr 13, 2023

I like your build scripts, but I am not sure if there should be part of tesserocr. Especially if not experienced user would use them.
Here are some comments from perspective of tesserocr builder/user:

  1. On linux and Mac you should first check if tesseract (any probably leptonica) is not already installed. if yes then it should be uninstalled first before building tesseract/leptonica to avoid further problems.
  2. If your script should be used for creating tesserocr python package/wheel - you should stick to system provided leptonica/tesseract (API/ABI of different version are not the same and programs linked against shared libraries could be broken if different version is present). Other solution could be that you include your custom build shared libraries to tesseocr wheel as it is done for windows
  3. pango, cairo and icu4c (installed for MacOS) are needed only for training tools - tesserocr is not able to use them e.g. scripts (also for linux) should build tessaract without training tools if you build it only for tesserocr
  4. Personally I prefer to build leptonica with zlib and png support (AKA minimalistic build) - tesserocr is used for OCR, and python is able to open rest of image formats (with PIL or OpenCV). This decrease tesserocr dependency complexity...

@zdenop
Copy link
Contributor

zdenop commented Apr 13, 2023

Regarding GitHub action part - this something I love to see in tesserocr (with windows wheels - but that is more difficult part).

@betaboon
Copy link
Contributor Author

first off: thanks for taking the time to look at this.

I like your build scripts, but I am not sure if there should be part of tesserocr. Especially if not experienced user would use them.

as they are only intended to be used for the wheel builds in ci, we could move them to .github so that they are not considered for "public consumption".

1. On linux and Mac you should first check if tesseract (any probably leptonica) is not already installed. if yes then it should be uninstalled first before building tesseract/leptonica to avoid further problems.

2. If your script should be used for creating tesserocr python package/wheel - you should stick to system provided leptonica/tesseract (API/ABI of different version are not the same and programs linked against shared libraries could be broken if different version is present). Other solution could be that you include your custom build shared libraries to tesseocr wheel as [it is done for windows](https://github.com/simonflueckiger/tesserocr-windows_build/releases)

do i understand this correctly, that these points are only relevant when users would attempt to use the build-scripts locally?

3. `pango`,  `cairo` and  `icu4c` (installed for MacOS) are needed only for training tools - tesserocr is not able to use them e.g.  scripts (also for linux) should build tessaract without training tools if you build it only for tesserocr

do i get this right: you're suggesting to compile tesseract with -DBUILD_TRAINING_TOOLS=OFF and removing those dependencies?

4. Personally I prefer to build leptonica with zlib and png support (AKA minimalistic build) - tesserocr is used for OCR, and python is able to open rest of image formats (with PIL or OpenCV). This decrease tesserocr dependency complexity...

do i get this right: you're suggesting to to compile leptonic with -DENABLE_GIF=OFF, -DENABLE_JPEG=OFF, -DENABLE_TIFF=OFF, -DENABLE_WEBP=OFF and -DENABLE_OPENJPEG=OFF and removing those dependencies?

@zdenop
Copy link
Contributor

zdenop commented Apr 14, 2023

For building release wheels use tesseract and leptonica provided by system.

If you plan to introduce testing CI (e.g. for commits, PR) then integrate those scripts to github actions as tesseract and leptonica do.

do i get this right: you're suggesting to compile tesseract with -DBUILD_TRAINING_TOOLS=OFF and removing those dependencies?

yes

do i get this right: you're suggesting to to compile leptonic with -DENABLE_GIF=OFF, -DENABLE_JPEG=OFF, -DENABLE_TIFF=OFF, -DENABLE_WEBP=OFF and -DENABLE_OPENJPEG=OFF and removing those dependencies?

yes.

@betaboon
Copy link
Contributor Author

betaboon commented Apr 14, 2023

so i just pushed changes doing the following:

  • move the buildscripts to .github as to signify that they are only being used for ci
  • build tesseract with -DBUILD_TRAINING_TOOLS=OFF and removing the appropriate dependencies
  • build leptonic with the discussed flags and removing the dependencies

is there anything left for me to do here to get this merged?

@betaboon betaboon mentioned this pull request Apr 15, 2023
Copy link
Owner

@sirfz sirfz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me, thank you for taking the time to do this. Awaiting @zdenop's approval before merging

@sirfz sirfz merged commit 3c9519b into sirfz:master Apr 16, 2023
@betaboon betaboon deleted the github-actions-build branch April 16, 2023 18:43
@betaboon
Copy link
Contributor Author

@sirfz thanks for merging.

quick question: when can we expect the wheels to be available on pypi?

@sirfz
Copy link
Owner

sirfz commented Apr 24, 2023

hi @betaboon, I'll include the wheels with the next release as I can't modify an existing release on pypi

@betaboon
Copy link
Contributor Author

do you have any idea when that might be ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wheels on Pypi
3 participants