Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build ffmpeg with the --enable-libtesseract option #297

Closed
jftuga opened this issue Jan 26, 2018 · 8 comments
Closed

Build ffmpeg with the --enable-libtesseract option #297

jftuga opened this issue Jan 26, 2018 · 8 comments

Comments

@jftuga
Copy link

jftuga commented Jan 26, 2018

How hard would it be to build ffmpeg.exe to include the --enable-libtesseract option? I am interested in using the OCR feature. I am cross-compiling from Ubuntu 16.04 to win64. I successfully ran the cross_compile_ffmpeg.sh script to produce a 64-bit ffmpeg.exe. This was just an "out-of-the-box" build, without adding in --enable-libtesseract as I just want to make sure my build system was properly working.

FWIW, OSX has ffmpeg via homebrew with tesseract and OCR:
https://twitter.com/dericed/status/786965160762155009
I tried the OSX version with OCR and it works great, but I really need a win64 version.

Thanks.

@rdp
Copy link
Owner

rdp commented Feb 23, 2018

OK got as far as this:

diff --git a/cross_compile_ffmpeg.sh b/cross_compile_ffmpeg.sh
index 1fe1a2c..3be9704 100755
--- a/cross_compile_ffmpeg.sh
+++ b/cross_compile_ffmpeg.sh
@@ -620,6 +620,15 @@ build_intel_quicksync_mfx() { # i.e. qsv
   cd ..
 }

+build_leptonica() {
+  do_git_checkout_and_make_install https://github.com/DanBloomberg/leptonica.git
+}
+
+build_libtesseract() {
+  build_leptonica
+  do_git_checkout_and_make_install https://github.com/tesseract-ocr/tesseract.git
+}
+
 build_libzimg() {
   do_git_checkout https://github.com/sekrit-twc/zimg.git zimg_git 8e87f5a4b88e16ccafb2e7ade8ef45
   cd zimg_git
@@ -1919,6 +1928,7 @@ build_dependencies() {
   build_libass # Needs freetype >= 9.10.3 (see https://bugs.launchpad.net/ubuntu/+source/freetype1/+bug/78573 o_O) and fribidi >= 0.19.0. Uses fontconfig >= 2.10.92, iconv and dlfcn.
   build_libxavs
   build_libxvid # FFmpeg now has native support, but libxvid still provides a better image.
+  build_libtesseract
   build_libvpx
   build_libx265
   build_libopenh264

but it gives


../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x1bd): undefined reference to `_imp__opj_stream_create@8'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x1df): undefined reference to `_imp__opj_stream_set_user_data@12'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x230): undefined reference to `_imp__opj_stream_set_user_data_length@12'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x244): undefined reference to `_imp__opj_stream_set_read_function@8'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x258): undefined reference to `_imp__opj_stream_set_write_function@8'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x26c): undefined reference to `_imp__opj_stream_set_skip_function@8'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x280): undefined reference to `_imp__opj_stream_set_seek_function@8'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x394): undefined reference to `_imp__opj_version@0'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x40e): undefined reference to `_imp__opj_version@0'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x60c): undefined reference to `_imp__opj_set_default_encoder_parameters@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x697): undefined reference to `_imp__opj_create_compress@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x6cc): undefined reference to `_imp__opj_setup_encoder@12'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x6e0): undefined reference to `_imp__opj_destroy_codec@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x6ec): undefined reference to `_imp__opj_image_destroy@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x7b0): undefined reference to `_imp__opj_start_compress@12'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x7c4): undefined reference to `_imp__opj_stream_destroy@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x7d0): undefined reference to `_imp__opj_destroy_codec@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x7dc): undefined reference to `_imp__opj_image_destroy@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x835): undefined reference to `_imp__opj_set_info_handler@12'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x851): undefined reference to `_imp__opj_set_warning_handler@12'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x86d): undefined reference to `_imp__opj_set_error_handler@12'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0xa10): undefined reference to `_imp__opj_image_create@12'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0xc69): undefined reference to `_imp__opj_image_destroy@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0xe06): undefined reference to `_imp__opj_encode@8'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0xe1a): undefined reference to `_imp__opj_stream_destroy@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0xe26): undefined reference to `_imp__opj_destroy_codec@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0xe36): undefined reference to `_imp__opj_image_destroy@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0xe7f): undefined reference to `_imp__opj_destroy_codec@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0xe8f): undefined reference to `_imp__opj_image_destroy@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0xf39): undefined reference to `_imp__opj_end_compress@8'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0xf47): undefined reference to `_imp__opj_stream_destroy@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0xf53): undefined reference to `_imp__opj_destroy_codec@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0xf63): undefined reference to `_imp__opj_image_destroy@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x1045): undefined reference to `_imp__opj_version@0'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x11a9): undefined reference to `_imp__opj_set_default_decoder_parameters@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x1218): undefined reference to `_imp__opj_create_decompress@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x1249): undefined reference to `_imp__opj_set_info_handler@12'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x1265): undefined reference to `_imp__opj_set_warning_handler@12'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x1281): undefined reference to `_imp__opj_set_error_handler@12'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x1291): undefined reference to `_imp__opj_setup_decoder@8'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x12c7): undefined reference to `_imp__opj_read_header@12'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x1341): undefined reference to `_imp__opj_set_decode_area@24'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x1361): undefined reference to `_imp__opj_decode@12'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x1379): undefined reference to `_imp__opj_end_decompress@8'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x138c): undefined reference to `_imp__opj_stream_destroy@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x1396): undefined reference to `_imp__opj_destroy_codec@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x1561): undefined reference to `_imp__opj_image_destroy@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x15bb): undefined reference to `_imp__opj_destroy_codec@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x15c5): undefined reference to `_imp__opj_stream_destroy@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x15d3): undefined reference to `_imp__opj_image_destroy@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x16e9): undefined reference to `_imp__opj_destroy_codec@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x1837): undefined reference to `_imp__opj_stream_destroy@4'
../src/.libs/liblept.a(jp2kio.o):jp2kio.c:(.text+0x1841): undefined reference to `_imp__opj_destroy_codec@4'
collect2: error: ld returned 1 exit status

hmm

@rdp
Copy link
Owner

rdp commented Feb 23, 2018

OK overcame that particular problem with generic_configure "--without-libopenjpeg" for build_leptonica

@rdp
Copy link
Owner

rdp commented Feb 23, 2018

next was
undefined reference to delete(void *) etc
undefined reference to `__gxx_personality_sj0

fix
sed -i.bak 's/-ltesseract.*$/-ltesseract -lstdc++ -lws2_32/' $PKG_CONFIG_PATH/tesseract.pc LOL

@rdp
Copy link
Owner

rdp commented Feb 23, 2018

hopefully got it in 9261995

@rdp rdp closed this as completed Feb 23, 2018
@jftuga
Copy link
Author

jftuga commented Feb 23, 2018

Thank you for working on this. I will try it out over the weekend.

@Reino17
Copy link
Contributor

Reino17 commented Feb 25, 2018

hopefully got it in 9261995

Have you actually tested it yourself? Half a year ago I've already made a test compile with Tesseract myself and it wasn't particularly easy.

+  # autoconf-archive is just for leptonica FWIW

I can compile Leptonica perfectly without it, but maybe it depends on the system, I don't know.

+build_leptonica() {
+  do_git_checkout https://github.com/DanBloomberg/leptonica.git 
+  cd leptonica_git
+    generic_configure "--without-libopenjpeg"
+    do_make_and_make_install
+  cd ..
+}
+
+build_libtiff() {
+  generic_download_and_make_and_install ftp://download.osgeo.org/libtiff/tiff-4.0.9.tar.gz
+}
+
+build_libtesseract() {
+  build_leptonica
+  build_libtiff # no disable option? odd...
+  do_git_checkout_and_make_install https://github.com/tesseract-ocr/tesseract.git
+  sed -i.bak 's/-ltesseract.*$/-ltesseract -lstdc++ -lws2_32/' $PKG_CONFIG_PATH/tesseract.pc # why does it needs winsock? LOL
+}

At first I also thought I could compile Leptonica and Tesseract without any external image library, because hey... there's libavcodec, right? You've probably already noticed LibTiff is obligatory, or Tesseract won't compile at all. But this way you're only satisfying the Tesseract compilationprocess, because you're not compiling Leptonica with LibTiff. This results in:

ffprobe.exe -hide_banner -show_entries frame_tags=lavfi.ocr.text -f lavfi -i "movie='input.png',ocr"
Error in pixReadMemTiff: function not present
Error in pixReadMem: tiff: no pix returned
Error in pixaGenerateFontFromString: pix not made
Error in bmfCreate: font pixa not made
Error in pixWriteMemPng: function not present
ObjectCache(020BB810)::~ObjectCache(): WARNING! LEAK! object 02ABD388 still has count 1 (id [...]\tessdata/eng.traineddatalstm-punc-dawg)
ObjectCache(020BB810)::~ObjectCache(): WARNING! LEAK! object 02B92130 still has count 1 (id [...]\tessdata/eng.traineddatalstm-word-dawg)
ObjectCache(020BB810)::~ObjectCache(): WARNING! LEAK! object 02B84AC0 still has count 1 (id [...]\tessdata/eng.traineddatalstm-number-dawg)

LibTiff seems to be rather crucial. The pixWriteMemPng message also worried me, so I compiled Leptonica with LibTiff and LibPNG. This results in:

ffprobe.exe -hide_banner -show_entries frame_tags=lavfi.ocr.text -f lavfi -i "movie='input.png',ocr"
Input #0, lavfi, from 'movie='input.png',ocr':
  Duration: N/A, start: 0.000000, bitrate: N/A
    Stream #0:0: Video: rawvideo (444P / 0x50343434), yuv444p, 636x131 [SAR 1:1 DAR 636:131], 25 tbr, 25 tbn, 25 tbc
[FRAME]
TAG:lavfi.ocr.text=Er komt oorlog.
Ik Weet niet wanneer ... maar er komt oorlog.

[/FRAME]

My code:

build_libleptonica() {
  do_git_checkout https://github.com/DanBloomberg/leptonica.git
  cd leptonica_git
    export PKG_CONFIG="pkg-config --static" # Automatically detect all Leptonica's dependencies.
    generic_configure "--disable-programs"
    do_make_and_make_install
    unset PKG_CONFIG
  cd ..
}

build_libtesseract() {
  do_git_checkout https://github.com/tesseract-ocr/tesseract.git
  cd tesseract_git
    if [[ ! -f tesseract.pc.in.bak ]]; then
      sed -i.bak "s/-lpthread/-lpthread -lstdc++ -lws2_32/" tesseract.pc.in
    fi
    generic_configure_make_install
  cd ..
}
  • I've removed OpenJPEG some time ago in my repo. FFmpeg has full support for it already, so there's no need for it anymore, if you ask me.
    If you still want to use it and what to compile Leptonica with it (eventhough you don't need to), then I suggest you recompile OpenJPEG. Nowadays OpenJPEG automatically generates a pkgconfig file, which Leptonica in turn is looking for. I've successfully compiled Leptonica this way, so there's no need for --without-libopenjpeg.
    Last year I still had to do this:
export LIBJP2K_CFLAGS="-DOPJ_STATIC -I$mingw_w64_x86_64_prefix/include/openjpeg-2.2 -I$mingw_w64_x86_64_prefix/include"
export LIBJP2K_LIBS="-L$mingw_w64_x86_64_prefix/lib -lopenjp2"
  • export PKG_CONFIG="pkg-config --static" is necessary, or the Tesseract compilationprocess will complain it can't find LibTiff's Libs.private: -llzma!

@rdp
Copy link
Owner

rdp commented Feb 26, 2018 via email

@Reino17
Copy link
Contributor

Reino17 commented Feb 27, 2018

As you can see above, I don't have to manually setup the path to LibTiff or LibPNG (nor is it needed anymore for OpenJPEG), so yes it is:

checking for ZLIB... yes
checking for LIBPNG... yes
checking for JPEG... no
checking for jpeg_read_scanlines in -ljpeg... no
checking jpeglib.h usability... no
checking jpeglib.h presence... no
checking for jpeglib.h... no
checking for DGifOpenFileHandle in -lgif... no
checking gif_lib.h usability... no
checking gif_lib.h presence... no
checking for gif_lib.h... no
checking for LIBTIFF... yes
checking for LIBWEBP... yes
checking for LIBJP2K... no
checking for opj_create_decompress in -lopenjp2... no
checking openjpeg-2.3/openjpeg.h usability... no
checking openjpeg-2.3/openjpeg.h presence... no
checking for openjpeg-2.3/openjpeg.h... no
checking openjpeg-2.2/openjpeg.h usability... no
checking openjpeg-2.2/openjpeg.h presence... no
checking for openjpeg-2.2/openjpeg.h... no
checking openjpeg-2.1/openjpeg.h usability... no
checking openjpeg-2.1/openjpeg.h presence... no
checking for openjpeg-2.1/openjpeg.h... no
checking openjpeg-2.0/openjpeg.h usability... no
checking openjpeg-2.0/openjpeg.h presence... no
checking for openjpeg-2.0/openjpeg.h... no

The initial 'lept.pc' (on my system at least):

Libs.private: -L/cygdrive/m/ffmpeg-windows-build-helpers-master/native_build/windows/ffmpeg_l
ocal_builds/sandbox/cross_compilers/mingw-w64-i686/i686-w64-mingw32/lib -lz -L/cygdrive/[...]
 -lpng16 -lz   -L/cygdrive/m/[...] -ltiff -L/cygdrive/m/[...] -lwebp

After export PKG_CONFIG="pkg-config --static":

Libs.private: -L/cygdrive/m/ffmpeg-windows-build-helpers-master/native_build/windows/ffmpeg_l
ocal_builds/sandbox/cross_compilers/mingw-w64-i686/i686-w64-mingw32/lib -lz -L/cygdrive/[...]
 -lpng16 -lz -lm -lz   -L/cygdrive/m/[...] -ltiff -llzma -lz -L/cygdrive/m/[...] -lwebp -lm -
 pthread

bizarrely my libtiff-4.pc doesn't mention lzma even though it's present...hmm...

Strange. This is when I build LibTiff with 'liblzma.pc' in $PKG_CONFIG_PATH:

checking for lzma_code in -llzma... yes
checking lzma.h usability... yes
checking lzma.h presence... yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants