Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doesn't work with Tesseract 4 ? #67

Open
Seegras opened this issue Dec 21, 2017 · 11 comments
Open

doesn't work with Tesseract 4 ? #67

Seegras opened this issue Dec 21, 2017 · 11 comments

Comments

@Seegras
Copy link

Seegras commented Dec 21, 2017

It looks like it can't cope with tesseract 4's language data files:

open("/usr/share/tesseract-ocr/4.00/tessdata/eng.traineddata", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=4113088, ...}) = 0
read(3, "\30\0\0\0\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377"..., 4096) = 4096
write(2, "Failed loading language 'eng'\n", 30Failed loading language 'eng'
) = 30
write(2, "Tesseract couldn't load any lang"..., 39Tesseract couldn't load any languages!

@Seegras Seegras changed the title doesn't work with Tesseracet 4 ? doesn't work with Tesseract 4 ? Dec 21, 2017
@Seegras
Copy link
Author

Seegras commented Dec 22, 2017

I'm pretty sure now this is the case. This is what happens when trying to compile it.

-- Performing Test TESSERACT_NAMESPACE - Failed
CMake Warning at CMakeModules/FindTesseract.cmake:56 (message):
You are using an old Tesseract version. Support for Tesseract 2 is
deprecated and will be removed in the future!
Call Stack (most recent call first):
CMakeLists.txt:66 (find_package)

Yes, that's tesseract 4 that's getting misindentified as tesseract 2.

@GustavoLafava
Copy link

Simply add -std=gnu++11 to CMAKE_CXX_FLAGS in CMakeLists.txt.

@Seegras
Copy link
Author

Seegras commented Feb 23, 2018

-- Build type: Debug
CMake Warning at CMakeModules/FindTesseract.cmake:56 (message):
You are using an old Tesseract version. Support for Tesseract 2 is
deprecated and will be removed in the future!

It still misindentifies tesseract 4 as tesseract 2, and compilation still fails, but for some other (or maybe exactly the same) reason:

[ 60%] Building CXX object src/CMakeFiles/vobsub2srt.dir/vobsub2srt.c++.o
/home/user/git/VobSub2SRT/src/vobsub2srt.c++: In function ‘int main(int, char**)’:
/home/user/git/VobSub2SRT/src/vobsub2srt.c++:218:3: error: ‘TessBaseAPI’ has not been declared
TessBaseAPI::SimpleInit(tess_path, tess_lang, false); // TODO params
^~~~~~~~~~~
/home/user/git/VobSub2SRT/src/vobsub2srt.c++:220:5: error: ‘TessBaseAPI’ has not been declared
TessBaseAPI::SetVariable("tessedit_char_blacklist", blacklist.c_str());
^~~~~~~~~~~
/home/user/git/VobSub2SRT/src/vobsub2srt.c++:275:20: error: ‘TessBaseAPI’ has not been declared
char *text = TessBaseAPI::TesseractRect(image, 1, stride, 0, 0, width, height);
^~~~~~~~~~~
/home/user/git/VobSub2SRT/src/vobsub2srt.c++:314:3: error: ‘TessBaseAPI’ has not been declared
TessBaseAPI::End();
^~~~~~~~~~~
make[2]: *** [src/CMakeFiles/vobsub2srt.dir/build.make:63: src/CMakeFiles/vobsub2srt.dir/vobsub2srt.c++.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:173: src/CMakeFiles/vobsub2srt.dir/all] Error 2

@bubonic
Copy link

bubonic commented Nov 21, 2018

Simply add -std=gnu++11 to CMAKE_CXX_FLAGS in CMakeLists.txt.

Had same issue as OP and this worked for me after adding it and doing a make distclean

@olaquetal
Copy link

Simply add -std=gnu++11 to CMAKE_CXX_FLAGS in CMakeLists.txt.

Had same issue as OP and this worked for me after adding it and doing a make distclean

Works great ! thanks a lot

@marshalleq
Copy link

Can this not be added to the CMakeLists.txt file here, so we don't have to add this manually?

@bubonic
Copy link

bubonic commented Jun 28, 2019

Can this not be added to the CMakeLists.txt file here, so we don't have to add this manually?

There hasn't been an update on this git in several years. I cloned the repository and even added a few changes to the code at my own git site. I noticed with tesseract 4, different OEM engines produced significantly different results on subtitle files. I experimented with this and added a --tesseract-oem option to vobsub2srt in my git repository. I updated the README with what expectations you should have with various oem options. Play around with it a bit and test for yourself. Anyway, you can find the new VobSub2SRT git repository here:

https://github.com/bubonic/VobSub2SRT

I am in no way currently maintaining this project. It was just a self added add-on. I'll update my repository as seems fit and neccessary.

@jefro108
Copy link

jefro108 commented Nov 9, 2020

There hasn't been an update on this git in several years. I cloned the repository and even added a few changes to the code at my own git site. I noticed with tesseract 4, different OEM engines produced significantly different results on subtitle files. I experimented with this and added a --tesseract-oem option to vobsub2srt in my git repository. I updated the README with what expectations you should have with various oem options. Play around with it a bit and test for yourself. Anyway, you can find the new VobSub2SRT git repository here:

https://github.com/bubonic/VobSub2SRT

@bubonic I added a fork of your repo to homebrew:

brew tap sammys/VobSub2SRT https://github.com/sammys/VobSub2SRT

and installed it by:

wget https://github.com/sammys/VobSub2SRT/raw/master/packaging/vobsub2srt.rb

brew install --HEAD vobsub2srt.rb

Not sure I needed the brew tap though

@bubonic
Copy link

bubonic commented Nov 13, 2020

@jefro108 I'm unsure how brew works, as I've never owned an Apple. I just cloned my git repo and compiled from source and everything was working as expected. Glad to hear you found a fork that works. Hope it benefits you! Best.

@trufanov-nok
Copy link

@bubonic
I'm trying to use your fork but getting

$ vobsub2srt --tesseract-oem 0 video
Error: Tesseract (legacy) engine requested, but components are not present in /usr/share/tesseract-ocr/4.00/tessdata/eng.traineddata!!
Failed loading language 'eng'
Tesseract couldn't load any languages!
Failed to initialize tesseract (OCR).

Btw, could you open the Issues page for your github project?

@bubonic
Copy link

bubonic commented Jun 9, 2022

@bubonic I'm trying to use your fork but getting

$ vobsub2srt --tesseract-oem 0 video
Error: Tesseract (legacy) engine requested, but components are not present in /usr/share/tesseract-ocr/4.00/tessdata/eng.traineddata!!
Failed loading language 'eng'
Tesseract couldn't load any languages!
Failed to initialize tesseract (OCR).

Btw, could you open the Issues page for your github project?

You need these guys installed: https://tesseract-ocr.github.io/tessdoc/Data-Files

Also, Issue page is now open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants