Skip to content
This repository has been archived by the owner on Mar 17, 2022. It is now read-only.

english the api works well but if i use the arabic trained data the app crashes #174

Closed
yatharthgupta112 opened this issue Sep 17, 2016 · 9 comments

Comments

@yatharthgupta112
Copy link

yatharthgupta112 commented Sep 17, 2016

09-17 15:20:02.050 21768-21778/com.example.sigmaway.homeimage W/art: Suspending all threads took: 28.488ms
09-17 15:20:02.078 21768-25485/com.example.sigmaway.homeimage V/OCR: Ctesseract 1
09-17 15:20:02.085 21768-25485/com.example.sigmaway.homeimage W/linker: /data/app/com.example.sigmaway.homeimage-1/lib/arm64/libjpgt.so: unused DT entry: type 0x6ffffffe arg 0x29b0
09-17 15:20:02.085 21768-25485/com.example.sigmaway.homeimage W/linker: /data/app/com.example.sigmaway.homeimage-1/lib/arm64/libjpgt.so: unused DT entry: type 0x6fffffff arg 0x1
09-17 15:20:02.088 21768-25485/com.example.sigmaway.homeimage W/linker: /data/app/com.example.sigmaway.homeimage-1/lib/arm64/libpngt.so: unused DT entry: type 0x6ffffffe arg 0x58e0
09-17 15:20:02.088 21768-25485/com.example.sigmaway.homeimage W/linker: /data/app/com.example.sigmaway.homeimage-1/lib/arm64/libpngt.so: unused DT entry: type 0x6fffffff arg 0x2
09-17 15:20:02.093 21768-25485/com.example.sigmaway.homeimage W/linker: /data/app/com.example.sigmaway.homeimage-1/lib/arm64/liblept.so: unused DT entry: type 0x6ffffffe arg 0x231d0
09-17 15:20:02.093 21768-25485/com.example.sigmaway.homeimage W/linker: /data/app/com.example.sigmaway.homeimage-1/lib/arm64/liblept.so: unused DT entry: type 0x6fffffff arg 0x2
09-17 15:20:02.097 21768-25485/com.example.sigmaway.homeimage W/linker: /data/app/com.example.sigmaway.homeimage-1/lib/arm64/libtess.so: unused DT entry: type 0x6ffffffe arg 0x67f60
09-17 15:20:02.097 21768-25485/com.example.sigmaway.homeimage W/linker: /data/app/com.example.sigmaway.homeimage-1/lib/arm64/libtess.so: unused DT entry: type 0x6fffffff arg 0x3
09-17 15:20:02.156 21768-25485/com.example.sigmaway.homeimage V/OCR: Ctesseract 2
09-17 15:20:02.157 21768-25485/com.example.sigmaway.homeimage V/OCR: Ctesseract 3
09-17 15:20:02.293 21768-25485/com.example.sigmaway.homeimage A/libc: Fatal signal 6 (SIGABRT), code -6 in tid 25485 (AsyncTask #4)
09-17 15:20:03.802 27329-27329/com.example.sigmaway.homeimage W/art: Before Android 4.1, method android.graphics.PorterDuffColorFilter android.support.graphics.drawable.VectorDrawableCompat.updateTintFilter(android.graphics.PorterDuffColorFilter, android.content.res.ColorStateList, android.graphics.PorterDuff$Mode) would have incorrectly overridden the package-private method in android.graphics.drawable.Drawable
09-17 15:20:04.033 27329-27329/com.example.sigmaway.homeimage A/add home: tess data or Document file found
09-17 15:20:04.037 27329-27329/com.example.sigmaway.homeimage A/add home: tess data or Document file found
09-17 15:20:04.090 27329-27372/com.example.sigmaway.homeimage D/OpenGLRenderer: Use EGL_SWAP_BEHAVIOR_PRESERVED: true
09-17 15:20:04.099 27329-27329/com.example.sigmaway.homeimage D/Atlas: Validating map...

public class Ocr {
String TAG= "OCR";
String DATA_PATH = Environment.getExternalStorageDirectory().toString() + "/Sigmaway/";
String[] language={"eng","ara"};
Context c;
ArrayList Pics=new ArrayList();
public void Ocr(Context context){

this.c=context;
String[] paths = new String[]
        { DATA_PATH, DATA_PATH + "tessdata/" };

for (String path : paths) {
    File dir = new File(path);
    if (!dir.exists()) {
        if (!dir.mkdirs()) {
            Log.v(TAG, "ERROR: Creation of directory " + path + " on sdcard failed");
            return;
        } else {
            Log.v(TAG, "Created directory " + path + " on sdcard");
        }
    }

}
for (String lang:language)
{   Log.v(TAG, "hey c");

    if (!(new File(DATA_PATH + "tessdata/" + lang + ".traineddata")).exists()) {
        try {

            AssetManager assetManager = c.getAssets();
            InputStream in = assetManager.open("tessdata/" + lang + ".traineddata");
            //GZIPInputStream gin = new GZIPInputStream(in);
            OutputStream out = new FileOutputStream(DATA_PATH
                    + "tessdata/" + lang + ".traineddata");

            // Transfer bytes from in to out
            byte[] buf = new byte[1024];
            int len;
            //while ((lenf = gin.read(buff)) > 0) {
            while ((len = in.read(buf)) > 0) {
                out.write(buf, 0, len);
            }
            in.close();
            //gin.close();
            out.close();

            Log.v(TAG, "Copied " + lang + " traineddata");
        } catch (IOException e) {
            Log.e(TAG, "Was unable to copy " + lang + " traineddata " + e.toString());
        }
    }

}

}

public String tesseract(Context context,Bitmap bmpImg, String lang){
this.c=context;

Log.v(TAG, "Ctesseract 1" );
TessBaseAPI baseApi = new TessBaseAPI();
Log.v(TAG, "Ctesseract 2" );
baseApi.setDebug(true);
Log.v(TAG, "Ctesseract 3" );
baseApi.init(DATA_PATH,lang);
Log.v(TAG, "Ctesseract 4" );
baseApi.setImage(bmpImg);
Log.v(TAG, "Ctesseract 5 " );
String recognizedText = baseApi.getUTF8Text();
Log.v(TAG, "Ctesseract 6" );
baseApi.end();
if ( lang.equalsIgnoreCase("eng") ) {
recognizedText = recognizedText.replaceAll("[^a-zA-Z0-9]+", " ");
}

//recognizedText = recognizedText.trim();
return recognizedText;
}

}
This is my class through which i ocr the task and call the method in async task from the main activity.
used the trained data provided in your documentation and compiled the dependencies in gradle with command compile 'com.rmtheis:tess-two:6.0.4'
so if i do use english the api works well but if i use the arabic trained data the app crashes giving
the below error on
baseApi.init(DATA_PATH,lang); command
09-17 15:20:02.293 21768-25485/com.example.sigmaway.homeimage A/libc: Fatal signal 6 (SIGABRT), code -6 in tid 25485 (AsyncTask #4)
09-17 15:20:03.802 27329-27329/com.example.sigmaway.homeimage W/art: Before Android 4.1, method android.graphics.PorterDuffColorFilter android.support.graphics.drawable.VectorDrawableCompat.updateTintFilter(android.graphics.PorterDuffColorFilter, android.content.res.ColorStateList, android.graphics.PorterDuff$Mode) would have incorrectly overridden the package-private method in android.graphics.drawable.Drawable

@rmtheis
Copy link
Owner

rmtheis commented Sep 17, 2016

Use Cube: baseApi.init(DATA_PATH, lang, OEM_CUBE_ONLY);

@yatharthgupta112
Copy link
Author

shall i use cube traineddata for it then?

@yatharthgupta112
Copy link
Author

Btw i tried it on normal ara.traineddata is not working. And i Tried making my own trained data for ara that data is working. i think this trained data has some issue
baseApi.init(DATA_PATH,"ara",TessBaseAPI.OEM_CUBE_ONLY);

@rmtheis
Copy link
Owner

rmtheis commented Sep 20, 2016

Arabic OCR is working fine for me when I run the test cases with the Cube trained data for Arabic and OEM_CUBE_ONLY.

Please reopen with the minimal code needed to reproduce the issue and the image file or test case you're using.

See also tesseract-ocr/tesseract#428.

@rmtheis rmtheis closed this as completed Sep 20, 2016
@yatharthgupta112
Copy link
Author

Sir which one is the cube trained data file for arabic ?

@rmtheis
Copy link
Owner

rmtheis commented Sep 21, 2016

@yatharthgupta112 There are several: ara.cube.*

They all need to be stored together in your data directory.

@yatharthgupta112
Copy link
Author

yatharthgupta112 commented Sep 21, 2016

@rmtheis the ara.cube* data files worked but ara.traineddata file didn't worked. But the result i am getting after ocr using cube data has only 20% accuracy or may be less.
So can you help or suggest me how to improve the arabic ocr result.
And thank you so much sir for your help.

@AbdelsalamHaa
Copy link

AbdelsalamHaa commented Apr 30, 2018

Hi, im using tesseract 4.00 and leptonica 1.75.3 i used eng.traindata for some image and it worked very well . now im trying to use the same code for arabic i used ara.traindata but it gives a weird characters. is it due to getUTF8Text(); or that has nothing to do with it

this is the same part of the code for the english one . the only difference is that i change the eng.traindata to ara.triandata.
the image is in textImg variable.

ic.SetImage((uchar*)textImg.data, textImg.size().width, textImg.size().height, textImg.channels(), textImg.step1());
result = ic.GetUTF8Text();
ic.Clear();

@ibrahimAlii
Copy link

@AbdelsalamHaa @yatharthgupta112 Did you find the solution for bad accuracy ?

@rmtheis Please help

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants