Getting error Invalid memory access when use Tess4j in JAVA project Netbeans #232

HerickRaposo · 2022-06-01T13:51:14Z

I'm developing a routine in my JAVA Web project (Using netbeans 13) where I extract the texts from a pdf. If it doesn't find a certain term, it converts the pdf to image and tries to extract the text with OCR tesseract.
After several attempts I always got the same error regardless of the configuration I did. Following error:

Error opening data file tessdata/por.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'por'
Tesseract couldn't load any languages!

Caused by: java.lang.Error: Invalid memory access

I'm having difficulties configuring the library in my development environment because, unlike the tutorials where SpringBoot and Eclipse are used, I use Netbeans 13 and I have the following project structure:

First I added the dependency in pom.xml:

<!-- https://mvnrepository.com/artifact/net.sourceforge.tess4j/tess4j -->
        <dependency>
            <groupId>net.sourceforge.tess4j</groupId>
            <artifactId>tess4j</artifactId>
            <version>5.2.0</version>
        </dependency>

After that as indicated in some tutorials, I went to my tessdata folder which is in dependencies/Tes4j and copied the tessdata folder to the web-inf folder (I also tried to the resources folder).

Later I tried to configure the environment variable
TESSDATA_PREFIX. I couldn't find information if this variable was to be defined in the windows system variables or if there is some other place in netbeans to define this. In my project structure I tried three definitions but none worked:

C:\Programacao\myProjectName\Tess4J
C:\Programacao\myProjectName\src\main\webapp\WEB-INF
C:\Programacao\myProjectName\src\main\webapp\resources

In the code of my method I tried to set the datapath passing only the name tessdata, data and I also tried to point out the paths above. Follow created method:

 public String extractText(Anexos anexo) throws Exception {
        File file = new File(anexo.getCaminho());
        PDDocument doc = PDDocument.load(file);

        System.out.println("=================================> Extraindo com pdfBox <=========================");
        PDFTextStripper estripador = new PDFTextStripper();
        estripador.setSortByPosition(false); 
        String pdfTexto = estripador.getText(doc);
        String line = "";

        line = pdfTexto.toLowerCase().replaceAll(AplicacaoBean.CARACTERES_ESPECIAIS_REGEX, "")
                .replaceAll("\\s", " ");
        
        if (line.contains("sped")) {
            return line;
        } else {

            PDFRenderer pdfRenderer = new PDFRenderer(doc);
            StringBuilder out = new StringBuilder();

            Tesseract tesseract = new Tesseract();
            tesseract.setLanguage("por");
            tesseract.setOcrEngineMode(1);

            Path dataDirectory = Paths.get("tessdata");
            tesseract.setDatapath(dataDirectory.toString());

            for (int page = 0; page < doc.getNumberOfPages(); page++) {
                BufferedImage bufferedImage = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);

                // Create a temp image file
                File tempFile = new File(file.getPath().replace(File.separator + anexo.getAnexo(), "") + File.separator + "tempfile_" + anexo.getAnexo().replaceAll("\\..*", "") + "_" + page + ".png");
                ImageIO.write(bufferedImage, "png", tempFile);
                String result = tesseract.doOCR(tempFile);
                out.append(result);

                // Delete temp file
                tempFile.delete();
            }

            line = out.toString().toLowerCase().replaceAll(AplicacaoBean.CARACTERES_ESPECIAIS_REGEX, "")
                    .replaceAll("\\s", " ");
        }

        return line;
    }

My tessdata folder:

My development environment:

Netbeans:13

JDK 15

tess4j:5.2.0

My doubts:

So I would like to know if this environment variable () is configured in windows variables, in netbeans or in some internal place in my code? Also, did I skip any steps? Do I need to download anything else? Please help me, I don't know what else to do!

The text was updated successfully, but these errors were encountered:

nguyenq · 2022-06-09T18:38:29Z

You would set the environment variable TESSDATA_PREFIX in Windows.

https://superuser.com/questions/949560/how-do-i-set-system-environment-variables-in-windows-10

Does your example work with eng pack? I suggest you get it to work with a console application first before moving on to a server application.

nguyenq · 2023-04-05T02:42:08Z

No response from OP.

nguyenq closed this as completed Apr 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting error Invalid memory access when use Tess4j in JAVA project Netbeans #232

Getting error Invalid memory access when use Tess4j in JAVA project Netbeans #232

HerickRaposo commented Jun 1, 2022 •

edited

nguyenq commented Jun 9, 2022

nguyenq commented Apr 5, 2023

Getting error Invalid memory access when use Tess4j in JAVA project Netbeans #232

Getting error Invalid memory access when use Tess4j in JAVA project Netbeans #232

Comments

HerickRaposo commented Jun 1, 2022 • edited

nguyenq commented Jun 9, 2022

nguyenq commented Apr 5, 2023

HerickRaposo commented Jun 1, 2022 •

edited