Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack overflow error on word graph creation, using text with multiple spaces #3

Closed
npit opened this issue Feb 6, 2018 · 2 comments
Closed

Comments

@npit
Copy link

npit commented Feb 6, 2018

Hello,
I'm getting a stack overflow error using the code below, on text from here.

    String text = "";
    // set text to contents of https://pastebin.com/JuFsX5VV
    // text = ...
    DocumentWordGraph wg = new DocumentWordGraph();
    wg.setDataString(text); // stack overflow error
@ggianna
Copy link
Owner

ggianna commented Feb 7, 2018

In DocumentWordGraph we utilize the utils.splitToWords function, to split the text to words.
There we used a RegExp to split based on punctuation and spaces:
Originally it was
String [] sRes = sStr.split("(\\s|\\p{Punct})+");
In the specific files, there exist too many whitespace characters. Can be fixed by simply ignoring full groups of whitespace/punctuation as follows:
String [] sRes = sStr.split("(\\s+|\\p{Punct}+)+");

As a temporary solution, you can simply replace all multi-whitespace sequences in the text with a single space, before feeding it to the graph:

       String text = utils.loadFileToString("inputText.txt");
        text = text.replaceAll("\\s+", " ");
        DocumentWordGraph wg = new DocumentWordGraph();
        wg.setDataString(text); // stack overflow error - no longer here! :)
        
        System.out.println(wg.getGraphLevel(0));

I will try to update the utils class appropriately... sometime soon.

@ggianna ggianna closed this as completed in 91c7574 Feb 7, 2018
@npit
Copy link
Author

npit commented Feb 7, 2018

Thank you for the swift reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants