Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"mississippi" gives "issi" instead of "ssi" as longest repeated substring #4

Closed
graste opened this issue Oct 8, 2016 · 6 comments
Closed

Comments

@graste
Copy link
Contributor

graste commented Oct 8, 2016

Expected ssi instead of issi for input mississippi.

mississippi

@shrink0r
Copy link
Owner

shrink0r commented Oct 8, 2016

Isn't "issi" actually the longest repeated substring?
It is in there twice, as is "iss", but because "issi" is longer it wins.

@graste
Copy link
Contributor Author

graste commented Oct 9, 2016

I'd say "ississi" doesn't contain "issi" twice.

@shrink0r shrink0r added the bug label Oct 12, 2016
@shrink0r
Copy link
Owner

shrink0r commented Oct 12, 2016

The way it is implemented at the moment, just gives the longest repeated substring path of thedeepest internal node. This may lead to an overlapping result.
Can be seen here for the "mississippi" example:
http://www.allisons.org/ll/AlgDS/Tree/Suffix/ (see the section "Longest Repeated Substring")
Not being able to determine the non-overlapping repeated-substring can be considered a bug, because the method name is kinda ambigious concerning the result.
@graste Do you think either a second method or an additional param '$overlap = false' make sense (1aa8b93)?

@graste
Copy link
Contributor Author

graste commented Oct 13, 2016

My expectation obviously was "no overlapping" as there are not "enough" characters to actually write down the repeated substrings. That's why I'd prefer a default of `allow_overlapping = false``. As this is a library l'd prefer two methods (and maybe a private one with the boolean flag to not repeat yourself). Not sure about naming though. :-D

@MrHash
Copy link

MrHash commented Oct 13, 2016

Wouldn't 'iss' be the longest repeated substring matched ahead of 'ssi' in a non-overlapping context?

@shrink0r
Copy link
Owner

@MrHash: yes, as the substrings are built from prefixes of each suffix (l2r), "iss" will win over "ssi".

@shrink0r shrink0r removed the ready label Oct 13, 2016
shrink0r added a commit that referenced this issue Oct 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants