Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some issue with Unicode characters, maybe #5

Open
Siderite opened this issue Apr 13, 2018 · 2 comments
Open

Some issue with Unicode characters, maybe #5

Siderite opened this issue Apr 13, 2018 · 2 comments

Comments

@Siderite
Copy link

Got an exception

System.ArgumentOutOfRangeException: startIndex cannot be larger than length of string. Parameter name: startIndex at System.String.Substring(Int32 startIndex, Int32 length) at Gma.DataStructures.StringSearch.UkkonenTrie1.TestAndSplit(Node1 inputs, String stringPart, Char t, String remainder, T value) at Gma.DataStructures.StringSearch.UkkonenTrie1.Update(Node1 inputNode, String stringPart, String rest, T value) at Gma.DataStructures.StringSearch.UkkonenTrie`1.Add(String key, T value) at TPB.Business.PirateBayDumpProcessor.Process(FileInfo file) in D:_Projects\TPB\TPB.Business\PirateBayDumpProcessor.cs:line 57 at TPB.ConsoleTester.Program.Main(String[] args) in D:_Projects\TPB\TPB.ConsoleTester\Program.cs:line 12} | System.ArgumentOutOfRangeException

when trying (pun not intended): trie.Add(entry.Name, entry);
where entry.Name was Tjockare än vatten (Thicker Than Water) - S02 E08 - 720p x265 H

@af-mst
Copy link

af-mst commented Feb 13, 2019

Hi,

we have a similar issue.
The Add was wokring all the time until someone added a word with an "ss" in the database.
After some data research we found out, that normaly we add those words with the character "ß" (we are from germany, "ss" and "ß" as interchangable ;)
So we pulled the code and debugged it.

The issue appeared here:

            var newEdge = new Edge<T>(remainder, newNode);
            e.Label = e.Label.Substring(remainder.Length);
            newNode.AddEdge(e.Label[0], e); // !!! HERE !!!
            s.AddEdge(t, newEdge);

(UkkonenTrie.cs -> Line: 207)

word: "walross"
remainder at that point: "oss"
e.label at that point = "oße"
and "e.Label = e.Label.Substring(remainder.Length);" will result in an empty string instead of the "e", which lets the next line fail with an out of index exception:
"newNode.AddEdge(e.Label[0], e);"

I guess, that you internally transform the "ß" to ss? Or that the code is interpretating the "ss" as "ß"?
Anyhow the code wants to use the "oss" node for the "oß" value :(

Our current workaround is to tralce all "ß" with "ss" and thats it, but it has annoying implications.

Thank you

Kind Regards

@a7744hsc
Copy link

a7744hsc commented Apr 26, 2021

This issue seems like being caused by globalization, I solved this issue by add the following runtime option:
{ "runtimeOptions": { "configProperties": { "System.Globalization.Invariant": true } } }

UPDATED======

For my case, the root cause of this issue is that at least in "en-US" and "中文(中国)" Culture, "ANYSTR".StartsWith("ANYSTR\u200B") returns True
This issue happens on Linux (for my case Ubuntu 18.04) but does not exist on Windows 10。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants