I think we need a word-splitter for naming properties. Having separate words would also enable us to detect initialisms and capitalize them properly for the target language. For example, the property CONVERT_JSON should be called ConvertJson in C#, but ConvertJSON in Go.
This might work:
Split at non-word characters, and remove them, word characters being letters and digits.
In each remaining string, search for all matches of upper(upper|digit)*, i.e. an uppercase letter followed by any number of uppercase letters and digits.
Split before the first uppercase letter (unless that's the beginning of the string). This will split myJSON into my and JSON, and myName into my and Name.
If the match is at the end of the string, we're finished splitting it.
If the match is a single character, we're also finished splitting it. This prevents myName being split into my, N, and ame.
If the last character in the match is an uppercase letter, split before it. This split JSONConverter into JSON and Converter.
If the last character in the match is a digit, split after it. I don't have high confidence in this rule. It would split UTF8encoder into UTF8 and encoder.
The capitalization of the remaining parts should be ignored. Capitalization should be done according to the target language's rules and conventions, and depending on whether a string part is an initialism.
We can use an additional heuristic to decide whether something should be treated like an initialism:
If a string part is all-uppercase, and the original string contains lowercase letters, treat that string part as an initialism. That would make, for example, the MNIST in ReadMNISTCorpus an initialism, but not the LOVE in I_LOVE_UPPERCASE.
It should be
This is across all target languages.
The text was updated successfully, but these errors were encountered: