enhance splitIdentifierByCaseAndSeparators to support non-case-sensitive languages #170

zrma · 2023-11-18T09:40:13Z

This PR includes modifications to the splitIdentifierByCaseAndSeparators function, aiming to extend its compatibility to non-case-sensitive languages such as Korean, Chinese, and Japanese, among others.

Key Points:

It maintains the existing functionality for case-sensitive languages.
It integrates support for languages lacking case distinction, ensuring accurate processing while preserving the fundamental logic of the function.

However, I faced difficulties in augmenting the test suite to include these new language scenarios. My limited experience with the current test structure hindered my ability to confidently implement new test cases for these languages.

In an attempt to modify the tests, I utilized the go-jsonschema binary, executing the following command:

./go-jsonschema --capitalization HtMl,ID,URL -p test tests/data/misc/capitalization/capitalization.json -o tests/data/misc/capitalization/capitalization.go

This process resulted in struct names being generated as CapitalizationJson instead of Capitalization. Consequently, I manually adjusted the names.

I am uncertain about the correctness of these steps and would greatly appreciate any advice or guidance. Your feedback or suggestions for improving this feature expansion and the addition of test cases would be invaluable.

Thank you.

codecov · 2023-11-18T09:44:44Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (33ec559) 75.89% compared to head (2aa55c8) 76.51%.
Report is 15 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #170      +/-   ##
==========================================
+ Coverage   75.89%   76.51%   +0.61%     
==========================================
  Files          24       24              
  Lines        1871     1882      +11     
==========================================
+ Hits         1420     1440      +20     
+ Misses        361      353       -8     
+ Partials       90       89       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…ive languages - Refactored the function to accurately handle languages without case distinction (e.g., Korean, Chinese) while maintaining functionality for case-sensitive languages. - Implemented rune-based string processing for better multi-language support. - Preserved original logic for case distinction where applicable.

omissis · 2023-12-17T02:40:16Z

hi @zrma , thanks a lot for the contribution! I will try to take a look at it over the xmas/nye holidays and hopefully merge it! 🙏

omissis

Hi, thanks again for the contribution and for the patience in waiting for the review! 🙏
The code looks good to me: if you could just push those little naming fixes I added to the PR I can then merge it! 🥂
Unfortunately I have no knowledge of non-case-sensitive languages, but if any user who does will find any bug we could fix it when the need arises. ✌️

omissis · 2024-01-17T23:55:20Z

internal/x/text/cases.go

@@ -77,39 +86,43 @@ func splitIdentifierByCaseAndSeparators(s string) []string {
 		stateNothing state = iota
 		stateLower
 		stateUpper
+		stateNonCase


Suggested change

stateNonCase

stateNoCase

omissis · 2024-01-17T23:55:37Z

internal/x/text/cases.go

 			nextState = stateDelimiter
+
+		default: // Non-case sensitive letters.
+			nextState = stateNonCase


Suggested change

nextState = stateNonCase

nextState = stateNoCase

omissis · 2024-01-18T00:02:48Z

internal/x/text/cases.go

 	}

 	return ident
 }

+func isNoneCaseSensitiveLetter(r rune) bool {


Suggested change

func isNoneCaseSensitiveLetter(r rune) bool {

func isNotCaseSensitiveLetter(r rune) bool {

omissis · 2024-01-18T00:02:56Z

internal/x/text/cases.go

-		ident = "A" + ident
+	rIdent := []rune(ident)
+	if len(rIdent) > 0 {
+		if !unicode.IsLetter(rIdent[0]) || isNoneCaseSensitiveLetter(rIdent[0]) {


Suggested change

if !unicode.IsLetter(rIdent[0]) || isNoneCaseSensitiveLetter(rIdent[0]) {

if !unicode.IsLetter(rIdent[0]) || isNotCaseSensitiveLetter(rIdent[0]) {

zrma · 2024-01-18T01:37:05Z

Hi, thanks again for the contribution and for the patience in waiting for the review! 🙏 The code looks good to me: if you could just push those little naming fixes I added to the PR I can then merge it! 🥂 Unfortunately I have no knowledge of non-case-sensitive languages, but if any user who does will find any bug we could fix it when the need arises. ✌️

I've made the suggested naming changes as per your review.
Thank you for the valuable feedback and guidance! 😻 The updated commit has been pushed to the PR.

omissis

LGTM

zrma force-pushed the feature/enhance-caser-non-case-sensitive branch 2 times, most recently from 51f3da2 to dd06caa Compare November 18, 2023 09:43

zrma force-pushed the feature/enhance-caser-non-case-sensitive branch from dd06caa to 7a55cec Compare November 18, 2023 10:28

zrma force-pushed the feature/enhance-caser-non-case-sensitive branch from 7a55cec to 2975b1b Compare November 18, 2023 10:47

omissis added this to the v0.16.0 milestone Jan 17, 2024

omissis assigned zrma Jan 17, 2024

omissis self-requested a review January 17, 2024 23:41

omissis requested changes Jan 18, 2024

View reviewed changes

refine variable naming as per PR review suggestions

2aa55c8

omissis approved these changes Jan 18, 2024

View reviewed changes

omissis merged commit 40f48ce into omissis:main Jan 18, 2024
3 checks passed

zrma deleted the feature/enhance-caser-non-case-sensitive branch January 18, 2024 02:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enhance splitIdentifierByCaseAndSeparators to support non-case-sensitive languages #170

enhance splitIdentifierByCaseAndSeparators to support non-case-sensitive languages #170

zrma commented Nov 18, 2023 •

edited

codecov bot commented Nov 18, 2023 •

edited

omissis commented Dec 17, 2023

omissis left a comment •

edited

omissis Jan 17, 2024

omissis Jan 17, 2024

omissis Jan 18, 2024

omissis Jan 18, 2024

zrma commented Jan 18, 2024

omissis left a comment

	func isNoneCaseSensitiveLetter(r rune) bool {
	func isNotCaseSensitiveLetter(r rune) bool {

	if !unicode.IsLetter(rIdent[0]) \|\| isNoneCaseSensitiveLetter(rIdent[0]) {
	if !unicode.IsLetter(rIdent[0]) \|\| isNotCaseSensitiveLetter(rIdent[0]) {

enhance splitIdentifierByCaseAndSeparators to support non-case-sensitive languages #170

enhance splitIdentifierByCaseAndSeparators to support non-case-sensitive languages #170

Conversation

zrma commented Nov 18, 2023 • edited

codecov bot commented Nov 18, 2023 • edited

Codecov Report

omissis commented Dec 17, 2023

omissis left a comment • edited

Choose a reason for hiding this comment

omissis Jan 17, 2024

Choose a reason for hiding this comment

omissis Jan 17, 2024

Choose a reason for hiding this comment

omissis Jan 18, 2024

Choose a reason for hiding this comment

omissis Jan 18, 2024

Choose a reason for hiding this comment

zrma commented Jan 18, 2024

omissis left a comment

Choose a reason for hiding this comment

zrma commented Nov 18, 2023 •

edited

codecov bot commented Nov 18, 2023 •

edited

omissis left a comment •

edited