Add options to allow unicode character in identifier names #4508

eblis · 2017-01-06T16:51:37Z

PR checklist

Read the contribution guildelines.
Ran the shell/batch script under ./bin/ to update Petstore sample so that CIs can verify the change. (For instance, only need to run ./bin/{LANG}-petstore.sh and ./bin/security/{LANG}-petstore.sh if updating the {LANG} (e.g. php, ruby, python, etc) code generator or {LANG} client's mustache templates)
Filed the PR against the correct branch: master for non-breaking changes and 2.3.0 branch for breaking (non-backward compatible) changes.

Description of the PR

Added option to allow unicode characters in class names, method names etc.
Added option to allow keeping of underscore characters in class names, methods names, etc.

Unicode identifiers are supported by some languages, like Java, but the codegen will remove all non ASCII letters by default. Users might want to separate different parts of names using extended punctuation connector characters, like "_, ‿, ⁀, ⁔, ・, ︳, ︴, ﹍, ﹎, ﹏, ＿, ･"

Underscores cannot be used to separate different parts of the name as they're removed by default by the codegen, the second option allows underscore characters if the user so desires.

Both options can be used separately if needed, for example one might allow extended punctuation connector characters, but still disallow underscore. Sometimes you need to logically separate parts of the identifier names, which isn't supported yet, without these changes.

The default values for both options are set such that the generator will behave as it did before.

…s etc. Added option to allow keeping of underscore characters in class names, methods names, etc. Unicode identifiers are supported by some languages, like Java, but the codegen will remove all non ASCII letters by default. Users might want to separate different parts of names using extended punctuation connector characters, like "_, ‿, ⁀, ⁔, ・, ︳, ︴, ﹍, ﹎, ﹏, ＿, ･". Underscores cannot be used to separate different parts of the name as they're removed by default by the codegen, the second option allows underscore characters if the user so desires. Both options can be used separately if needed, for example one might allow extended punctuation connector characters, but still disallow underscore.

wing328 · 2017-01-07T04:06:41Z

Thanks for the PR but your commit (as shown in the Commits tab) is not linked to your Github account, which means this PR won't count as your contribution in https://github.com/swagger-api/swagger-codegen/graphs/contributors.

Let me know if you need help fixing it.

Ref: https://github.com/swagger-api/swagger-codegen/wiki/FAQ#how-can-i-update-commits-that-are-not-linked-to-my-github-account

eblis · 2017-01-09T07:57:55Z

I've added the email address in the commits as a secondary email, hopefully that should fix it.
Thanks.

If that doesn't work i'll use the script to edit the history, please let me know if I should do that instead.

wing328 · 2017-01-09T10:00:03Z

Now the commits are correctly linked to your account.

Can you elaborate with a use case in which you need to keep the underscore in the property name? As you probably aware, keeping the underscore in the variable name won't conform to C# style guide in terms of variable naming.

eblis · 2017-01-09T10:41:50Z

Sure.

The reason we need to keep the underscores is to logically separate parts of type names. We're building a proxy server to handle requests to a 3rd party system which has a very weird API and we're building a more RESTful interface for it. The problem is that the 3rd party system types use 2 different names, domain name and type name, and when calling the system we need to know which is which, as the proxy relies on reflection to do it's job (so as not to manually hard-code calls), reflection allows us to be future compatible, any new feature created in the 3rd party system just works without any further changes in the proxy. We've used _ to separate between the domain and type names, but those get removed by the swagger codegen. We'd like to have the option to keep those underscores in certain cases, if the user knows what he's doing and really wants underscores to be present, for any language they might be generating for.

The feature has been implemented at the base layer, so going forward the codegen could have different default values, depending on whether underscores conform to each language's style guides or not, or in cases users really want the underscores for other reasons, like the one I mentioned above.

Alternatively, some languages (like Java, which we're generating for) allow multiple unicode characters as punctuation connector characters, which would alleviate our need for underscores, as we could use one of these extended connectors to logically separate between domain and type names.
Also, some languages support unicode characters in type names, so this feature could be used to allow that support for languages that offer it.

Both feature default values are set to false, so they're not activated unless the user really wants to use either of them. And they can be turned on/off independently, which could be used to further configure the generators for all languages that swagger codegen supports, depending on the style guides for those specific languages.

wing328 · 2017-01-12T09:43:41Z

@eblis thanks for the detailed explanation.

We've used _ to separate between the domain and type names, but those get removed by the swagger codegen

Let's say the name is "domain_type" and it will become "domainType" in Java, "DomainType" in C#. I would like to point out that these are just variable names but not the actual parameter (path, header, form, query) name. When the API client (e.g. C#) sends the request to the RESTful backend, the actual parameter name (baseName) is used as shown below:

https://github.com/swagger-api/swagger-codegen/blob/master/modules/swagger-codegen/src/main/resources/csharp/api.mustache#L240-L251

We definitely welcome enhancement to meet different use cases but I would like to point this out to make sure we're on the same page after merging the additional features you've built.

eblis · 2017-01-12T11:21:16Z

We are more interested in the variable and class names, as we're using reflection to handle the proxy-ing, and need to know the domain name and type name.
I probably didn't explain it good enough the first time, the data types we use have 2 different logical names in them (as the system we're doing proxy calls for uses this naming convention) so a full name is not just domain_type (or DomainType, domainType) but instead it is domain＿type, for example ActiveDirectory＿OrganizationalUnit. We'd like to have a separator there so we can determine, at runtime, that domain was ActiveDirectory and type was OrganizationalUnit.
(There are some other ways we could achieve this, like adding an enumerated value to each type definition, but that would just generate a lot of fluff)

We can do the logical separation using underscores, but we can also use a unicode extended separator character (e.g. ＿) to logically separate these parts of the name.

However, I thought that having the option to strip/not strip underscores might be useful for some people (I noticed other reported issues mentioning the fact that underscores get stripped) and also for some languages which may use/approve of this naming styling (C, Python, etc ?) The two features should allow for more customizing options for each supported language, e.g. enable unicode characters for languages that support it, enable underscores for languages (or companies) that use snake case.

eblis · 2017-01-12T12:38:13Z

I just noticed the name template variable for classes, which I missed the first time looking at the variables :(
Looks like it's holding the original name of the class, including any unicode characters and/or underscores so we could probably use this value and annotate all generated types with the original name, instead of relying on the name of the class.

However, I feel like the original pull request is still valid, as you can make use of unicode characters in variable names and allow languages (and companies) that use snake case to generate code using swagger codegen.

bhuvnesharya · 2017-01-16T13:06:03Z

Thanks for the excellent use cases @eblis . I am having a use case where data is coming from the MATLAB type system and the image is already created from that and used from a long time and the response is again send back to the other system to do some analysis. I have to write an application in between that do some transformations and send the data in the same format as it is coming from the first system( snake_case). If I will not have support of snake_case then I need to convert snake_case to camelCase ( already done by swagger codegen , while storing I need to store it in camelCase and while retrieving and send back data to other system , I need to convert it back to snake_case. Which creates complexity while writing code.

bhuvnesharya · 2017-01-16T13:27:44Z

@eblis : Can you provide more insight , how to use KEEP_UNDERSCORES parameter in maven codegen. Shall I use this as a config options.

eblis · 2017-01-16T13:39:54Z

Just a note, the KEEP_UNDERSCORES setting will not prevent camelization of the names, as I wasn't really interested in the case that much, I just wanted to keep the underscores in place. I guess it makes more sense to stop camelization if the KEEP_UNDERSCORES setting is enabled, as you probably don't want to have camel case + snake case.

The easiest way to use the new settings is to enable them in the configuration options, e.g.:
{ "allowUnicodeIdentifiers": "false", "keepUnderscores": "false", "interfaceOnly": false, "java8": true, "serializableModel": false, "async": false, "library": "spring-boot" }

…ifier case as is (you probably don't want camel case + snake case, just snake case when KEEP_UNDERSCORES is set). Added unit tests to verify how the case is computed for various scenarios.

eblis · 2017-01-16T15:43:04Z

I have modified the pull request such that it doesn't camelize the names if KEEP_UNDERSCORES option is selected. The settings acts like a case selector now, if KEEP_UNDERSCORES is false camel case will be used, if KEEP_UNDERSCORES is true snake case will be used instead.

bhuvnesharya · 2017-01-17T06:29:36Z

Thanks @eblis . I am currently using swagger-codegen-maven-plugin-2.2.1 which internally uses swagger codegen -2.2.1 . May I know when a new release will be coming where I can get these above changes merged into master.

eblis · 2017-01-17T09:19:40Z

It depends if and when this pull request is accepted, the code from this pull request isn't available in the swagger code.
Maybe if there's more demand for the feature it will get in :)

wing328 · 2017-01-18T17:16:11Z

@eblis first of all, thank you for your work. The support for unicode identifier is definitely a nice one and we definitely want to include the enhancement in the upcoming release.

But for the "underscore" feature, I do have the following feedback:

the switch/flag is done inside the camelize function. IMO, that's not the best way to do it because a developer will expect the function to simply convert the input into camel case. A switch/flag inside the function makes the implementation relatively easy but that will completely change the behaviour of the function, which can be used elsewhere such as class naming, function naming, filename naming, parameter naming, etc.
For some generators (e.g. typescript) we do have an option to keep the original naming, e.g.

	modelPropertyNaming
	    Naming convention for the property: 'camelCase', 'PascalCase', 'snake_case' and 'original', which keeps the original name (Default: camelCase)

(ref: java -jar modules/swagger-codegen-cli/target/swagger-codegen-cli.jar config-help -l typescript-node)

If we implement this option for other generators, it should cover what the "KEEP_UNDERSCORE" option provides. Do you agree?

What I would propose is that I'll cherry pick your work related to ALLOW_UNICODE_IDENTIFIERS and add the feature to the latest master while we discuss further on the "KEEP_UNDERSCORE" feature. What do you think?

eblis · 2017-01-18T17:30:55Z

I agree with both statements. If swagger already has an option/flag for snake casing we can use that and make it available for multiple languages (sorry, didn't know about the feature). Cherry picking the changes for Unicode support also sounds like a very good idea. If you have any trouble I can try and modify the original pull request and only keep the Unicode chances. Thank you, Cristi.

…

On Wed, Jan 18, 2017, 19:16 wing328 ***@***.***> wrote: @eblis <https://github.com/eblis> first of all, thank you for your work. The support for unicode identifier is definitely a nice one and we definitely want to include the enhancement in the upcoming release. But for the "underscore" feature, I do have the following feedback: 1. the switch/flag is done inside the camelize function. IMO, that's not the best way to do it because a developer will expect the function to simply convert the input into camel case. A switch/flag inside the function makes the implementation relatively easy but that will completely change the behaviour of the function, which can be used elsewhere such as class naming, function naming, filename naming, parameter naming, etc. 2. For some generators (e.g. typescript) we do have an option to keep the original naming, e.g. modelPropertyNaming Naming convention for the property: 'camelCase', 'PascalCase', 'snake_case' and 'original', which keeps the original name (Default: camelCase) (ref: java -jar modules/swagger-codegen-cli/target/swagger-codegen-cli.jar config-help -l typescript-node) If we implement this option for other generators, it should cover what the "KEEP_UNDERSCORE" option provides. Do you agree? What I would propose is that I'll cherry pick your work related to ALLOW_UNICODE_IDENTIFIERS and add the feature to the latest master while we discuss further on the "KEEP_UNDERSCORE" feature. What do you think? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4508 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAg-HPtvo4bbKX25Hb0foFDlGYzpmiCdks5rTkjkgaJpZM4Lc5LO> .

…nicode characters in identifiers (removed references to keep underscores).

eblis · 2017-01-19T10:01:44Z

I've reworked the pull request to only include changes related to unicode characters support.

wing328 · 2017-01-19T10:05:33Z

@eblis thanks. Looks good to me. I'll perform some tests later and merge if no question/feedback from me or anyone in the community.

wing328 · 2017-01-19T16:47:58Z

Btw, before I forgot, for your upcoming PRs (not this one), I would recommend creating a new branch for the change as per git best practice.

wing328 · 2017-01-20T06:42:29Z

@eblis PR merged into master. Thanks for your contribution.

eblis · 2017-01-20T06:48:16Z

Thank you !

…

On Fri, Jan 20, 2017, 08:42 wing328 ***@***.***> wrote: @eblis <https://github.com/eblis> PR merged into master. Thanks for your contribution. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4508 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAg-HHF2SD7Ne7aRO2Zg0BgGiuWkfr6rks5rUFdYgaJpZM4Lc5LO> .

) * Added option to allow unicode identifiers in class names, method names etc. Added option to allow keeping of underscore characters in class names, methods names, etc. Unicode identifiers are supported by some languages, like Java, but the codegen will remove all non ASCII letters by default. Users might want to separate different parts of names using extended punctuation connector characters, like "_, ‿, ⁀, ⁔, ・, ︳, ︴, ﹍, ﹎, ﹏, ＿, ･". Underscores cannot be used to separate different parts of the name as they're removed by default by the codegen, the second option allows underscore characters if the user so desires. Both options can be used separately if needed, for example one might allow extended punctuation connector characters, but still disallow underscore. * Added new command line options to all required unit tests. * Added KEEP_UNDERSCORES and ALLOW_UNICODE_IDENTIFIERS to Bash tests. * When KEEP_UNDERSCORES is set don't camelize the names, keep the identifier case as is (you probably don't want camel case + snake case, just snake case when KEEP_UNDERSCORES is set). Added unit tests to verify how the case is computed for various scenarios. * Reworked pull request to only include changes related to supporting unicode characters in identifiers (removed references to keep underscores). * These methods and classes can be static again.

eblis added 3 commits January 6, 2017 18:00

Merge branch 'master' of https://github.com/swagger-api/swagger-codegen

2ca8a9e

Added new command line options to all required unit tests.

e67fee2

eblis mentioned this pull request Jan 6, 2017

underscore("_") in property name of swagger.yaml is getting converted to camelCase in generated code of spring through swagger codegen maven plugin #4233

Closed

wing328 added the Enhancement: Feature label Jan 12, 2017

wing328 added this to the v2.2.2 milestone Jan 12, 2017

eblis added 3 commits January 16, 2017 15:50

Merge branch 'master' of https://github.com/swagger-api/swagger-codegen

228e619

Added KEEP_UNDERSCORES and ALLOW_UNICODE_IDENTIFIERS to Bash tests.

b1244ea

When KEEP_UNDERSCORES is set don't camelize the names, keep the ident…

b726987

…ifier case as is (you probably don't want camel case + snake case, just snake case when KEEP_UNDERSCORES is set). Added unit tests to verify how the case is computed for various scenarios.

eblis added 3 commits January 19, 2017 11:34

Merge branch 'master' of https://github.com/swagger-api/swagger-codegen

ab6b509

Reworked pull request to only include changes related to supporting u…

d229475

…nicode characters in identifiers (removed references to keep underscores).

These methods and classes can be static again.

93d0c46

eblis changed the title ~~Options to allow unicode character in identifier names and keep underscore characters~~ Options to allow unicode character in identifier names ~~and keep underscore characters~~ Jan 19, 2017

eblis changed the title ~~Options to allow unicode character in identifier names ~~and keep underscore characters~~~~ Options to allow unicode character in identifier names Jan 19, 2017

wing328 merged commit 3b3f2d2 into swagger-api:master Jan 20, 2017

wing328 changed the title ~~Options to allow unicode character in identifier names~~ Add options to allow unicode character in identifier names Feb 22, 2017

wing328 added the Feature: Codegen label Feb 22, 2017

felixvolz mentioned this pull request May 28, 2017

[Java] snake_case swagger contents are converted to camelCase java code #5721

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add options to allow unicode character in identifier names #4508

Add options to allow unicode character in identifier names #4508

eblis commented Jan 6, 2017 •

edited

Loading

wing328 commented Jan 7, 2017

eblis commented Jan 9, 2017

wing328 commented Jan 9, 2017 •

edited

Loading

eblis commented Jan 9, 2017

wing328 commented Jan 12, 2017

eblis commented Jan 12, 2017

eblis commented Jan 12, 2017

bhuvnesharya commented Jan 16, 2017

bhuvnesharya commented Jan 16, 2017

eblis commented Jan 16, 2017

eblis commented Jan 16, 2017

bhuvnesharya commented Jan 17, 2017

eblis commented Jan 17, 2017

wing328 commented Jan 18, 2017

eblis commented Jan 18, 2017 via email

eblis commented Jan 19, 2017

wing328 commented Jan 19, 2017

wing328 commented Jan 19, 2017

wing328 commented Jan 20, 2017

eblis commented Jan 20, 2017 via email

Add options to allow unicode character in identifier names #4508

Add options to allow unicode character in identifier names #4508

Conversation

eblis commented Jan 6, 2017 • edited Loading

PR checklist

Description of the PR

wing328 commented Jan 7, 2017

eblis commented Jan 9, 2017

wing328 commented Jan 9, 2017 • edited Loading

eblis commented Jan 9, 2017

wing328 commented Jan 12, 2017

eblis commented Jan 12, 2017

eblis commented Jan 12, 2017

bhuvnesharya commented Jan 16, 2017

bhuvnesharya commented Jan 16, 2017

eblis commented Jan 16, 2017

eblis commented Jan 16, 2017

bhuvnesharya commented Jan 17, 2017

eblis commented Jan 17, 2017

wing328 commented Jan 18, 2017

eblis commented Jan 18, 2017 via email

eblis commented Jan 19, 2017

wing328 commented Jan 19, 2017

wing328 commented Jan 19, 2017

wing328 commented Jan 20, 2017

eblis commented Jan 20, 2017 via email

eblis commented Jan 6, 2017 •

edited

Loading

wing328 commented Jan 9, 2017 •

edited

Loading