Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: export uncased identifiers like 日本語 #5763

Closed
leo-liu opened this issue Jun 24, 2013 · 35 comments
Closed

proposal: spec: export uncased identifiers like 日本語 #5763

leo-liu opened this issue Jun 24, 2013 · 35 comments

Comments

@leo-liu
Copy link

@leo-liu leo-liu commented Jun 24, 2013

In go (now v1.1.1), an identifier is exported only if it starts with a character in
Unicode class "Lu" (uppercase letter).

The feature works fine for Western languages, but fails for CJK languages. All CJK
characters are letters but they are not uppercase. Therefore, these are not exported:

    var 成本 int = 5        // Chinese ideograph
    func ぶつける() { ... } // Japanese Hiragana (they are indeed letters)

It is very strange to use, say Z成本 or Jぶつける as identifiers.

I don't know how to properly control the permission. But at least I think it is
preferable to use CJK characters as *uppercase* letters, if we have no other choices
(more keywords, etc.)
@cznic

This comment has been minimized.

Copy link
Contributor

@cznic cznic commented Jun 24, 2013

Comment 1:

IMHO this is working as intended. Also note that such change would not be backward
compatible, ie. it could suddenly export things which were previously package private
(safe from being accessed from other packages).
PS: My native language is also not English, but I, for one, would never ever use a non
English identifier in my code, except perhaps for some occasional Greek letters in math
stuff.
I suggest tag #Unfortunate
@leo-liu

This comment has been minimized.

Copy link
Author

@leo-liu leo-liu commented Jun 24, 2013

Comment 2:

I guess there won't be much old code which uses CJK identifiers, if we support it early.
I myself rarely use non-English identifiers either.  However, I know that CJK
identifiers are often used in some business domain (no proper English translation
exists), in C# and Java.
Unicode identifier support is good, but it may be more useful for CJK languages if we
change a bit.
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 24, 2013

Comment 3:

In language that have no upper/lower case distinction, we need a
special case for exported symbols or a special case for non-exported
symbols.  The current language has a special case for exported
symbols.  I really don't know what is better.  As far as I know there
has never been a clear consensus either way among people who, unlike
me, speak those languages, so we've just muddled along with the
current approach.
As you say, this can not change until Go 2 anyhow.  But if there is a
clear consensus for Go 2, then it could certainly change then.
@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Jun 24, 2013

Comment 4:

We have heard from other Go programmers who are native speakers of these
languages who have suggested to leave things as they are.
Our suggestion at least for the lifetime of Go 1 (and perhaps beyond) is to
use "X" as the canonical exporting prefix in these cases.
Russ
@bradfitz

This comment has been minimized.

Copy link
Member

@bradfitz bradfitz commented Jun 25, 2013

Comment 5:

Labels changed: added go2, priority-someday, removed priority-triage.

Status changed to Thinking.

@chai2010

This comment has been minimized.

Copy link
Contributor

@chai2010 chai2010 commented Jun 25, 2013

Comment 6:

There are some discuss about this topic:
https://groups.google.com/forum/#!topic/golang-china/h_vxbPHaIvw/discussion
I can accept the current exported identifiers rule.
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Aug 4, 2013

Comment 7 by bronze1man:

My native language is Chinese, I never use Chinese as variable name.
Sometimes, you need translate some Chinese concept into English just for a variable name.
I think it is good idea to support Chinese variable name as export name.
@bradfitz

This comment has been minimized.

Copy link
Member

@bradfitz bradfitz commented Nov 9, 2013

Comment 8:

Issue #6745 has been merged into this issue.

@bradfitz

This comment has been minimized.

Copy link
Member

@bradfitz bradfitz commented Nov 9, 2013

Comment 9:

Issue #6745 has been merged into this issue.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Nov 27, 2013

Comment 10:

Labels changed: added go1.3maybe.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Dec 4, 2013

Comment 11:

Labels changed: added release-none, removed go1.3maybe.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Dec 4, 2013

Comment 12:

Labels changed: added repo-main.

@robpike

This comment has been minimized.

Copy link
Contributor

@robpike robpike commented Dec 25, 2013

Comment 13:

A solution that's been kicking around for a while:
For Go 2 (can't do it before then): Change the definition to "lower case letters and _
are package-local; all else is exported". Then with non-cased languages, such as
Japanese, we can write 日本語 for an exported name and _日本語 for a local name.
This rule has no effect, relative to the Go 1 rule, with cased languages. They behave
exactly the same.
@chencun

This comment has been minimized.

Copy link

@chencun chencun commented Aug 1, 2016

@有通知的作用,比如在微博的时候,可以@人让他知道
对应在在编程中,加了@的,表达是公开的,通知可以使用的。
如果可以增加@开头的也是导出的。。。就变成@开头和大写字母开头都是导出的
但是,调用函数等不需要加@,@类似个公开的关键词,只是写在标识符的前面
声明: func @计算成本(){}
调用: 计算成本()
因为默认的不能用符号开头的,所以如果加了这个,也能很好的兼容以前的源码。
英文不好,希望可以翻译给GO的核心开发组人员听。 @leo-liu

@lych77

This comment has been minimized.

Copy link

@lych77 lych77 commented Sep 9, 2016

Many says non-English words are rarely used in practical coding even among people whose native language are not English. That's mainly right. However, please let me give a special "rare" case, for your information.

I work in the online game industry in China, and just like other industries our code are mainly written by programmers. but when it comes to the whole product, it contains resources besides the code. One important portion of online game resources consists of numerical and string values, which plays a vital role within the whole experience of gameplay. One product could contain thousands of such values, and they are provided, with their STRUCTURE, by game designers, not programmers. The programmers should follow the provided structure to use the values. When loaded these data from a file or database, the code could access a certain value by a static (Avatar.HP) or dynamic (Avatar["HP"]) manner. For the purpose of performance and static check, the static way is often preferred, and here comes the problem: the type names and field names, as part of the data structure, are created by the game designers, and they are often not systematically trained and adapted to the programmers' convention. They just compose the values in a spreadsheet editor and tools alike, define type names and field names by sheet names and header of the table, and of course they prefer using native words to describe the logic for clarity, especially for fancied concepts that are often difficult to translate, which are not rare at all in games. Their works are then converted by scripts into a form capable for loading by the program, but the identifiers they defined should anyway be preserved, i.e. when a static manner is adopted, it must involve code generation, and nobody wish to involve manual translation in this step, or to keep a translation dictionary up to date with the revisions of the designers' works . And ... now you will understand what I want to express. With initial characters of unicode category "Lo" not treated as exported, The Go language makes this working process impossible, and forces us either to sacrafice the performance and type safety by uisng the dynamic manner, or to force the designers to use English that they are not accustomed to, or to lose the clarity of logic encapsulation provided by the package system. There's no such obstacle in other programming languages.

@robpike

This comment has been minimized.

Copy link
Contributor

@robpike robpike commented Sep 9, 2016

@lych77 Thank you very much for your thoughtful and helpful message. We appreciate getting a more authoritative contribution to this discussion.

Unfortunately the Go 1 guarantee prevents us from changing this rule now, but if there ever is a Go 2, there could be a change as I described above:

"Change the definition to "lower case letters and _
are package-local; all else is exported". Then with non-cased languages, such as
Japanese, we can write 日本語 for an exported name and _日本語 for a local name.
This rule has no effect, relative to the Go 1 rule, with cased languages. They behave
exactly the same."

This is a fairly minor change to the implementation but could have major effect for Chinese programmers. Please let us know what you think about this idea.

@lych77

This comment has been minimized.

Copy link

@lych77 lych77 commented Sep 9, 2016

@robpike Thanks for your reply. Yes, the underscore way is reasonable. The Go language is already quite popular in China and I would certainly be glad to see it become more popular :)

@linuxjh

This comment has been minimized.

Copy link

@linuxjh linuxjh commented Mar 21, 2017

a dot . preceding filename is used to hide the file in unix.
can such a dot . be used for non exported identifiers for golang?

@griesemer

This comment has been minimized.

Copy link
Contributor

@griesemer griesemer commented Mar 22, 2017

@xHacking A dot "." is not a valid character in an (unqualified) identifier, so if the dot is part of the identifier the answer is no.

I suppose one could use a dot to mark an identifier as non-exported, but not have the dot be part of the name (I haven't looked into whether this might cause syntactic problems elsewhere, but it might not). But this would be a different approach to naming: As is, in Go, by looking at an identifier we can tell right away if it is exported or not. With an identifier-external marking scheme that would not be true anymore. Also, by default (no dot) identifiers would be exported, which is probably not what we want.

@bradfitz

This comment has been minimized.

Copy link
Member

@bradfitz bradfitz commented Mar 22, 2017

@xHacking, "hidden" dot files is one of @robpike's top disliked things about Unix, FWIW.

@mattn

This comment has been minimized.

Copy link
Member

@mattn mattn commented Mar 22, 2017

I don't feel benefit from this. And I guess this won't be useful if Go will handle unicode case folding for exporting because we don't like to use properly for and . I think that we should use _ for this. Dot seems breaking something for me.

@chencun

This comment has been minimized.

Copy link

@chencun chencun commented Mar 22, 2017

@mattn The problem is not aand A,Some professional term can't describe in English.Such as the objects in the game or industry. like 五行 ,fiveline?the five elements ?No,No.Literal translation is wrong, and unable to express the original meaning

@mattn

This comment has been minimized.

Copy link
Member

@mattn mattn commented Mar 22, 2017

@chencun do you mean ? certainly, it is not valid for identifier. https://play.golang.org/p/DTwt4Qs-cO

@chencun

This comment has been minimized.

Copy link

@chencun chencun commented Mar 22, 2017

@mattn No,I'm just saying that 五行 can't correct expression in English。This is a simple example, the 五行 is the unique Chinese culture,There are many other industry words。My English is not good, reply is translated in Chinese, I'm sorry about this。

@chencun

This comment has been minimized.

Copy link

@chencun chencun commented Mar 22, 2017

Korean, Japanese, and there is a lot of proper nouns,Can't use the full expression in English

@mattn

This comment has been minimized.

Copy link
Member

@mattn mattn commented Mar 23, 2017

@chencun Sorry, I don't understand what you mean. Don't you like _五行 as identifier?

@chencun

This comment has been minimized.

Copy link

@chencun chencun commented Mar 23, 2017

@mattn Sure,We want to directly can enter CJK as a public variable, rather than private,And _ also don't belong to CJK.If 五行 can be used as a public, than _五行 can be used in private.When we write code, public variables that will be more friendly in ide.When as a public API, It would read better.

@mattn

This comment has been minimized.

Copy link
Member

@mattn mattn commented Mar 24, 2017

@chencun Ah, sorry. I was confused. I thought that current implementation make it public.
As I remember 五行 was public. Hmm, I have no idea. So we should name it as Do五行() or Do火水木金土() in current spec as you mentioned.

@chencun

This comment has been minimized.

Copy link

@chencun chencun commented Mar 24, 2017

yes, so, here we recommend to support it for the public! under the current 1.0 standard, The implementation code is ugly。We don't want to a variable mixture of English and Chinese。

@rsc rsc changed the title spec: CJK identifiers are not exported proposal: spec: export uncased identifiers like 日本語 Jun 17, 2017
@rsc rsc added the Proposal label Jun 17, 2017
@astaxie

This comment has been minimized.

Copy link

@astaxie astaxie commented Jul 19, 2017

Hey, I am the organizer of GopherChina. I create the biggest China Gopher community. gocn.io @mpvl reached to me today and mentioned this issue. I did a poll in our Gopher wechat groups.

Title: Do you want to use Chinese name variable or function

  1. No
  2. Yes, and public
  3. Yes, but private

Here is the result:

  1. 94.7%
  2. 3.6%
  3. 1.8%

I hope this poll will help you to make decision. This poll just passed 2 hours. But the results have been very obvious.

image

image

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Oct 2, 2017

Today identifiers are exported if they begin with an upper-case letter. This issue proposes to change the rule to be unexported if they begin with a lower-case letter. The effect would be that identifiers beginning with uncased identfiers, such as var 成本 int in the original report, would become exported instead of (as they are today) unexported.

The advantage of changing the rule is that exported identifiers need not all begin with some throwaway cased letter. As also noted in the original report, “It is very strange to use, say Z成本 or Jぶつける as identifiers.”

It seems that there are two main disadvantages of changing the rule.

The first disadvantage is that it will have the effect of retroactively exporting many identifiers, which we might finesse as a not quite breaking change but is at least an unexpected change that would likely require changing essentially all code written using uncased identifiers for top-level consts, funcs, types, and vars, as well as fields of exported types, and expecting those identifiers to be unexported.

The second disadvantage is that it makes the "default export behavior" of an identifier essentially language-dependent in the following way.

When I program using English, I and probably most other programmers write "data" by default but must give an explicit signal - capitalizing the d to get Data - in order to export. (As evidence of this, consider function argument names or local variables, where the choice doesn't matter: essentially everyone defaults to lower case.) When we chose the export rules, we decided intentionally that exporting requires an explicit signal. In fact, the original proposal that was made to us was to export everything by default and use a leading underscore to mean unexported (following a convention from Python). We used upper-case for export instead of underscore for unexported specificall to make exporting something “opt-in” instead of “opt-out,” so that programmers (in this case, using English) would not export fields without making an explicit decision to do so.

If we make uncased identifiers exported by default, the effect will be that programmers writing programs in uncased languages will export by default and be required to reach for an explicit signal to unexport (that is, exporting will be “opt-out”), which is different from cased languages and exactly what we rejected way back in January 2009 for English. While being sensitive to the fact that I am not a native speaker of an uncased language, it nonetheless seems wrong to me for Go to adopt for uncased languages the exact behavior we rejected for English.

To summarize, the two disadvantages to changing the casing rules are (1) it will break a lot of things, and (2) it's probably wrong from a large-scale software engineering point of view, because it makes exporting “opt-out” for some languages (and not others).

I've been willing to try to work around (1), but I only recently realized the full import of (2). The combination of these suggests to me that we should not take this approach, and that 成本, ぶつける, 数据 should remain unexported just like "data".

Even if we decide not to change the uncased export default, though, it may be that we should still address the original objection that “It is very strange to use, say Z成本 or Jぶつける as identifiers,” and we should make sure we have the ability to do so. There are other explicit signals we could adopt. I'm going to enumerate a few below, but this is not intended as a complete list. The point here is that there are things we can do other than changing the default exportedness of uncased identifiers.

A special symbol for marking an uncased identifier as exported could be introduced. For example, 数据 is unexported but maybe $数据 is exported. Uses from other packages would still need to say p.$数据, but that might be less jarring than p.Z数据, because it's a symbol not a roman letter. (Let's not worry about the specific choice of symbol ($) here; it's just a placeholder to discuss the approach. Let's also not worry about the fact that the package is almost certainly not going to be named p if the exported identifier is named 数据; p is a placeholder too.)

An extension of that, suggested by @robpike, would be to require the exporting symbol only at the declaration site, so that var 数据 int is unexported, var $数据 int is exported, but uses outside the package can refer to p.数据 instead of p.$数据. That would essentially completely address the objection to Z数据, at the cost of giving up the property that you can see at each use site whether the identifier being used is an exported or unexported one. (This property is nice - one of my favorites as an English-speaking programmer - but was not intended in the original design, and it may simply not be cost-effective to preserve in uncased identifiers.)

A further extension, suggested by @griesemer, would be to make the exporting-at-declaration-time symbol a period, as in var .数据 int, matching the eventual use p.数据.

I'd like to explicit not discuss these alternatives further yet but instead loop back to whether we should change the default to make uncased identifiers be opt-out.

I propose that we agree not to change the default rules for uncased identifiers for Go 2 and instead agree to consider only non-breaking changes, based on (1) and (2) above and also the new point (3) there appear to be decent alternatives that avoid those two problems.

The reason I want to reach this partial agreement on the (non-)solution space is that another thing we are considering is to expand the identifier set to allow combining characters (#20706), and the main effect would be to introduce many more identifiers in uncased languages. If we are going to make a breaking change to the exporting of uncased languages, we should do it before expanding the identifier set, to limit the breakage.

If we agree not to make a breaking change to the exporting of uncased languages, then the resolution of a new export signal for uncased languages and the expansion of the identifier set can proceed essentially completely independently.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Oct 2, 2017

In short, a proposal for how to proceed here:

Let's leave uncased identifiers unexported and find non-breaking ways to address the "Z成本 or Jぶつける are strange identifiers" problem.

Please thumbs up/thumb down/respond to that specific steering suggestion, but let's defer discussion of details of specific alternatives for the moment. Thanks.

@ChimeraCoder

This comment has been minimized.

Copy link
Contributor

@ChimeraCoder ChimeraCoder commented Oct 4, 2017

I've done some work regarding interoperability between code written in Latin and non-Latin scripts. As we design an exporting scheme for uncased languages, one thing we want to keep in mind is that this process requires round-trippable translations of identifiers, given a valid dictionary.

For example, given a stable dictionary mapping between (ঘন্টাhour) we should be able to discern that the exported $ঘন্টা would map to Hour, and not $hour (or even $Hour). Disallowing the $ symbol for cased languages would probably be sufficient here. Preserving that invariant makes it much easier to write code or tools that can be shared between languages, which fits with the overall goal of this proposal.

(There are some edge cases involving identifiers with mixed alphabets, but that's a rather thorny and niche edge case for automatic dictionary translations to begin with, even without bringing case-based exports into the mix).

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Oct 9, 2017

OK, closing this issue in favor of #22188, which is explicitly about not changing the existing rules.

@linuxjh

This comment has been minimized.

Copy link

@linuxjh linuxjh commented Jun 1, 2018

maybe this can be solved in some other project's own coding rules

// if you don't understand the requirement and abstract the concepts very well and cant' come up with good names

// E for Export
var E_成本1 double
var E_成本2 double
var E_成本3 double

// or use a Getter
func Get_成本() double

some time some coder can't come up with a good name in English, this has nothing to do with the programming language.

variable or function names in Chinese are not used so much in source code. other part of source code are still not Chinese: if, for, func, return. you don't use variable or function names in Chinese in C, C++ too.

@golang golang locked and limited conversation to collaborators Jun 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
You can’t perform that action at this time.