Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Go 2: permit file based overloading of unexported names #39530

Closed
henryas opened this issue Jun 11, 2020 · 13 comments
Closed

proposal: Go 2: permit file based overloading of unexported names #39530

henryas opened this issue Jun 11, 2020 · 13 comments

Comments

@henryas
Copy link

@henryas henryas commented Jun 11, 2020

Hi,

I would like to propose polymorphism for unexported ad-hoc types and functions.

Problem

In my project, we lump together the user interface controllers (controller as in MVC architecture) under a single package. Each controller has its own models. The models are unexported, and the users just need to know about the controllers. The package is rapidly becoming unwieldy with these unexported models and helper functions. We are running out of names and have to assign very long names for the newer models and helper functions, such as accountingJournalEntryAdditionPageAccountDetailModel. It is starting to affect the code readability as well as the rate of typo error. The thing is many of these helper types and functions are just file-specific assistive types and functions. They mean nothing to the rest of the package.

We have experimented reorganizing the package into smaller packages, so that we can use shorter names for these helper types and functions. The problem is not everything can be broken down into smaller packages. In our case, it results in the fragmentation of the logical package organization. From the user's point of view, it looks weird too because the number of exported types and functions are few and now you are breaking them further into more packages. What they don't realize is that behind those exported types and functions, there exist tons of helper types and functions, and we are trying to organize those internal stuffs but we end up splitting the exported stuffs too.

We have also explored using the internal folder to organize the internal stuffs. We ended up with many packages under the internal folder. The drawback of this approach is that those internal packages lose the direct connection to the types and functions where they are used. Remember that many of those internal items are ad-hoc types and functions built to support specific exported types and functions. Now they are located in another files and under different folders, away from where they are used. It also seems like a complicated way to do things just so that we can use shorter names and avoid naming collision.

Proposal

Therefore, I am proposing polymorphism for unexported ad-hoc types and functions. Syntax-wise, there is no change. What this proposal is suggesting is that if there are two unexported types (or functions) with the same name, the one in the same file as the calling function should take priority.

For example, let's say I have UserProfilePageController in user_profile_page.go . When UserProfilePageController instantiate pageModel, it should look for pageModel in its own file first (user_profile_page.go), before looking for pageModel in the rest of the package.

Now we will cover the other cases. What if the UserProfilePageController has no definition of pageModel in its own file, but discovers more than one pageModel in the package? Which one should it use? In this case, the compiler should just flag an error and refuse to compile. This automatically implies that exported types and package-level types should have unique names, but internal ad-hoc types are not.

What if UserProfilePageController uses pageModel from its own file, but it also wants to use pageModel from another file in the same package? The answer is: you can't. You need to rename one of them into something else. Remember that package-level types must have unique name. So if you want to use the package level pageModel, there must only be one pageModel within the package. When you define another pageModel in the package, you essentially demote the package level pageModel into ad-hoc pageModel.

The last obvious rule is that the names of types and functions in the same file must be unique. This means that you cannot have more than one pageModel within the same file.

Summary

Implementation-wise, the proposal boils down to the following rules:

  1. If there is a call to an unexported type or function, and there are more than one such definition with the same name in the package, the one in the current file should take priority.
  2. If there is no such definition in the current file, use the definition else where in the package, provided that there is only one such definition in the package.
  3. If there are more than one such definition in the package, throw an error. If there is no such definition in the package, also throw an error.
  4. The names of types and functions in the same file must be unique.

Let me know what you think. Thanks.

Henry

@gopherbot gopherbot added this to the Proposal milestone Jun 11, 2020
@gopherbot gopherbot added the Proposal label Jun 11, 2020
@martisch
Copy link
Member

@martisch martisch commented Jun 11, 2020

Apart from other issues: how would one test multiple adhoc functions without cluttering the namespace again with unique name alias definitions to be used in "_test" files?

@networkimprov
Copy link

@networkimprov networkimprov commented Jun 11, 2020

This is at least the third time this has come up, see #35387 and #33202.

I support the general idea. I already use _name for internal-to-file symbols. But I think a declaration modifier is less confusing than polymorphism.

file  func f() error { ...
local func f() error { ...
_     func f() error { ...

Alas, there may need to be 10x more reports about this to prompt a solution.

@ianlancetaylor ianlancetaylor changed the title Proposal: Polymorphism for Unexported Ad-Hoc Types and Functions proposal: Go 2: permit file based overloading of unexported names Jun 11, 2020
@henryas
Copy link
Author

@henryas henryas commented Jun 12, 2020

Apart from other issues: how would one test multiple adhoc functions without cluttering the namespace again with unique name alias definitions to be used in "_test" files?

In my opinion, you should not test these ad-hoc types and functions directly. These ad-hoc items are partial definition for something more concrete. They themselves are incomplete. They are specific utilities built specifically as a part of a larger algorithm. When taken out of context, they don't mean anything. You should test that larger algorithm instead. What this means is that you should test the exported and the package-level types and functions instead.

In addition, since these ad-hoc items are built specifically as a part of a larger context, and not as general purpose items, they are usually volatile. They change more frequently than the exported and the package level items. If you test these ad-hoc items directly, your tests become fragile.

That being said, there is nothing stopping you from testing these ad-hoc items. We can do something like filename.go and filename_test.go where the ad-hoc items are accessible to test file using the same name of the file where the ad-hoc items are defined, as long as the test file is also under the same package since the ad-hoc items are unexported.

However, if it were up to me, I would take the simplest approach by disallowing tests of ad-hoc items. As soon as the test calls for a function or a type where there are multiple definitions, the compiler should throw an error and stop compiling the tests. I think it is best to stick to this simpler approach first and see it goes. If there is an interest in ad-hoc item testing, then we can figure something out later.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 16, 2020

Perhaps you could use generics or code generation or methods to address this problem. Hard to know for sure without more details. (An example might help.)

This proposal would complicate the language for a highly specific and unusual use case. In general this seems more like a design problem than a language problem.

@henryas
Copy link
Author

@henryas henryas commented Jun 17, 2020

Hi Ian,

Thanks for joining the discussion. The problem is not so much about the type visibility. It is more about naming issue, which in turn affects readability.

You are right that this is more about design than language problem. However, Go brought about modularization improvement over C. Modularization is technically a design problem than a language problem, and this proposal's problem is kind of modularization issue.

What this proposal suggests is akin to variable shadowing, but applied to types and functions. You can reuse names and the compiler would be able to infer which one you are referring based on their scopes.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 17, 2020

It has some similarities to shadowing, but it's not the same. Given three files, if two of them define X, then the third cannot refer to X. That's not shadowing.

@henryas
Copy link
Author

@henryas henryas commented Jun 17, 2020

They are similar in that they allow greater freedom in naming local ad-hoc items. That is the general idea.

They are not exactly the same because with variables it is easy to demarcate the scope, which Go already supports. The same cannot be said with types and functions, because the narrowest scope Go supports for types and functions is package level scope.

One alternative is to support a new visibility level and add a new keyword to mark the new scope, such as @networkimprov 's suggestion. I tried to avoid introducing new keywords into the language specs, and hence the limitation as @ianlancetaylor mentioned.

Nonetheless, I am open to ideas in case if someone has better suggestions, and feel free to link similar proposals.

@networkimprov
Copy link

@networkimprov networkimprov commented Jun 17, 2020

I suspect the only way to convince the Go team that a solution would be widely useful is research on a code corpus. We have to show that a significant fraction of multi-file packages use symbols that only appear in a single file, and are named in a way that indicates only-this-file scoping. A good place to start would be a search for package level symbols beginning with underscore.

Alternatively, make and publish a preprocessor that finds certain kinds of symbols and prefixes them with the source file name. If people adopt it widely, that would be noticed. That might help your project more than anything you can accomplish here ;-)

@henryas
Copy link
Author

@henryas henryas commented Jun 17, 2020

I did a quick look at the net/http package, I already found things like headerNewlineToSpace, headerSorter, headerSorterPool. These are http header's specific items.

In my own project, I have something like: JournalEntryAddPage, which is supported by journalEntryAddPageModel, journalEntryAddPageAccountModel, journalEntryAddPageDetailModel, journalEntryAddPageAccountModelByType, journalEntryAddPageAccountModelByName. Now that is just one page. When you have more pages under the package, you end up with many of those long-named items. If you try to split the package, you end up with numerous sub-packages that consists of just one or two pages per package, which is messy and awkward.

By adding these name prefixes, it seems like we are trying to reinvent some kind of finer-grain modularization that is currently not supported by Go, and to avoid naming collision (or perhaps to anticipate such collision). I wish there is a greater freedom to name these local items so that they read more naturally.

An example of frequent typo error due to these long names is when you use journalEntryAddPageAccountModel where you are supposed to use journalEntryEditPageAccountModel.

Under this proposal suggestion, in the net/http header example, you can just name newlineToSpace, sorter, sorterPool for use in the http header file. In another file, when you need to create another sorter, you can call it sorter without worrying whether it has been used elsewhere.

If nobody defines their own sorter and they all rely on the original sorter, then that sorter becomes a package-level utility. The original author probably didn't expect his sorter to find utility beyond his intended usage, and he didn't need to do anything anticipate that. There is no extra keyword to add. It is code as you go along. If someone else finds it useful, then it is good. Otherwise, it is fine.

Now that the sorter becomes popular and used in multiple files, and someone else comes along tries to define his own sorter to beat this popular sorter. The compiler will then refuse to compile and say "Sorry, you are outvoted. You will have to rename your sorter to something else".

At least, this is how I envision it to be.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 23, 2020

As discussed above, this proposal introduces a new kind of scoping into the language, one that is unlike any other kind of scoping. A name that is defined in only one file is visible in package scope, but a name that is defined in two files is visible only in file scope. There isn't much evidence that this is a common problem. The way that the scope changes based on whether there are multiple definitions seems likely to be error-prone. Therefore, this is a likely decline.

That said, one option might be to use types and methods as a form of namespace. Instead of XFoo, XBar, and YFoo, YBar, write something like

type X struct{}
func (X) Foo() {}
func (X) Bar() {}

type Y struct{}
func (Y) Foo() {}
func (Y) Bar() {}

Here X and Y are types that represent namespaces, and are not meaningful in themselves. I don't know whether this helps much.

@henryas
Copy link
Author

@henryas henryas commented Jun 24, 2020

As discussed above, this proposal introduces a new kind of scoping into the language, one that is unlike any other kind of scoping. A name that is defined in only one file is visible in package scope, but a name that is defined in two files is visible only in file scope. There isn't much evidence that this is a common problem. The way that the scope changes based on whether there are multiple definitions seems likely to be error-prone. Therefore, this is a likely decline.

That said, one option might be to use types and methods as a form of namespace. Instead of XFoo, XBar, and YFoo, YBar, write something like

type X struct{}
func (X) Foo() {}
func (X) Bar() {}

type Y struct{}
func (Y) Foo() {}
func (Y) Bar() {}

Here X and Y are types that represent namespaces, and are not meaningful in themselves. I don't know whether this helps much.

The examples you gave apply only to grouping functions. How about data types? In my earlier reply, I mentioned data types, such as headerSorter, headerSorterPool, in the net/http package. How common is it for us to have to name our data types with certain prefix to introduce some sort of grouping or scoping?

The problem with using naming scheme as a workaround is that, first, you end up with long and sometimes unnatural names. Second, it requires foresight. When you designate certain types to belong to a certain group (via naming scheme) and those data types turn out to be useful for others in the package, it becomes awkward to re-use those types due to their prefixes. You can rename them so that they read more naturally, but that means you have to fix your earlier code.

By using the solution in this proposal, you can freely name those types as you code, with little to no assumption about its future utility. It resolves conflicts by using adoption rate. When a type is already adopted as file scoped (more than one file create their own types with the same name), then the name is taken for file scoped types. When a type is adopted as package scope (more than one file use the type), then the name is reserved for that particular type. Later types will have to use another names.

In addition, these inner types are supposed to be the inner-facing interface. One should be able to freely name them, unlike the outer-facing interface which are exposed and requires much more thoughts in terms of naming, consistency, and design.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 25, 2020

For data types, you can use a struct.

I'm not trying to claim that these are perfect solutions. But introducing a new, different, error-prone form of scoping into the language is not a perfect solution either.

I note also that the emoji voting on this issue is not in favor.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jul 21, 2020

No change in consensus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.