Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict flag names to ASCII #4696

Closed
wants to merge 1 commit into from
Closed

Conversation

tfausak
Copy link
Collaborator

@tfausak tfausak commented Aug 15, 2017

  • Patches conform to the coding conventions.
  • Any changes that could be relevant to users have been recorded in the changelog.
  • The documentation has been updated, if necessary.

This pull request restricts flag names to ASCII (see #4686). It's not done yet, though. I have a few issues with it:

  • There's both a ReadP-based parser and a Parsec-based parser. Having both seems like a good way for them to get out of sync with each other. I don't know enough about Haskell's parsers to be able to combine them.
  • Even with these changes, I couldn't get flags with leading hyphens to fail parsing. It seems like they should fail, so I must be missing something.
  • Trying to parse a Unicode flag fails for an unrelated reason.

For an example of the second point, consider this Cabal file:

name: example
version: 1
build-type: Simple
cabal-version: >= 1.2
flag -leading-hyphen
library

It build successfully even though I expected it to fail to parse.

For an example of the third point, consider this Cabal file:

name: example
version: 1
build-type: Simple
cabal-version: >= 1.2
flag unicode-letter-א
flag unicode-number-⓪
library

It fails, but the error isn't quite what I expected:

Warning: example.cabal:0:0: "the input" (line 5, column 6):
unexpected character in input '\215'
expecting section parameter, "{", field or section name or end of file
cabal: Failing parsing "./example.cabal".

I am not familiar enough with Cabal's code base to quickly diagnose these problems. I'd appreciate some help sorting them out.

@mention-bot
Copy link

@tfausak, thanks for your PR! By analyzing the history of the files in this pull request, we identified @phadej, @ezyang and @hvr to be potential reviewers.

@hvr
Copy link
Member

hvr commented Aug 15, 2017

Oh dear... can you please state the rationale for disallowing Unicode in the .cabal spec, when this is the exact opposite of my agenda? Specifically, what problem does this solve?

@phadej
Copy link
Collaborator

phadej commented Aug 15, 2017

The unicode flag should be specified as flag "unicode-number-⓪", it seems that parsec's section argument ident lexer is too strict atm.

@hvr, should one be able to write

executable юникод
   ....

what Cabal spec says?

@tfausak
Copy link
Collaborator Author

tfausak commented Aug 15, 2017

@hvr: I decided to open this pull request based on #4686 (comment):

encourage changing Cabal to outright reject flag names that cabal check would warn about?

I'd accept a patch that did that.

@phadej: I didn't know flag names could be quoted. Thanks! I'll see how this branch handles quoted flags with Unicode characters in them.

@tfausak
Copy link
Collaborator Author

tfausak commented Aug 15, 2017

Putting the flag name in quotes changes the result, but it still fails for what seems like an unrelated reason:

# master
Warning: example.cabal:5:17:
unexpected '\147'
expecting white space or end of input: "unicode-number-\226\147\170"
cabal: Failing parsing "./example.cabal".

# this branch
Warning: example.cabal:5:16:
unexpected '\226'
expecting white space or end of input: "unicode-number-\226\147\170"
cabal: Failing parsing "./example.cabal".

I don't know; maybe that's the expected output.

@hvr
Copy link
Member

hvr commented Aug 15, 2017

I think we're conflating two things here. There's the .cabal specification, which specifies what a .cabal file supports, and then there's (seemingly arbitrary) restrictions imposed by Hackage, which are policy restrictions. So while Unicode package names aren't allowed on Hackage for usability, portability, and standardisation reasons (since Hackage is our central point of distribution for Cabal users); there's no reason why I shouldn't be able to i18n my executable/package/module/flag names (if you've ever been to German or French software shops, you may know what I mean). And in fact, I've got a couple open unicode-related Cabal issues I still have on my agenda.

Here's an example of a .cabal file I want to be able to use with cabal -- and cabal 2.0 is able to parse it properly; (I posted this example already in #4687 but it got hidden as the commit was superseeded):

name:                無
version:             0
synopsis:            The canonical non-package 無
build-type:          Simple
cabal-version:       >=1.10

source-repository head
  Type:     git
  Location: https://github.com/hvr/-.git

flag 無
  description: 無

library
  default-language:    Haskell2010

  exposed-modules: Ω

  if !flag(無)
    buildable:False

And your suggested change would deny me this.

PS: @phadej, yes (in this context) I'd expect to be able to use executables whose names aren't limited to the ASCII subset :-)

PS2: Note that the Haskell Report is specified very Unicode-friendly; identifiers (including module names) are deliberately allowed to use more than just the ASCII subset.

@23Skidoo
Copy link
Member

if you've ever been to German or French software shops, you may know what I mean

Hmm, it is a good argument against this change. Though I still think that keeping Hackage-accepted and Cabal-accepted formats as close as possible is a desirable goal.

@hvr
Copy link
Member

hvr commented Aug 15, 2017

@23Skidoo Well, it's the same format, Hackage just restricts the allowed charset for interop reasons; it doesn't restrict the format (ok, actually it does restrict the format somewhere else for different reasons, but let's not go there).

@tfausak
Copy link
Collaborator Author

tfausak commented Aug 15, 2017

I am suggesting that Cabal and Hackage agree with each other. Either: (a) it's useful to have Unicode flag names, and therefore Hackage should allow them; or (b) it's troublesome to deal with Unicode flag names, and therefore Cabal should disallow them. My preference is for restricting Cabal rather than broadening Hackage, but ultimately I just want them to be the same.

Put another way: Either these German/French software shops should be able to release their software as-is on Hackage or they should have to use ASCII for compatibility's sake.

And to support my ASCII viewpoint: Do you want to allow flag names like ーⅾⅰⅿ (U+30FC U+217E U+2170 U+217F)?

@hvr
Copy link
Member

hvr commented Aug 15, 2017

@tfausak if you give me such a false dilemma, I'd go with (a)

PS: Note that I don't agree to either (a) nor (b) because I consider this a false dilemma.

@tfausak
Copy link
Collaborator Author

tfausak commented Aug 15, 2017

Great! How can I propose changing Hackage to allow flag names with Unicode characters?

@hvr
Copy link
Member

hvr commented Aug 15, 2017

@tfausak it's a little known fact, but Hackage already allows it; the point of the recently introduced flag warning was to discourage this before somebody noticed (the main goal of cabal check is to encode Hackage restrictions imposed on top of the cabal spec for the purpose of distribution; but please, let's not start questioning this as well).

@tfausak
Copy link
Collaborator Author

tfausak commented Aug 15, 2017

I'm confused now.

  • Cabal allows this.
  • Hackage allows this.
  • cabal check encodes Hackage restrictions.
  • cabal check complains about this.

Why does cabal check complain about this if Hackage allows it?

@23Skidoo
Copy link
Member

Why does cabal check complain about this if Hackage allows it?

I think that what @hvr meant is that Hackage will stop doing that once the production server has been rebuilt against a version of Cabal that includes that check.

@23Skidoo
Copy link
Member

So we discussed this with @hvr a bit more and decided that it could be useful to have a use strict-like directive for .cabal files that'd tell Cabal to reject non-portable constructs more eagerly. He said that he will file an enhancement request.

@tfausak
Copy link
Collaborator Author

tfausak commented Aug 15, 2017

I can't say I'm a fan of strictness pragmas, but if that's the only way forward here then so be it.

@tfausak tfausak closed this Aug 15, 2017
@tfausak tfausak deleted the gh-4686-flags branch August 15, 2017 17:54
@23Skidoo
Copy link
Member

23Skidoo commented Aug 16, 2017

@tfausak Sorry for changing my mind on this. You (or anyone else) are welcome to work on a patch adding a strictness pragma for .cabal files if you want to continue improving Cabal in this area. @hvr promised to write up an enhancement request with his ideas regarding how such a pragma could work.

@tfausak
Copy link
Collaborator Author

tfausak commented Aug 16, 2017

I am interested in seeing the spec/idea, but at this point I don't think strictness pragmatic are the right solution.

tfausak added a commit to tfausak/cassava that referenced this pull request Sep 8, 2017
This flag name was changed without comment in 6e1ff78. It has caused a few problems:

- commercialhaskell/stack#3345
- commercialhaskell/stackage#2755
- commercialhaskell/stackage#2759
- commercialhaskell/stackage#2842
- haskell/cabal#4686
- haskell-hvr#150

Those problems either caused or accelerated some (attempted) changes in Cabal:

- haskell/cabal#4654
- haskell/cabal#4687
- haskell/cabal#4696

Those problems also caused a change in Stack:

- commercialhaskell/stack#3349

In short: Cabal never had any trouble with this. Stack did. The current master version of Stack can handle flags like this, but no released versions of it can. It's not clear when the next version of Stack will be released. In the meantime, no Stack users can use this package. This commit changes the offending flag to something that Stack can handle. By using a flag that Stack can handle, Stack users can once again use this package, and it can return to Stackage. There are no apparent downsides to using a more compatible flag name.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants