Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove non-ascii name #4103

Merged
merged 1 commit into from
Feb 29, 2024
Merged

Conversation

soulomoon
Copy link
Collaborator

Fix #4102
Since hackage does not accept non-ascii filename.

@hasufell
Copy link
Member

this should be backported to the 2.7.0.0 release branch

@hasufell
Copy link
Member

Since hackage does not accept non-ascii filename.

No, because it's a bad idea. There's a reason hackage rejects it. It's not portable and tar doesn't specify what to do with non-ascii characters. That means they land uninterpreted on your hard drive (and possibly incompatible with your local encoding). If you go the interpretation route (always store in tar as UTF-8, then try to convert to local encoding), you'll get into all sorts of other problems (no byte equivalence of filepaths etc.).

@fendor
Copy link
Collaborator

fendor commented Feb 29, 2024

this should be backported to the 2.7.0.0 release branch

It is not necessary for the HLS release to Hackage, cabal sdist all seems to "conveniently" ignore that module already.
Also, it seems like we were missing a couple dozens of extra-source-files, I have added #4105 to fix that.

@hasufell
Copy link
Member

cabal sdist all seems to "conveniently" ignore that module already.

How so? My tests show that it incorrectly truncates. That will cause failure to run cabal test.

@fendor
Copy link
Collaborator

fendor commented Feb 29, 2024

That will cause failure to run cabal test.

Yeah that is true. It seems we have been missing a couple of extra-source-files and most of our releases can't be tested from a cabal get.

@hasufell
Copy link
Member

That will cause failure to run cabal test.

Yeah that is true. It seems we have been missing a couple of extra-source-files and most of our releases can't be tested from a cabal get.

For the plugins, you have plugins/**/*.hs in extra-source-files, which includes all the test data. If you look at my post here you can see the evidence that cabal is not ignoring the module, but storing it as TModula without a file extension.

@fendor
Copy link
Collaborator

fendor commented Feb 29, 2024

Indeed, it truncates, but it looks like cabal get not cabal sdist don't choke on the module name. But I suppose, it depends on the tar version of the host system that uploads the file?

Either way, the tests will definitely fail.

@hasufell
Copy link
Member

Indeed, it truncates, but it looks like cabal get not cabal sdist don't choke on the module name.

Well, cabal get just unpacks the already "malformed" (filename truncated) tar archive. If you manage to upload a unicode filename containing tar archive to hackage somehow, it'll truncate on unpacking.

This would become an absolute nightmare if hackage allowed unicode filenames and some cabal versions handle it correctly and others not.

@fendor
Copy link
Collaborator

fendor commented Feb 29, 2024

This would become an absolute nightmare if hackage allowed unicode filenames and some cabal versions handle it correctly and others not.

I agree, and we are getting rid of it!

@soulomoon soulomoon merged commit f4a36ed into haskell:master Feb 29, 2024
35 checks passed
@soulomoon
Copy link
Collaborator Author

soulomoon commented Feb 29, 2024

Since hackage does not accept non-ascii filename.

No, because it's a bad idea. There's a reason hackage rejects it. It's not portable and tar doesn't specify what to do with non-ascii characters. That means they land uninterpreted on your hard drive (and possibly incompatible with your local encoding). If you go the interpretation route (always store in tar as UTF-8, then try to convert to local encoding), you'll get into all sorts of other problems (no byte equivalence of filepaths etc.).

Thank you for the explanation, so the universal support for utf-8 for filename just sucks :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Non-ascii module names break cabal sdist
3 participants