New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

windows PATH_MAX still a problem #3972

Closed
bitc opened this Issue Oct 11, 2016 · 12 comments

Comments

Projects
None yet
6 participants
@bitc

bitc commented Oct 11, 2016

Issue #3430 was about exceeding PATH_MAX because of long hashes in package-ids.

I am running cabal-install built with #3431 and #3954 applied and am running into a similar issue.

This time the problem is because PATH_MAX is exceeded due to lots of nested directories within the "dist-newstyle" directory. The problem happens when the following 2 condition are met:

  • The users project directory is "medium" length (such as C:\Users\bitc\stuff\experiments\projects\testing\test-project)
  • One of the dependencies has a very long name (such as "data-default-instances-containers")

Here is what happens during cabal new-build:

During the build of the dependency(data-default-instances-containers), suspicious haddock warnings shed light on a very long path:
C:\Users\bitc\stuff\experiments\projects\testing\test-project\dist-newstyle\tmp\data-default-instances-containers-9408\data-default-instances-containers-0.0.1\dist\doc\html\data-default-instances-containers

and the dependency finally fails with:

ghc-pkg.exe: dist\package.conf.inplace\:
openBinaryTempFileWithDefaultPermissions: does not exist (No such file or
directory)

The above error is a sign of a PATH_MAX problem.

In my situation, all of the other dependencies built ok. But the length of the string "data-default-instances-containers" is the longest of all of the dependencies, and apparently is just above the threshold that causes it to fail to build. When I move my project directory down a few levels in the filesystem (making it shorter), everything builds fine.

Recommendation: Dependencies should not be built inside "dist-newstyle/tmp" but rather in some temporary directory completely outside of the project dir, probably in some system tmp dir (which can hopefully be assumed to be quite short)

@dcoutts

This comment has been minimized.

Show comment
Hide comment
@dcoutts

dcoutts Oct 11, 2016

Member

Or alternatively, we can put the dist dir somewhere else, so it's not nested, e.g.

$projroot\dist-newstyle\tmp\data-default-instances-containers-0.0.1\

much like we have for local projects, which would be

$projroot\dist-newstyle\build\data-default-instances-containers-0.0.1\

We also don't need to duplicate the names in tmp, e.g. we could use

$projroot\dist-newstyle\tmp\src-9408\data-default-instances-containers-0.0.1\

Anyway, point is we can make these ones no worse than the ones we have already for local inplace packages, which have their per-pkg dist dirs under dist-newstyle.

Member

dcoutts commented Oct 11, 2016

Or alternatively, we can put the dist dir somewhere else, so it's not nested, e.g.

$projroot\dist-newstyle\tmp\data-default-instances-containers-0.0.1\

much like we have for local projects, which would be

$projroot\dist-newstyle\build\data-default-instances-containers-0.0.1\

We also don't need to duplicate the names in tmp, e.g. we could use

$projroot\dist-newstyle\tmp\src-9408\data-default-instances-containers-0.0.1\

Anyway, point is we can make these ones no worse than the ones we have already for local inplace packages, which have their per-pkg dist dirs under dist-newstyle.

@gbaz

This comment has been minimized.

Show comment
Hide comment
@gbaz

gbaz Oct 11, 2016

Collaborator

Someone remind me again why we don't just use the windows shortnames api universally on windows boxes to derive proper shortnames for all the paths?

Collaborator

gbaz commented Oct 11, 2016

Someone remind me again why we don't just use the windows shortnames api universally on windows boxes to derive proper shortnames for all the paths?

@dcoutts

This comment has been minimized.

Show comment
Hide comment
@dcoutts

dcoutts Oct 11, 2016

Member

@gbaz that sounds rather tricky. We currently do all our decisions about filepath layouts in pure code, whereas using a shortname api would be in IO.

Oh and the other thing we can do is avoid using absolute paths from the root, and use paths relative to the top of the project / package. That'd buy us a lot of headroom.

Member

dcoutts commented Oct 11, 2016

@gbaz that sounds rather tricky. We currently do all our decisions about filepath layouts in pure code, whereas using a shortname api would be in IO.

Oh and the other thing we can do is avoid using absolute paths from the root, and use paths relative to the top of the project / package. That'd buy us a lot of headroom.

@23Skidoo

This comment has been minimized.

Show comment
Hide comment
@23Skidoo

23Skidoo Oct 12, 2016

Member

One other place where this issue pops up is temp directories for components. For example, when I build the log-test-integration test suite component of the log package, the exe has the following path:

./dist-newstyle/build/x86_64-linux/ghc-8.0.1.20161010/log-0.5.4/c/log-test-integration/build/log-test-integration/log-test-integration

I think that c/log-test-integration/build/log-test-integration/log-test-integration can be safely shortened to c/log-test-integration/build/log-test-integration and maybe even to c/log-test-integration/log-test-integration (since there's already a top-level build dir).

Member

23Skidoo commented Oct 12, 2016

One other place where this issue pops up is temp directories for components. For example, when I build the log-test-integration test suite component of the log package, the exe has the following path:

./dist-newstyle/build/x86_64-linux/ghc-8.0.1.20161010/log-0.5.4/c/log-test-integration/build/log-test-integration/log-test-integration

I think that c/log-test-integration/build/log-test-integration/log-test-integration can be safely shortened to c/log-test-integration/build/log-test-integration and maybe even to c/log-test-integration/log-test-integration (since there's already a top-level build dir).

@ezyang

This comment has been minimized.

Show comment
Hide comment
@ezyang

ezyang Oct 12, 2016

Contributor

@23Skidoo There is a tradeoff here. Per-component build requires a dist dir per component in the package; thus ./dist-newstyle/build/x86_64-linux/ghc-8.0.1.20161010/log-0.5.4/c/log-test-integration. That's fine. But then there was a question of whether or not the dist dir layout should change depending on whether or not per-component build is turned on. This is actually not that easy: the layout is hardcoded in many parts of the build system, and also undoubtedly tools, which expect to be able to get a dist dir, and then find an executable by following the legacy layout.

We could remove the redundancy in the path. But this COMES AT THE COST of the dist dir layout changing if per-component build is enabled, and forcing end-user code to compensate accordingly; AS WELL AS a major refactor to Cabal library to make sure paths are computed in a centralized manner, and ironing out all of the consequent bugs. It's possible we should do it, but that's the situation. When @hvr and I discussed it last, we decided not to do this.

Contributor

ezyang commented Oct 12, 2016

@23Skidoo There is a tradeoff here. Per-component build requires a dist dir per component in the package; thus ./dist-newstyle/build/x86_64-linux/ghc-8.0.1.20161010/log-0.5.4/c/log-test-integration. That's fine. But then there was a question of whether or not the dist dir layout should change depending on whether or not per-component build is turned on. This is actually not that easy: the layout is hardcoded in many parts of the build system, and also undoubtedly tools, which expect to be able to get a dist dir, and then find an executable by following the legacy layout.

We could remove the redundancy in the path. But this COMES AT THE COST of the dist dir layout changing if per-component build is enabled, and forcing end-user code to compensate accordingly; AS WELL AS a major refactor to Cabal library to make sure paths are computed in a centralized manner, and ironing out all of the consequent bugs. It's possible we should do it, but that's the situation. When @hvr and I discussed it last, we decided not to do this.

@23Skidoo

This comment has been minimized.

Show comment
Hide comment
@23Skidoo

23Skidoo Oct 12, 2016

Member

It looks like new-build will break most assumptions about the dist dir layout anyway, so changing the per-component dist dir layout makes sense before the 3.0 release.

Member

23Skidoo commented Oct 12, 2016

It looks like new-build will break most assumptions about the dist dir layout anyway, so changing the per-component dist dir layout makes sense before the 3.0 release.

@ezyang

This comment has been minimized.

Show comment
Hide comment
@ezyang

ezyang Oct 12, 2016

Contributor

@23Skidoo Actually, it doesn't! Certainly finding the dist directory requires some new process (the current understanding seems to be, use dist-newstyle/cache/plan.json to get out the dist directory of a package/component you are building), but once you know where the dist directory is, the layout is completely the same.

Contributor

ezyang commented Oct 12, 2016

@23Skidoo Actually, it doesn't! Certainly finding the dist directory requires some new process (the current understanding seems to be, use dist-newstyle/cache/plan.json to get out the dist directory of a package/component you are building), but once you know where the dist directory is, the layout is completely the same.

@23Skidoo

This comment has been minimized.

Show comment
Hide comment
@23Skidoo

23Skidoo Oct 12, 2016

Member

Certainly finding the dist directory requires some new process (the current understanding seems to be, use dist-newstyle/cache/plan.json to get out the dist directory of a package/component you are building)

I think we're in agreement that existing tools and habits will need updating. That's what I meant by broken assumptions. Also splitting the single dist directory into multiple ones under c/is a highly user-visible change.

Member

23Skidoo commented Oct 12, 2016

Certainly finding the dist directory requires some new process (the current understanding seems to be, use dist-newstyle/cache/plan.json to get out the dist directory of a package/component you are building)

I think we're in agreement that existing tools and habits will need updating. That's what I meant by broken assumptions. Also splitting the single dist directory into multiple ones under c/is a highly user-visible change.

@23Skidoo 23Skidoo added this to the 2.2 milestone Oct 18, 2016

@ezyang

This comment has been minimized.

Show comment
Hide comment
@ezyang

ezyang Mar 4, 2017

Contributor

I don't see why we can't use short path API. Isn't it sufficient to just call the Unicode version of GetShortPathName() prior to passing a path to anything? That will let us take any file path less than 32,767 characters in size and compress it into a short path. We will have to redo all of the IO libraries from base we use, but I kind of wanted to do this anyway to get accurate call stacks.

Contributor

ezyang commented Mar 4, 2017

I don't see why we can't use short path API. Isn't it sufficient to just call the Unicode version of GetShortPathName() prior to passing a path to anything? That will let us take any file path less than 32,767 characters in size and compress it into a short path. We will have to redo all of the IO libraries from base we use, but I kind of wanted to do this anyway to get accurate call stacks.

@ezyang

This comment has been minimized.

Show comment
Hide comment
@ezyang
Contributor

ezyang commented Mar 4, 2017

@Mistuke

This comment has been minimized.

Show comment
Hide comment
@Mistuke

Mistuke Dec 25, 2017

Collaborator

I don't see why we can't use short path API. Isn't it sufficient to just call the Unicode version of GetShortPathName() prior to passing a path to anything?

Because this Is a significant overhead, and for this reason it's often turned off.. It also has a technical limitation that you can only create short paths for existing paths. Because of how they're stored, they need to be associated with an inode.

Collaborator

Mistuke commented Dec 25, 2017

I don't see why we can't use short path API. Isn't it sufficient to just call the Unicode version of GetShortPathName() prior to passing a path to anything?

Because this Is a significant overhead, and for this reason it's often turned off.. It also has a technical limitation that you can only create short paths for existing paths. Because of how they're stored, they need to be associated with an inode.

@Mistuke

This comment has been minimized.

Show comment
Hide comment
@Mistuke

Mistuke May 7, 2018

Collaborator

This is fixed in GHC 8.6, at least as much as we can fix it down stream. The remaining work is upstream. GCC won't be able to take long paths. However any pure Haskell library or function should never have this issue anymore.

cabal compiled with GHC 8.6 will "just work"

Collaborator

Mistuke commented May 7, 2018

This is fixed in GHC 8.6, at least as much as we can fix it down stream. The remaining work is upstream. GCC won't be able to take long paths. However any pure Haskell library or function should never have this issue anymore.

cabal compiled with GHC 8.6 will "just work"

@23Skidoo 23Skidoo closed this May 7, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment