-
Notifications
You must be signed in to change notification settings - Fork 697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework data-file handling to be file-embed-like? #6096
Comments
I think this might be a nice solution that should work for a lot of cases on most (all?) platforms. But I think it might run into practical problems when data files get very large. Dumping a 2GB file to disk before being able to pass it to a process doesn't sound like much fun, when you could have also just installed into a separate file from the beginning. For distributions like Debian there is also the problem of mixing architecture independent and architecture dependent data in one file. For architecture-dependent files there has to be one package per architecture, but for architecture-independent data files can be placed in an architecture-independent package that works on all architectures. When the architecture-independent data gets bundled into the architecture-dependent executable it becomes impossible the separate it into an architecture-independent package, resulting in a lot of unnecessarily duplicated data across multiple architecture-dependent packages. I think it would be great if both Cabal install and the cabal-install v2 commands supported methods of installation that feel more native to the target platform. On Unix (and friends) system-wide things usually get installed across a variety of system directories. The paths required to find installed files are usually hard-compiled into the installed binaries. This is basically how v1 installations worked, but produced problems when running binaries from the On Windows things are usually installed by putting everything into a folder somewhere. The installed binaries can get relative paths to the required files as described in the prefix-independence feature Cabal already has. I think this could even be implemented in a way that partially lifts the restriction, that libraries can't be prefix-independent. When you compile the executables and libraries for your installation separately, as described above, the libraries can assume that their data-files are installed in the same place as the data-files for the executable. |
Fwiw, zinza branch is first step to refactor Paths_ module generation, so it would be easier to refactor.
Something I briefly discussed with Moritz, that it could generate store-relative Paths_module (at least when we know that package is going to be installed into a store). And similarly have few special cases like inplace stuff (removing the need for setting env in v2-run).
Anyway, should be easier to do when Paths_ is templated.
… On 20 Jun 2019, at 23.35, Sven Bartscher ***@***.***> wrote:
I think this might be a nice solution that should work for a lot of cases on most (all?) platforms. But I think it might run into practical problems when data files get very large. Dumping a 2GB file to disk before being able to pass it to a process doesn't sound like much fun, when you could have also just installed into a separate file from the beginning.
For distributions like Debian there is also the problem of mixing architecture independent and architecture dependent data in one file. For architecture-dependent there has to be one package per architecture, but for architecture-independent data files there needs to be only one package for all architectures. When the architecture-independent data gets bundled into the architecture-dependent executable it becomes impossible the separate it into an architecture-independent package, resulting in a lot of unnecessarily duplicated data across multiple architecture-dependent packages.
I think it would be great if both Cabal install and the cabal-install v2 commands supported methods of installation that feel more native to the target platform.
On Unix (and friends) system-wide things usually get installed across a variety of system directories. The paths required to find installed files are usually hard-compiled into the installed binaries. This is basically how v1 installations worked, but produced problems when running binaries from the dist/ directory during development. I think this could be supported in v2, without brining back old v1 problems, by separating builds for development and installation. That way you can compile binaries that work when being executed from dist-newstyle/ and compile different binaries that only work when they are installed in the appropriate location. I think the biggest hurdle here would be, that not only the executable, but also all libraries that it depends on, would have to be recompiled to get the correct paths. Also I'm not sure how installing shared libraries would work.
On Windows things are usually installed by putting everything into a folder somewhere. The installed binaries can get relative paths to the required files as described in the prefix-independence feature Cabal already has. I think this could even be implemented in a way that partially lifts the restriction, that libraries can't be prefix-independent. When you compile the executables and libraries for your installation separately, as described above, the libraries can assume that their data-files are installed in the same place as the data-files for the executable.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@gbaz the idea isn't new, in fact this has been discussed internally a couple times (including how to |
@hvr i'm glad its been kicked around before. I wanted to just create a ticket to capture some of the discussion and issues around this. So is the issue that private scoped executables are treated as data files themselves, and the os doesn't allow you to be able to write out and then execute programs? Could you point me to some documentation regarding this? |
As I just discovered, data-files are currently unusable with ghcjs. The browser has no concept of pwd or environment variables, so local builds don't work altogether. Store-installed packages do work, but the usefulness of ghcjs lies in the redistribution of the built package, which is impossible because datadir is an absolute path and once the package is put on a server the path changes. |
IIRC there was discussion of doing file-embed-like thing, using linker object file + generated module. On mobile, so cannot look up if it is in some existing issue.
The linker + generated module generalises to ghcjs too
The reasoning behind using linker objects is efficiency, not making GHC parse huge literals for nothing.
… On 4. Jan 2020, at 15.14, Francesco Gazzetta ***@***.***> wrote:
As I just discovered, data-files are currently unusable with ghcjs.
The browser has no concept of pwd or environment variables, so local builds don't work altogether.
Store-installed packages do work, but the usefulness of ghcjs lies in the redistribution of the built package, which is impossible because datadir is an absolute path and once the package is put on a server the path changes.
Since there's no flag to change the datadir or to make it relative, currently the only solution is to manage assets externally.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
This is intended to be a discussion thread. I think the whole concept of data-files needs a rethink. v2 commands highlight the issue, since data-files get installed into a store, and we would like the store to be wipeable but data-files sort of vitiate against that. However, even in v1, data-files just go to "some" system location disconnected from the executable, and certainly make redistribution of executables more difficult.
Here is a strawman proposal -- datafiles no longer just get installed into a location with a paths module pointing to it. Instead, they get embedded into a module directly, and to use them, the app necessarily calls a function to either access them directly, or to unpack them into a temp-location so that both the executable and other programs can access them...
(Let's ignore migration issues for now -- if the proposal finds a nice reception, those can be sorted out).
I know that this is semi-externally-possible with e.g. the file-embed package. However, building this uniformly into cabal has a number of advantages, not least that it bypasses the need for TH, etc. and that the embedded files can still be tracked directly in the cabal file. Further, it at least allows the possibility of designing some sort of backward-compat api and gradual migration.
Thoughts?
The text was updated successfully, but these errors were encountered: