Rework data-file handling to be file-embed-like? #6096

gbaz · 2019-06-20T19:11:53Z

This is intended to be a discussion thread. I think the whole concept of data-files needs a rethink. v2 commands highlight the issue, since data-files get installed into a store, and we would like the store to be wipeable but data-files sort of vitiate against that. However, even in v1, data-files just go to "some" system location disconnected from the executable, and certainly make redistribution of executables more difficult.

Here is a strawman proposal -- datafiles no longer just get installed into a location with a paths module pointing to it. Instead, they get embedded into a module directly, and to use them, the app necessarily calls a function to either access them directly, or to unpack them into a temp-location so that both the executable and other programs can access them...

(Let's ignore migration issues for now -- if the proposal finds a nice reception, those can be sorted out).

I know that this is semi-externally-possible with e.g. the file-embed package. However, building this uniformly into cabal has a number of advantages, not least that it bypasses the need for TH, etc. and that the embedded files can still be tracked directly in the cabal file. Further, it at least allows the possibility of designing some sort of backward-compat api and gradual migration.

Thoughts?

Kritzefitz · 2019-06-20T20:35:27Z

I think this might be a nice solution that should work for a lot of cases on most (all?) platforms. But I think it might run into practical problems when data files get very large. Dumping a 2GB file to disk before being able to pass it to a process doesn't sound like much fun, when you could have also just installed into a separate file from the beginning.

For distributions like Debian there is also the problem of mixing architecture independent and architecture dependent data in one file. For architecture-dependent files there has to be one package per architecture, but for architecture-independent data files can be placed in an architecture-independent package that works on all architectures. When the architecture-independent data gets bundled into the architecture-dependent executable it becomes impossible the separate it into an architecture-independent package, resulting in a lot of unnecessarily duplicated data across multiple architecture-dependent packages.

I think it would be great if both Cabal install and the cabal-install v2 commands supported methods of installation that feel more native to the target platform.

On Unix (and friends) system-wide things usually get installed across a variety of system directories. The paths required to find installed files are usually hard-compiled into the installed binaries. This is basically how v1 installations worked, but produced problems when running binaries from the dist/ directory during development. I think this could be supported in v2, without brining back old v1 problems, by separating installation builds from development builds That way you can compile binaries that work when being executed from dist-newstyle/ and compile different binaries that only work when they are installed in the appropriate location. I think the biggest hurdle here would be, that not only the executable, but also all libraries that it depends on, would have to be recompiled to get the correct paths. Also I'm not sure how installing shared libraries would work.

On Windows things are usually installed by putting everything into a folder somewhere. The installed binaries can get relative paths to the required files as described in the prefix-independence feature Cabal already has. I think this could even be implemented in a way that partially lifts the restriction, that libraries can't be prefix-independent. When you compile the executables and libraries for your installation separately, as described above, the libraries can assume that their data-files are installed in the same place as the data-files for the executable.

phadej · 2019-06-21T07:31:59Z

Fwiw, zinza branch is first step to refactor Paths_ module generation, so it would be easier to refactor. Something I briefly discussed with Moritz, that it could generate store-relative Paths_module (at least when we know that package is going to be installed into a store). And similarly have few special cases like inplace stuff (removing the need for setting env in v2-run). Anyway, should be easier to do when Paths_ is templated.

…

On 20 Jun 2019, at 23.35, Sven Bartscher ***@***.***> wrote: I think this might be a nice solution that should work for a lot of cases on most (all?) platforms. But I think it might run into practical problems when data files get very large. Dumping a 2GB file to disk before being able to pass it to a process doesn't sound like much fun, when you could have also just installed into a separate file from the beginning. For distributions like Debian there is also the problem of mixing architecture independent and architecture dependent data in one file. For architecture-dependent there has to be one package per architecture, but for architecture-independent data files there needs to be only one package for all architectures. When the architecture-independent data gets bundled into the architecture-dependent executable it becomes impossible the separate it into an architecture-independent package, resulting in a lot of unnecessarily duplicated data across multiple architecture-dependent packages. I think it would be great if both Cabal install and the cabal-install v2 commands supported methods of installation that feel more native to the target platform. On Unix (and friends) system-wide things usually get installed across a variety of system directories. The paths required to find installed files are usually hard-compiled into the installed binaries. This is basically how v1 installations worked, but produced problems when running binaries from the dist/ directory during development. I think this could be supported in v2, without brining back old v1 problems, by separating builds for development and installation. That way you can compile binaries that work when being executed from dist-newstyle/ and compile different binaries that only work when they are installed in the appropriate location. I think the biggest hurdle here would be, that not only the executable, but also all libraries that it depends on, would have to be recompiled to get the correct paths. Also I'm not sure how installing shared libraries would work. On Windows things are usually installed by putting everything into a folder somewhere. The installed binaries can get relative paths to the required files as described in the prefix-independence feature Cabal already has. I think this could even be implemented in a way that partially lifts the restriction, that libraries can't be prefix-independent. When you compile the executables and libraries for your installation separately, as described above, the libraries can assume that their data-files are installed in the same place as the data-files for the executable. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

hvr · 2019-06-21T07:55:34Z

@gbaz the idea isn't new, in fact this has been discussed internally a couple times (including how to
have cabal portably (windows, linux, macos, ...) as well as different compiles (ghcjs, eta, ghc, uhc, ...), compile/link blobs into executables); but there's some features of cabal which wouldn't work well with this, such as the private scoped executables feature where private executables can also be associated with a package, and in general you can't expect an operating system to allow you to write those to a temporary location and then execute them.

23Skidoo · 2019-06-21T21:58:44Z

Previously: #142, #839 (read the discussion). I think that this a good idea, but don't have time to work on it.

gbaz · 2019-06-21T23:04:34Z

@hvr i'm glad its been kicked around before. I wanted to just create a ticket to capture some of the discussion and issues around this. So is the issue that private scoped executables are treated as data files themselves, and the os doesn't allow you to be able to write out and then execute programs? Could you point me to some documentation regarding this?

fgaz · 2020-01-04T13:14:22Z

As I just discovered, data-files are currently unusable with ghcjs.

The browser has no concept of pwd or environment variables, so local builds don't work altogether.

Store-installed packages do work, but the usefulness of ghcjs lies in the redistribution of the built package, which is impossible because datadir is an absolute path and once the package is put on a server the path changes.
Since there's no flag to change the datadir or to make it relative, until this gets resolved the only solution is to manage assets externally.

phadej · 2020-01-04T14:05:52Z

IIRC there was discussion of doing file-embed-like thing, using linker object file + generated module. On mobile, so cannot look up if it is in some existing issue. The linker + generated module generalises to ghcjs too The reasoning behind using linker objects is efficiency, not making GHC parse huge literals for nothing.

…

On 4. Jan 2020, at 15.14, Francesco Gazzetta ***@***.***> wrote: As I just discovered, data-files are currently unusable with ghcjs. The browser has no concept of pwd or environment variables, so local builds don't work altogether. Store-installed packages do work, but the usefulness of ghcjs lies in the redistribution of the built package, which is impossible because datadir is an absolute path and once the package is put on a server the path changes. Since there's no flag to change the datadir or to make it relative, currently the only solution is to manage assets externally. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

gbaz added the type: enhancement label Jun 20, 2019

phadej mentioned this issue Feb 19, 2020

data-files of relocatable libraries are inaccessible in template Haskell #6549

Open

phadej mentioned this issue Jul 7, 2020

Implementation Plan for Windows Resource Files (#142) #6939

Open

gbaz added the type: discussion label Aug 28, 2021

Mikolaj added the data-files label Jul 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework data-file handling to be file-embed-like? #6096

Rework data-file handling to be file-embed-like? #6096

gbaz commented Jun 20, 2019

Kritzefitz commented Jun 20, 2019 •

edited

Loading

phadej commented Jun 21, 2019 via email

hvr commented Jun 21, 2019

23Skidoo commented Jun 21, 2019 •

edited

Loading

gbaz commented Jun 21, 2019

fgaz commented Jan 4, 2020 •

edited

Loading

phadej commented Jan 4, 2020 via email

Rework data-file handling to be file-embed-like? #6096

Rework data-file handling to be file-embed-like? #6096

Comments

gbaz commented Jun 20, 2019

Kritzefitz commented Jun 20, 2019 • edited Loading

phadej commented Jun 21, 2019 via email

hvr commented Jun 21, 2019

23Skidoo commented Jun 21, 2019 • edited Loading

gbaz commented Jun 21, 2019

fgaz commented Jan 4, 2020 • edited Loading

phadej commented Jan 4, 2020 via email

Kritzefitz commented Jun 20, 2019 •

edited

Loading

23Skidoo commented Jun 21, 2019 •

edited

Loading

fgaz commented Jan 4, 2020 •

edited

Loading