Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework data-file handling to be file-embed-like? #6096

Open
gbaz opened this issue Jun 20, 2019 · 7 comments
Open

Rework data-file handling to be file-embed-like? #6096

gbaz opened this issue Jun 20, 2019 · 7 comments

Comments

@gbaz
Copy link
Collaborator

gbaz commented Jun 20, 2019

This is intended to be a discussion thread. I think the whole concept of data-files needs a rethink. v2 commands highlight the issue, since data-files get installed into a store, and we would like the store to be wipeable but data-files sort of vitiate against that. However, even in v1, data-files just go to "some" system location disconnected from the executable, and certainly make redistribution of executables more difficult.

Here is a strawman proposal -- datafiles no longer just get installed into a location with a paths module pointing to it. Instead, they get embedded into a module directly, and to use them, the app necessarily calls a function to either access them directly, or to unpack them into a temp-location so that both the executable and other programs can access them...

(Let's ignore migration issues for now -- if the proposal finds a nice reception, those can be sorted out).

I know that this is semi-externally-possible with e.g. the file-embed package. However, building this uniformly into cabal has a number of advantages, not least that it bypasses the need for TH, etc. and that the embedded files can still be tracked directly in the cabal file. Further, it at least allows the possibility of designing some sort of backward-compat api and gradual migration.

Thoughts?

@Kritzefitz
Copy link

Kritzefitz commented Jun 20, 2019

I think this might be a nice solution that should work for a lot of cases on most (all?) platforms. But I think it might run into practical problems when data files get very large. Dumping a 2GB file to disk before being able to pass it to a process doesn't sound like much fun, when you could have also just installed into a separate file from the beginning.

For distributions like Debian there is also the problem of mixing architecture independent and architecture dependent data in one file. For architecture-dependent files there has to be one package per architecture, but for architecture-independent data files can be placed in an architecture-independent package that works on all architectures. When the architecture-independent data gets bundled into the architecture-dependent executable it becomes impossible the separate it into an architecture-independent package, resulting in a lot of unnecessarily duplicated data across multiple architecture-dependent packages.

I think it would be great if both Cabal install and the cabal-install v2 commands supported methods of installation that feel more native to the target platform.

On Unix (and friends) system-wide things usually get installed across a variety of system directories. The paths required to find installed files are usually hard-compiled into the installed binaries. This is basically how v1 installations worked, but produced problems when running binaries from the dist/ directory during development. I think this could be supported in v2, without brining back old v1 problems, by separating installation builds from development builds That way you can compile binaries that work when being executed from dist-newstyle/ and compile different binaries that only work when they are installed in the appropriate location. I think the biggest hurdle here would be, that not only the executable, but also all libraries that it depends on, would have to be recompiled to get the correct paths. Also I'm not sure how installing shared libraries would work.

On Windows things are usually installed by putting everything into a folder somewhere. The installed binaries can get relative paths to the required files as described in the prefix-independence feature Cabal already has. I think this could even be implemented in a way that partially lifts the restriction, that libraries can't be prefix-independent. When you compile the executables and libraries for your installation separately, as described above, the libraries can assume that their data-files are installed in the same place as the data-files for the executable.

@phadej
Copy link
Collaborator

phadej commented Jun 21, 2019 via email

@hvr
Copy link
Member

hvr commented Jun 21, 2019

@gbaz the idea isn't new, in fact this has been discussed internally a couple times (including how to
have cabal portably (windows, linux, macos, ...) as well as different compiles (ghcjs, eta, ghc, uhc, ...), compile/link blobs into executables); but there's some features of cabal which wouldn't work well with this, such as the private scoped executables feature where private executables can also be associated with a package, and in general you can't expect an operating system to allow you to write those to a temporary location and then execute them.

@23Skidoo
Copy link
Member

23Skidoo commented Jun 21, 2019

Previously: #142, #839 (read the discussion). I think that this a good idea, but don't have time to work on it.

@gbaz
Copy link
Collaborator Author

gbaz commented Jun 21, 2019

@hvr i'm glad its been kicked around before. I wanted to just create a ticket to capture some of the discussion and issues around this. So is the issue that private scoped executables are treated as data files themselves, and the os doesn't allow you to be able to write out and then execute programs? Could you point me to some documentation regarding this?

@fgaz
Copy link
Member

fgaz commented Jan 4, 2020

As I just discovered, data-files are currently unusable with ghcjs.

The browser has no concept of pwd or environment variables, so local builds don't work altogether.

Store-installed packages do work, but the usefulness of ghcjs lies in the redistribution of the built package, which is impossible because datadir is an absolute path and once the package is put on a server the path changes.
Since there's no flag to change the datadir or to make it relative, until this gets resolved the only solution is to manage assets externally.

@phadej
Copy link
Collaborator

phadej commented Jan 4, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants