Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neither __SOURCE_DIRECTORY__ nor Paket.Packages are unique for each notebook in a docker container #112

Open
dsyme opened this issue Dec 3, 2016 · 13 comments

Comments

@dsyme
Copy link
Contributor

dsyme commented Dec 3, 2016

As mentioned in #106, __SOURCE_DIRECTORY__ is the same /home/nbuser for each notebook process in a docker container. Likewise the System.Environment.CurrentDirectory for each notebook is the same.

Also, the directory used for nuget packages is not unique. This would mean that different notebooks may get different nuget package versions, and may alter the paket.dependencies in conflicting ways.

Both can easily lead to conflicting use of the file system from different notebooks if the current directory is used to store and resolve nuget packages, for example, depending on the technique used to get nuget packages.

@dsyme dsyme changed the title __SOURCE_DIRECTORY__ is not unique for each notebook in a docker container Neither __SOURCE_DIRECTORY__ nor Paket.Packages are unique for each notebook in a docker container Dec 3, 2016
@cgravill
Copy link
Member

cgravill commented Dec 3, 2016

With Paket, if you have two notebooks with different versions it's going to cause churn but they seem to have coexisted.

We discussed moving to a model with a unique packages folder per notebook but that may require quite a lot of duplication of binaries. I'm usually of the opinion that disc space is cheap but plotting/FsLab is a lot for every experimental notebook.

@cgravill
Copy link
Member

cgravill commented Dec 3, 2016

Is this causing an issue when you switch between two notebooks?

@dsyme
Copy link
Contributor Author

dsyme commented Dec 3, 2016

Is this causing an issue when you switch between two notebooks?

It causes a problem when using Paket.Dependencies.Install which is non-additive. I did notice that for sure. Also the results of the generation of "load" files from one notebook was picked up by another, so that's an issue to consider.

It would also cause a problem with Paket.Package if versions or resolutions are different. But that's a much less severe problem.

@dsyme
Copy link
Contributor Author

dsyme commented Dec 3, 2016

@cgravill On Azure notebooks I find myself with 5-6 similar duplicated notebooks all with slight variations on package lists

Maybe paket uses a package cache (for downloads) which is independent of the "packages" directory. So that could be shared, but the individual notebooks gets isolated "packages" directories. But yes, the cost could certainly be high in disk space. It's an issue for any F# scripting model that acquires packages, to be honest - how isolated should they be?

@cgravill
Copy link
Member

To put some figures on the packages directories:

nuget FsLab = 1.0.2
373 MB (391,801,535 bytes)

framework: net451
nuget FsLab = 1.0.2

115 MB (121,290,223 bytes)

framework: net462
nuget FsLab = 1.0.2

115 MB (121,290,223 bytes)

nuget XPlot.Plotly = 1.4.2
262 MB (274,837,915 bytes)

framework: net451
nuget XPlot.Plotly = 1.4.2

9.32 MB (9,778,255 bytes)

framework: net462
nuget XPlot.Plotly = 1.4.2

9.32 MB (9,778,255 bytes)

On Windows 10, Paket version 3.31.2 (can provide .lock file needed)

@cgravill
Copy link
Member

cgravill commented Dec 19, 2016

My ideal is a machine-wise immutable store of downloaded packages. Perhaps we should use: https://fsprojects.github.io/Paket/nuget-dependencies.html#Putting-the-version-no-into-the-path though it'll cause annoyance around knowing the version you've got in a given notebook. Perhaps this combined with generating referencing scripts.

This still doesn't solve conflicting changes to the dependency file however.

@sylvanc
Copy link
Collaborator

sylvanc commented Jan 5, 2017

Each notebook needs a logically separate paket.dependencies. This could be represented as a group or as a separate dependencies file. This is true whether or not version numbers go on the directories, to prevent version resolution from churning/conflicting, and to allow reference scripts to work.

As a result, I'm afraid putting version numbers on doesn't help at all.

Unfortunately, when using groups or separate paket.dependencies, the packages are duplicated N + 1 times: once in the cache, and once for each notebook.

The core issue I think is whether a user's notebooks represent a single project or a collection of independent projects. When they represent a single project, then having a single paket.dependencies, and a single reference script, is not problematic: the notebooks logically share dependencies, and this saves (significantly) on disk space. @cgravill has users whose use case is like this, where their collection of notebooks (possibly many dozens of them) share on the order of 150 - 200 MB of packages (based on the number above).

One possible solution is to use the notebook's directory, so that notebooks in a single directory share a paket.dependencies, but other directories or sub-directories do not. This would provide an "organising principle", but I'm afraid it is too subtle, and would leave users confused on both sides: some wondering why notebooks in the same directory have conflicting packages and others wondering why their directory tree of notebooks don't share a paket.dependencies.

@sylvanc
Copy link
Collaborator

sylvanc commented Jan 5, 2017

As a side note: in an IPython notebook, dependencies are installed using pip, and they are installed globally. In other words, effectively the same as the system that's currently in place for IFSharp. That doesn't mean it's a good system, it just means I was hoping IPython had a better approach but it doesn't appear to :)

@cgravill
Copy link
Member

Yes a logically separate expression of dependencies seems the right approach. The Dependencies.locate(dir) is convenient but we could directly construct the Dependencies against say .paket.dependencies. We'd also need to figure out something similar for the .lock to ensure it's stable and saved per notebook.

I haven't tested that but I think it would let us get stable dependencies per notebook. However, it would lead to churn on the packages e.g. the packages/Newtonsoft.Json directory would constantly switch between versions causing IO. What I had in mind with adding the version number is that this could be prevented. However, we might run into issues with Paket cleaning up references: https://fsprojects.github.io/Paket/reference/paket-garbagecollection.html

If we can resolve them and combined with the auto-generation of referencing scripts from #121 it would shield the the notebook from the noisy changes happening underneath.

I haven't tried any of the above properly yet so it may run into issues but hope it's helpful. If it works it would give one copy of referenced dlls, minimise IO, and give consistent stable dependencies.

@cgravill
Copy link
Member

The newer storage:none mechanism of Paket would be a nice way avoid the IO noise while keeping everything safe: https://fsprojects.github.io/Paket/dependencies-file.html#Disable-packages-folder and pretty much the ideal I hoped for! While it's still marked as beta I've used it elsewhere particularly on netcore projects.

@cgravill
Copy link
Member

I've done some experiments with the storage:none and it works well. There's a snag in that it makes any native dependencies awkward to load, which are often used in a notebook scenario.

There's also a planned extension to #r which would make this much better:
dotnet/fsharp#5850

@matthid
Copy link
Member

matthid commented Jun 22, 2019

There's a snag in that it makes any native dependencies awkward to load

@cgravill I stumbled over this issue. If you mean with "native dependencies" the unmanaged stuff, then there is also an API in paket to do this just fine. You can take a look at this commit where I added support in FAKE for it.

@cgravill
Copy link
Member

cgravill commented Jun 24, 2019

Yes, unmanaged platform specific libraries.

That's very interesting @matthid and your corresponding Paket change fsprojects/Paket#3593 is great, and would mean loading dependencies would be much easier even without storage:none.

One of the routes we use for loading dependencies is via the general purpose load scripts #load @".paket/load/main.group.fsx" there's an existing issue out on adding native dependencies to that: fsprojects/Paket#3222

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants