Feature Request: Cache Values #16

AnotherCodeArtist · 2023-04-28T10:39:32Z

Hi Jan!

Thanks for the great work. It already became my favorite Go kernel and I'm using it on a JupyterHub cluster.
One thing, however, would be great: When declaring functions, types or variables they can be re-used over multiple cells. But, if a variable is holding the results of a function call

var lotsOfData = LoadOverTheInternet("https://.....")

and this variable is used later in another cell

%%
processData(lotsOfData)

then not the initial result that was loaded in the previous cell is used, but the function call is executed again.
Is there any chance to cache the data instead of executing the function over and over again?

BTW: If you need a Dockerfile, I already have (although a bit specific, since it is running in our cluster with some custom modifications)

The text was updated successfully, but these errors were encountered:

janpfeifer · 2023-04-28T11:33:22Z

Thanks @AnotherCodeArtist happy to hear that 😃!

Yes, the idea of having something that carry over values to another cell is high on my want list -- it's the item "Library to easily store/retrieve calculated content." in the TODO list.

The thing is, GoNB works by recompiling and re-executing at every cell execution, so it's not that the value can stay alive. One work around that comes to mind is to have a library that quickly serializes/deserializes on demand. So your example would look like:

var lotsOfData = CacheValue(func () Data { return LoadOverTheInternet("https://.....") }, "lotsOfData")

Where:

func CacheValue[T any](fn func() T, key string) T {
...
}

Would try to first read the value T from a cache, keyed by key. And if it doesn't find, it would call fn and save the result in the cache.

So it actually runs the LoadOverTheInternet() only at the first execution, and all other cells will simply reload it. And assuming one is using the notebook, this will be in the computer cache and always in memory for fast access.

For really large blobs of data this may still not be good enough, but then maybe one could memory-map a file with the data in binary formal (e.g; a large array of floats). It would require special memory management, but it's easy to make this manageable.

Any thoughts ? I suppose that's what you meant with caching the data ?

Btw, on the Dockerfile indeed it needs one. @sirliu suggested he/she (?) would do it in #13, but I haven't heard back in a bit. What about we follow up on Dockerfile in that thread ?

cheers

AnotherCodeArtist · 2023-04-28T12:07:26Z

Hi Jan!

Sounds like a good first shot. Would be even cooler if there were a chance to provide some cell magic like %% for the main function so that it all becomes transparent to the user. But conceptually that seems to be the way to go (pun not intended!).

Anyhow, here's a working Dockerfile:

ARG BASE_IMAGE=jupyter/base-notebook
ARG BASE_TAG=python-3.10

FROM ${BASE_IMAGE}:${BASE_TAG}


USER $NB_USER

ENV GOVERSION=1.20

USER root

WORKDIR /root
RUN wget https://dl.google.com/go/go$GOVERSION.linux-amd64.tar.gz && \
	tar -C /usr/local -xzf go$GOVERSION.linux-amd64.tar.gz

RUN apt-get update && apt-get install -y git libtool pkg-config build-essential autoconf automake uuid-dev libzmq3-dev



USER $NB_USER
WORKDIR /home/jovyan

ENV GOROOT=/usr/local/go
ENV GOPATH=/home/jovyan/go
ENV PATH=$PATH:$GOROOT/bin:$GOPATH/bin

# Install GoNB (https://github.com/janpfeifer/gonb)
RUN go install github.com/janpfeifer/gonb@latest && \
    go install golang.org/x/tools/cmd/goimports@latest && \
    go install golang.org/x/tools/gopls@latest && \
    gonb --install

WORKDIR /home/jovyan/work

USER root

Build it with

docker build -t gonb:latest .

Run it with

docker run -p 8888:8888 --rm gonb:latest

janpfeifer · 2023-04-28T13:28:28Z

On the cache: I'm hesitant to create the cell magic -- I'm a big fan of making things explicit, even if requires a bit more typing. Also because the cache system is also useful outside GoNB. So if it can use normal Go language to achieve the same thing, I think it is a plus (one less thing to be learned by the end-user).

Thanks for the Dockerfile! I'll add it this weekend, and generate one in Docker Hub so folks can simply pool from it.

janpfeifer · 2023-04-30T06:57:13Z

Thx again @AnotherCodeArtist . I added a few more things to your initial Dockerfile and pushed it out. Check it out.

Let me see if I can cook a Cache Values library next.

janpfeifer · 2023-04-30T14:07:35Z

I took an initial stab at it, check it out in c9a1f3198096180f63042cd667675ddee8c7f2bc.

I haven't yet created a new release. I'll give it a few days, if you see any issues, or it doesn't work for you let me know.

If everything works I'll create a section in the tutorial about it and the 0.6 release.

AnotherCodeArtist · 2023-05-01T15:07:53Z

Hi Jan!

Just tried to use the new cache in a notebook. Get the following error message when running the cell:

 go: downloading github.com/janpfeifer/gonb v0.5.1
 gonb_110feb6a imports
 	github.com/janpfeifer/gonb/cache: cannot find module providing package github.com/janpfeifer/gonb/cache
 
 exit status 1

My dependencies in the docker image are

RUN go install github.com/janpfeifer/gonb@c9a1f3198096180f63042cd667675ddee8c7f2bc && \
    go install golang.org/x/tools/cmd/goimports@latest && \
    go install golang.org/x/tools/gopls@latest && \
    gonb --install

Nevertheless, it seems that 0.5.1 gets downloaded when running

var phonebook = cache.Cache("my_data", func() *Data { return LoadPhoneBookFile() })

Do I require any additional dependencies/imports?

janpfeifer · 2023-05-02T05:59:22Z

Sorry I probably should have explained. But the gonb/cache package you use in your notebook is not the same that is running the kernel. So you didn't even need to rebuild the docker, what matters is the version your notebook will use.

Try this, in 3 different cells:

Tell Go (for the notebook) to download the cache package at the given version. Notice the !* prefix executes the bash command in the temporary directory where the cells Go program is being executed (there is an example in the tutorial):

!*go get -u github.com/janpfeifer/gonb/cache@c9a1f3198096180f63042cd667675ddee8c7f2bc

Let's define two variables, one cached one not, taking some random value:

import (
    "math/rand"
    "github.com/janpfeifer/gonb/cache"
)

var (
    a = rand.Intn(100)
    b = cache.Cache("b", func() int { return rand.Intn(100) })
)

And then try running a few times:

%%
fmt.Printf("a=%d, b=%d\n", a, b)

AnotherCodeArtist · 2023-05-02T10:05:51Z

Hi Jan!

Works great for my scenario! Just a thought: Would it be possible to use some in-memory database (which needs to be started and controlled by the kernel) for caching or would it have no influence on execution time anyway?

janpfeifer · 2023-05-02T11:22:57Z

Nice, I'm happy it worked.

So about having an in-memory database of sorts to store the cache: the thing is that the OS is pretty good with caching of disk: in most cases (if we are not talking GB of data) interactively working on the notebook everything will be in memory anyway. Another inefficiency of the OS filesystem may be the number of files, if you start having thousands or millions of cached values. In those cases one reasonable option would be packing collections of values into a container, and caching the container instead ?

Also, notice that the cache.New() method allows you to create a cache.Storage in any arbitrary disk. Depending on your set up, you could also create an in-memory filesystem (with tmpfs, see this article), and store it there ?

If none of those work for you, let me know what is your scenario.

The API is flexible to support different type of backends -- one could create a NewInDatabase call that uses a Database as storage, or something like that.

AnotherCodeArtist · 2023-05-03T06:40:23Z

Thanks for the hint with the ramdisk. As I've already pointed out, if there's no significant performance gain it's not worth going through the hassle!

janpfeifer · 2023-05-03T08:37:22Z

Nice. Closing the issue then. Next weekend I'll create a new release, and update the tutorial.

janpfeifer added a commit that referenced this issue Apr 30, 2023

Added Dockerfile #13 and #16.

935b041

janpfeifer mentioned this issue Apr 30, 2023

Hope to provide gonb Docker images! #13

Closed

janpfeifer mentioned this issue Apr 30, 2023

Added gonb/cache library, to allow easy caching of values. #17

Merged

janpfeifer closed this as completed May 3, 2023

cczambrano12 mentioned this issue Jan 15, 2024

Imports, struct and functions can be reused in other cells but variables don't #87

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Cache Values #16

Feature Request: Cache Values #16

AnotherCodeArtist commented Apr 28, 2023

janpfeifer commented Apr 28, 2023

AnotherCodeArtist commented Apr 28, 2023

janpfeifer commented Apr 28, 2023

janpfeifer commented Apr 30, 2023

janpfeifer commented Apr 30, 2023 •

edited

Loading

AnotherCodeArtist commented May 1, 2023

janpfeifer commented May 2, 2023 •

edited

Loading

AnotherCodeArtist commented May 2, 2023

janpfeifer commented May 2, 2023

AnotherCodeArtist commented May 3, 2023

janpfeifer commented May 3, 2023

Feature Request: Cache Values #16

Feature Request: Cache Values #16

Comments

AnotherCodeArtist commented Apr 28, 2023

janpfeifer commented Apr 28, 2023

AnotherCodeArtist commented Apr 28, 2023

janpfeifer commented Apr 28, 2023

janpfeifer commented Apr 30, 2023

janpfeifer commented Apr 30, 2023 • edited Loading

AnotherCodeArtist commented May 1, 2023

janpfeifer commented May 2, 2023 • edited Loading

AnotherCodeArtist commented May 2, 2023

janpfeifer commented May 2, 2023

AnotherCodeArtist commented May 3, 2023

janpfeifer commented May 3, 2023

janpfeifer commented Apr 30, 2023 •

edited

Loading

janpfeifer commented May 2, 2023 •

edited

Loading