Proxy Cache Fill From Olympus #571

marwan-at-work · 2018-08-26T14:57:01Z

Issue #495 made it so that we can redirect to Olympus. We should also have the option to save an Olympus module into the proxy storage backend.

This should configurable.
This should weigh out synchronous vs asynchronous options based on clean code and performance.
This should either be done through a backend passed to the middleware, or an Olympus implementation of the download.Protocol

marpio · 2018-08-26T18:20:14Z

Ref #474

fedepaol · 2018-08-28T19:36:39Z

In case of storage miss currently the handlers fetch from the vcs synchronously and fill the storage

Let's put down some possible approaches:

We let the middleware check the storage and do the redirect in case of miss

In case of hit we let the handlers serve the data from the storage.
I don't like this approach because the logic is split between the middleware and the handlers.
It works relying on the fact that the handler will always found the storage full because the middleware checked it for them.

We let the handlers check the storage, redirect and trigger the asynchronous cache filler

This has the advantage of having all the logic in the handlers and being more readable, it's probably faster than the worst case scenario of double miss (if the olympus storage is empty as well the whole operation would take 2x filling the storage and serving the data).

We let the handlers check the storage, download & fill the cache from olympus synchronously

Again, all the code is in the handler. We'll have to have another download protocol (different from the one that serves from olympus) but probably the worst case scenario would be slower. We should take into account that filling the cache by the proxy generally would faster since the proxy is intended to work in proximity. Another advantage of this solution is that olympus will act as source of truth even for the first fetch.

I'm torn between the last two, in any case the handlers will need to be specialized between the proxy and olympus and / or the public / private flag.

Would love to hear your thoughts before I start messing up with the code.

ghost · 2018-08-28T21:10:21Z

I'm torn between the same two. My first instinct is to go for the second option:

We let the handlers check the storage, redirect and trigger the asynchronous cache filler

But I also really like the idea of Olympus being the source of truth from the first fetch.

marwan-at-work · 2018-08-29T00:58:42Z

Haven't fully read the response yet, but my initial thought is this:

Have the downloadProtocol be able to cache Filling just like it has before. It does it currently from go get, but it can also do it trivially from Olympus.
Have the handlers check if err is KindX (KindNotFound, or KindRedirect), and then redirect to Olympus.

My current open PR (573) does a few changes in that we can pass a "Stasher" interface to the download Protocol and so it might make it even easier to have an Olympus Stasher (just one method implementation) instead of an Olympus download.Protocol (5 method implementation)

Will read more about your suggestions and update on my thoughts when I get the time :)

michalpristas · 2018-08-29T05:48:29Z

I'm on the edge if this is dup of #474
I would go with 3 up to the point we find out that it won't perform well, then we know how to fix it immediately by switching to 2

fedepaol · 2018-08-29T07:02:18Z

Ok then, I'll try to go for 3 trying to leverage @marwan-at-work PR for the stasher interface

marpio · 2018-08-29T08:37:51Z

@fedepaol sorry for the late response.
I also think we should go with the third option. If performance will be a problem we can switch to option two or apply an optimization we already considered - writing to the storage and response at the same time.
Also, I think we don't have to implement a whole new download protocol but just swap out the module.Fetcher in the existing one:

athens/pkg/download/goget/goget.go

Line 28 in 7af04ff

mf, err := module.NewGoGetFetcher(goBin, fs)

We could use this:
https://github.com/gomods/athens/blob/master/pkg/module/download.go
Let me know what you think.

fedepaol · 2018-08-29T09:05:58Z

Thanks @marpio for your opinion! I will try to collect all the pieces and start working.

marwan-at-work · 2018-08-29T14:51:02Z

@marpio If I understood you correctly, I don't think we can swap the Olympus with the GoGetFetcher because how else is the download.Protocol gonna get private modules? See my comment for a suggestion where we can have the Stasher be able to stash from an upstream (olympus) or local (goget)

marpio · 2018-08-29T15:41:01Z

@marwan-at-work my idea was to make it possible to provide a module.Fetcher to the download protocol so we could pass the GoGetFetcher to fetch private modules and let say OlympusFetcher for public mods.

fedepaol · 2018-08-29T15:50:11Z

The download protocol would need to know if the module is public / private and have both the fetchers / stasher + fetcher .
@marwan-at-work you mentioned redirect but given that we fetch from olympus it's not needed, right?

marpio · 2018-08-29T21:41:34Z

@fedepaol i don't intend to put both module.Fetchers into the same protocol. More like letting the caller of goget.New (where goget wouldn't be the right name anymore) pass an appropriate one. So something like this:
func New(mf module.Fetcher) (download.Protocol, error)
so we can do
gogetProtocol := New(module.NewGoGetFetcher(goBin, fs))
and
olympusProtocol := New(module.NewOlympusFetcher(goBin, fs))
The rest of the code inside pkg/download/goget/goget.go would stay the same.
Implementing the Olympus Stasher doesn't seem consistent to me - we don't have Goget Stasher. Why would we treat getting the .mod, .zip, .info bits from Olympus, differently that getting them from disk using go mod download? Olympus is just another source, just like the files on disk.

michalpristas · 2018-08-30T07:08:57Z

I would like to avoid using go to download bits from Olympus (in proxy). It is not an effective way to download modules. If we could use API of each VCS we would.

We have both of them under control, Olympus provides a deterministic API, we can parallelize downloads of 3 pieces, without any additional go magic. Let's treat it as a network disk, which it is.
We can treat zip as a stream and pipe to disk or even DB maybe directly to lower memory consumption. But this is an implementation detail.

marpio · 2018-08-30T10:00:35Z

I agree 💯 percent with you @michalpristas - we shouldn't use go to download bits from Olympus. This is also not what I had in mind.
If you look at https://github.com/gomods/athens/blob/master/pkg/download/goget/goget.go the only place where it interacts with go binary is here:

athens/pkg/download/goget/goget.go

Lines 150 to 159 in e2cd124

    
           ref, err := gg.fetcher.Fetch(mod, ver) 
        
           if err != nil { 
        
           	return nil, errors.E(op, err) 
        
           } 
        
           v, err := ref.Read() 
        
           if err != nil { 
        
           	return nil, errors.E(op, err) 
        
           } 
        
           return v, nil

It does it through the module.Fetcher.
The implementation of the module.Fetcher used is here:
https://github.com/gomods/athens/blob/master/pkg/module/go_get_fetcher.go
This is the code which executes the go binary and provides module.Reference to the files on disk.
What if we would provide another module.Fetcher, let say - pkg/module/olympus_fetcher.go which would return module.olympusRef instead of module.diskRef (https://github.com/gomods/athens/blob/master/pkg/module/disk_ref.go)?
The implementation of the Read() function of the module.olympusRef would return module.Version using https://github.com/gomods/athens/blob/master/pkg/module/download.go

michalpristas · 2018-08-30T10:13:13Z

ok i get it now, Olympus implementation of Ref interface.
This actually might work.
Ref says Read reads modules into memory, so this is a limitation and avoids using streams for saving Olympus -> DB. Which might not even work.
But as a first step, I think it good enough. olympusGoget passed into a filtering middleware which currently redirects to Olympus in case of Include.
But with stasher in place we will need to figure out how to share workers for both of them, Olympus and vcs pulls will be equally resource heavy.

marpio · 2018-08-30T11:26:08Z

Ref says Read reads modules into memory, so this is a limitation and avoids using streams for saving Olympus -> DB. Which might not even work.

The only thing that Ref.Read() loads into memory is .mod and .info. The .zip is passed as a stream from disk/olympus.

athens/pkg/module/disk_ref.go

Lines 99 to 107 in e2cd124

    
           sourceFile, err := d.fs.Open(filepath.Join(packagePath, fmt.Sprintf("%s.zip", d.version))) 
        
           if err != nil { 
        
           	return nil, errors.E(op, err) 
        
           } 
        
           // note: don't close sourceFile here so that the caller can read directly from disk. 
        
           // 
        
           // if we close, then the caller will panic, and the alternative to make this work is 
        
           // that we read into memory and return an io.ReadCloser that reads out of memory 
        
           ver.Zip = &zipReadCloser{sourceFile, d}

michalpristas · 2018-08-30T12:34:21Z

Discussed on Slack.
We can pass resp.Body as a ver.Zip.
watch out for Close
as work is limited by memory it will not allow handling so many jobs that will exhaust sockets.
Once we improve on memory, we need to keep an eye on sockets.

fedepaol · 2018-09-02T14:14:55Z

Discussed on slack, leaving this open in order to focus the work on the proxy.

arschles · 2018-10-11T21:24:40Z

As of #772, we're not going to try and build a registry for the time being, so I'm closing this issue

marwan-at-work assigned fedepaol Aug 26, 2018

ghost added the on hold Issues and pull requests that shouldn't be closed yet, but shouldn't be worked on yet either label Sep 2, 2018

arschles closed this as completed Oct 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proxy Cache Fill From Olympus #571

Proxy Cache Fill From Olympus #571

marwan-at-work commented Aug 26, 2018 •

edited by marpio

Loading

marpio commented Aug 26, 2018

fedepaol commented Aug 28, 2018 •

edited

Loading

ghost commented Aug 28, 2018

marwan-at-work commented Aug 29, 2018 •

edited

Loading

michalpristas commented Aug 29, 2018

fedepaol commented Aug 29, 2018

marpio commented Aug 29, 2018 •

edited

Loading

fedepaol commented Aug 29, 2018

marwan-at-work commented Aug 29, 2018

marpio commented Aug 29, 2018

fedepaol commented Aug 29, 2018

marpio commented Aug 29, 2018

michalpristas commented Aug 30, 2018

marpio commented Aug 30, 2018

michalpristas commented Aug 30, 2018

marpio commented Aug 30, 2018

michalpristas commented Aug 30, 2018

fedepaol commented Sep 2, 2018

arschles commented Oct 11, 2018

Proxy Cache Fill From Olympus #571

Proxy Cache Fill From Olympus #571

Comments

marwan-at-work commented Aug 26, 2018 • edited by marpio Loading

marpio commented Aug 26, 2018

fedepaol commented Aug 28, 2018 • edited Loading

We let the middleware check the storage and do the redirect in case of miss

We let the handlers check the storage, redirect and trigger the asynchronous cache filler

We let the handlers check the storage, download & fill the cache from olympus synchronously

ghost commented Aug 28, 2018

marwan-at-work commented Aug 29, 2018 • edited Loading

michalpristas commented Aug 29, 2018

fedepaol commented Aug 29, 2018

marpio commented Aug 29, 2018 • edited Loading

fedepaol commented Aug 29, 2018

marwan-at-work commented Aug 29, 2018

marpio commented Aug 29, 2018

fedepaol commented Aug 29, 2018

marpio commented Aug 29, 2018

michalpristas commented Aug 30, 2018

marpio commented Aug 30, 2018

michalpristas commented Aug 30, 2018

marpio commented Aug 30, 2018

michalpristas commented Aug 30, 2018

fedepaol commented Sep 2, 2018

arschles commented Oct 11, 2018

marwan-at-work commented Aug 26, 2018 •

edited by marpio

Loading

fedepaol commented Aug 28, 2018 •

edited

Loading

marwan-at-work commented Aug 29, 2018 •

edited

Loading

marpio commented Aug 29, 2018 •

edited

Loading