Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http/cookiejar: add way to access all cookies in CookieJar #17587

Open
Xeoncross opened this issue Oct 25, 2016 · 12 comments

Comments

Projects
None yet
10 participants
@Xeoncross
Copy link

commented Oct 25, 2016

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

1.6.3

What operating system and processor architecture are you using (go env)?

darwin / amd64

What did you do?

Tried to access all cookies stored in a http.CookieJar.

What did you expect to see?

A Function for accessing the []*http.Cookie data. The cookiejar struct is missing a way to access all the cookie structs saved in it. You have to know all the matching urls (as url.URL structs) in order to pull cookies out of it.

What did you see instead?

Only a method for reading cookies from pre-known url.URL structs.

@bradfitz bradfitz changed the title Access http.CookieJar.AllCookies()? net/http/cookiejar: add way to access all cookies in CookieJar Oct 25, 2016

@bradfitz bradfitz added this to the Go1.9Maybe milestone Oct 25, 2016

@GuySa

This comment has been minimized.

Copy link

commented Oct 27, 2016

What will be a good way to implement such a function?

I thought about a generator-style function that returns a channel to a custom type containing the key, subkey and the entry, so you could call something like:
for currentEntry := jar.AllCookies() { ... }

I'd gladly do it, if you guys think my solution is good.

By the way, what is an actual use case for this function?
Isn't it risky, considering same-origin policy?

@Xeoncross

This comment has been minimized.

Copy link
Author

commented Oct 27, 2016

The use case in general (from looking around on the web and even a couple github projects) is to persist the cookiejar for later reuse. Some people (like me) want to restart their services while keeping state.

I am currently manually keeping up a second copy of everything in the cookiejar (by checking the http client response each time), but since cookiejar is already doing this it is a waste of resources.

The same-origin policy was added because of the need for sandboxing Javsacript in the browser and requests to different, third-party servers. Unknown/unauthorized code isn't really a problem in most Go applications.

@GuySa

This comment has been minimized.

Copy link

commented Oct 28, 2016

Is an opposite function, that accepts all the cookies and puts them in the jar also needed?
How do you put all the saved cookies in the cookie jar?

@mdlayher

This comment has been minimized.

Copy link
Member

commented Oct 28, 2016

I'd be happy to give this a shot.

@Xeoncross

This comment has been minimized.

Copy link
Author

commented Oct 29, 2016

While nice to have, a function to set all the cookies back on the cookie jar isn't required since you can just range over the map/struct and call cookiejar.SetCookies(u *url.URL, cookies []*http.Cookie).

The main thing is simply getting a JSON/map/struct dump of the cookies inside the cookiejar so it can be persisted.

@dprime

This comment has been minimized.

Copy link

commented Nov 18, 2016

I would like this in the stdlib, too. For now, since CookieJar is an interface, I just wrapped it and keep track of stuff.

import (
    "net/url"
    "net/http/cookiejar"
    "net/http"
    "sync"
)
/*
Implements the normal http cookie jar interface but also usefully
allows you to dump all the stored cookies without
having to know any of the domains involved, which helps a lot
 */
func NewExportableCookieJar() http.CookieJar {
    realJar, _ := cookiejar.New(nil)

    e := &ExportableCookieJar{
        jar: realJar,
        allCookies: make(map[url.URL][]*http.Cookie),
    }

    return e

}

func (jar *ExportableCookieJar) SetCookies(u *url.URL, cookies []*http.Cookie) {
    jar.Lock()
    defer jar.Unlock()
    jar.allCookies[*u] = cookies
    jar.jar.SetCookies(u, cookies)
}
func (jar *ExportableCookieJar) Cookies(u *url.URL) []*http.Cookie {
    return jar.jar.Cookies(u)
}

func (jar *ExportableCookieJar) ExportAllCookies() map[url.URL][]*http.Cookie {
    jar.RLock()
    defer jar.RUnlock()

    copied := make(map[url.URL][]*http.Cookie)
    for u, c := range jar.allCookies {
        copied[u] = c
    }

    return copied
}

type ExportableCookieJar struct {
    jar *cookiejar.Jar
    allCookies map[url.URL][]*http.Cookie
    sync.RWMutex
}
@sgen

This comment has been minimized.

Copy link

commented Nov 18, 2016

The function should return an array instead of a channel. Channels are far to heavy of an iterable, and I cannot see any need for a channel specifically.

@dougnukem

This comment has been minimized.

Copy link

commented Feb 19, 2017

@dprime A couple questions on that proposed ExportableCookieJar:

  • Storing all url.URL requests could become inefficient because it could essentially mean storing ALL unique request URL's even though in terms of Cookies rules they may
    • https://domain.com/abc/1 - cookies may just have path /abc but it stores the full URL
    • https://domain.com/abc/2
    • https://domain.com/abc/3
  • Storing allCookies map[url.URL][]*http.Cookie how would you go about restoring a cookie Jar
    • I think you need to preserve the order of key insertion into that map (e.g. need it to be a sorted map), otherwise future past/future requests that modify the same Cookies wouldn't get properly inserted if a user tried to restore from that exported map[url.URL][]*http.Cookie
  • Minor bug but I think you need to jar.RLock() in the Cookies even just for reading those values

From what I can tell rummaging around the code, it'd be ideal if there were a way to serialize the entries tree that is built up in the standard libraries cookie jar.

type entries map[string]map[string]entry


// entry is the internal representation of a cookie.
//
// This struct type is not used outside of this package per se, but the exported
// fields are those of RFC 6265.
type entry struct {
	Name       string
	Value      string
	Domain     string
	Path       string
	Secure     bool
	HttpOnly   bool
	Persistent bool
	HostOnly   bool
	Expires    time.Time
	Creation   time.Time
	LastAccess time.Time

	// seqNum is a sequence number so that Cookies returns cookies in a
	// deterministic order, even for cookies that have equal Path length and
	// equal Creation time. This simplifies testing.
	seqNum uint64
}

As this data-structure is what preserves and provides the proper lookup of Cookies associated with a URL request in an efficient manner.

In that case maybe the ExportableCookiesJar just needs to duplicate the logic for building the entries (but then it just seems you might as well fork the implementation or provide a public method to export Entries or something)

@vdobler

This comment has been minimized.

Copy link
Contributor

commented Feb 27, 2017

This stuff might be a bit more tricky than one would expect.

The issue is named "add way to access all cookies in CookieJar" but accessing the content is only the first half of what a new API would need to provide: After accessing all stored cookies one may serialise them to disk and probably wants to recreate a Jar from such a serialisation. So we not only need a way to access all cookies (including all attributes) but also a way to refill the Jar. This cannot be done through Jar.SetCookies (as this doesn't handle the Created/Accessed attributes).

So we need at least the following API:

type Entry struct { ... }                   // Export Entry type as container for all cookie attributes.
func (j *Jar) AllEntries() []Entry          // Return a copy of all (non-expired) cookies stored in j.
func (j *Jar) LoadEntries(e []Entry) error  // Load cookies from e into j.

This allows accessing all stored cookies and re-populating A Jar from that information. Deleting cookies from the jar can be simulated by specially crafted Jar.SetCookies calls.

This is sufficient for the following use case:

  • A long running application which has to keep cookies from several domains.
  • The list of domains is not ad hoc known/determined.
  • The application want's to persist its state (including the stored cookies) and restart from this state at some later time.

During the original discussion about a persistent cookiejar (starting points for more might be https://groups.google.com/forum/#!topic/golang-dev/7Jcbgrbah2s/discussion) we found out that such a minimal API might not be suitable: Think about a browser in a mobile device which wants to persist cookies while not generating garbage (to keep GC low) and minimizing disk access. In such a use case reading jar.AllEntries() after each request (to update LastAccess) would be far to heavyweight. In such a scenario some kind of notification mechanism would be preferable: The Jar informs some subscriber each time some change happens, the subscriber filters this stream of notifications and acts on some subset. E.g. ignoring changes to LastAccess and expired cookies and handling only new/deleted/changed cookies. To make this work we would need an API to subscribe to changes to the Jar and query only a subset of all cookies stored in the Jar:

type Notification uint32
const (
    ReadAccess   Notification = 1 << iota // LastAccess modified in Cookies().
    Expiration                            // Deleted because expiration detected in Cookies().
    DeleteCookie                          // Deleted during SetCookies.
    SetCookie                             // Set or updated during SetCookies.
)
type Options struct {
    PublicSuffixList cookiejar.PublicSuffixList  // As existing already

    // Notify is a channel on which changes to the jar are announced.
    Notify chan<- string  // TODO: element type of channel

    // NotifyOn is the bitmap which determines which changes to the
    // jar will result in a notification message on the Notify channel.
    NotifyOn Notification
}
func( j *Jar) GetEntry(domain, path, name string) (Entry, error)  // Retrieve single cookie.

This starts to become a pretty large API change.

I think it will be hard to find the right tradeoff between an API which is convenient to use in the simple use case (dump everything at program exit, reload all at next startup) while enabling to partially act on changes to the Jar content.
I expect the following API to be a sensible compromise:
https://godoc.org/github.com/vdobler/ht/cookiejar

  • Dumping everything is one iteration over jar.ETLDsPlus1(nil).
  • Restoring all is one (or several) call(s) to jar.LoadEntries().
  • Slices can be reused to minimise GC pressure if needed.
  • Notifications and subset access is coarse (limited to everything in a eTLD+1)

The reason for the coarse notification and subset is as follows:

  • LastAccess updates often happen to several cookies of one eTLD+1, it is common to see a Path of "/" in cookies, so distinguishing single cookies seems overkill.
  • I tend to think about the set of cookies for an eTLD+1 / Path combination as some kind of atomic data in which a partial update (e.g. just 2 out of 4 cookies) is more problematic than not updating the 4 cookies.
  • If access grouped by eTLD+1 is too coarse it could be narrowed to (eTLD+1, Path) which probably is the finest (but I doubt that a mobile browser would benefit significantly from this finer access).

Question to @bradfitz and @nigeltao : Are such large API changes okay? Should I prepare a design document? Or is the whole issue something which can be delegated to a 3rd party package?

@nigeltao

This comment has been minimized.

Copy link
Contributor

commented Mar 13, 2017

Large API changes are OK, but it is indeed tricky, as we've discussed before, and unfortunately, I don't have a lot of spare bandwidth to think about cookie jars at the moment. I think the way to start is to work in a 3rd party package.

@tv42

This comment has been minimized.

Copy link

commented Aug 14, 2018

Does this really need to reside in the stdlib? For example, implementing a cookie jar that immediately persists changes in a database is easiest if you know the database you're talking to. You might even use a per-use wrapper that communicates an open SQL transaction to the thing implementing http.CookieJar.

I think instead of wanting "100% all of the features cookiejar implementation" in stdlib, might it not be more useful to focus on this: Why is implementing your own CookieJar difficult? Is it because marshaling a http.Cookie usefully is hard? Is it because of innate complexity of HTTP cookie behavior? Can stdlib provide helpers for these tasks, without having to find a one-size-fits-everyone CookieJar implementation?

@vdobler

This comment has been minimized.

Copy link
Contributor

commented Aug 14, 2018

@tv42

Is it because of innate complexity of HTTP cookie behavior?

This.
Which cookies are allowed by the spec and which are so common in the wild that one has to allow them is ugly and the logic behind domains and paths is complicated.

But I agree: There is not much need for this in the stdlib. Taking the stdlib Jar implementation and adding a few functions/methods is not much work. net/http/cookiejar does not see much commits, actually just a handful during the last 5 years. So forking and maintaining the fork is pretty simple.

Adding helper functions to the stdlib is also complicated: http.Cookie is not really suitable to store all information needed in a cookiejar, e.g. how to distinguish domain-cookies from host-cookies?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.