Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement access control lists for gateway #1551

Closed
wants to merge 3 commits into from
Closed

Conversation

whyrusleeping
Copy link
Member

License: MIT
Signed-off-by: Jeromy jeromyj@gmail.com


if i.config.DenyList != nil && i.config.DenyList.Has(k) {
w.WriteHeader(451)
w.Write([]byte("451 - Unavailable For Legal Reasons"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not always. we might have to block things that are not illegal (code of conduct on the service we run on our own gateway infra). probably should depend on the blocked thing... ok to merge, but this is not correct in the long run

@jbenet
Copy link
Member

jbenet commented Aug 5, 2015

this PR changes the behavior.

previously, allowing "$foo" on the webui, would allow "$foo" and all its children. BUT the children would not be able to be viewed individually, only through the parent:

/$foo_hash/child  ok
$child_hash  forbidden

we may want the old behavior. or we may want the new one. im not sure. anyone else have opinions on this?

@jbenet
Copy link
Member

jbenet commented Aug 11, 2015

@mappum @diasdavid o/ thoughts re

we may want the old behavior. or we may want the new one. im not sure. anyone else have opinions on this?

@whyrusleeping
Copy link
Member Author

the problem with the old behaviour is that if you block X, all i have to do is make Y that has X as a child, and I can access it again. It didnt really provide any sort of real 'blocking'

@jbenet
Copy link
Member

jbenet commented Aug 11, 2015

the problem with the old behaviour is that if you block X, all i have to do is make Y that has X as a child, and I can access it again. It didnt really provide any sort of real 'blocking'

no-- on the webui it was indeed blocked, because nothing except those things rooted at the allowed roots would work.

the reason i bring it up, too, is that there are implications to how HTTP works. for example, i may want to allow, root X, which has Y at /X/Y, and Y in the context of X is fine. But I may not want to allow /Z/Y, or even Z to use Y as /Y.

I'm leaning towards: "allowing/denying roots" as the right idea. you have to look through and make sure everything is ok to be used that way, but there's no questions otherwise.

@whyrusleeping
Copy link
Member Author

no-- on the webui it was indeed blocked, because nothing except those things rooted at the allowed roots would work.

are you talking about whitelists or blacklists? there are different implications depending on that context. In the case of a whitelist, what you say is true. In the case of a blocklist, what I'm saying is true.

@jbenet
Copy link
Member

jbenet commented Aug 12, 2015

i mean the webui "allowlist".

@whyrusleeping
Copy link
Member Author

okay, the more i think of this, the more i think we're going to have to treat each list differently. the "allowlist" is going to have to use the root checking like you say, but the "do-not-allowlist" is going to need to use the technique i'm using.

@jbenet jbenet mentioned this pull request Aug 22, 2015
16 tasks
func loadKeySetFromURLs(urls []string) (key.KeySet, error) {
ks := key.NewKeySet()
for _, url := range urls {
resp, err := http.Get(url)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woah, no. we should be caching this locally. the http thing should be only to update the list. if list host is unreachable, then what? no gateway? no blocking?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where should we cache it to? just inside $IPFS_PATH ?
should we name them after the url used to fetch them? or just blocklist.json and allowlist.json ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


They should be ipfs objects. we can import them from JSON.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think they should work kind of like branches in git.

  • there's a remote endpoint with a bunch of named lists (a tree of lists if you will)
  • fetching them locally tends to use the same name, but can rebind them.

@jbenet
Copy link
Member

jbenet commented Aug 23, 2015

  • cache lists locally, only update with http.
  • maybe the lists should include a reason for blocking/allowing, that way we can select the appropriate error code. not everything will be a 451. some may be just forbidden. etc. may want to brainstorm how to represent the reason.
  • allowlist should use the roots technique
  • denylist should continue using what you're doing

@jbenet
Copy link
Member

jbenet commented Aug 23, 2015

not suggesting to change it to this, this is only for discussion:

right now, the allowlist and denylist only stack once. other filter systems allow users to compose a stack of filters, allowing expressions like:

# deny everything except things under good/ but not good/nowait,bad
DENY *
ALLOW good/*
DENY good/nowait,bad/
ALLOW *
DENY bad/*
ALLOW bad/butonlyalittlebad/*

and so on.

(this makes sense for IP cidr filters too)

@ghost ghost mentioned this pull request Aug 25, 2015
56 tasks
License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
@whyrusleeping
Copy link
Member Author

@jbenet question about caching the lists. how should they be stored? as one list for allow and one list for deny on disk? and how often/when should they be updated?

License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
@jbenet
Copy link
Member

jbenet commented Aug 25, 2015

@whyrusleeping re "how they should be stored":


They should be ipfs objects. we can import them from JSON.

@whyrusleeping
Copy link
Member Author

@jbenet we already talked about this... thats the reason we arent fetching them from ipfs in the first place. the gateway is set up during node construction, fetching objects before the node is completed is a hassle.

@jbenet
Copy link
Member

jbenet commented Aug 26, 2015

the gateway is set up during node construction, fetching objects before the node is completed is a hassle

We can fetch ipfs objects over HTTP just fine. (to clarify, i'm suggesting hosting the lists as ipfs objects (either in json or cbor (or both! "list.json" and "list.cbor") and only pulling them with http. add them to local repo then)

@ghost
Copy link

ghost commented Aug 26, 2015

Can we make sure that we don't have to comply with a DMCA that shuts down the denylist?

@ghost
Copy link

ghost commented Aug 26, 2015

Can we make sure that we don't have to comply with a DMCA that shuts down the denylist?

What I'm getting at is, it'd make the denylist subject to the same absurd theater that it's supposed to make transparent

@whyrusleeping
Copy link
Member Author

@jbenet but if we fetch ipfs object over http, what exactly are we caching?

@jbenet
Copy link
Member

jbenet commented Aug 26, 2015

@jbenet but if we fetch ipfs object over http, what exactly are we caching?

the object. cache it locally because you're not guaranteed to be able to reach the host every time you launch IPFS. Basically:

func start() {
  tryDownloadingNewDenylist()
  loadDenyList()
}

func tryDownloadingNewDenylist() {
  // do http request.
  // if successful, store as an ipfs object locally.
}

func loadDenylist() {
  // load local denylist ipfs object
}

@jbenet
Copy link
Member

jbenet commented Aug 26, 2015

@lgierth i think we'll be fine with distributing a list of hashes. we generate the denylist, so it's technically our copyright?

@whyrusleeping
Copy link
Member Author

@lgierth whats in the Uri field of the blocklists?

@ghost
Copy link

ghost commented Aug 26, 2015

@lgierth whats in the Uri field of the blocklists?

@whyrusleeping The response is an array of notice objects, each consisting of a URI for more information on that notice, and a list of keys to block:

[ { "uri":"http://dmca.ipfs.io/2015-08-03-foobar",
    "keys":["Qmfoobar","Qmsomething","Qmbarbaz"] } ]

As per our chat yesterday, we can just flatten that, because the concept of a notice just isn't needed on the gateway's end.

["Qmfoobar","Qmsomething","Qmbarbaz"]

Would it be a good idea to respond with a URI template for the for-more-information-click-here link?

@whyrusleeping
Copy link
Member Author

@jbenet so the flow i think i'm getting is:

checkLocalDagstoreFor(listKey)
if thatFailed {
  queryHTTP(listUrl)
  if thatFailedToo {
    die
  }
  writeListToLocalBlockstore(list)
}

question: where does the listKey come from? do we store it in the config? that would likely required a map or something in the config to store list urls and cached keys.

@whyrusleeping
Copy link
Member Author

also, when should we check for updates to the lists?

@jbenet
Copy link
Member

jbenet commented Aug 28, 2015

checkLocalDagstoreFor(listKey)
if thatFailed {
  queryHTTP(listUrl)
  if thatFailedToo {
    die
  }
  writeListToLocalBlockstore(list)
}

this o/ wont work because after getting a list you would never update again. i think we need:

if updateCondition() {
  if list, err := getListViaHTTP(url); err == nil {
    dag.Add(list)
  }
}
list, err := dag.Get(listKey)
if err != nil {
  list = emptyList
}

@whyrusleeping
Copy link
Member Author

@jbenet

also, when should we check for updates to the lists?

@jbenet
Copy link
Member

jbenet commented Aug 29, 2015

@whyrusleeping

also, when should we check for updates to the lists?

I think we shouldn't go over an hour with a stale one. another question is that long-running daemons should poll over time to remain up to date. this is way easier if we make the lists just return ipfs refs. see below.

@jbenet
Copy link
Member

jbenet commented Aug 29, 2015

(could use the main api endpoint to return a head (i.e. a hash of the head) and then grab the head from the global http gateway itself. so

https://ipfs.io/refs/lists/denylists/dmca
https://ipfs.io/refs/lists/archives/archive.org
https://ipfs.io/refs/lists/archives/arxiv.org

could be just hashes, which we can then get with:

https://ipfs.io/ipfs/api/v0/object/get?arg=/ipfs/$head

that way, the json / cbor / whatever encoding question is handled by the API. (yay!)

we want to make these changes to propagate fast through the network

@daviddias daviddias mentioned this pull request Sep 2, 2015
42 tasks
@ghost
Copy link

ghost commented Sep 3, 2015

Should we go with something like this?

> GET https://gateway.ipfs.io/refs/lists/denylists/dmca HTTP/1.1

< HTTP/1.1 302 Found
< Location: /ipfs/$hash
<
< $hash

$hash resolves to a DenyList.

type DenyList struct {
  Objects []Link // all DenyObjects
}

type DenyObject struct {
  Object Link // the object that's supposed to be blocked
  Notice Link // the dmca notice object
}

type DenyNotice struct {
  Value Data // opaque value
}

Get all DenyList.Objects, then populate denylist with all DenyObject.Object link hashes.

> GET https://gateway.ipfs.io/ipfs/$deniedHash

< HTTP/1.1 451 Unavailable For Legal Reasons
<
< <a href="/dmca/$deniedHash">Show DenyNotice object,
<                             and the DenyObjects linking to it</a>

@jbenet
Copy link
Member

jbenet commented Sep 3, 2015

@lgierth that LGreatTM.

One minor thing, for the href in the returned page, what about maybe generating the page and link its ipfs object instead (so href="/ipfs/$pageShowingDenyNoticeAndLinks/" instead of href="/dmca/$deniedHash"). That way we would not have to implement special route handling for /dmca/<hash> (another route to worry about).

@ghost
Copy link

ghost commented Sep 3, 2015

@jbenet the gateway would have to keep the reverse-links from Object to DenyObject in order to get the notice. The idea above was that the little gateway-dmca-denylist daemon would take care of that, so that we don't have to introduce any logic to the gateway apart from basic denying.

Or is there an API function for getting objects linking to a given hash?

@jbenet
Copy link
Member

jbenet commented Sep 4, 2015

@lgierth we could probably prebuild all the pages statically every time we add to the list. and maybe we could link to a server we run then. dont want to put tons of dmca specific stuff in ipfs daemon.

@ghost ghost mentioned this pull request Sep 8, 2015
41 tasks
@RichardLitt RichardLitt mentioned this pull request Sep 23, 2015
65 tasks
@ghost
Copy link

ghost commented Sep 23, 2015

gateway-dmca-denylist now builds this kind of list: https://ipfs.io/ipfs/QmRER7erZxU63huYgSBryGhKrfHdkDkVjwQTd8RD4RdSW5

... which is a unixfs structure, which I figured is the simplest to implement and look at (thanks to the dir-listing).

It fans out based on the keys associated with each notice, so that e.g. a notice with 3 keys will result in 3 objects linked in the denylist. Each of these objects links to the rendered notice, and the object to be blocked as identified by the key. The rendered notice is the same for all these.

$ go run denylist.go 
QmRER7erZxU63huYgSBryGhKrfHdkDkVjwQTd8RD4RdSW5
$
$ # this is the denylist object
$ ipfs object get QmRER7erZxU63huYgSBryGhKrfHdkDkVjwQTd8RD4RdSW5
{
  "Links": [
    {
      "Name": "2015-08-03-foobar-QmcuJpsniX8GWiz59RMgTg9UUH23zPBATZEe6qKEAzacCz",
      "Hash": "QmSD1bn3LVLwZpdvYiekv7ei6ipx4k5vxLhRRTmzoQFMND",
      "Size": 621
    },
    {
      "Name": "2015-08-03-foobar-Qmem4kaF6Jw8yPgCTaLAsV3deGjahbeN1oymqg4tzBMZZR",
      "Hash": "QmQawxoaW5j3z6VUKAKd8zVqYKgiJ3TouisxB5jNGFxQqb",
      "Size": 640
    }
  ],
  "Data": "\u0008\u0001"
}
$
$ # the first item of the denylist, linking to the rendered notice and the object to be blocked
$ ipfs object get QmSD1bn3LVLwZpdvYiekv7ei6ipx4k5vxLhRRTmzoQFMND
{
  "Links": [
    {
      "Name": "notice",
      "Hash": "QmanbefGzMvgsyhAo47TW7AQ5YkKtVzoVLnYyZgf6Geok3",
      "Size": 476
    },
    {
      "Name": "object",
      "Hash": "QmcuJpsniX8GWiz59RMgTg9UUH23zPBATZEe6qKEAzacCz",
      "Size": 44
    }
  ],
  "Data": "\u0008\u0001"
}
$
$ # this is the rendered notice
$ ipfs cat QmanbefGzMvgsyhAo47TW7AQ5YkKtVzoVLnYyZgf6Geok3
<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8" />
    <title>Unavailable for Legal Reasons</title>
</head>
<body>
    remove my stuff plox

    <h2>Affected Objects</h2>
    <ul>
        <li><a href="/ipfs/QmcuJpsniX8GWiz59RMgTg9UUH23zPBATZEe6qKEAzacCz">/ipfs/QmcuJpsniX8GWiz59RMgTg9UUH23zPBATZEe6qKEAzacCz</a></li>
        <li><a href="/ipfs/Qmem4kaF6Jw8yPgCTaLAsV3deGjahbeN1oymqg4tzBMZZR">/ipfs/Qmem4kaF6Jw8yPgCTaLAsV3deGjahbeN1oymqg4tzBMZZR</a></li>

    </ul>
</body>
</html>

@jbenet
Copy link
Member

jbenet commented Sep 23, 2015

@lgierth feedback. Looking good!

what if instead of:

denylist/dmca/
denylist/dmca/2015-08-03-foobar-QmcuJpsniX8GWiz59RMgTg9UUH23zPBATZEe6qKEAzacCz
denylist/dmca/2015-08-03-foobar-Qmem4kaF6Jw8yPgCTaLAsV3deGjahbeN1oymqg4tzBMZZR
...
<root>/<date>-<org>-<hash>

instead do:

denylist/dmca/
denylist/dmca/2015-08-03-foobar/QmcuJpsniX8GWiz59RMgTg9UUH23zPBATZEe6qKEAzacCz
denylist/dmca/2015-08-03-foobar/Qmem4kaF6Jw8yPgCTaLAsV3deGjahbeN1oymqg4tzBMZZR
...
<root>/<date>-<org>/<hash>

this could:

  • stave off the cost of not yet sharding unixfs dirs.
  • make it easier to deal with long lists in one notice
  • easier to navigate.

Or is there a reason to have the hashes accessible from the root? is it about finding them easily? (could still do find over levels, just a bit harder. but if find is a problem, then we probably have to land unixfs dir sharding)


there's also something to be said for:

denylist/dmca/
denylist/dmca/2015/08/03/foobar/QmcuJpsniX8GWiz59RMgTg9UUH23zPBATZEe6qKEAzacCz
denylist/dmca/2015/08/03/foobar/Qmem4kaF6Jw8yPgCTaLAsV3deGjahbeN1oymqg4tzBMZZR
...
<root>/<year>/<month>/<day>/<org>/<hash>

but this index can be built later anyway


  • may want to include a link to the previous version of the list. like a dir called <root>/previous or something that's just the hash of the previous thing. (version history pre-commit chains)

@ghost
Copy link

ghost commented Sep 24, 2015

Or is there a reason to have the hashes accessible from the root

Regarding the hashes being present at all, I'm only appending them to the link name so that I don't end up with name clashes. These suffixes could as well be a counter (yyyy-mm-dd-org-$i). That's probably better, and easy to do.

Regarding this being a flat list, I thought that'd be easier to parse on the client side. I'm happy to make it <root>/yyyy/mm/dd/org/i if that doesn't make it a pain for clients.

may want to include a link to the previous version of the list

Sure

@ghost
Copy link

ghost commented Sep 24, 2015

Regarding this being a flat list, I thought that'd be easier to parse on the client side. I'm happy to make it /yyyy/mm/dd/org/i if that doesn't make it a pain for clients.

@whyrusleeping wdyt?

@whyrusleeping
Copy link
Member Author

@lgierth i can probably handle it.

@ghost ghost mentioned this pull request Oct 3, 2015
@RichardLitt RichardLitt mentioned this pull request Oct 13, 2015
62 tasks
@RichardLitt RichardLitt mentioned this pull request Oct 21, 2015
35 tasks
@ghost ghost added the topic/gateway Topic gateway label Dec 22, 2015
@Kubuxu Kubuxu added status/ready Ready to be worked and removed status/in-progress In progress labels Sep 26, 2016
@Kubuxu Kubuxu added need/analysis Needs further analysis before proceeding and removed status/ready Ready to be worked labels Nov 2, 2016
@whyrusleeping
Copy link
Member Author

closing, rebasing and making this work with todays codebase would require more work than rewriting it from scratch

@whyrusleeping whyrusleeping deleted the feat/lists branch November 16, 2016 04:51
@whyrusleeping whyrusleeping removed the need/analysis Needs further analysis before proceeding label Nov 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/gateway Topic gateway
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants