Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IPFS add] "nocopy" option for "add" method doesn't work in a HTTP API call. #4558

Closed
v696973 opened this issue Jan 8, 2018 · 14 comments
Closed
Assignees

Comments

@v696973
Copy link

v696973 commented Jan 8, 2018

Version information:

go-ipfs version: 0.4.13-
Repo version: 6
System version: amd64/linux
Golang version: go1.9.2

Type:

Bug

Severity:

Low-Medium

Description:

I've been trying to use an IPFS HTTP API in order to add files using experimental filestore feature, and found out that "nocopy" option for "add" method doesn't actually prevent IPFS from copying files when used in a HTTP API call. The size of the blockstore gets increased by the size of the file I'm trying to add, effectively duplicating the file.
CLI version still works as expected

Notes on the IPFS version: I'm using a pre-built binary from the last go-ipfs release (0.4.13).

Steps to reproduce:

Let's make sure that we indeed enabled the filestore:

ipfs config --json Experimental.FilestoreEnabled
true

Check the size of our test data file:

du -sh 1.json
82M 1.json

And the size of our blockstore:

du -sh ../ipfs/blocks
2.0M ../ipfs/blocks

Let's try to add our file using CLI:

ipfs add --nocopy 1.json
added QmagqfHSuMrWiQoNQ8hRJwTJejN37UUVHHU1fLK5Acs2GQ 1.json

Checking the blockstore size...

du -sh ../ipfs/blocks
2.1M ../ipfs/blocks

A minor increase in size. Looks like it works, just like it should.

Now, let's do a cleanup and try to add the same file, only now using HTTP API:

ipfs pin rm QmagqfHSuMrWiQoNQ8hRJwTJejN37UUVHHU1fLK5Acs2GQ
unpinned QmagqfHSuMrWiQoNQ8hRJwTJejN37UUVHHU1fLK5Acs2GQ

ipfs repo gc > /dev/null

du -sh ../ipfs/blocks
2.0M ../ipfs/blocks

Add the file using HTTP API:

curl -F file=@1.json "http://localhost:5001/api/v0/add?nocopy=true"
{"Name":"1.json","Hash":"QmagqfHSuMrWiQoNQ8hRJwTJejN37UUVHHU1fLK5Acs2GQ","Size":"85949064"}

du -sh ../ipfs/blocks
85M ../ipfs/blocks

Judging by the size of the blockstore, we've got a copy of our file, despite the "nocopy" option.

So, it looks like it's a bug to me, although I may be missing something important, like any additional parameters that I should pass to the API call. The API is still quite poorly documented, unfortunately.

@whyrusleeping
Copy link
Member

This one does look like a bug.

@kevina or @magik6k could one of you take a look at this?

@kevina
Copy link
Contributor

kevina commented Jan 10, 2018

I am not super familiar with how the HTTP API works but I can look into to. Seams easy enough.

@kevina kevina self-assigned this Jan 10, 2018
@kevina
Copy link
Contributor

kevina commented Jan 16, 2018

@v696973 Sorry for the delay. This should probably error out somewhere but the problem is that the full path of the file is not being sent to the server only the name of the file. In order for the filestore to work it needs to path of the file so it can find it on the filesystem.

@v696973
Copy link
Author

v696973 commented Jan 18, 2018

Thanks, @kevina! So it turns this is my mistake, although this wasn't mentioned in the docs (it still should pretty obvious, but the lack of explicit error message from the API left me confused). I'm closing this issue for now.

@v696973 v696973 closed this as completed Jan 18, 2018
@kevina
Copy link
Contributor

kevina commented Jan 18, 2018

This should at least return an error rather than silently falling back to storing the file in the datastore. Reopening and will close once that is in place.

@magik6k
Copy link
Member

magik6k commented Jan 14, 2019

For anyone wondering how to make this work: The request to add needs to send body as multipart/form-data, and files that should be referenced from filesystem must have Abspath parameter set.

Here is how a correct nocopy request should look like:

POST /api/v0/add?encoding=json&hash=sha2-256&nocopy=true&pin=true&stream-channels=true HTTP/1.1
Host: 127.0.0.1:5001
User-Agent: go-ipfs-cmds/http
Connection: close
Transfer-Encoding: chunked
Content-Type: multipart/form-data; boundary=4dc47aa7aca6a1d475d25f8177538d5d8cbca5266eb72b936089747b01d8
Accept-Encoding: gzip

--4dc47aa7aca6a1d475d25f8177538d5d8cbca5266eb72b936089747b01d8
Abspath: /tmp/file
Content-Disposition: file; filename="file"
Content-Type: application/octet-stream

Hello

--4dc47aa7aca6a1d475d25f8177538d5d8cbca5266eb72b936089747b01d8--

(The issue here is that we allow adding files with no Abspath set, which silently copies them to the datastore)

@parkan
Copy link

parkan commented Feb 12, 2019

@magik6k setting custom header values inside multi-part form fragments seems very non-standard, I don't see how to do this without crafting a custom post body by hand (curl assumes you would only use Content-Disposition and Content-Type)

I guess the issue is that paths are relativized (ie sending -F 'file=@/foo/bar will set filename to bar) but this solution seems inaccessible

@parkan
Copy link

parkan commented Feb 12, 2019

hmm so this approach is in violation of RFC 7578:

4.8.  Other "Content-" Header Fields

   The multipart/form-data media type does not support any MIME header
   fields in parts other than Content-Type, Content-Disposition, and (in
   limited circumstances) Content-Transfer-Encoding.  Other header
   fields MUST NOT be included and MUST be ignored.

which explains why other tools do not support setting these

the restriction on using absolute paths is softer:

 If a "filename" parameter is supplied, the requirements of
   Section 2.3 of [RFC2183] for the "receiving MUA" (i.e., the receiving
   Mail User Agent) apply to receivers of multipart/form-data as well:
   do not use the file name blindly, check and possibly change to match
   local file system conventions if applicable, and do not use directory
   path information that may be present.

it appears that if we are to bend the rules to specify these, using absolute paths in the existing Content-Disposition header is the lesser evil

@magik6k
Copy link
Member

magik6k commented Feb 12, 2019

This restriction only applies to Content- headers, and was already sort-of defined in https://www.ietf.org/rfc/rfc2046.txt:

5.1 [...] The only header fields that have defined meaning for body parts are
   those the names of which begin with "Content-".  All other header
   fields may be ignored in body parts.  Although they should generally
   be retained if at all possible, they may be discarded by gateways if
   necessary.  Such other fields are permitted to appear in body parts
   but must not be depended on.  "X-" fields may be created for
   experimental or private purposes, with the recognition that the
   information they contain may be lost at some gateways.

This also sounds a bit like we should use X-, but other than that it doesn't prohibit this use.

@magik6k
Copy link
Member

magik6k commented Feb 12, 2019

using absolute paths in the existing Content-Disposition header is the lesser evil

It makes it hard to tell where object root is, and ties dag paths to files in the filesystem (though the second point would only be an issue for people who could easily work around it)

@parkan
Copy link

parkan commented Feb 12, 2019

hmm that wasn't my reading of 4.8, though perhaps 5.1 in 2046 does override it (I would prefer X-IPFS-FilestorePath or similar, though)

do you have an example curl invocation or comparable code snippet that creates a valid request with this header?

@magik6k
Copy link
Member

magik6k commented Feb 12, 2019

@Stebalien's example just worked for me on master:

curl 'http://127.0.0.1:5001/api/v0/add?nocopy=true' -F 'file=@SOMEFILE;headers="Abspath: /home/magik6k/full/path/to/SOMEFILE"'

@magik6k
Copy link
Member

magik6k commented Feb 12, 2019

Note that leaving out this header will generate missing file path or URL, can't create filestore reference error on master (not in 0.4.18 iirc)

@magik6k
Copy link
Member

magik6k commented Feb 12, 2019

(Let's move this discussion to #5984)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants