Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipfs ls tries to get the contents inside the folder #4229

Open
wpfnihao opened this issue Sep 14, 2017 · 5 comments
Open

ipfs ls tries to get the contents inside the folder #4229

wpfnihao opened this issue Sep 14, 2017 · 5 comments
Labels
topic/perf Performance

Comments

@wpfnihao
Copy link

wpfnihao commented Sep 14, 2017

Version information:

go-ipfs version: 0.4.10-
Repo version: 5
System version: amd64/linux
Golang version: go1.8.3

Type:

Medium

Severity:

Low

Description:

Environment:
One folder containing 100,000 files, each of the size 100KB.
Two servers with 1Gb network.
The test folder added in one server and requested by the other one.

I have tested the performance of ipfs ls. It seems that ipfs ls with the option --resolve-type=true, which is the default setting, ipfs will get the contents inside the folder for just listing the folder information (see Fig. 3 and Fig. 4).

I have a quick look of the go-ipfs codes at go-ipfs/core/commands/ls.go, which confirms my guess:

// code snippet
output[i] = LsObject{
	Hash:  paths[i],
	Links: make([]LsLink, len(links)),
}

for j, link := range links {
	t := unixfspb.Data_DataType(-1)

	linkNode, err := link.GetNode(req.Context(), dserv)
	if err == merkledag.ErrNotFound && !resolve {
		// not an error
		linkNode = nil
	} else if err != nil {
		res.SetError(err, cmds.ErrNormal)
		return
	}

After that, I also tested ipfs ls --resolve-type=false and FUSE + native ls (see Fig. 1-2 and Fig. 5-6). The results show that with --resolve-type=false ipfs will only get the folder info block, and FUSE + native ls has the similar behavior with ipfs ls --resolve-type=true.

I know that ipfs needs the linked blocks to resolve the data type (in go-ipfs/unixfs/pb/unixfs.pb.go):

 type Data_DataType int32  
 const (        
     Data_Raw       Data_DataType = 0      
     Data_Directory Data_DataType = 1 
     Data_File      Data_DataType = 2 
     Data_Metadata  Data_DataType = 3 
     Data_Symlink   Data_DataType = 4             
     Data_HAMTShard Data_DataType = 5    
 )

My problems here:

  1. In some cases, I only need the folder info. and then get only a few files in the folder. However, ipfs will download the entire folder, which can be extremely slow.
  2. Do you have any plan to store the data type information along with the hash links, so that tackling with folders won't be cumbersome.

Thank you.

Related to #3120

Fig. 1: ipfs ls --resolve-type=false Sender
ipfs-ls-resolve-false-135-to-137-100k-100k/sys_fig_135.jpg

Fig. 2: ipfs ls --resolve-type=false Receiver
ipfs-ls-resolve-false-135-to-137-100k-100k/sys_fig_137.jpg

Fig. 3: ipfs ls --resolve-type=true Sender
ipfs-ls-135-to-137-100k-100k/sy_fig_135.jpg

Fig. 4: ipfs ls --resolve-type=true Receiver
ipfs-ls-135-to-137-100k-100k/sys_fig.jpg

Fig. 5: ls with FUSE Sender
fuse-ls-135-to-137-100k-100k/sys_fig_135.jpg

Fig. 6: ls with FUSE Receiver
fuse-ls-135-to-137-100k-100k/sys_fig.jpg

@magik6k
Copy link
Member

magik6k commented Sep 15, 2017

On 2: It might be possible to store type metadata as multicodec in CID, as is currently done for raw-leaves files.

@Stebalien
Copy link
Member

On 2: It might be possible to store type metadata as multicodec in CID, as is currently done for raw-leaves files.

We really shouldn't, that's not what CIDs are for. CIDs are used to determine how to decode some binary blob into an IPLD object. Ideally, one would only perform this CID introspection as an optimization (e.g., raw nodes can't have links). Unfortunately, this isn't currently the case but I'd rather not make it worse.

A better way would probably be to inline this metadata into directories themselves.

@whyrusleeping
Copy link
Member

Hey @wpfnihao Thanks again for the detailed graphs!

The fuse interface will always load the entire directory, this is unfortunately needed to address one of the previous bugs you reported where we were setting the type on every dirent to "File".

As @Stebalien says, we should probably store this info in the directory itself.

On that note, we should probably start designing ipld-unixfs. We have a lot of issues with the current unixfs that would be great to fix (including metadata, permissions, and executable bits). If someone wants to start thinking about this and drafting a proposal, that would be really helpful.

@kevina
Copy link
Contributor

kevina commented Sep 17, 2017

@whyrusleeping I could probably work on that if no one else wants to pick it up. I would really like to see some sort of timestamp stored for directory entries also.

@whyrusleeping
Copy link
Member

@kevina lets start gathering requirements in an issue in this new repo: https://github.com/ipfs/ipld-unixfs

This is something we need to think about very carefully, as we only really get one shot at changing it.

@momack2 momack2 added this to Inbox in ipfs/go-ipfs May 9, 2019
@Stebalien Stebalien added the topic/perf Performance label Mar 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/perf Performance
Projects
No open projects
Development

No branches or pull requests

5 participants