Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipfs files write -e -t --raw-leaves --cid-ver 1 ended up with a weird layout #6936

Open
RubenKelevra opened this issue Feb 27, 2020 · 7 comments
Labels
kind/bug A bug in existing code (including security flaws) need/analysis Needs further analysis before proceeding need/author-input Needs input from the original author P2 Medium: Good to have, but can wait until someone steps up topic/MFS Topic MFS

Comments

@RubenKelevra
Copy link
Contributor

RubenKelevra commented Feb 27, 2020

I'm sorry guys, I'm always hitting the weird ones :/

Version information:

go-ipfs version: 0.4.23-6ce9a355f
Repo version: 7
System version: amd64/linux
Golang version: go1.13.7

Description:

I rewrote a file with

printf "%s\n" "$htmldata" | ipfs files write --create --truncate --raw-leaves --cid-version 1 "/path/to/file"

in a shell-script. This created the following CID, which can neither be read by

ipfs cat /ipfs/bafybeiee2irxixmhpoj3hm67byj7kiwlxetjevbwy37ylzrlpjrek2tuue

nor by

ipfs files read /path/to/file

I've inspected the layout of this cid in the webgui on a different node, and there are 6 empty blocks and a 7th block with some length, which isn't the layout I expected.

I don't set the --flush command, so it's on its the default. The garbage collector did run on this node today, but I cannot verify if this happened simultaneously or not. So there's a chance that the GC did cause trouble for --truncate.

Screenshot_20200227_022629


The .ipfs folder is stored on a ZFS, which shows no data integrity errors, I'm currently running a ipfs repo verify but I doubt it will find any issues either.

@RubenKelevra RubenKelevra added the kind/bug A bug in existing code (including security flaws) label Feb 27, 2020
@RubenKelevra
Copy link
Contributor Author

RubenKelevra commented Feb 27, 2020

Since I have the complete debug log of the shell script, I could recover the content which actually should end up in this node... (hope it's bytewise the same, but that's not absolutely certain).

/ipfs/bafybeifylcgw2bycfxdnlgm7ilzdq6nckuuckmyu6aiqtighlmrzt5loya

And since the repo does not hold the data of the previous CID of this file (before the script wrote the data which failed to be properly written) it's pretty likely that the GC run either while or afterward of the write command.

@hsanjuan
Copy link
Contributor

I am not sure I get it. The original command is missing a ". But fixing that it works well enough for me (am I supposed to set the contents of /ipfs/bafybeifylcgw2bycfxdnlgm7ilzdq6nckuuckmyu6aiqtighlmrzt5loya to $htmldata?).

@RubenKelevra
Copy link
Contributor Author

The original command is missing a ".

Yeah sorry, fixed that.

That code is not that what's actually running, so I haven't copied it, but just wrote it down and made the error.

I am not sure I get it. But fixing that it works well enough for me (am I supposed to set the contents of /ipfs/bafybeifylcgw2bycfxdnlgm7ilzdq6nckuuckmyu6aiqtighlmrzt5loya to $htmldata?).

So I'm basiclly reading a file, adding stuff to the variable and writing it on the same filename again with ipfs files write --create --truncate --raw-leaves --cid-version 1 "/path/to/file"

This worked great for a while, then I decided to use cid-version 1 and obviously the GC was running at the same time the script issued this command and I ended up with a file with 6 zero blocks and a 7th block containing the actual file data.

A read on the path or the CID returns no data and hangs indefinitely.

@alanshaw
Copy link
Member

alanshaw commented Mar 2, 2020

@RubenKelevra I tried this out - I saved the HTML for this page and ran this:

cat ~/Desktop/https_github.com_ipfs_go-ipfs_issues_6936.html | ipfs files write --create --truncate --raw-leaves --parents --cid-version 1 /path/to/file

(Note I had to add --parents for it to work)

This returned without error and I was able to ipfs files read /path/to/file as well as read it like ipfs files stat /path/to/file --hash | ipfs cat.

Do you have a file we can use to reproduce this behaviour? Also you said you couldn't use IPFS to read the file that was created - what error message are you getting?

GC should not have run during the operation, IPFS should be taking locks to ensure it doesn't run GC during MFS mutations.

@RubenKelevra
Copy link
Contributor Author

Sorry, the notification slipped by.

Also you said you couldn't use IPFS to read the file that was created - what error message are you getting?

An ipfs cat would just stall indefinitely.

Do you have a file we can use to reproduce this behaviour?

It was just a simple HTML-File like a apache-listing of a directory. I was adding a list item on each run.

I switched to removing the file from the MFS and using ipfs cp /ipfs/cid /path/to/file afterwards for a while, which was stable. But since then I completely removed this function from my production code - the amount of throughput was just way too much per day to keep it for a longer period on this node anyway.

I going to write a short example code and run it on my notebook until it happens and report back.

@RubenKelevra
Copy link
Contributor Author

RubenKelevra commented May 9, 2020

So, this script would replicate the access pattern:

A file gets read by it's CID, the content will be modified slightly at the end, and then the whole file gets rewritten. I use just the file's own CID as random data input.

My guess is, that there's a race condition bug when a file gets truncated and rewritten but the initial blocks already exist on the datastore. But my file cannot reproduce this issue with the version from the master. But now I use badgerds on the system - so the performance on certain situations is probably different than with flatfs on ZFS.

#!/bin/bash
set -e
api='--api=/ip6/::1/tcp/5001'
testdata="abcabcabcabcabcabcabcabcabcabcabcabcabcabcabc"
ipfs $api files mkdir --cid-version 1 '/testfolder'
testfile="/testfolder/testfile"
printf "%s\n" "$testdata" | ipfs $api files write --create --truncate --raw-leaves --cid-version 1 "$testfile"
while true; do
	cid=$(ipfs $api files stat --hash "$testfile")
	old_file_content=$( ipfs $api cat "$cid" )
	new_file_content="${old_file_content::-10}${cid:10:20}${old_file_content:0:10}"
	printf "%s\n" "$new_file_content" | ipfs $api files write --create --truncate --raw-leaves --cid-version 1 "$testfile"
done

@lidel
Copy link
Member

lidel commented Jun 16, 2021

I was not able to reproduce "cat stall indefinitely" with the above script, so switching this to P2.
@RubenKelevra if you are still able to reproduce this with 0.9.0-rc2 (or any older version) let us know how.


I was able to replicate a version of this with go-ipfs 0.9.0-rc2 by executing below command 5 times:

$ echo "hello world" | ipfs files write --create --parents --truncate --raw-leaves --cid-version 1 /aaa_test/file

It produced a weird DAG (bafybeifs4f3fbrix3b5lsk3sexgdh5eoeih2gmxv4rnc6pkn3ewitvuf2q) with 4 copies of empty bafkreihdwdcefgh4dqkjv67uzcmw7ojee6xedzdetojuzjevtenxquvyku and a single raw block with "hello world" (bafkreifjjcie6lypi6ny7amxnfftagclbuxndqonfipmb64f2km2devei4):

2021-06-16--01-49-16

The file can be read via ipfs files read /aaa_test/file or ipfs cat bafybeia4miqqt6qvhkmlsw3ipfiax3c7ntd2y4rnurvuvbayj3zl63otdq and it works fine via gateway (https://ipfs.io/ipfs/bafybeifs4f3fbrix3b5lsk3sexgdh5eoeih2gmxv4rnc6pkn3ewitvuf2q) but indeed, the layout is weird.

@lidel lidel added need/analysis Needs further analysis before proceeding P1 High: Likely tackled by core team if no one steps up topic/MFS Topic MFS labels Jun 16, 2021
@lidel lidel changed the title ipfs files write -e -t --raw-leaves --cid-ver 1 ended up with a weird layout and is unreadable ipfs files write -e -t --raw-leaves --cid-ver 1 ended up with a weird layout Jun 16, 2021
@lidel lidel added P2 Medium: Good to have, but can wait until someone steps up and removed P1 High: Likely tackled by core team if no one steps up labels Jun 16, 2021
@aschmahmann aschmahmann added the need/author-input Needs input from the original author label Jul 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) need/analysis Needs further analysis before proceeding need/author-input Needs input from the original author P2 Medium: Good to have, but can wait until someone steps up topic/MFS Topic MFS
Projects
None yet
Development

No branches or pull requests

5 participants