Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Link same file from diffrent block trees as solid big raw block. #42

Closed
ivan386 opened this issue May 5, 2017 · 6 comments
Closed

Link same file from diffrent block trees as solid big raw block. #42

ivan386 opened this issue May 5, 2017 · 6 comments
Labels
status/deferred Conscious decision to pause or backlog

Comments

@ivan386
Copy link

ivan386 commented May 5, 2017

GT:

Problem:
IPLD allows you to divide the file into blocks of different sizes and bind them differently. As a result, for the same file there can be a lot of variants of trees and, accordingly, hashes.

Decision:
If the file is divided into raw blocks. Into the root block, the hash of the entire file is written as if it were one large raw block. Accordingly, this hash is additionally announced to the network.

The client receiving the root block additionally searches for the source of the entire file by its hash and checks it by hash of the parts in the root block.

If the file has been fully loaded then it is rechecked by its RawLink. If RawLink does not match then the correct RawLink is written to the root block.

Original [RU]:

Проблема:
IPLD позволяет делить файл на блоки разного размера и по разному связывать их. В итоге для одного и того же файла может быть множество вариантов деревьев и соответственно хешей.

Решение:
Если файл делиться на сырые блоки в корневой блок записывается хеш всего файла как если бы это был один большой сырой блок. Соответственно этот хеш дополнительно анонсируется в сеть.

Клиент получив корневой блок дополнительно ищет источники файла целиком по его хешу и проверяет его по хешам частей в корневом блоке.

Если файл был полностью загружен то перепроверяется его RawLink. Если RawLink не совпадает то в корневой блок записывается правильный RawLink.

protocol buffers format

message PBLink {
    optional bytes  Hash = 1;
    optional string Name = 2;
    optional uint64 Tsize = 3;
}

message PBNode {
    repeated PBLink Links = 2;
    optional PBLink RawLink = 3;
    optional bytes  Data = 1;
}

GT:

RawLink is CIDv1 the whole file as a raw block. It is used to find additional data sources for the file.

Original [RU]:

RawLink это CIDv1 всего файла как сырого блока. Он используется для поиска дополнительных источников данных файла.

GT:

Example:

The first participant publishes the file by selecting the block size of 131072 bytes. And gets a CIDv0 of the root block: QmAAAA...AAAA

Original [RU]:
Пример:

Первый участник публикует файл выбрав размер блока в 131072 байт. И получает CIDv0 корневого блока: QmAAAA...AAAA

{
  "Links": [
    {
      "Name": "",
      "Hash": "zb2rhA2A2A2...A2A2A2",
      "Size": 131072
    },
    {
      "Name": "",
      "Hash": "zb2rhB2B2B2...B2B2B2",
      "Size": 131072
    },
    {
      "Name": "",
      "Hash": "zb2rhC2C2C2...C2C2C2",
      "Size": 131072
    },
    {
      "Name": "",
      "Hash": "zb2rhD2D2D2...D2D2D2",
      "Size": 131072
    },	
    {
      "Name": "",
      "Hash": "zb2rhC1C1C1...C1C1C1C1",
      "Size": 59029
    }
  ],
  "RawLink":{
    "Name": "",
    "Hash": "zb2rhR0R0R0R0...R0R0R0R0",
    "Size": 583317
  },
  "Data": "\b\u0002\u0018\ufffd\u0343\f ... \ufffd\ufffd\u0003"
}

GT:

The second participant publishes the file by selecting the block size of 262144 bytes. And got a CIDv0 of the root block: QmBBBB...BBBB

Original [RU]:

Второй участник публикует файл выбрав размер блока 262144 байт. И получил CIDv0 корневого блока: QmBBBB...BBBB

{
  "Links": [
    {
      "Name": "",
      "Hash": "zb2rhA1A1...A1A1",
      "Size": 262144
    },
    {
      "Name": "",
      "Hash": "zb2rhB1B1B1...B1B1B1",
      "Size": 262144
    },
    {
      "Name": "",
      "Hash": "zb2rhC1C1C1...C1C1C1C1",
      "Size": 59029
    }
  ],
  "RawLink":{
    "Name": "",
    "Hash": "zb2rhR0R0R0R0...R0R0R0R0",
    "Size": 583317
  },
  "Data": "\b\u0002\u0018\ufffd\u0343\f ... \ufffd\ufffd\u0003"
}

GT:

At both one and the same file which has a CIDv1 zb2rhR0R0R0R0...R0R0R0R0. This CIDv1 is written in RawLink.

The third participant received the QmAAAA...AAAA block and additionally searches the network for the sources of the block zb2rhR0R0R0R0...R0R0R0R0.

He finds the second participant by the CIDv1 zb2rhR0R0R0R0...R0R0R0R0 and asks him for the parts of the block zb2rhR0R0R0R0...R0R0R0R0 which are checked with hashes(CIDv1 Links) in the block QmAAAA ... AAAA.

Original [RU]:

У обоих один и тотже файл который имеет хеш zb2rhR0R0R0R0...R0R0R0R0. Этот хеш записан в RawLink.

Третий участник получил блок QmAAAA...AAAA и дополнительно ищет в сети источники блока zb2rhR0R0R0R0...R0R0R0R0.

Он находит второго участника по хешу zb2rhR0R0R0R0...R0R0R0R0 и запрашивает у него части блока zb2rhR0R0R0R0...R0R0R0R0 которые проверяет хешами(CIDv1 Links) в блоке QmAAAA...AAAA.

@mitra42
Copy link

mitra42 commented Sep 18, 2017

Ivan, two questions on this:

  1. Did this get accepted as a change, since I don't see it in the spec in the README.md

  2. Your format and README.md's diverge in format and names other than just this added hash. Which is correct

README.md
  "subfiles": [
    {
      "link": {"/": "QmAAA..."},
      "size": "100324"
    }, 
]

compared to:

Issue#42 

"Links": [
    {
      "Name": "",
      "Hash": "zb2rhA2A2A2...A2A2A2",
      "Size": 131072
    },
]

@ivan386
Copy link
Author

ivan386 commented Sep 19, 2017

  1. I guess not. In IPFS merkledag.proto same as in IPLD
message PBLink {
   optional bytes  Hash = 1;
   optional string Name = 2;
   optional uint64 Tsize = 3;
}

message PBNode {
   repeated PBLink Links = 2;
   optional bytes  Data = 1;
}
  1. This is the result of the output of the command ipfs dag get in the old format. Now it's changed.

@Kubuxu
Copy link

Kubuxu commented Sep 19, 2017

@ivan386 this won't solve the problem of Denial of Service that the size limit aims to solve.

Let's say I have a hash, you tell me that it is very big raw blob and send me the "sub-hashes". I fetch those sub-hashes (which could be gigabytes) to just find out that they don't hash to the hash I initially asked for.

Unless I am missing something?

@ivan386
Copy link
Author

ivan386 commented Sep 19, 2017

GT (and my translate):
@Kubuxu RawLink for binding trees only. "sub-hashes" are not needed. For verification, sub-hashes of the original tree are used.

For sources of a large block("zb2rhR0R0R0R0...R0R0R0R0"), the peer requests the data directly.

Example:

  1. Peer have block "QmAAAA...AAAA" with Links and RawLink("zb2rhR0R0R0R0...R0R0R0R0")
  2. Peer search sources and requests data of block "zb2rhR0R0R0R0...R0R0R0R0" from 0 to 131071 byte
  3. Peer check that hash of that data is "zb2rhA2A2A2...A2A2A2" as in "QmAAAA...AAAA" block.
    If hash correct then request next part. If not then drop source.

RU:
RawLink только для связывания деревьев. "sub-hashes" не нужны. Для проверки используются хеши оригинального дерева.

У источников большого блока ("zb2rhR0R0R0R0...R0R0R0R0") данные запрашиваются напрямую.

Пример:

  1. Пир имеет блок "QmAAAA...AAAA" с Links и RawLink("zb2rhR0R0R0R0...R0R0R0R0")
  2. Пир находит источники и запрашивает данные блока "zb2rhR0R0R0R0...R0R0R0R0" с 0 по 131071 байт.
  3. Пир проверяет что хеш полученных данных "zb2rhA2A2A2...A2A2A2" как и в "QmAAAA...AAAA" блоке.
    Если хеш правильный то запрашивается следующая часть. Если нет то соединение с источником разрывается.

@daviddias daviddias added the status/deferred Conscious decision to pause or backlog label Mar 19, 2018
@ivan386 ivan386 changed the title add big raw block link(RawLink) to root node of file with raw blocks Link same file from diffrent block trees as solid big raw block. Sep 19, 2018
@ivan386
Copy link
Author

ivan386 commented Sep 19, 2018

At this moment ipfs can create many root and leaves hashes for same file.

It's problem for big files.

>ipfs add ruwiki-20180301-pages-articles.xml.bz2
 3.10 GiB / 3.10 GiB [=====================================================================
added QmSLz9gjKrZ9Nh4yo4mYh1mc5RAWKTUSZXoLo6GoLwfye3 ruwiki-20180301-pages-articles.xml.bz2

>ipfs add --raw-leaves ruwiki-20180301-pages-articles.xml.bz2
 3.10 GiB / 3.10 GiB [=====================================================================
added QmNZAa1ceyPNnNhZof3UNtLErqJf7fM3W38sVwDGR6aYWw ruwiki-20180301-pages-articles.xml.bz2

>ipfs add -s"rabin" ruwiki-20180301-pages-articles.xml.bz2
 3.10 GiB / 3.10 GiB [=====================================================================
added QmSNqyH1jSLhoyNWamREpeVsDubnZ11PLS7cbMHYsVgrnr ruwiki-20180301-pages-articles.xml.bz2

>ipfs add -s"rabin" --raw-leaves ruwiki-20180301-pages-articles.xml.bz2
 3.10 GiB / 3.10 GiB [=====================================================================
added QmUgpPvZP9AWcKoiLKGX3KB8i4hDkuhrRXPqKeFsiV1yGQ ruwiki-20180301-pages-articles.xml.bz2

We have different root and leaves hashes.

At the same time entire file have only one sha256 hash

>rhash --sha256 ruwiki-20180301-pages-articles.xml.bz2
028d98f95b2d26ba01232a01d3a7e329386da08fc1f0ae7a587f7ac16c269b7d  ruwiki-20180301-pages-articles.xml.bz2

In Base58 CID will be: zb2rhWpFEmhVJ7H1myjjsuJn5h1CpQ4s2rnZ1us496ksy4kJY

And i propose to use that cid for linking between different block trees and allow to get raw data from one tree for another.

@rvagg
Copy link
Member

rvagg commented Aug 14, 2019

Closing due to staleness as per team agreement to clean up the issue tracker a bit (ipld/team-mgmt#28). This doesn't mean this issue is off the table entirely, it's just not on the current active stack but may be revisited in the near future. If you feel there is something pertinent here, please speak up, reopen, or open a new issue. [/boilerplate]

@rvagg rvagg closed this as completed Aug 14, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status/deferred Conscious decision to pause or backlog
Projects
None yet
Development

No branches or pull requests

5 participants