Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node shows how expensive retrieving given file would be (network fee + margin) #1437

Open
holisticode opened this issue Jun 10, 2019 · 12 comments

Comments

@holisticode
Copy link
Contributor

@holisticode holisticode commented Jun 10, 2019

This issue is part of the ethersphere/user-stories#23 user story.

In order for gateway incentivization to be possible, the price for retrieving a given file should be showed to the user.

This issue is about implementing this feature.

@nagydani @homotopycolimit @zelig please specify the requirements from the point of view of research for the implementation of this feature.

@acud

This comment has been minimized.

Copy link
Contributor

@acud acud commented Jun 10, 2019

@holisticode when you have the root chunk of the content you can assert the price for a certain content referenced by that root hash.

that being said, in the current design you cannot make such an assertion on an entire manifest. when you fetch a website, the browser will retrieve every file separately, meaning, with the above method of determining price, you'd have to price every item separately (without downloading it), prompt to the user, then, after approval, start the download. this is, however, no longer so simple when you take into account the fact that there could be cross references and inter-dependencies within the same manifest (since you would have to actually download one file and have the browser parse it in order to actually have the browser to ask for the next file...).

this, per-se, is difficult to cater for with our current manifest design (but not just). it would mean we need to implement, at least to a certain degree, actual content parsers that would scan the content on upload and implant this information in the manifest level (this would also be susceptible to certain spoofing attack vectors). this would only also work to price an initial GET from the manifest (e.g. just the default entry and its most basic resources), but you cannot price future/other GETs because the resource dependency lattice would just be impossible to keep track of and deterministically price. if you take into account the fact that users can generate manifests and fork content - the complexity goes even further to the fact that you would have to download forked content in order to embed its inherent pricing into the manifest that uses the forked content (this would also incur cost since you'd have to actually fetch everything in the manifest).

so the next question is: how much of a blocker is this?

@vojtechsimetka

This comment has been minimized.

Copy link
Member

@vojtechsimetka vojtechsimetka commented Jun 11, 2019

@acud We were looking at this during a meeting and we (@holisticode, @vojtechsimetka, @mortelli) don't fully understand.

However, the unix based OS' do not show file size for directories. Maybe we can take the same approach. The issue itself is talking only about files so maybe we can tackle that only here.

That does not mean the conversation is not valid and should not be researched!

@acud

This comment has been minimized.

Copy link
Contributor

@acud acud commented Jun 11, 2019

for standalone files there's not a problem, but a website for example would be multiple standalone files that depend on each other. so you would get a popup to pay for index.html, then another popup for styles.css, then another one for colors.css... you get the point. structures similar to this would be difficult to handle both in terms of UX and of pricing.
we need a mechanism anyway to protect users from their wallet being drained while browsing (and probably another protection when uploading files to swarm)

@homotopycolimit

This comment has been minimized.

Copy link

@homotopycolimit homotopycolimit commented Jun 12, 2019

it's like knowing the price per megabyte before you load a webpage, but not knowing just how big the page is.

@homotopycolimit

This comment has been minimized.

Copy link

@homotopycolimit homotopycolimit commented Jun 12, 2019

a node can show how expensive retrieving any given file would be.
a node cannot show how expensive retrieving any given dapp/homepage would be.

@holisticode

This comment has been minimized.

Copy link
Contributor Author

@holisticode holisticode commented Jun 13, 2019

it's like knowing the price per megabyte before you load a webpage, but not knowing just how big the page is.

Theoretically we could "scan" each manifest, as it contains the size, but it probably would not scale, and its UX could be awful (delays). However, @acud describes an even worse scenario...

Maybe we could add a field to a root hash's manifest, e.g. deep_size or something, if it is a structured data container (a directory, a webpage, a dapp) - when something is uploaded, we should probably know all its content and therefore its deep size and able to add this to the manifest?

Not sure if there are drawbacks with this, just leaving it here.

@acud

This comment has been minimized.

Copy link
Contributor

@acud acud commented Jun 13, 2019

@holisticode when i was specing out a better manifest structure i had the idea to do something similar (i called it complexity), and that is to add some measurement of the size of the trie. i'm not sure however that this is actually worth the while spent on writing the code for it as i'm not really sure if it has any benefits at all. if i have a website in a manifest and one of the resources is a 1tb file which you can download, does that count into the price of download or not? i think this is a corner we should be wise not to get into; the complexity here quickly gets out of hand.

@holisticode

This comment has been minimized.

Copy link
Contributor Author

@holisticode holisticode commented Jun 13, 2019

@acud well the deep_size field isn't strictly related to pricing. It is just information we have and is useful IMHO.

If someone wants to download the whole thing, then querying the maniferst in order to know that one file is 1GB is indeed super valuable I think...Querying in fact is free (or cheap), and only applies to price if you'd download the whole structure.

Even for you own example above. You'd show a popup only once for the complete website.

I can't see the rabbit hole you are describing, but I can see a lot of benefit having that feature. Please let's move the discussion here: #1467

@acud

This comment has been minimized.

Copy link
Contributor

@acud acud commented Jun 13, 2019

a complete website with all of its inherent resources might not be relevant to show a popup for. i might publish a website with gigabytes of resources which are very rarely used by very few users; does this website size constitute the price shown to every user?

example, i am the arch linux website operator and have uploaded the archlinux website to swarm with all of the image resources. let's assume the manifest deep_size is at 5tb including historical images. do you show a popup to download and pay for 5tb for every user that wants to enter my website? i'm guessing that the amount of downloads will decrease significantly. i can think of many other examples.

If someone wants to download the whole thing

this is an edge case that i would not hurry to generalize. i agree that in such case, such a metric would be valuable, but i claim this case would be rarely used

@holisticode

This comment has been minimized.

Copy link
Contributor Author

@holisticode holisticode commented Jun 13, 2019

I understand what you mean now. Thanks for bringing that up.

If we want Swarm to be successful, then the UX will be equally important to the tech. Thus, IMHO we can't ignore the complexity of things to make it useful. Specifically, as soon as we introduce pricing, which is a very sensitive issue to anyone, this should be transparent and as user-friendly as possible.

For a directory upload, it makes still sense to have deep_size nonetheless. Every subfolder could have its own deep_size (not thought through), and I think it is easy to implement.

For the website case, it would then be clear that it is not an information about a single page request.

In other words, for actual web pages (and possibly other compounded resources if there are any), we would still need something else.

@acud

This comment has been minimized.

Copy link
Contributor

@acud acud commented Jun 13, 2019

we can't ignore the complexity of things to make it useful

that is true, however, i'm not talking about the complexity of the tech, i am discussing the complexity of the UX. what I claim is that without probing the actual data upon upload (and what happens when you just forked a manifest and reused its hash in your website? you'll have to download and crawl it all, and pay for it obviously) the UX you will be able to supply users with this metric is very problematic (my personal opinion that it would very quickly become a burden than anything useful).

let's focus on the problem here and the problem here is that you want to mitigate user wallets to be drained by certain content. this is a very problematic attack vector to consider since an attacker will not gain anything from the attack (since the payment for the content will just go to the storing nodes, right?).
if anything it would benefit from certain DoS which can be easily mitigated by other means such as placing soft and hard limits on how much should a user pay for a request (or a series of requests/data transit per sec), which can be somewhat measured as wei/sec, and once that goes over a certain threshold of ongoing retrievals (you can detected when an ongoing http connection is doing many retrievals and you can conclude when it stops by measuring the incoming http connections on the proxy) can use a pop-up to alert the user (and temporarily suspend retrievals).

@acud

This comment has been minimized.

Copy link
Contributor

@acud acud commented Jun 13, 2019

this is very much a parable to using a car. you fill it up and drive to work, but the car doesn't tell you how much euro/peso the next drive is going to cost you and asks for your approval.
i think that allowing to provide users with a beneficial functionality in this sense is more like something that could be handed out for a thesis (or higher) research

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
6 participants
You can’t perform that action at this time.