Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support manifest traversal #452

Merged
merged 14 commits into from Jul 21, 2020
Merged

Support manifest traversal #452

merged 14 commits into from Jul 21, 2020

Conversation

zbiljic
Copy link
Contributor

@zbiljic zbiljic commented Jul 16, 2020

See #433

NOTE: This is PoC, and uses simple, JSON-based manifest structure, which will be changed to solidity friendly implementation later on. It also lacks various protections against possible attacks.

Includes initial support for resolving the relative path from the bzz URL.
If the referenced address (hash) is of type manifest, it will attempt to lookup noted path from URL in that manifest, and if matched return the content of that file.

Route used is /bzz:/{address}/{path:.*}.

This implementation borrowed heavily from file.go and bytes.go. Initial PoC uses JSON-based manifest (for ease of testing), which is identified by application/bzz-manifest+json MIME type in file metadata. Manifest entry structure contains headers for the referenced file, which will be populated on response. All manifest-based operations are hidden behind interfaces, which should allow for easier switch to any other structure for manifest later on.

Example of usage for PoC website:

step 1 - upload 'index.html'

http POST http://127.0.0.1:8081/files "Content-Type:text/html" < index.html

Used HTML content:

<!doctype html>
<html>
<head>
    <title>Swarm website</title>
</head>
<body>
    <img src="images/ethersphere.png" alt="ethersphere">
    <p>Swarm: Censorship resistant storage and communication infrastructure for a truly sovereign digital society</p>
</body>
</html>

step 2 - upload some image

For example, 'ethersphere.png' was taken.

http POST http://127.0.0.1:8081/files "Content-Type:image/png" < ethersphere.png

step 3 - manually create manifest JSON

References used are the ones received for each file upload.
Note that 'index.html' uses 'ethersphere.png' on specific path.

{
    "entries": {
        "index.html": {
            "reference": "<INDEX_HTML_REFERENCE>",
            "name": "index.html",
            "headers": {
                "Content-Type": ["text/html"]
            }
        },
        "images/ethersphere.png": {
            "reference": "<PNG_REFERENCE>",
            "name": "ethersphere.png",
            "headers": {
                "Content-Type": ["image/png"]
            }
        }
    }
}

step 4 - upload created manifest

We use /files API, and set special content type for manifest file (for this PoC).

http POST http://127.0.0.1:8081/files "Content-Type:application/bzz-manifest+json" < manifest.json

step 5 - check "website"

It should now be possible to open the "website" in browser by going to:

http://127.0.0.1:8081/bzz:/<MANIFEST_FILE_REFERENCE>/index.html

@zbiljic zbiljic added the in progress ongoing development , hold out with review label Jul 16, 2020
@pradovic pradovic linked an issue Jul 16, 2020 that may be closed by this pull request
Copy link
Contributor

@pradovic pradovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Left a few style-related comments.

pkg/api/bzz.go Outdated Show resolved Hide resolved
pkg/api/bzz.go Outdated Show resolved Hide resolved
Comment on lines 45 to 61
fileName := "sample.html"
filePath := "test/" + fileName
missingFilePath := "test/missing"
sampleHtml := `<!DOCTYPE html>
<html>
<body>

<h1>My First Heading</h1>

<p>My first paragraph.</p>

</body>
</html>`

var err error
var fileReference swarm.Address
var manifestFileReference swarm.Address
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would maybe put this in the TestBzz scope and just have these individual tests below. Don't see a need for sub-tests.

@zbiljic zbiljic added ready for review The PR is ready to be reviewed and removed in progress ongoing development , hold out with review labels Jul 16, 2020
@@ -54,6 +54,10 @@ func (s *server) setupRouting() {
"POST": http.HandlerFunc(s.chunkUploadHandler),
})

handle(router, "/bzz:/{address}/{path:.*}", jsonhttp.MethodHandler{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from these links, it is unclear to me tbh

Copy link
Contributor

@pradovic pradovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mortelli mortelli mentioned this pull request Jul 16, 2020
Copy link
Member

@zelig zelig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks promising, need to understand why the code is so complex

pkg/api/bzz.go Outdated

entryAddress := entry.GetReference()

toDecryptEntry := len(entryAddress.Bytes()) == (swarm.HashSize + encryption.KeyLength)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should extract this as a function on addresses

pkg/api/bzz_test.go Outdated Show resolved Hide resolved
pkg/api/bzz.go Outdated Show resolved Hide resolved
Copy link
Contributor

@mortelli mortelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only minor remarks.

@@ -54,6 +54,10 @@ func (s *server) setupRouting() {
"POST": http.HandlerFunc(s.chunkUploadHandler),
})

handle(router, "/bzz:/{address}/{path:.*}", jsonhttp.MethodHandler{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from these links, it is unclear to me tbh

pkg/manifest/manifest.go Outdated Show resolved Hide resolved
pkg/api/common.go Outdated Show resolved Hide resolved
pkg/api/bzz.go Show resolved Hide resolved
pkg/api/bzz.go Show resolved Hide resolved
pkg/api/bzz.go Outdated Show resolved Hide resolved
pkg/api/bzz_test.go Show resolved Hide resolved
ManifestContentType = "application/bzz-manifest+json"
)

func (s *server) bzzDownloadHandler(w http.ResponseWriter, r *http.Request) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very long function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That it is. It was intentionally left like that, to have linear steps that are being performed. There is also a lot of boilerplate regarding HTTP API.
BTW, not the longest one, the file.go wins in that regard.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well mistakes don't need to be repeated

Copy link
Contributor

@pradovic pradovic Jul 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terribly sorry but I would say this function is not specially long, and having a "long" function is not bad in any general way. Also, we are trying to isolate the PRs to be small, and limit the change set, so I would not spend to much time here to refactor the files.go code. This function is using a code from files.go, and although I agree it should be refactored, I don't think it needs to be in this specific PR. Especially, since we asked about this on mattermost and got the info that the code inside was not considered business logic and that it should be used in this way: https://beehive.ethswarm.org/swarm/pl/83bayftmzfgqbqgocpyxwkn3xc.
In order to refactor this function we should probably:

  1. Extract specific error types
  2. Remove logging and request rendering form the function and move it to calling functions, based on extracted error types
  3. Maybe move this to a special upload/download/transfer package/component in pkg and let api be thin and only API related

So, anyways, my 2 c are that we should keep this PR minimal and focus on the user-story, and open the separate PR to refactor the code, since this code will move stuff from files.go and possibly other api calls as well. Just my 2 c, I am OK either way.

@mortelli
Copy link
Contributor

@zbiljic thank you very much for addressing all my comments 👍

@zbiljic zbiljic merged commit 1467336 into master Jul 21, 2020
@zbiljic zbiljic deleted the feature/manifest-traversal branch July 21, 2020 13:40
@mortelli
Copy link
Contributor

related follow-up issue: #463

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready for review The PR is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support path in url and map to manifest traversal
6 participants