Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs usage #210

Closed
wants to merge 1 commit into from
Closed

Docs usage #210

wants to merge 1 commit into from

Conversation

vlad-bm
Copy link
Contributor

@vlad-bm vlad-bm commented May 21, 2019

Adds first draft of the usage documentation for the module

(Addresses #198)

docs/usage/permissions.rst Outdated Show resolved Hide resolved
docs/usage/permissions.rst Outdated Show resolved Hide resolved
Copy link

@roksys roksys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also update README.rst and follow "standard" pattern as in other packages.

Short package description

Features:
- some feature
- another feature

Further documentation is available on ...

https://invenio-records-files.readthedocs.io/en/latest/
https://invenio-records.readthedocs.io/en/latest/

docs/usage/upload_files.rst Outdated Show resolved Hide resolved
.gitignore Outdated Show resolved Hide resolved
docs/usage/migration.rst Outdated Show resolved Hide resolved
--------------------

:code:`Locations` are used to represent different storage systems and possibly
different geographical locations. :code:`Buckets` but also :code:`Objects` are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced by geographical, but it might make sense and I miss some history when this was created. We might ask Lars confirmation here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not like we came up with it, I saw it here

used to represent e.g. different storage systems and/or geographical


When we create a bucket we need to provide a Location, otherwise the default
one is used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Objects are not directly assigned to locations, right? They belongs to a bucket which has a Location.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understood, they actually do! Let's look into that.

docs/usage/migration.rst Outdated Show resolved Hide resolved

**NOTE:** By default, the file contents are displayed in the browser.
Should you want to download the file, add the :code:`download` keyword in the query parameters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

`
Retrieve Files

Once we have created our bucket and uploaded a file, it is possible to retrieve it with a GET request.
By default, the latest version will be downloaded.

ex

If you want to retrieve a specific version of the file, you can use the versionId query parameter:

NOTE: By default, the file is server with the header 'Content-Disposition': 'inline', meaning that the browser will try to preview it if possible. If you want to trigger a download, you can use the extra query parameter download, which will change the 'Content-Disposition' header to 'attachment'
`


Storage Backends
----------------
In order to get started we need to setup and configure a storage backend.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would in general first explain concepts. Why do we need to get started with a storage? What is a Storage backend? Why this abstraction?
https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html
https://cloud.google.com/storage/docs/storage-classes

I would mention something like that Invenio provides a default storage that is local storage using standard python lib (PyFSFileStorage). It defines an interface to create your own storage adapter. For example, Invenio-XRootD provides XRootD integration.

Creating Buckets
----------------

In order to upload, modify or delete files, we first need to create a bucket.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html
https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html

A bucket is a container for objects. Every object is contained in a bucket.
Buckets have a default location (link to the related section) and storage class (link to the related section).

Each bucket can also define a quota. The size of a bucket is the size of all objects in the bucket (including all versions).

In order to upload, modify or delete files, we first need to create a bucket.
A bucket can be created by a POST request to the url prefix, i.e. /files.
The response will contain the unique ID of the bucket.
A bucket can have one or more tags which store extra metadata for that bucket.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain what is a tag: extra metadata to "attach" extra information and describe a bucket. It is a loose key/value.

"id": "cb8d0fa7-2349-484b-89cb-16573d57f09e",
"size": 0
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in general, IMHO, I would first explain the concepts. Imagine I don't know anything about Bucket/ObjectVersion. I need to have an overview of the data model first (we need a location, we need a bucket, then we can upload a file, that is represented by ObjectVersion/FileInstance)
WDYT?

@topless
Copy link
Member

topless commented May 21, 2019

I think we should also update README.rst and follow "standard" pattern as in other packages.

Short package description

Features:
- some feature
- another feature

Further documentation is available on ...

https://invenio-records-files.readthedocs.io/en/latest/
https://invenio-records.readthedocs.io/en/latest/

I would agree with you @roksys, we have compiled a list of features in the beginning of the usage section. We can enhance and move that part to the README as you suggested.

@diegodelemos diegodelemos added this to In progress in Files bundle sprint (May 13-24) via automation May 21, 2019

>>> pprint(json.loads(res.get_data().decode("utf-8")))

Data Model
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this clarifications could be treated as the overview in Invenio-Access (see here), what do you think?

@topless topless changed the title [WIP]: Docs usage Docs usage May 22, 2019
@topless topless force-pushed the docs-usage branch 2 times, most recently from 66cb1e2 to c550cb1 Compare May 22, 2019 10:19
@topless topless marked this pull request as ready for review May 22, 2019 10:20
Copy link
Contributor

@ntarocco ntarocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please also improve the README file?

Files download/upload REST API similar to S3 for Invenio.

*This is an experimental developer preview release.*

Overview
========

In order to upload files, a Location needs to be defined first.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add an extra gentle introduction paragraph, something like (please improve!):

Invenio-Files-REST is a files storage module. You can store and retrieve files in a similar way to Amazon S3 APIs.

To understand how to safely manage files in Invenio, this guide will introduce you to the key concepts and terminology that you need to understand.

I will, IMHO, but please let's understand together what is best, start from "the end", files: this is because if I am new to these concepts, I don't know/understand why I need locations and buckets.
I would try to answer this question: I have a file, why would I need a location/bucket/object version?
I would also do not focus on upload/download only, because this if for HTTP. We can use the module programmatically.

The text below should be improved... but you should get my point.

Files are represented by FileInstance, which describes the path (or location???) the file on disk. Each file belongs to a Bucket, a container of files. Buckets are important to organize your storage or, for example, to define quotas. Since a file can have a complex set of metadata, for example name, type, size, etc., an ObjectVersion is associated to each file.

And then, maybe start the detailed description of each object.

Locations

A Location is ...

- Supports for large file uploads and multipart upload.
- Signals for system events.

The REST API follows best practices and supports, e.g.:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the comma after supports it sounds like supports was meant to be a plural form of a noun- and for me this doesn't really fit in the sentence. I would either remove the comma so it's clear that supports is a verb, or even better phrase it somehow along the lines of "... and provides following features:"


GET /files/<bucket_id>/

Returns list of file versions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The form of the verb is inconsistent- I would stay with "Return" also for the next 2 occurrences.

It is very easy to be exposed to Cross-Site Scripting (XSS) attacks if you
serve user uploaded files. Here are some recommendations:

1. Serve user uploaded files from a separate domain (not a subdomain). This
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the points are rendered without any line brake- maybe we should add newlines between them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, thanks for spotting that!

:code:`Bucket`, a particular :code:`Location` for it can be specified. The
:code:`Bucket` can also have a maximum quota assigned to it, and an important
point to note is that the :code:`Objects` inside it do not necessarily have to
be located in the same :code:`Location`. The :code:`Location` can be used to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and an important point to note is that the :code:Objects inside it do not necessarily have to
be located in the same :code:Location

Hey! Are you sure about that? The Bucket is the one that has the Location as a thing handled by Invenio-Files-Rest, I understand that you can set the "physical location" of a file using ObjectVersion.set_location but I think this is used on the special case where the file is outside "our file system", i.e. linked.
I might be wrong ☺️

Objects are an abstraction of a file, and are uniquely identified within
a bucket by string keys, i.e. the file name.


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Object Tags? ☺️

Copy link
Contributor

@ntarocco ntarocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not have time to complete the reading. Shall we go through together tomorrow and we fix on the fly?

@@ -7,3 +7,9 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should enrich this to something similar to: https://invenio-assets.readthedocs.io/en/latest/installation.html
Let's see together what to put exactly.

README.rst Outdated

* Free software: MIT license
* Documentation: https://invenio-files-rest.readthedocs.io/
Invenio-Files-REST provides configurable REST APIs for uploading, serving,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that the first thing it does is something like:
Invenio-Files-REST is a files storage module. It allows you to store and retrieve files in a similar way to Amazon S3 APIs
Then it also does add REST APIs.

README.rst Outdated

Features:

* A robust REST API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose something more in what I was saying above, please improve as you wish:
`

  • Files storage with configurable storage backends
  • REST APIs similar to Amazon S3
  • Robust and customizable access control
  • Integrity checking mechanism.
  • Support for large file uploads and multipart upload.
  • Signals for system events.
    `

docs/overview.rst Show resolved Hide resolved
An Object acts like container for a particular file (as identified by its name),
and holds *it* as well as all its previous versions (if any). The latest version
of the file is referred to as the :code:`HEAD`, while a version of the file is
referred to as an :code:`Object Version`. The link between an :code:`Object Version`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the way it is described, I would just maybe precise a few things. What about if we put slightly less details here and we move them in each section?

I would say that even the latest version is an ObjectVersion (maybe without space since it is a technical name)? I would also argue that the file is in reality is represented by a FileInstance as described below.

I put my 2 cents:
In Invenio-Files-REST, a file on disk is represented by an entity called :code:FileInstance`, which contains the location of the file on disk.

Files metadata such as name, type or version are represented by :code:ObjectVersion with a link to the FileInstance. Multiple ObjectVersion can point to the same FileInstance (advantages will be described below).

A :code:Bucket is a container of ObjectVersion. A Bucket defines the :code:Location (the base path of the storage) of the files it contains. Buckets are also useful to define quotas.

To summarize: Invenio represents files metadata as objects in buckets. Each object is versioned and point to a file instance. A file instance contains the physical location of the file.
`

All the rest of details, I would put them in each section...

-------
Storage classes are useful for defining the type of storage an object is
located on (e.g. offline/online), so that the system knowns if it can serve
the file and/or what is the reliability.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to describe better why we need a Location and a Storage, let's see together.

@ntarocco
Copy link
Contributor

@topless @vladbmihaescu @Glignos honestly, nice job! I think the overview is a great starting point for users.
I managed to review the Overview, README and started with the Usage.
I have added a commit just to re-arrange the order of concepts in the Overview.

For the REST API, available methods, etc... I think it should be split because it is mixed between location, buckets, files. I think it would better if as a user I can see the available REST APIs for:

  • Location: verbs and explanations
  • Buckets: verbs and explanations
  • Files: verbs and explanations

Tomorrow I will continue will the rest.

@ntarocco
Copy link
Contributor

@topless actually I cannot push... did you uncheck the allow edits from maintainers?

@Glignos Glignos moved this from In progress to Done in Files bundle sprint (May 13-24) May 24, 2019
@ntarocco ntarocco moved this from Done to Pending review in Files bundle sprint (May 13-24) May 24, 2019
@topless topless force-pushed the docs-usage branch 6 times, most recently from 51da664 to 6c73b3b Compare May 27, 2019 09:21
- overview structure and content
- usage structure and content
- added `BucketTag` in `__all__` of `models` so its documentation shows up
- closes inveniosoftware#198


Co-authored-by: Vlad Mihaescu <vladbmihaescu@protonmail.ch>
Co-authored-by: Nicola Tarocco <nicola.tarocco@cern.ch>
@lnielsen lnielsen moved this from Backlog to In progress in Files bundle sprint (July 22 to Aug 2) Jul 17, 2019
@lnielsen lnielsen assigned lnielsen and unassigned topless and vlad-bm Jul 22, 2019
@lnielsen lnielsen moved this from In progress to Pending review in Files bundle sprint (July 22 to Aug 2) Jul 22, 2019
@lnielsen lnielsen moved this from Pending review to Pending merge in Files bundle sprint (July 22 to Aug 2) Jul 22, 2019
@lnielsen lnielsen added this to Backlog in Sprint Week 46-47 (2019) - V3.2 Release via automation Nov 8, 2019
@lnielsen lnielsen moved this from Backlog to In progress in Sprint Week 46-47 (2019) - V3.2 Release Nov 11, 2019
@lnielsen lnielsen assigned ntarocco and unassigned lnielsen Nov 11, 2019
@kpsherva kpsherva moved this from In progress to Todo in Sprint Week 46-47 (2019) - V3.2 Release Nov 15, 2019
@kpsherva kpsherva moved this from Todo to In progress in Sprint Week 46-47 (2019) - V3.2 Release Nov 18, 2019
@kpsherva kpsherva added this to the v3.2 milestone Nov 19, 2019
@ntarocco
Copy link
Contributor

New one: #227

@ntarocco ntarocco closed this Nov 20, 2019
Sprint Week 46-47 (2019) - V3.2 Release automation moved this from In progress to Done Nov 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

9 participants