New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs usage #210
Docs usage #210
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also update README.rst
and follow "standard" pattern as in other packages.
Short package description
Features:
- some feature
- another feature
Further documentation is available on ...
https://invenio-records-files.readthedocs.io/en/latest/
https://invenio-records.readthedocs.io/en/latest/
docs/usage/migration.rst
Outdated
-------------------- | ||
|
||
:code:`Locations` are used to represent different storage systems and possibly | ||
different geographical locations. :code:`Buckets` but also :code:`Objects` are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not convinced by geographical
, but it might make sense and I miss some history when this was created. We might ask Lars confirmation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not like we came up with it, I saw it here
used to represent e.g. different storage systems and/or geographical |
docs/usage/migration.rst
Outdated
|
||
When we create a bucket we need to provide a Location, otherwise the default | ||
one is used. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Objects are not directly assigned to locations, right? They belongs to a bucket which has a Location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understood, they actually do! Let's look into that.
docs/usage/serve_files.rst
Outdated
|
||
**NOTE:** By default, the file contents are displayed in the browser. | ||
Should you want to download the file, add the :code:`download` keyword in the query parameters. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`
Retrieve Files
Once we have created our bucket and uploaded a file, it is possible to retrieve it with a GET request.
By default, the latest version will be downloaded.
ex
If you want to retrieve a specific version of the file, you can use the versionId query parameter:
NOTE: By default, the file is server with the header 'Content-Disposition': 'inline', meaning that the browser will try to preview it if possible. If you want to trigger a download, you can use the extra query parameter download
, which will change the 'Content-Disposition' header to 'attachment'
`
docs/usage/storage_backends.rst
Outdated
|
||
Storage Backends | ||
---------------- | ||
In order to get started we need to setup and configure a storage backend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would in general first explain concepts. Why do we need to get started with a storage? What is a Storage backend? Why this abstraction?
https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html
https://cloud.google.com/storage/docs/storage-classes
I would mention something like that Invenio provides a default storage that is local storage using standard python lib (PyFSFileStorage). It defines an interface to create your own storage adapter. For example, Invenio-XRootD provides XRootD integration.
docs/usage/upload_files.rst
Outdated
Creating Buckets | ||
---------------- | ||
|
||
In order to upload, modify or delete files, we first need to create a bucket. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html
https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html
A bucket is a container for objects. Every object is contained in a bucket.
Buckets have a default location (link to the related section) and storage class (link to the related section).
Each bucket can also define a quota. The size of a bucket is the size of all objects in the bucket (including all versions).
docs/usage/upload_files.rst
Outdated
In order to upload, modify or delete files, we first need to create a bucket. | ||
A bucket can be created by a POST request to the url prefix, i.e. /files. | ||
The response will contain the unique ID of the bucket. | ||
A bucket can have one or more tags which store extra metadata for that bucket. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe explain what is a tag: extra metadata to "attach" extra information and describe a bucket. It is a loose key/value.
docs/usage/upload_files.rst
Outdated
"id": "cb8d0fa7-2349-484b-89cb-16573d57f09e", | ||
"size": 0 | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in general, IMHO, I would first explain the concepts. Imagine I don't know anything about Bucket/ObjectVersion. I need to have an overview of the data model first (we need a location, we need a bucket, then we can upload a file, that is represented by ObjectVersion/FileInstance)
WDYT?
I would agree with you @roksys, we have compiled a list of features in the beginning of the usage section. We can enhance and move that part to the README as you suggested. |
invenio_files_rest/__init__.py
Outdated
|
||
>>> pprint(json.loads(res.get_data().decode("utf-8"))) | ||
|
||
Data Model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this clarifications could be treated as the overview in Invenio-Access (see here), what do you think?
66cb1e2
to
c550cb1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please also improve the README file?
Files download/upload REST API similar to S3 for Invenio.
*This is an experimental developer preview release.*
docs/overview.rst
Outdated
Overview | ||
======== | ||
|
||
In order to upload files, a Location needs to be defined first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add an extra gentle introduction paragraph, something like (please improve!):
Invenio-Files-REST is a files storage module. You can store and retrieve files in a similar way to Amazon S3 APIs.
To understand how to safely manage files in Invenio, this guide will introduce you to the key concepts and terminology that you need to understand.
I will, IMHO, but please let's understand together what is best, start from "the end", files: this is because if I am new to these concepts, I don't know/understand why I need locations and buckets.
I would try to answer this question: I have a file, why would I need a location/bucket/object version?
I would also do not focus on upload/download only, because this if for HTTP. We can use the module programmatically.
The text below should be improved... but you should get my point.
Files are represented by
FileInstance
, which describes the path (or location???) the file on disk. Each file belongs to aBucket
, a container of files. Buckets are important to organize your storage or, for example, to define quotas. Since a file can have a complex set of metadata, for example name, type, size, etc., anObjectVersion
is associated to each file.
And then, maybe start the detailed description of each object.
Locations
A Location is ...
- Supports for large file uploads and multipart upload. | ||
- Signals for system events. | ||
|
||
The REST API follows best practices and supports, e.g.: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the comma after supports it sounds like supports was meant to be a plural form of a noun- and for me this doesn't really fit in the sentence. I would either remove the comma so it's clear that supports is a verb, or even better phrase it somehow along the lines of "... and provides following features:"
invenio_files_rest/__init__.py
Outdated
|
||
GET /files/<bucket_id>/ | ||
|
||
Returns list of file versions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The form of the verb is inconsistent- I would stay with "Return" also for the next 2 occurrences.
It is very easy to be exposed to Cross-Site Scripting (XSS) attacks if you | ||
serve user uploaded files. Here are some recommendations: | ||
|
||
1. Serve user uploaded files from a separate domain (not a subdomain). This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the points are rendered without any line brake- maybe we should add newlines between them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, thanks for spotting that!
docs/overview.rst
Outdated
:code:`Bucket`, a particular :code:`Location` for it can be specified. The | ||
:code:`Bucket` can also have a maximum quota assigned to it, and an important | ||
point to note is that the :code:`Objects` inside it do not necessarily have to | ||
be located in the same :code:`Location`. The :code:`Location` can be used to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and an important point to note is that the :code:
Objects
inside it do not necessarily have to
be located in the same :code:Location
Hey! Are you sure about that? The Bucket
is the one that has the Location
as a thing handled by Invenio-Files-Rest, I understand that you can set the "physical location" of a file using ObjectVersion.set_location
but I think this is used on the special case where the file is outside "our file system", i.e. linked.
I might be wrong
Objects are an abstraction of a file, and are uniquely identified within | ||
a bucket by string keys, i.e. the file name. | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Object Tags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not have time to complete the reading. Shall we go through together tomorrow and we fix on the fly?
@@ -7,3 +7,9 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should enrich this to something similar to: https://invenio-assets.readthedocs.io/en/latest/installation.html
Let's see together what to put exactly.
README.rst
Outdated
|
||
* Free software: MIT license | ||
* Documentation: https://invenio-files-rest.readthedocs.io/ | ||
Invenio-Files-REST provides configurable REST APIs for uploading, serving, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that the first thing it does is something like:
Invenio-Files-REST is a files storage module. It allows you to store and retrieve files in a similar way to Amazon S3 APIs
Then it also does add REST APIs.
README.rst
Outdated
|
||
Features: | ||
|
||
* A robust REST API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose something more in what I was saying above, please improve as you wish:
`
- Files storage with configurable storage backends
- REST APIs similar to Amazon S3
- Robust and customizable access control
- Integrity checking mechanism.
- Support for large file uploads and multipart upload.
- Signals for system events.
`
docs/overview.rst
Outdated
An Object acts like container for a particular file (as identified by its name), | ||
and holds *it* as well as all its previous versions (if any). The latest version | ||
of the file is referred to as the :code:`HEAD`, while a version of the file is | ||
referred to as an :code:`Object Version`. The link between an :code:`Object Version` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the way it is described, I would just maybe precise a few things. What about if we put slightly less details here and we move them in each section?
I would say that even the latest version is an ObjectVersion (maybe without space since it is a technical name)? I would also argue that the file is in reality is represented by a FileInstance as described below.
I put my 2 cents:
In Invenio-Files-REST, a file on disk is represented by an entity called :code:
FileInstance`, which contains the location of the file on disk.
Files metadata such as name, type or version are represented by :code:ObjectVersion
with a link to the FileInstance. Multiple ObjectVersion can point to the same FileInstance (advantages will be described below).
A :code:Bucket
is a container of ObjectVersion. A Bucket defines the :code:Location
(the base path of the storage) of the files it contains. Buckets are also useful to define quotas.
To summarize: Invenio represents files metadata as objects in buckets. Each object is versioned and point to a file instance. A file instance contains the physical location of the file.
`
All the rest of details, I would put them in each section...
docs/overview.rst
Outdated
------- | ||
Storage classes are useful for defining the type of storage an object is | ||
located on (e.g. offline/online), so that the system knowns if it can serve | ||
the file and/or what is the reliability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to describe better why we need a Location and a Storage, let's see together.
@topless @vladbmihaescu @Glignos honestly, nice job! I think the overview is a great starting point for users. For the REST API, available methods, etc... I think it should be split because it is mixed between location, buckets, files. I think it would better if as a user I can see the available REST APIs for:
Tomorrow I will continue will the rest. |
@topless actually I cannot push... did you uncheck the allow edits from maintainers? |
51da664
to
6c73b3b
Compare
- overview structure and content - usage structure and content - added `BucketTag` in `__all__` of `models` so its documentation shows up - closes inveniosoftware#198 Co-authored-by: Vlad Mihaescu <vladbmihaescu@protonmail.ch> Co-authored-by: Nicola Tarocco <nicola.tarocco@cern.ch>
New one: #227 |
Adds first draft of the usage documentation for the module
(Addresses #198)