Skip to content

Commit

Permalink
Merge 31277b3 into 3cea80c
Browse files Browse the repository at this point in the history
  • Loading branch information
ntarocco committed Nov 22, 2019
2 parents 3cea80c + 31277b3 commit 0e9bebc
Show file tree
Hide file tree
Showing 15 changed files with 924 additions and 32 deletions.
Binary file added docs/_static/invenio-files-overview.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/invenio-records-file.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 9 additions & 3 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -320,18 +320,24 @@

# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {
'flask': ('https://flask.readthedocs.io/', None),
'flask': ('https://flask.palletsprojects.com/en/1.1.x/', None),
'flaskassets': ('https://flask-assets.readthedocs.io/en/latest/', None),
'flaskregistry': (
'https://flask-registry.readthedocs.io/en/latest/', None),
'flaskscript': ('https://flask-script.readthedocs.io/en/latest/', None),
'invenio-access': (
'https://invenio-access.readthedocs.io/en/latest/', None),
'jinja': ('https://jinja.readthedocs.io/', None),
'jinja': ('https://jinja.palletsprojects.com/en/2.10.x/', None),
'python': ('https://docs.python.org/', None),
'sqlalchemy': ('http://docs.sqlalchemy.org/en/latest/', None),
'webassets': ('https://webassets.readthedocs.io/en/latest/', None),
'werkzeug': ('https://werkzeug.readthedocs.io/', None),
'werkzeug': ('https://werkzeug.palletsprojects.com/en/0.16.x/', None),
'invenio_files_rest': (
'https://invenio-files-rest.readthedocs.io/en/latest/',
None),
'invenio_celery': (
'https://invenio-celery.readthedocs.io/en/latest/',
None)
}

# Autodoc configuraton.
Expand Down
16 changes: 16 additions & 0 deletions docs/files/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
..
This file is part of Invenio.
Copyright (C) 2019 CERN.
Invenio is free software; you can redistribute it and/or modify it
under the terms of the MIT License; see LICENSE file for more details.

Handling Files
==============

.. toctree::
:maxdepth: 1

integration
management
storage
104 changes: 104 additions & 0 deletions docs/files/integration.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
..
This file is part of Invenio.
Copyright (C) 2019 CERN.
Invenio is free software; you can redistribute it and/or modify it
under the terms of the MIT License; see LICENSE file for more details.

.. _integration:

Integrating Files
=================

Invenio provides a bundle of modules to accommodate all around needs about
file management:

- :code:`invenio-files-rest`
- :code:`invenio-records-files`
- :code:`invenio-previewer`
- :code:`invenio-iiif`

Integration overview
++++++++++++++++++++

.. note:: If you want to use records with files, please note that you should be
using the `Record class <https://invenio-records-files.readthedocs.io/en/latest/api.html#invenio_records_files.api.Record>`_
provided in :code:`invenio-records-files`

Once your invenio instance is populated with records, you might want to add
files to it. The :code:`invenio-records-files` package is combining together
the :code:`invenio-records` module and the :code:`invenio-files-rest` module
and provides APIs to simplify the files integration.

The main classes involved in this process are :

- `RecordMetadata <https://invenio-records.readthedocs.io/en/latest/api.html#invenio_records.models.RecordMetadata>`_
: contains metadata of the record.
- `Bucket <https://invenio-files-rest.readthedocs.io/en/latest/api.html#invenio_files_rest.models.Bucket>`_
: contains files.
- `RecordsBucket <https://invenio-records-files.readthedocs.io/en/latest/api.html#invenio_records_files.models.RecordsBuckets>`_
: associates a record with one or more files contained in a bucket.

The following schema gives an overview of this integration and it's followed by a small description of the two file
modules involved.

.. image:: ../_static/invenio-files-overview.png


`invenio-files-rest <https://invenio-files-rest.readthedocs.io/>`_
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
:code:`invenio-files-rest` is the first required module for managing files
with your application, and some of its key functions is to allow you to store
and retrieve files in a similar way to Amazon S3 APIs.

- Configurable files storage
- Secure REST APIs for upload/download
- Support for large file uploads and multipart upload.
- File integrity monitoring
- Customizable access control

`invenio-records-files <https://invenio-records-files.readthedocs.io/>`_
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:code:`invenio-records-files` is the other required module, which provides a
basic API for the seamless co-operation of `invenio-records <https://invenio-records.readthedocs.io/>`__
and `invenio-files-rest`_. The API provides functionality for

- records creation
- files creation
- accessing files
- files metadata management
- files extraction from records

File previewing
+++++++++++++++

After your files have been integrated, invenio provides packages that allow
the previewing of them.

`invenio-previewer <https://invenio-previewer.readthedocs.io/>`_
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:code:`invenio-previewer` by default comes with support to a number of file
types but it also provides an extensible API to create new previewers.
By default the supported file types are: **PDF**,
**ZIP** **CSV**, **Markdown**, **XML**, **Json**, **PNG**, **JPG**, **GIF** and
**Jupyter Notebooks**.

`invenio-iiif <https://invenio-iiif.readthedocs.io/>`_
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:code:`invenio-iiif` integrates Invenio-Records-Files with `Flask-IIIF <https://flask-iiif.readthedocs.io/en/latest/>`__
to provide support for serving images complying with the International Image
Interoperability Framework (IIIF) API standards.

Invenio-IIIF registers the REST API endpoint provided by Flask-IIIF in the
Invenio instance through entry points. On each image request, it delegates
authorization check and file retrieval to Invenio-Files-REST and it serves the
image after adaptation by Flask-IIIF. Invenio-IIIF can also be used in a
combination with Invenio-Previewer to preview images and comes with the
following features.

- Thumbnail generation and previewing of images.
- Allows to preview, resize and zoom images, by implementing the `IIIF <https://iiif.io/>`__ API.
- Provide celery task to create image thumbnails.
244 changes: 244 additions & 0 deletions docs/files/management.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,244 @@
..
This file is part of Invenio.
Copyright (C) 2019 CERN.
Invenio is free software; you can redistribute it and/or modify it
under the terms of the MIT License; see LICENSE file for more details.

.. _serve:

Managing files
================

In this section are explained the different operations you can do to manage
files.

Operations
----------

Serving
+++++++

To serve and allow download of files, you can perform a GET request
specifying the bucket and the filename used to upload the file.

.. code-block:: console
$ curl -i -X GET "http://localhost:5000/api/files/$BUCKET/my_file.txt"
You can also list files or download specific versions of files. See the REST
APIs reference documentation below for more information.

Be aware that there are security implications to take into account when
serving files. See the :ref:`usage-security` for more information.

Uploading
+++++++++

You can upload, download and modify single files via REST APIs.
A file is uniquely identified within a bucket by its name and version.
Each file can have multiple versions.

Let's upload a file called :code:`my_file.txt` inside the bucket that
was just created.

.. code-block:: console
$ BUCKET=cb8d0fa7-2349-484b-89cb-16573d57f09e
$ echo "my file content" > my_file.txt
$ curl -i -X PUT --data-binary @my_file.txt \
"http://localhost:5000/api/files/$BUCKET/my_file.txt"
.. code-block:: json
{
"mimetype": "text/plain",
"updated": "2019-05-16T13:10:22.621533+00:00",
"links": {
"self": "http://localhost:5000/api/files/
cb8d0fa7-2349-484b-89cb-16573d57f09e/my_file.txt",
"version": "http://localhost:5000/api/files/
cb8d0fa7-2349-484b-89cb-16573d57f09e/my_file.txt?
versionId=7f62676d-0b8e-4d77-9687-8465dc506ca8",
"uploads": "http://localhost:5000/api/files/
cb8d0fa7-2349-484b-89cb-16573d57f09e/
my_file.txt?uploads"
},
"is_head": true,
"tags": {},
"checksum": "md5:d7d02c7125bdcdd857eb70cb5f19aecc",
"created": "2019-05-16T13:10:22.617714+00:00",
"version_id": "7f62676d-0b8e-4d77-9687-8465dc506ca8",
"delete_marker": false,
"key": "my_file.txt",
"size": 14
}
If you have a new version of the file, you can upload it to the same bucket
using the same filename. In this case, a new ObjectVersion will be created.

.. code-block:: console
$ echo "my file content version 2" > my_filev2.txt
$ curl -i -X PUT --data-binary @my_filev2.txt \
"http://localhost:5000/api/files/$BUCKET/my_file.txt"
.. code-block:: json
{
"mimetype": "text/plain",
"updated": "2019-05-16T13:11:22.621533+00:00",
"links": {
"self": "http://localhost:5000/api/files/
cb8d0fa7-2349-484b-89cb-16573d57f09e/my_file.txt",
"version": "http://localhost:5000/api/files/
cb8d0fa7-2349-484b-89cb-16573d57f09e/my_file.txt?
versionId=24bf075f-09f4-42f8-9fbe-3f00b8aac3e8",
"uploads": "http://localhost:5000/api/files/
cb8d0fa7-2349-484b-89cb-16573d57f09e/
my_file.txt?uploads"
},
"is_head": true,
"tags": {},
"checksum": "md5:fe76512703258a894e56bac89d2e8dec",
"created": "2019-05-16T13:11:22.617714+00:00",
"version_id": "24bf075f-09f4-42f8-9fbe-3f00b8aac3e8",
"delete_marker": false,
"key": "my_file.txt",
"size": 13
}
When integrating the REST APIs to upload files via a web application, you
might use JavaScript to improve user experience. Invenio-Files-REST provides
out of the box integration with JavaScript uploaders. See the
:ref:`usage-js-uploaders` section for more information.

Invenio-Files-REST also provides different ways to upload large files. See
the :ref:`usage-multipart-upload` and :ref:`usage-large-files` sections
for more information.

Downloading
+++++++++++

Once the bucket is created and a file is uploaded, it is possible
to retrieve it with a :code:`GET` request.

By default, the latest version will be retrieved. Invenio provides also support
for file versioning. In order to retrieve a different than the default version
of the file you have to provide the :code:`versionId` as query parameter, as in
the example below:

Download the latest version of the file:

.. code-block:: console
$ BUCKET_ID=cb8d0fa7-2349-484b-89cb-16573d57f09e
$ curl -i http://localhost:5000/files/$BUCKET_ID/my_file.txt
Download a specific version of the file:

.. code-block:: console
$ curl -i http://localhost:5000/files/$B/my_file.txt?versionId=<version_id>
.. note::
By default, the file is returned with the header
:code:`'Content-Disposition': 'inline'`, so that the browser will try to
preview it. In case you want to trigger a download of the file, use the
:code:`download` boolean query parameter, which will change the
:code:`'Content-Disposition'` header to :code:`'attachment'`

.. code-block:: console
$ curl -i http://localhost:5000/files/$B/my_file.txt?download
Stream
******
Instead of waiting for the file download to complete, invenio provides support
for streaming out of the box for the following file types:
`audio/mpeg`, `audio/ogg`, `audio/wav`, `audio/webm`, `image/gif`,
`image/jpeg`, `image/png`, `image/tiff`, `text/plain`.

You can add your custom mime types to
`MIMETYPE_WHITELIST <https://invenio-files-rest.readthedocs.io/en/latest/api.html#invenio_files_rest.helpers.MIMETYPE_WHITELIST>`_
to extend functionality according to your needs.

.. warning::

Be extra careful when you extend the whitelisted mime types since it could
potentially expose your server to XSS attacks


Deleting
++++++++

A delete operation can be of two types:

1. mark an object as deleted, allowing the possibility of restoring
a deleted file (also called delete marker or soft deletion).
2. permanently remove any trace of an object and referenced file
on disk (also called hard deletion).

Soft deletion
**************
Technically, it creates a new ObjectVersion, that becomes the new :code:`head`,
with no reference to a FileInstance. It is possible to revert it
by getting the previous version.

This operation will not access to the file on disk and it will leave it
untouched.

You can soft delete using REST APIs:

.. code-block:: console
DELETE /files/<bucket_id>/<file_name>
Hard deletion
**************
Given a specific object version, it will delete the ObjectVersion,
the referenced FileInstance and the file on disk. If the deleted version
was the :code:`head`, it will then set the previous object
as the new head.

The deletion of files on disk will not happen immediately. This is because
it is done via an asynchronous task to ensure that the FileInstance is
safely removed from the database in case the low level operation of file
removal on disk fails for any unexpected reason.

You can hard delete a file using REST APIs:

.. code-block:: console
DELETE /files/<bucket_id>/<file_name>?versionId=<version_id>
REST APIs do not allow to perform delete operations that can affect multiple
objects at the same time. For advanced use cases, you will to use the
Invenio-Files-REST APIs programmatically.

.. note::
For safety reasons, the deletion will fail if the file that you want
to delete is referenced by multiple ObjectVersions, for example
in case of Buckets snapshots.

Security
--------

When serving files, you will have to take into account any security
implications. Here you can find some recommendations to mitigate possible
vulnerabilities, such as Cross-Site Scripting (XSS):

1. If possible, serve user uploaded files from a separate domain
(not a subdomain).

2. By default, Invenio-Files-REST sets some response headers to prevent
the browser from rendering and executing HTML files.
See :py:func:`invenio_files_rest.helpers.send_stream` for more information.

3. Prefer file download instead of allowing the browser to preview any file,
by adding the :code:`?download` URL query argument

0 comments on commit 0e9bebc

Please sign in to comment.