Permalink
Fetching contributors…
Cannot retrieve contributors at this time
329 lines (228 sloc) 10.2 KB
.. index:: GridFS
======
GridFS
======
.. default-domain:: mongodb
.. contents:: On this page
:local:
:backlinks: none
:depth: 1
:class: singlecol
:term:`GridFS` is a specification for storing and retrieving files
that exceed the :term:`BSON`\-document :ref:`size limit
<limit-bson-document-size>` of 16 MB.
Instead of storing a file in a single document, GridFS divides the file
into parts, or chunks [#chunk-disambiguation]_, and stores each chunk as
a separate document. By default, GridFS uses a chunk size of 255 kB;
that is, GridFS divides a file into chunks of 255 kB with the exception
of the last chunk. The last chunk is only as large as necessary.
Similarly, files that are no larger than the chunk size only have a
final chunk, using only as much space as needed plus some additional
metadata.
GridFS uses two collections to store files. One collection stores the
file chunks, and the other stores file metadata. The section
:ref:`gridfs-collections` describes each collection in detail.
When you query GridFS for a file, the driver will reassemble the chunks
as needed. You can perform range queries on files stored through GridFS.
You can also access information from arbitrary sections of files, such
as to "skip" to the middle of a video or audio file.
GridFS is useful not only for storing files that exceed 16 MB but also
for storing any files for which you want access without having to load
the entire file into memory. See also
:ref:`faq-developers-when-to-use-gridfs`.
.. versionchanged:: 2.4.10
The default chunk size changed from 256 kB to 255 kB.
.. _faq-developers-when-to-use-gridfs:
When to Use GridFS
------------------
In MongoDB, use :term:`GridFS` for storing files larger than 16 MB.
In some situations, storing large files may be more efficient in a
MongoDB database than on a system-level filesystem.
- If your filesystem limits the number of files in a directory, you can
use GridFS to store as many files as needed.
- When you want to access information from portions of large
files without having to load whole files into memory, you can use
GridFS to recall sections of files without reading the entire file
into memory.
- When you want to keep your files and metadata automatically synced
and deployed across a number of systems and facilities, you can use
GridFS. When using :ref:`geographically distributed replica sets
<replica-set-geographical-distribution>`, MongoDB can distribute
files and their metadata automatically to a number of
:program:`mongod` instances and facilities.
Do not use GridFS if you need to update the content of the entire file
atomically. As an alternative you can store multiple versions of each
file and specify the current version of the file in the metadata. You
can update the metadata field that indicates "latest" status in an
atomic update after uploading the new version of the file, and later
remove previous versions if needed.
Furthermore, if your files are all smaller the 16 MB :limit:`BSON
Document Size` limit, consider storing the file manually within a
single document instead of using GridFS. You may use the BinData data
type to store the binary data. See your :doc:`drivers
</applications/drivers>` documentation for details on using BinData.
.. index:: GridFS; initialize
.. _gridfs-use:
Use GridFS
----------
To store and retrieve files using :term:`GridFS`, use either of the
following:
- A MongoDB driver. See the :doc:`drivers</applications/drivers>`
documentation for information on using GridFS with your driver.
- The :program:`mongofiles` command-line tool. See the
:program:`mongofiles` reference for documentation.
.. _gridfs-collections:
GridFS Collections
------------------
:term:`GridFS` stores files in two collections:
- ``chunks`` stores the binary chunks. For details, see
:ref:`gridfs-chunks-collection`.
- ``files`` stores the file's metadata. For details, see
:ref:`gridfs-files-collection`.
GridFS places the collections in a common bucket by prefixing each
with the bucket name. By default, GridFS uses two collections with
a bucket named ``fs``:
- ``fs.files``
- ``fs.chunks``
You can choose a different bucket name, as well as create multiple
buckets in a single database. The full collection name, which includes
the bucket name, is subject to the :limit:`namespace length limit
<Namespace Length>`.
.. index:: GridFS; chunks collection
.. _gridfs-chunks-collection:
The ``chunks`` Collection
~~~~~~~~~~~~~~~~~~~~~~~~~
Each document in the ``chunks`` [#chunk-disambiguation]_ collection
represents a distinct chunk of a file as represented in :term:`GridFS`.
Documents in this collection have the following form:
.. code-block:: javascript
{
"_id" : <ObjectId>,
"files_id" : <ObjectId>,
"n" : <num>,
"data" : <binary>
}
A document from the ``chunks`` collection contains the following fields:
.. data:: chunks._id
The unique :term:`ObjectId` of the chunk.
.. data:: chunks.files_id
The ``_id`` of the "parent" document, as specified in the ``files``
collection.
.. data:: chunks.n
The sequence number of the chunk. GridFS numbers all chunks, starting
with 0.
.. data:: chunks.data
The chunk's payload as a :term:`BSON` ``Binary`` type.
.. index:: GridFS; files collection
.. _gridfs-files-collection:
The ``files`` Collection
~~~~~~~~~~~~~~~~~~~~~~~~
Each document in the ``files`` collection represents a file in
:term:`GridFS`.
.. code-block:: javascript
{
"_id" : <ObjectId>,
"length" : <num>,
"chunkSize" : <num>,
"uploadDate" : <timestamp>,
"md5" : <hash>,
"filename" : <string>,
"contentType" : <string>,
"aliases" : <string array>,
"metadata" : <any>,
}
Documents in the ``files`` collection contain some or all of the
following fields:
.. data:: files._id
The unique identifier for this document. The ``_id`` is of the data
type you chose for the original document. The default type for
MongoDB documents is :term:`BSON` :term:`ObjectId`.
.. data:: files.length
The size of the document in bytes.
.. data:: files.chunkSize
The size of each chunk in **bytes**. GridFS divides the document into
chunks of size ``chunkSize``, except for the last, which is only as
large as needed. The default size is 255 kilobytes (kB).
.. versionchanged:: 2.4.10
The default chunk size changed from 256 kB to 255 kB.
.. data:: files.uploadDate
The date the document was first stored by GridFS. This value has the
``Date`` type.
.. data:: files.md5
An MD5 hash of the complete file returned by the :doc:`filemd5
</reference/command/filemd5>` command. This value has the ``String``
type.
.. data:: files.filename
Optional. A human-readable name for the GridFS file.
.. data:: files.contentType
Optional. A valid MIME type for the GridFS file.
.. data:: files.aliases
Optional. An array of alias strings.
.. data:: files.metadata
Optional. The metadata field may be of any data type and can hold
any additional information you want to store. If you wish to add
additional arbitrary fields to documents in the ``files``
collection, add them to an object in the metadata field.
.. index:: GridFS; index; indexes
.. _gridfs-indexes:
GridFS Indexes
--------------
GridFS uses indexes on each of the ``chunks`` and ``files`` collections
for efficiency. :doc:`Drivers </applications/drivers>` that conform to
the `GridFS specification`_ automatically create these indexes for
convenience. You can also create any additional indexes as desired to
suit your application's needs.
.. _gridfs-chunks-index:
The ``chunks`` Index
~~~~~~~~~~~~~~~~~~~~
:term:`GridFS` uses a :term:`unique <unique index>`, :term:`compound
<compound index>` index on the ``chunks`` collection using the
``files_id`` and ``n`` fields. This allows for efficient retrieval of
chunks, as demonstrated in the following example:
.. code-block:: javascript
db.fs.chunks.find( { files_id: myFileID } ).sort( { n: 1 } )
:doc:`Drivers </applications/drivers>` that conform to the `GridFS
specification`_ will automatically ensure that this index exists before
read and write operations. See the relevant driver documentation for the
specific behavior of your GridFS application.
If this index does not exist, you can issue the following operation to
create it using the :program:`mongo` shell:
.. code-block:: javascript
db.fs.chunks.createIndex( { files_id: 1, n: 1 }, { unique: true } );
.. _gridfs-file-index:
The ``files`` Index
~~~~~~~~~~~~~~~~~~~
:term:`GridFS` uses an :term:`index` on the ``files`` collection using
the ``filename`` and ``uploadDate`` fields. This index allows for
efficient retrieval of files, as shown in this example:
.. code-block:: javascript
db.fs.files.find( { filename: myFileName } ).sort( { uploadDate: 1 } )
:doc:`Drivers </applications/drivers>` that conform to the `GridFS
specification`_ will automatically ensure that this index exists before
read and write operations. See the relevant driver documentation for the
specific behavior of your GridFS application.
If this index does not exist, you can issue the following operation to
create it using the :program:`mongo` shell:
.. code-block:: javascript
db.fs.files.createIndex( { filename: 1, uploadDate: 1 } );
.. [#chunk-disambiguation] The use of the term *chunks* in the context
of GridFS is not related to the use of the term *chunks* in
the context of sharding.
.. _`GridFS specification`: https://github.com/mongodb/specifications/blob/master/source/gridfs/gridfs-spec.rst
Sharding GridFS
---------------
There are two collections to consider with :term:`gridfs` - ``files`` and
``chunks``.
If you need to shard a GridFS data store, use the ``chunks`` collection
setting ``{ files_id : 1, n : 1 }`` or ``{ files_id : 1 }`` as the shard key
index.
``files_id`` is an :term:`objectid` and changes
:ref:`monotonically<shard-key-monotonic>`.
You cannot use :doc:`/core/hashed-sharding` when sharding the ``chunks``
collection.
The ``files`` collection is small and only contains metadata. None of the
required keys for GridFS lend themselves to an even distribution in a
sharded environment. If you *must* shard the ``files`` collection, use the
``_id`` field, possibly in combination with an application field.
Leaving ``files`` unsharded allows all the file metadata documents to live
on the :term:`primary shard`.