Skip to content
Takeshi Nakatani edited this page Feb 23, 2024 · 41 revisions

FUSE-based file system backed by Amazon S3

Announcement:

s3fs-fuse is moved from s3fs on googlecode after v1.74. Please submit usage/support questions to the Issues area instead of as a comment to this Wiki page. Thanks!

What's New

  • v1.87 fixed many bugs etc.
  • v1.86 fixed many bugs etc.
  • v1.85 fixed many bugs etc.
  • v1.84 fixed many bugs etc.
  • v1.83 fixed many bugs etc.
  • v1.82 not fallback to http.
  • v1.81 fixed many bugs etc.
  • v1.80 fixed many bugs etc.
  • v1.79 fixed many bugs etc.
  • v1.78 supported for SSE-C, and fixed some bugs.
  • v1.77 fixed curl ssl problems etc.
  • v1.76 fixed some bugs
  • v1.75 fixed some bugs and for MacOSX build
  • v1.74 initial version in Github, same as in googlecodes v1.74

Older version is in GoogleCodes, please refer to it for the version before v1.74.

Overview

s3fs is a FUSE filesystem that allows you to mount an Amazon S3 bucket as a local filesystem. It stores files natively and transparently in S3 (i.e., you can use other programs to access the same files). The maximum size of objects that s3fs can handle depends on Amazon S3. For example, up to 5 GB when using single PUT API. And up to 5 TB is supported when Multipart Upload API is used.

s3fs is stable and is being used in number of production environments, e.g., rsync backup to s3.

Important Note:

Your kernel must support FUSE, kernels earlier than 2.6.18-164 may not have FUSE support (see issue #140). Virtual Private Servers (VPS) may not have FUSE support compiled into their kernels.

To use it:

  1. Get an Amazon S3 account! http://aws.amazon.com/s3/
  2. Download, compile and install, (see Installation Notes or the ReadMe Install Notes)
  3. Specify your Security Credentials (Access Key ID & Secret Access Key) by one of the following methods:
  • using the passwd_file command line option
  • setting the AWSACCESSKEYID and AWSSECRETACCESSKEY environment variables
  • using a .passwd-s3fs file in your home directory
  • using the system-wide /etc/passwd-s3fs file
  1. do this:
/usr/bin/s3fs mybucket /mnt

That's it! the contents of your amazon bucket "mybucket" should now be accessible read/write in /mnt.

The s3fs password file has this format (use this format if you have only one set of credentials):

accessKeyId:secretAccessKey

If have more than one set of credentials, then you can have default credentials as specified above, but this syntax will be recognized as well:

bucketName:accessKeyId:secretAccessKey

If you want to use IAM account, you can get AccessKey/secretAccessKey pair on AWS S3 console.

Note:

The credentials files may not have lax permissions as this creates a security hole. ~/.passwd-s3fs may not have others/group permissions and /etc/passwd-s3fs may not have others permissions. Set permissions on these files accordingly:

% chmod 600 ~/.passwd-s3fs
% sudo chmod 640 /etc/passwd-s3fs

s3fs supports mode (e.g., chmod), mtime (e.g, touch) and uid/gid (chown). s3fs stores the values in x-amz-meta custom meta headers, and uses x-amz-copy-source to efficiently change them.

s3fs has a caching mechanism: You can enable local file caching to minimize downloads, e.g., :

/usr/bin/s3fs mybucket /mnt -ouse_cache=/tmp

Hosting a cvsroot on s3 works! Although you probably don't really want to do it in practice. E.g., cvs -d /s3/cvsroot init. Incredibly, mysqld also works, although I doubt you really wanna do that in practice! =)

s3fs works with rsync! (as of svn 43) as of r152 s3fs uses x-amz-copy-source for efficient update of mode, mtime and uid/gid.

s3fs will retry s3 transactions on certain error conditions. The default retry count is 2, i.e., s3fs will make 2 retries per s3 transaction (for a total of 3 attempts: 1st attempt + 2 retries) before giving up. You can set the retry count by using the "retries" option, e.g., "-oretries=2".

Utility mode

s3fs can be started as a utility mode.
This utility mode is used to delete interrupted multipart upload objects.
You can list or delete interrupted multipart upload objects as follows:

s3fs --incomplete-mpu-list(-u) bucket
s3fs --incomplete-mpu-abort[=all | =<date format>] bucket

Options

default_acl (default="private")

  • the default canned acl to apply to all written s3 objects, e.g., "private", "public-read".
    empty string means do not send header.
  • see http://aws.amazon.com/documentation/s3/ for the full list of canned acls.

umask (default="0000")

  • Sets umask for files under the mountpoint.
    This can allow users other than the mounting user to read and write to files that they did not create.

mp_umask (default="0000")

  • Sets umask for the mount point directory.
    If allow_other option is not set, s3fs allows access to the mount point only to the owner.
    In the opposite case s3fs allows access to all users as the default. But if you set the allow_other with this option, you can control the permissions of the mount point by this option like umask.

prefix (default="") (coming soon!)

  • a prefix to append to all s3 objects
  • For now, you can specify via s3fs mybucket:/path/prefix/

retries (default="5")

  • number of times to retry a failed s3 transaction

use_cache (default="" which means disabled)

  • local folder to use for local file cache

check_cache_dir_exist (default is disable)

  • If use_cache is set, check if the cache directory exists. If this option is not specified, it will be created at runtime when the cache directory does not exist.

del_cache

  • Delete local file cache when s3fs starts and exits.

storage_class (default="standard")

  • Store object with specified storage class.
    this option replaces the old option use_rrs.
    Possible values: standard, standard_ia, onezone_ia, reduced_redundancy, intelligent_tiering, and glacier.

use_rrs (default="" which means disabled)

  • use Amazon's Reduced Redundancy Storage

use_sse (default is disable)

  • not specify use_sse option
    default is SSE-DISABLE
  • "use_sse" or "use_sse=1"(old type parameter)
    uses Amazon S3-managed encryption keys
  • "use_sse=custom:'filepath'" or "use_sse='filepath'"(old type parameter)
    uses customer-provided encryption keys.
    The custom key file must be 600 permission.
    The file can have some lines, each line is one SSE-C key.
    The first line in file is used as Customer-Provided Encryption Keys for uploading and changing headers etc.
    If there are some keys after first line, those are used downloading object which are encrypted by not first key.
    So that, you can keep all SSE-C keys in file, that is SSE-C key history.
  • "use_sse=custom"
    If you specify "custom"("c") without file path, you need to set custom key by load_sse_c option or AWSSSECKEYS environment.
    (AWSSSECKEYS environment has some SSE-C keys with ":" separator.)
    This option is used to decide the SSE type.
    So that if you do not want to encrypt a object object at uploading, but you need to decrypt encrypted object at downloading, you can use load_sse_c option instead of this option.
  • "use_sse=kmsid" or "use_sse=kmsid:'kms id'"
    uses the master key which you manage in AWS KMS.
    You can use "k" for short "kmsid".
    If you can specify SSE-KMS type with your 'kms id' in AWS KMS, you can set it after "kmsid:"(or "k:").
    If you specify only "kmsid"("k"), you need to set AWSSSEKMSID environment which value is 'kms id'.
    • notice
      You must be careful about that you can not use the KMS id which is not same EC2 region.
      And your endpoints must use Secure Sockets Layer(SSL) or Transport Layer Security(TLS).

load_sse_c - specify SSE-C keys

  • Specify the custom-provided encryption keys file path for decrypting at downloading.
    If you use the custom-provided encryption key at uploading, you specify with "use_sse=custom".
    The file has many lines, one line means one custom key.
    So that you can keep all SSE-C keys in file, that is SSE-C key history.
    AWSSSECKEYS environment is as same as this file contents.

ssl_verify_hostname (default="2")

  • When 0, do not verify the SSL certificate against the hostname.

passwd_file (default="")

  • specify the path to the password file, over-rides looking for the password in in $HOME/.passwd-s3fs and /etc/passwd-s3fs

ahbe_conf (default="" which means disabled)

  • This option specifies the configuration file path which file is the additional HTTP header by file(object) extension.

profile (default="default")

  • Choose a profile from ${HOME}/.aws/credentials to authenticate against S3.
    Note that this format matches the AWS CLI format and differs from the s3fs passwd format.

public_bucket (default="" which means disabled)

  • anonymously mount a public bucket when set to 1, ignores the $HOME/.passwd-s3fs and /etc/passwd-s3fs files.
    S3 does not allow copy object api for anonymous users, then s3fs sets nocopyapi option automatically when public_bucket=1 option is specified.

bucket

  • if it is not specified bucket name (and path) in command line, must specify this option after -o option for bucket name.

no_check_certificate (default is disable)

  • Server certificate won't be checked against the available certificate authorities.

connect_timeout (default="10" seconds)

  • time to wait for connection before giving up

readwrite_timeout (default="30" seconds)

  • time to wait between read/write activity before giving up

list_object_max_keys (default="1000")

  • specify the maximum number of keys returned by S3 list object API. The default is 1000. you can set this value to 1000 or more.

max_stat_cache_size (default="100,000" entries (about 40MB))

  • maximum number of entries in the stat cache

url (default="https://s3.amazonaws.com")

  • sets the url to use to access Amazon S3. If you want to use HTTP, then you can set "url=http://s3.amazonaws.com".
    If you do not use https, please specify the URL with the url option.

stat_cache_expire (default is no expire)

  • specify expire time(seconds) for entries in the stat cache.

stat_cache_interval_expire (default is 900)

  • specify expire time(seconds) for entries in the stat cache and symbolic link cache.
    This expire time is based on the time from the last access time of those cache.
    This option is exclusive with stat_cache_expire, and is left for compatibility with older versions.

enable_noobj_cache (default is disable)

  • enable cache entries for the object which does not exist.

nodnscache

  • s3fs is always using dns cache, this option make dns cache disable.

nosscache

  • s3fs is always using SSL session cache, this option make SSL session cache disable.

nomultipart

  • disable multipart uploads.

multireq_max (default="20")

  • maximum number of parallel request for listing objects.

parallel_count (default="5")

  • number of parallel request for downloading/uploading large objects. s3fs uploads large object(over 20MB) by multipart post request, and sends parallel requests. This option limits parallel request count which s3fs requests at once.

multipart_size (default="10")

  • Part size, in MB, for each multipart request. The minimum value is 5 MB and the maximum value is 5 GB.

ensure_diskfree (default="0")

  • sets MB to ensure disk free space.
    This option means the threshold of free space size on disk which is used for the cache file by s3fs.
    s3fs makes file for downloading, uploading and caching files.
    If the disk free space is smaller than this value, s3fs do not use diskspace as possible in exchange for the performance.

singlepart_copy_limit (default="512")

  • Maximum size, in MB, of a single-part copy before trying multipart copy.

enable_content_md5 (default is disable)

  • verifying uploaded data without multipart by content-md5 header.

host (default="https://s3.amazonaws.com")

servicepath (default="/")

  • Set a service path when the non-Amazon host requires a prefix.

noxmlns

ibm_iam_endpoint (default="https://iam.bluemix.net")

  • Sets the URL to use for IBM IAM authentication.

ecs ( default is disable )

This option instructs s3fs to query the ECS container credential metadata address instead of the instance metadata address.

iam_role ( default is no role )

  • set the IAM Role that will supply the credentials from the instance meta-data. specify only IAM role name.

ibm_iam_auth ( default is not using IBM IAM authentication )

This option instructs s3fs to use IBM IAM authentication. In this mode, the AWSAccessKey and AWSSecretKey will be used as IBM's Service-Instance-ID and APIKey, respectively.

use_session_token

  • Indicate that session token should be provided.
    If credentials are provided by environment variables this switch forces presence check of AWSSESSIONTOKEN variable.
    Otherwise an error is returned.

nomixupload

  • Disable copy in multipart uploads.
    Disable to use PUT(copy api) when multipart uploading large size objects.
    By default, when doing multipart upload, the range of unchanged data will use PUT (copy api) whenever possible.
    When nocopyapi or norenameapi is specified, use of PUT(copy api) is invalidated even if this option is not specified.

nocopyapi

  • for a distributed object storage which is compatibility S3 API without PUT(copy api). If you set this option, s3fs do not use PUT with "x-amz-copy-source"(copy api).

norenameapi

  • for a distributed object storage which is compatibility S3 API without PUT(copy api). This option is a subset of nocopyapi option.

use_path_request_style

  • Enable compatibility with S3-like APIs which do not support the virtual-host request style, by using the older path request style.

noua

  • Suppress User-Agent header.
    Usually s3fs outputs of the User-Agent in "s3fs/ (commit hash ; )" format.
    If this option is specified, s3fs suppresses the output of the User-Agent.

cipher_suites

  • Customize the list of TLS cipher suites
    Expects a colon separated list of cipher suite names.
    A list of available cipher suites, depending on your TLS engine, can be found on the CURL library documentation:
    https://curl.haxx.se/docs/ssl-ciphers.html

instance_name

  • The instance name of the current s3fs mountpoint.
    This name will be added to logging messages and user agent headers sent by s3fs.

mime (default="/etc/mime.types")

  • Specify the path of the mime.types file.
    If this option is not specified, the existence of "/etc/mime.types" is checked, and that file is loaded as mime information.
    If this file does not exist on macOS, then "/etc/apache2/mime.types" is checked as well.

complement_stat

  • complement lack of file/directory mode
    s3fs complements lack of information about file/directory mode if a file or a directory object does not have x-amz-meta-mode header.
    As default, s3fs does not complements stat information for a object, then the object will not be able to be allowed to list/modify.

notsup_compat_dir

  • not support compatibility directory types
    As a default, s3fs supports objects of the directory type as much as possible and recognizes them as directories.
    Objects that can be recognized as directory objects are "dir/", "dir", "dir_$folder$", and there is a file object that does not have a directory object but contains that directory path.
    s3fs needs redundant communication to support all these directory types.
    The object as the directory created by s3fs is "dir/".
    By restricting s3fs to recognize only "dir/" as a directory, communication traffic can be reduced.
    This option is used to give this restriction to s3fs.
    However, if there is a directory object other than "dir/" in the bucket, specifying this option is not recommended.
    s3fs may not be able to recognize the object correctly if an object created by s3fs exists in the bucket.
    Please use it when the directory in the bucket is only "dir/" object.

use_wtf8 ( support arbitrary file system encoding )

S3 requires all object names to be valid utf-8. But some clients, notably Windows NFS clients, use their own encoding.
This option re-encodes invalid utf-8 object names into valid utf-8 by mapping offending codes into a 'private' codepage of the Unicode set.
Useful on clients not using utf-8 as their file system encoding.

requester_pays (default is disable)

  • This option instructs s3fs to enable requests involving Requester Pays buckets (It includes the 'x-amz-request-payer=requester' entry in the request header).

sigv2 (default is signature version 4)

  • Sets signing AWS requests by using Signature Version 2.

createbucket

  • create new bucket at starting to run s3fs
    Attempts to create a new bucket immediately after starting s3fs. If new creation is impossible, s3fs ends with error. If you can create a bucket, mount the created bucket and start s3fs normally.
    This option can not be described in fstab. If you specify this option in fstab, the mount fails because s3fs tries to create a bucket each time at mounting. Please use this when starting s3fs on the command line.

endpoint (default="us-east-1")

  • Sets the endpoint to use on signature version 4.
    If this option is not specified, s3fs uses "us-east-1" region as the default.
    If the s3fs could not connect to the region specified by this option, s3fs could not run.
    But if you do not specify this option, and if you can not connect with the default region, s3fs will retry to automatically connect to the other region.
    So s3fs can know the correct region name, because s3fs can find it in an error from the S3 server.

use_xattr (default is not handling the extended attribute)

  • Enable to handle the extended attribute (xattrs).
    If you set this option, you can use the extended attribute.
    For example, encfs and ecryptfs need to support the extended attribute.
    Notice: if s3fs handles the extended attribute, s3fs can not work to copy command with preserve=mode.

dbglevel ( default="crit" )

  • Set the debug message level. set value as crit(critical), err(error), warn(warning), info(information), dbg(debug) to debug level. default debug level is critical. If s3fs run with "-d" option, the debug level is set information. When s3fs catch the signal SIGUSR2, the debug level is bumped up.

curldbg

  • Put the debug message from libcurl when this option is specified.

set_check_cache_sigusr1 (default is disable)

  • If the cache is enabled, you can check the integrity of the cache file and the cache file's stats info file.
    This option is specified and when sending the SIGUSR1 signal to the s3fs process checks the cache status at that time.
    This option can take a file path as parameter to output the check result to that file.
    The file path parameter can be omitted. If omitted, the result will be output to stdout or syslog.

Utility mode Options

-u or --incomplete-mpu-list

Lists multipart incomplete objects uploaded to the specified bucket.

--incomplete-mpu-abort(=all or =)

Delete the multipart incomplete object uploaded to the specified bucket.
If all is specified for this option, all multipart incomplete objects will be deleted. If you specify no argument as an option, objects older than 24 hours(24H) will be deleted(This is the default value).
You can specify an optional date format. It can be specified as year, month, day, hour, minute, second, and it is expressed as Y, M, D, h, m, s respectively.
For example, 1Y6M10D12h30m30s.

Details

If enabled via "use_cache" option, s3fs automatically maintains a local cache of files in the folder specified by use_cache. Whenever s3fs needs to read or write a file on s3 it first downloads the entire file locally to the folder specified by use_cache and operates on it. When fuse release() is called, s3fs will re-upload the file to s3 if it has been changed. s3fs uses md5 checksums to minimize downloads from s3. Note: this is different from the stat cache (see below).
Local file caching works by calculating and comparing md5 checksums (ETag HTTP header).
The folder specified by use_cache is just a local cache. It can be deleted at any time. s3fs re-builds it on demand. Note: this directory grows unbounded and can fill up a file system dependent upon the bucket and reads to that bucket. Take precaution by using a quota system or routinely clearing the cache (or some other method).
s3fs supports chmod (mode) and touch (mtime) by virtue of "x-amz-meta-mode" and "x-amz-meta-mtime" custom meta headers. as of r149 s3fs uses x-amz-copy-source, this means that s3fs no longer needs to operate in a brute-force manner; much faster now (one minor performance-related corner case left to solve... /usr/bin/touch)
The stat cache stores file information in memory and can improve performance. It's default setting is to store 100,000 entries which can account for about 4 MB of memory usage. When the stat cache fills up, entries with a low hit count are deleted first. The size of the stat cache is controllable with an option.
s3fs uses /etc/mime.types to "guess" the "correct" content-type based on file name extension. This means that you can copy a website to s3 and serve it up directly from s3 with correct content-types. Unknown file types are assigned "application/octet-stream".

Important Limitations

Eventual Consistency

Due to S3's "eventual consistency" limitations file creation can and will occasionally fail. Even after a successful create subsequent reads can fail for an indeterminate time, even after one or more successful reads. Create and read enough files and you will eventually encounter this failure. This is not a flaw in s3fs and it is not something a FUSE wrapper like s3fs can work around. The retries option does not address this issue. Your application must either tolerate or compensate for these failures, for example by retrying creates or reads. For more details, see Eventual Consistency

libcurl version

s3fs runs with libcurl, then if you use libcurl with libnss, s3fs requires libcurl after version 7.21.5. If you use libcurl(with libnss) under version 7.21.5, s3fs leaks memory. You don't mind about libcurl version when libcurl linked OpenSSL library instead of libnss.

Release Notes

Older changes list is in GoogleCodes, please refer to it for the version before r501.

Limitations

  • server side copies are not possible - due to how FUSE orchestrates the low level instructions, the file must first be downloaded to the client and then uploaded to the new location

ToDo

  • permissions: using -o allow_other, even though files are owned by root 0755, another use can make changes
  • use default_permissions option?!?
  • better error logging for troubleshooting.
  • need to parse response on, say, 403 and 404 errors, etc... and log 'em!

See Also

Here is a list of other Amazon S3 filesystems:

Other tools that combine with s3fs in useful ways:

  • S3Proxy - allows applications using the S3 API to access other object stores, e.g., EMC Atmos, Microsoft Azure, OpenStack Swift