diff --git a/README.rst b/README.rst
index fe6006c9..7d382afc 100644
--- a/README.rst
+++ b/README.rst
@@ -12,11 +12,25 @@ smart_open — utils for streaming large files in Python
What?
=====
-``smart_open`` is a Python 2 & Python 3 library for **efficient streaming of very large files** from/to S3, HDFS, WebHDFS, HTTP, or local storage. It supports transparent, on-the-fly (de-)compression for a variety of different formats.
+``smart_open`` is a Python 2 & Python 3 library for **efficient streaming of very large files** from/to storages such as S3, HDFS, WebHDFS, HTTP, HTTPS, or local filesystem. It supports transparent, on-the-fly (de-)compression for a variety of different formats.
``smart_open`` is a drop-in replacement for Python's built-in ``open()``: it can do anything ``open`` can (100% compatible, falls back to native ``open`` wherever possible), plus lots of nifty extra stuff on top.
-``smart_open`` is well-tested, well-documented, and has a simple, Pythonic API:
+
+Why?
+====
+
+Working with large remote files, for example using Amazon's `boto `_ and `boto3 `_ Python library, is a pain.
+``boto``'s ``key.set_contents_from_string()`` and ``key.get_contents_as_string()`` methods only work for small files, because they're loaded fully into RAM, no streaming.
+There are nasty hidden gotchas when using ``boto``'s multipart upload functionality that is needed for large files, and a lot of boilerplate.
+
+``smart_open`` shields you from that. It builds on boto3 and other remote storage libraries, but offers a **clean unified Pythonic API**. The result is less code for you to write and fewer bugs to make.
+
+
+How?
+=====
+
+``smart_open`` is well-tested, well-documented, and has a simple Pythonic API:
.. _doctools_before_examples:
@@ -61,7 +75,7 @@ What?
... break
'\n'
-Other examples of URLs that ``smart_open`` accepts::
+Other examples of URIs that ``smart_open`` accepts::
s3://my_bucket/my_key
s3://my_key:my_secret@my_bucket/my_key
@@ -80,6 +94,13 @@ Other examples of URLs that ``smart_open`` accepts::
.. _doctools_after_examples:
+
+Documentation
+=============
+
+Built-in help
+-------------
+
For detailed API info, see the online help:
.. code-block:: python
@@ -88,7 +109,8 @@ For detailed API info, see the online help:
or click `here `__ to view the help in your browser.
-More examples:
+More examples
+-------------
.. code-block:: python
@@ -134,29 +156,6 @@ More examples:
with open('s3://bucket/key.txt', 'wb', transport_params=transport_params) as fout:
fout.write(b'here we stand')
-Why?
-----
-
-Working with large S3 files using Amazon's default Python library, `boto `_ and `boto3 `_, is a pain.
-Its ``key.set_contents_from_string()`` and ``key.get_contents_as_string()`` methods only work for small files (loaded in RAM, no streaming).
-There are nasty hidden gotchas when using ``boto``'s multipart upload functionality that is needed for large files, and a lot of boilerplate.
-
-``smart_open`` shields you from that. It builds on boto3 but offers a cleaner, Pythonic API. The result is less code for you to write and fewer bugs to make.
-
-Installation
-------------
-::
-
- pip install smart_open
-
-Or, if you prefer to install from the `source tar.gz `_::
-
- python setup.py test # run unit tests
- python setup.py install
-
-To run the unit tests (optional), you'll also need to install `mock `_ , `moto `_ and `responses `_ (``pip install mock moto responses``).
-The tests are also run automatically with `Travis CI `_ on every commit push & pull request.
-
Supported Compression Formats
-----------------------------
@@ -185,6 +184,7 @@ For 2.7, use `backports.lzma`_.
.. _backports.lzma: https://pypi.org/project/backports.lzma/
+
Transport-specific Options
--------------------------
@@ -260,6 +260,52 @@ Since going over all (or select) keys in an S3 bucket is a very common operation
annual/monthly_rain/2012.monthly_rain.nc 13
+File-like Binary Streams
+------------------------
+
+The ``open`` function also accepts file-like objects.
+This is useful when you already have a `binary file `_ open, and would like to wrap it with transparent decompression:
+
+
+.. code-block:: python
+
+ >>> import io, gzip
+ >>>
+ >>> # Prepare some gzipped binary data in memory, as an example.
+ >>> # Note that any binary file will do; we're using BytesIO here for simplicity.
+ >>> buf = io.BytesIO()
+ >>> with gzip.GzipFile(fileobj=buf, mode='w') as fout:
+ ... fout.write(b'this is a bytestring')
+ >>> buf.seek(0)
+ >>>
+ >>> # Use case starts here.
+ >>> buf.name = 'file.gz' # add a .name attribute so smart_open knows what compressor to use
+ >>> import smart_open
+ >>> smart_open.open(buf, 'rb').read() # will gzip-decompress transparently!
+ b'this is a bytestring'
+
+
+In this case, ``smart_open`` relied on the ``.name`` attribute of our `binary I/O stream `_ ``buf`` object to determine which decompressor to use.
+If your file object doesn't have one, set the ``.name`` attribute to an appropriate value.
+Furthermore, that value has to end with a **known** file extension (see the ``register_compressor`` function).
+Otherwise, the transparent decompression will not occur.
+
+
+Installation
+============
+::
+
+ pip install smart_open
+
+Or, if you prefer to install from the `source tar.gz `_::
+
+ python setup.py test # run unit tests
+ python setup.py install
+
+To run the unit tests (optional), you'll also need to install `mock `_ , `moto `_ and `responses `_ (``pip install mock moto responses``).
+The tests are also run automatically with `Travis CI `_ on every commit push & pull request.
+
+
Migrating to the new ``open`` function
--------------------------------------
@@ -294,13 +340,17 @@ Before:
.. code-block:: python
+ >>> import smart_open
>>> smart_open.smart_open('s3://commoncrawl/robots.txt').read(32) # 'rb' used to be default
+ b'User-Agent: *\nDisallow: /'
After:
.. code-block:: python
+ >>> import smart_open
>>> smart_open.open('s3://commoncrawl/robots.txt', 'rb').read(32)
+ b'User-Agent: *\nDisallow: /'
The ``ignore_extension`` keyword parameter is now called ``ignore_ext``.
It behaves identically otherwise.
@@ -312,7 +362,7 @@ transport layer, e.g. HTTP, S3, etc. The old function accepted these directly:
>>> url = 's3://smart-open-py37-benchmark-results/test.txt'
>>> session = boto3.Session(profile_name='smart_open')
- >>> smart_open(url, 'r', session=session).read(32)
+ >>> smart_open.smart_open(url, 'r', session=session).read(32)
'first line\nsecond line\nthird lin'
The new function accepts a ``transport_params`` keyword argument. It's a dict.
@@ -335,14 +385,14 @@ Removed parameters:
- ``profile_name``
**The profile_name parameter has been removed.**
-Pass an entire boto3.Session object instead.
+Pass an entire ``boto3.Session`` object instead.
Before:
.. code-block:: python
>>> url = 's3://smart-open-py37-benchmark-results/test.txt'
- >>> smart_open(url, 'r', profile_name='smart_open').read(32)
+ >>> smart_open.smart_open(url, 'r', profile_name='smart_open').read(32)
'first line\nsecond line\nthird lin'
After:
@@ -361,7 +411,7 @@ If you pass an invalid parameter name, the ``smart_open.open`` function will war
Keep an eye on your logs for WARNING messages from ``smart_open``.
Comments, bug reports
----------------------
+=====================
``smart_open`` lives on `Github `_. You can file
issues or pull requests there. Suggestions, pull requests and improvements welcome!