Skip to content

6.0.0

Compare
Choose a tag to compare
@mpenkov mpenkov released this 24 Apr 06:40
· 102 commits to develop since this release

6.0.0, 24 April 2022

This release deprecates the old ignore_ext parameter.
Use the compression parameter instead.

fin = smart_open.open("/path/file.gz", ignore_ext=True)  # 🚫 No
fin = smart_open.open("/path/file.gz", compression="disable")  # Yes

fin = smart_open.open("/path/file.gz", ignore_ext=False)  # 🚫 No
fin = smart_open.open("/path/file.gz")  # Yes
fin = smart_open.open("/path/file.gz", compression="infer_from_extension")  # Yes, if you want to be explicit

fin = smart_open.open("/path/file", compression=".gz")  # Yes
  • Make Python 3.7 the required minimum (PR #688, @mpenkov)
  • Drop deprecated ignore_ext parameter (PR #661, @mpenkov)
  • Drop support for passing buffers to smart_open.open (PR #660, @mpenkov)
  • Support working directly with file descriptors (PR #659, @mpenkov)
  • Added support for viewfs:// URLs (PR #665, @ChandanChainani)
  • Fix AttributeError when reading passthrough zstandard (PR #658, @mpenkov)
  • Make UploadFailedError picklable (PR #689, @birgerbr)
  • Support container client and blob client for azure blob storage (PR #652, @cbare)
  • Pin google-cloud-storage to >=1.31.1 in extras (PR #687, @PLPeeters)
  • Expose certain transport-specific methods e.g. to_boto3 in top layer (PR #664, @mpenkov)
  • Use pytest instead of parameterizedtestcase (PR #657, @mpenkov)

5.2.1, 28 August 2021

5.2.0, 18 August 2021

5.1.0, 25 May 2021

This release introduces a new top-level parameter: compression.
It controls compression behavior and partially overlaps with the old ignore_ext parameter.
For details, see the README.rst file.
You may continue to use ignore_ext parameter for now, but it will be deprecated in the next major release.

5.0.0, 30 Mar 2021

This release modifies the handling of transport parameters for the S3 back-end in a backwards-incompatible way.
See the migration docs for details.

  • Refactor S3, replace high-level resource/session API with low-level client API (PR #583, @mpenkov)
  • Fix potential infinite loop when reading from webhdfs (PR #597, @traboukos)
  • Add timeout parameter for http/https (PR #594, @dustymugs)
  • Remove tests directory from package (PR #589, @e-nalepa)

4.2.0, 15 Feb 2021

  • Support tell() for text mode write on s3/gcs/azure (PR #582, @markopy)
  • Implement option to use a custom buffer during S3 writes (PR #547, @mpenkov)

4.1.2, 18 Jan 2021

  • Correctly pass boto3 resource to writers (PR #576, @jackluo923)
  • Improve robustness of S3 reading (PR #552, @mpenkov)
  • Replace codecs with TextIOWrapper to fix newline issues when reading text files (PR #578, @markopy)

4.1.0, 30 Dec 2020

  • Refactor s3 submodule to minimize resource usage (PR #569, @mpenkov)
  • Change download_as_string to download_as_bytes in gcs submodule (PR #571, @alexandreyc)

4.0.1, 27 Nov 2020

  • Exclude requests from install_requires dependency list.
    If you need it, use pip install smart_open[http] or pip install smart_open[webhdfs].

4.0.0, 24 Nov 2020

  • Fix reading empty file or seeking past end of file for s3 backend (PR #549, @jcushman)
  • Fix handling of rt/wt mode when working with gzip compression (PR #559, @mpenkov)
  • Bump minimum Python version to 3.6 (PR #562, @mpenkov)

3.0.0, 8 Oct 2020

This release modifies the behavior of setup.py with respect to dependencies.
Previously, boto3 and other AWS-related packages were installed by default.
Now, in order to install them, you need to run either:

pip install smart_open[s3]

to install the AWS dependencies only, or

pip install smart_open[all]

to install all dependencies, including AWS, GCS, etc.

2.2.1, 1 Oct 2020

  • Include S3 dependencies by default, because removing them in the 2.2.0 minor release was a mistake.

2.2.0, 25 Sep 2020

This release modifies the behavior of setup.py with respect to dependencies.
Previously, boto3 and other AWS-related packages were installed by default.
Now, in order to install them, you need to run either:

pip install smart_open[s3]

to install the AWS dependencies only, or

pip install smart_open[all]

to install all dependencies, including AWS, GCS, etc.

Summary of changes:

  • Correctly pass newline parameter to built-in open function (PR #478, @burkovae)
  • Remove boto as a dependency (PR #523, @isobit)
  • Performance improvement: avoid redundant GetObject API queries in s3.Reader (PR #495, @jcushman)
  • Support installing smart_open without AWS dependencies (PR #534, @justindujardin)
  • Take object version into account in to_boto3 method (PR #539, @interpolatio)

Deprecations

Functionality on the left hand side will be removed in future releases.
Use the functions on the right hand side instead.

  • smart_open.s3_iter_bucketsmart_open.s3.iter_bucket

2.1.1, 27 Aug 2020

  • Bypass unnecessary GCS storage.buckets.get permission (PR #516, @gelioz)
  • Allow SFTP connection with SSH key (PR #522, @rostskadat)

2.1.0, 1 July 2020

2.0.0, 27 April 2020, "Python 3"

  • This version supports Python 3 only (3.5+).
    • If you still need Python 2, install the smart_open==1.10.1 legacy release instead.
  • Prevent smart_open from writing to logs on import (PR #476, @mpenkov)
  • Modify setup.py to explicitly support only Py3.5 and above (PR #471, @Amertz08)
  • Include all the test_data in setup.py (PR #473, @sikuan)

1.10.1, 26 April 2020

  • This is the last version to support Python 2.7. Versions 1.11 and above will support Python 3 only.
  • Use only if you need Python 2.

1.11.1, 8 Apr 2020

  • Add missing boto dependency (Issue #468)

1.11.0, 8 Apr 2020

Starting with this release, you will have to run:

pip install smart_open[gcs] to use the GCS transport.

In the future, all extra dependencies will be optional. If you want to continue installing all of them, use:

pip install smart_open[all]

See the README.rst for details.

1.10.0, 16 Mar 2020

1.9.0, 3 Nov 2019

1.8.4, 2 Jun 2019

1.8.3, 26 April 2019

1.8.2, 17 April 2019

  • Removed dependency on lzma (PR #262, @tdhopper)
  • backward compatibility fixes (PR #294, @mpenkov)
  • Minor fixes (PR #291, @mpenkov)
  • Fix #289: the smart_open package now correctly exposes a __version__ attribute
  • Fix #285: handle edge case with question marks in an S3 URL

This release rolls back support for transparently decompressing .xz files,
previously introduced in 1.8.1. This is a useful feature, but it requires a
tricky dependency. It's still possible to handle .xz files with relatively
little effort. Please see the
README.rst
file for details.

1.8.1, 6 April 2019

smart_open.open

This new function replaces smart_open.smart_open, which is now deprecated.
Main differences:

  • ignore_extension → ignore_ext
  • new transport_params dict parameter to contain keyword parameters for the transport layer (S3, HTTPS, HDFS, etc).

Main advantages of the new function:

  • Simpler interface for the user, less parameters
  • Greater API flexibility: adding additional keyword arguments will no longer require updating the top-level interface
  • Better documentation for keyword parameters (previously, they were documented via examples only)

The old smart_open.smart_open function is deprecated, but continues to work as previously.

1.8.0, 17th January 2019

1.7.1, 18th September 2018

1.7.0, 18th September 2018

1.6.0, 29th June 2018

  • Migrate to boto3. Fix #43 (PR #164, @mpenkov)
  • Refactoring smart_open to share compression and encoding functionality (PR #185, @mpenkov)
  • Drop python2.6 compatibility. Fix #156 (PR #192, @mpenkov)
  • Accept a custom boto3.Session instance (support STS AssumeRole). Fix #130, #149, #199 (PR #201, @eschwartz)
  • Accept multipart_upload parameters (supports ServerSideEncryption) for S3. Fix (PR #202, @eschwartz)
  • Add support for pathlib.Path. Fix #170 (PR #175, @clintval)
  • Fix performance regression using local file-system. Fix #184 (PR #190, @mpenkov)
  • Replace ParsedUri class with functions, cleanup internal argument parsing (PR #191, @mpenkov)
  • Handle edge case (read 0 bytes) in read function. Fix #171 (PR #193, @mpenkov)
  • Fix bug with changing f._current_pos when call f.readline() (PR #182, @inksink)
  • Сlose the old body explicitly after seek for S3. Fix #187 (PR #188, @inksink)

1.5.7, 18th March 2018

  • Fix author/maintainer fields in setup.py, avoid bug from setuptools==39.0.0 and add workaround for botocore and python==3.3. Fix #176 (PR #178 & #177, @menshikh-iv & @baldwindc)

1.5.6, 28th December 2017

1.5.5, 6th December 2017

1.5.4, 30th November 2017

1.5.3, 18th May 2017

1.5.2, 12th Apr 2017

1.5.1, 16th Mar 2017

1.5.0, 14th Mar 2017

1.4.0, 13th Feb 2017

  • HdfsOpenWrite implementation similar to read (PR #106, @skibaa)
  • Support custom S3 server host, port, ssl. (PR #101, @robottwo)
  • Add retry around s3_iter_bucket_process_key to address S3 Read Timeout errors. (PR #96, @bbbco)
  • Include tests data in sdist + install them. (PR #105, @cournape)

1.3.5, 5th October 2016

- Add MANIFEST.in required for conda-forge recip (PR #90, @tmylk)

1.3.4, 26th August 2016

  • Relative path support (PR #73, @yupbank)
  • Move gzipstream module to smart_open package (PR #81, @mpenkov)
  • Ensure reader objects never return None (PR #81, @mpenkov)
  • Ensure read functions never return more bytes than asked for (PR #84, @mpenkov)
  • Add support for reading gzipped objects until EOF, e.g. read() (PR #81, @mpenkov)
  • Add missing parameter to read_from_buffer call (PR #84, @mpenkov)
  • Add unit tests for gzipstream (PR #84, @mpenkov)
  • Bundle gzipstream to enable streaming of gzipped content from S3 (PR #73, @mpenkov)
  • Update gzipstream to avoid deep recursion (PR #73, @mpenkov)
  • Implemented readline for S3 (PR #73, @mpenkov)
  • Added pip requirements.txt (PR #73, @mpenkov)
  • Invert NO_MULTIPROCESSING flag (PR #79, @Janrain-Colin)
  • Add ability to add query to webhdfs uri. (PR #78, @ellimilial)

1.3.3, 16th May 2016

  • Accept an instance of boto.s3.key.Key to smart_open (PR #38, @asieira)
  • Allow passing encrypt_key and other parameters to initiate_multipart_upload (PR #63, @asieira)
  • Allow passing boto host and profile_name to smart_open (PR #71 #68, @robcowie)
  • Write an empty key to S3 even if nothing is written to S3OpenWrite (PR #61, @petedmarsh)
  • Support LC_ALL=C environment variable setup (PR #40, @nikicc)
  • Python 3.5 support

1.3.2, 3rd January 2016

  • Bug fix release to enable 'wb+' file mode (PR #50)

1.3.1, 18th December 2015

  • Disable multiprocessing if unavailable. Allows to run on Google Compute Engine. (PR #41, @nikicc)
  • Httpretty updated to allow LC_ALL=C locale config. (PR #39, @jsphpl)
  • Accept an instance of boto.s3.key.Key (PR #38, @asieira)

1.3.0, 19th September 2015

  • WebHDFS read/write (PR #29, @ziky90)
  • re-upload last S3 chunk in failed upload (PR #20, @andreycizov)
  • return the entire key in s3_iter_bucket instead of only the key name (PR #22, @salilb)
  • pass optional keywords on S3 write (PR #30, @val314159)
  • smart_open a no-op if passed a file-like object with a read attribute (PR #32, @gojomo)
  • various improvements to testing (PR #30, @val314159)

1.1.0, 1st February 2015

  • support for multistream bzip files (PR #9, @pombredanne)
  • introduce this CHANGELOG