Skip to content

Latest commit

 

History

History
574 lines (432 loc) · 21.5 KB

changes.rst

File metadata and controls

574 lines (432 loc) · 21.5 KB

Changes

Version 1.7.4

  • Use python 3.6 instead of 2.7 for deploy on travis-ci. No python changes. By juarezr, 550.

Version 1.7.3

  • Fixed SQLAlchemy 1.4 removed the Engine.contextual_connect method By juarezr, 545.
  • How to use convert with custom function and reference row By javidy, 542.

Version 1.7.2

  • Allow aggregation over the entire table (without a key) By bmaggard, 541.
  • Allow specifying output field name for simple aggregation By bmaggard, 370.
  • Bumped version of package dependency on lxml from 4.4.0 to 4.6.2 By juarezr, 536.

Version 1.7.1

  • Fixing conda packaging failures. By juarezr, 534.

Version 1.7.0

  • Added toxml() as convenience wrapper over totext(). By juarezr, 529.
  • Document behavior of multi-field convert-with-row. By chrullrich, 532.
  • Allow user defined sources from fsspec for remote I/O <io_remotes>. By juarezr, 533.

Version 1.6.8

  • Allow using a custom/restricted xml parser in fromxml(). By juarezr, 527.

Version 1.6.7

  • Reduced memory footprint for JSONL files, huge improvement. By fahadsiddiqui, 522.

Version 1.6.6

  • Added python version 3.8 and 3.9 to tox.ini for using in newer distros. By juarezr, 517.
  • Fixed compatibility with python3.8 in petl.timings.clock(). By juarezr, 484.
  • Added json lines support in fromjson(). By fahadsiddiqui, 521.

Version 1.6.5

  • Fixed fromxlsx() with read_only crashes. By juarezr, 514.

Version 1.6.4

  • Fixed exception when writing to S3 with fsspec auto_mkdir=True. By juarezr, 512.

Version 1.6.3

  • Allowed reading and writing Excel files in remote sources. By juarezr, 506.
  • Allow toxlsx() to add or replace a worksheet. By churlrich, 502.
  • Improved avro: improve message on schema or data mismatch. By juarezr, 507.
  • Fixed build for failed test case. By juarezr, 508.

Version 1.6.2

  • Fixed boolean type detection in toavro(). By juarezr, 504.
  • Fix unavoidable warning if fsspec is installed but some optional package is not installed. By juarezr, 503.

Version 1.6.1

  • Added extras_require for the petl pip package. By juarezr, 501.
  • Fix unavoidable warning if fsspec is not installed. By juarezr, 500.

Version 1.6.0

  • Added class petl.io.remotes.RemoteSource using package fsspec for reading and writing files in remote servers by using the protocol in the url for selecting the implementation. By juarezr, 494.
  • Removed classes petl.io.source.s3.S3Source as it's handled by fsspec By juarezr, 494.
  • Removed classes petl.io.codec.xz.XZCodec, petl.io.codec.xz.LZ4Codec and petl.io.codec.zstd.ZstandardCodec as it's handled by fsspec. By juarezr, 494.
  • Fix bug in connection to a JDBC database using jaydebeapi. By miguelosana, 497.

Version 1.5.0

  • Added functions petl.io.sources.register_reader and petl.io.sources.register_writer for registering custom source helpers for hanlding I/O from remote protocols. By juarezr, 491.
  • Added function petl.io.sources.register_codec for registering custom helpers for compressing and decompressing files with other algorithms. By juarezr, 491.
  • Added classes petl.io.codec.xz.XZCodec, petl.io.codec.xz.LZ4Codec and petl.io.codec.zstd.ZstandardCodec for compressing files with XZ and the "state of art" LZ4 and Zstandard algorithms. By juarezr, 491.
  • Added classes petl.io.source.s3.S3Source and petl.io.source.smb.SMBSource reading and writing files to remote servers using int url the protocols s3:// and smb://. By juarezr, 491.

Version 1.4.0

  • Added functions petl.io.avro.fromavro, petl.io.avro.toavro, and petl.io.avro.appendavro for reading and writing to Apache Avro <https://avro.apache.org/docs/current/spec.html> files. Avro generally is faster and safer than text formats like Json, XML or CSV. By juarezr, 490.

Version 1.3.0

Note

The parameters to the petl.io.xlsx.fromxlsx function have changed in this release. The parameters row_offset and col_offset are no longer supported. Please use min_row, min_col, max_row and max_col instead.

  • A new configuration option failonerror has been added to the petl.config module. This option affects various transformation functions including petl.transform.conversions.convert, petl.transform.maps.fieldmap, petl.transform.maps.rowmap and petl.transform.maps.rowmapmany. The option can have values True (raise any exceptions encountered during conversion), False (silently use a given errorvalue if any exceptions arise during conversion) or "inline" (use any exceptions as the output value). The default value is False which maintains compatibility with previous releases. By bmaggard, 460, 406, 365.
  • A new function petl.util.timing.log_progress has been added, which behaves in a similar way to petl.util.timing.progress but writes to a Python logger. By dusktreader, 408, 407.
  • Added new function petl.transform.regex.splitdown for splitting a value into multiple rows. By John-Dennert, 430, 386.
  • Added new function petl.transform.basics.addfields to add multiple new fields at a time. By mjumbewu, 417.
  • Pass through keyword arguments to xlrd.open_workbook. By gjunqueira, 470, 473.
  • Added new function petl.io.xlsx.appendxlsx. By victormpa and alimanfoo, 424, 475.
  • Fixes for upstream API changes in openpyxl and intervaltree modules. N.B., the arguments to petl.io.xlsx.fromxlsx have changed for specifying row and column offsets to match openpyxl. (472 - alimanfoo).
  • Exposed read_only argument in petl.io.xlsx.fromxlsx and set default to False to prevent truncation of files created by LibreOffice. By mbelmadani, 457.
  • Added support for reading from remote sources with gzip or bz2 compression (463 - H-Max).
  • The function petl.transform.dedup.distinct has been fixed for the case where None values appear in the table. By bmaggard, 414, 412.
  • Changed keyed sorts so that comparisons are only by keys. By DiegoEPaez, 466.
  • Documentation improvements by gamesbook (458).

Version 1.2.0

Please note that this version drops support for Python 2.6 (443, 444 - hugovk).

  • Function petl.transform.basics.addrownumbers now supports a "field" argument to allow specifying the name of the new field to be added (366, 367 - thatneat).
  • Fix to petl.io.xlsx.fromxslx to ensure that the underlying workbook is closed after iteration is complete (387 - mattkatz).
  • Resolve compatibility issues with newer versions of openpyxl (393, 394 - henryrizzi).
  • Fix deprecation warnings from openpyxl (447, 445 -scardine; 449 - alimanfoo).
  • Changed exceptions to use standard exception classes instead of ArgumentError (396 - bmaggard).
  • Add support for non-numeric quoting in CSV files (377, 378
    • vilos).
  • Fix bug in handling of mode in MemorySource (403 - bmaggard).
  • Added a get() method to the Record class (401, 402 -dusktreader).
  • Added ability to make constraints optional, i.e., support validation on optional fields (399, 400 - dusktreader).
  • Added support for CSV files without a header row (421 -LupusUmbrae).
  • Documentation fixes (379 - DeanWay; 381 -PabloCastellano).

Version 1.1.0

  • Fixed petl.transform.reshape.melt to work with non-string key argument (#209).
  • Added example to docstring of petl.transform.dedup.conflicts to illustrate how to analyse the source of conflicts when rows are merged from multiple tables (#256).
  • Added functions for working with bcolz ctables, see petl.io.bcolz (#310).
  • Added petl.io.base.fromcolumns (#316).
  • Added petl.transform.reductions.groupselectlast. (#319).
  • Added example in docstring for petl.io.sources.MemorySource (#323).
  • Added function petl.transform.basics.stack as a simpler alternative to petl.transform.basics.cat. Also behaviour of petl.transform.basics.cat has changed for tables where the header row contains duplicate fields. This was part of addressing a bug in petl.transform.basics.addfield for tables where the header contains duplicate fields (#327).
  • Change in behaviour of petl.io.json.fromdicts to preserve ordering of keys if ordered dicts are used. Also added petl.transform.headers.sortheader to deal with unordered cases (#332).
  • Added keyword strict to functions in the petl.transform.setops module to enable users to enforce strict set-like behaviour if desired (#333).
  • Added epilogue argument to petl.util.vis.display to enable further customisation of content of table display in Jupyter notebooks (#337).
  • Added petl.transform.selects.biselect as a convenience for obtaining two tables, one with rows matching a condition, the other with rows not matching the condition (#339).
  • Changed petl.io.json.fromdicts to avoid making two passes through the data (#341).
  • Changed petl.transform.basics.addfieldusingcontext to enable running calculations (#343).
  • Fix behaviour of join functions when tables have no non-key fields (#345).
  • Fix incorrect default value for 'errors' argument when using codec module (#347).
  • Added some documentation on how to write extension classes, see intro (#349).
  • Fix issue with unicode field names (#350).

Version 1.0

Version 1.0 is a new major release of petl. The main purpose of version 1.0 is to introduce support for Python 3.4, in addition to the existing support for Python 2.6 and 2.7. Much of the functionality available in petl versions 0.x has remained unchanged in version 1.0, and most existing code that uses petl should work unchanged with version 1.0 or with minor changes. However there have been a number of API changes, and some functionality has been migrated from the petlx package, described below.

If you have any questions about migrating to version 1.0 or find any problems or issues please email python-etl@googlegroups.com.

Text file encoding

Version 1.0 unifies the API for working with ASCII and non-ASCII encoded text files, including CSV and HTML.

The following functions now accept an 'encoding' argument, which defaults to the value of locale.getpreferredencoding() (usually 'utf-8'): fromcsv, tocsv, appendcsv, teecsv, fromtsv, totsv, appendtsv, teetsv, fromtext, totext, appendtext, tohtml, teehtml.

The following functions have been removed as they are now redundant: fromucsv, toucsv, appenducsv, teeucsv, fromutsv, toutsv, appendutsv, teeutsv, fromutext, toutext, appendutext, touhtml, teeuhtml.

To migrate code, in most cases it should be possible to simply replace 'fromucsv' with 'fromcsv', etc.

pelt.fluent and petl.interactive

The functionality previously available through the petl.fluent and petl.interactive modules is now available through the root petl module.

This means two things.

First, is is now possible to use either functional or fluent (i.e., object-oriented) styles of programming with the root petl module, as described in introductory section on intro_programming_styles.

Second, the default representation of table objects uses the petl.util.vis.look function, so you can simply return a table from the prompt to inspect it, as described in the introductory section on intro_interactive_use.

The petl.fluent and petl.interactive modules have been removed as they are now redundant.

To migrate code, it should be possible to simply replace "import petl.fluent as etl" or "import petl.interactive as etl" with "import petl as etl".

Note that the automatic caching behaviour of the petl.interactive module has not been retained. If you want to enable caching behaviour for a particular table, make an explicit call to the petl.util.materialise.cache function. See also intro_caching.

IPython notebook integration

In version 1.0 petl table container objects implement _repr_html_() so can be returned from a cell in an IPython notebook and will automatically format as an HTML table.

Also, the petl.util.vis.display and petl.util.vis.displayall functions have been migrated across from the petlx.ipython package. If you are working within the IPython notebook these functions give greater control over how tables are rendered. For some examples, see:

http://nbviewer.ipython.org/github/petl-developers/petl/blob/v1.0/repr_html.ipynb

Database extract/load functions

The petl.io.db.todb function now supports automatic table creation, inferring a schema from data in the table to be loaded. This functionality has been migrated across from the petlx package, and requires SQLAlchemy to be installed.

The functions fromsqlite3, tosqlite3 and appendsqlite3 have been removed as they duplicate functionality available from the existing functions petl.io.db.fromdb, petl.io.db.todb and petl.io.db.appenddb. These existing functions have been modified so that if a string is provided as the dbo argument it is interpreted as the name of an sqlite3 file. It should be possible to migrate code by simply replacing 'fromsqlite3' with 'fromdb', etc.

Other functions removed or renamed

The following functions have been removed because they are overly complicated and/or hardly ever used. If you use any of these functions and would like to see them re-instated then please email python-etl@googlegroups.com: rangefacet, rangerowreduce, rangeaggregate, rangecounts, multirangeaggregate, lenstats.

The following functions were marked as deprecated in petl 0.x and have been removed in version 1.0: dataslice (use data instead), fieldconvert (use convert instead), fieldselect (use select instead), parsenumber (use numparser instead), recordmap (use rowmap instead), recordmapmany (use rowmapmany instead), recordreduce (use rowreduce instead), recordselect (use rowselect instead), valueset (use table.values(‘foo’).set() instead).

The following functions are no longer available in the root petl namespace, but are still available from a subpackage if you really need them: iterdata (use data instead), iterdicts (use dicts instead), iternamedtuples (use namedtuples instead), iterrecords (use records instead), itervalues (use values instead).

The following functions have been renamed: isordered (renamed to issorted), StringSource (renamed to MemorySource).

The function selectre has been removed as it duplicates functionality, use search instead.

Sorting and comparison

A major difference between Python 2 and Python 3 involves comparison and sorting of objects of different types. Python 3 is a lot stricter about what you can compare with what, e.g., None < 1 < 'foo' works in Python 2.x but raises an exception in Python 3. The strict comparison behaviour of Python 3 is generally a problem for typical usages of petl, where data can be highly heterogeneous and a column in a table may have a mixture of values of many different types, including None for missing.

To maintain the usability of petl in this type of scenario, and to ensure that the behaviour of petl is as consistent as possible across different Python versions, the petl.transform.sorts.sort function and anything that depends on it (as well as any other functions making use of rich comparisons) emulate the relaxed comparison behaviour that is available under Python 2.x. In fact petl goes further than this, allowing comparison of a wider range of types than is possible under Python 2.x (e.g., datetime with None).

As the underlying code to achieve this has been completely reworked, there may be inconsistencies or unexpected behaviour, so it's worth testing carefully the results of any code previously run using petl 0.x, especially if you are also migrating from Python 2 to Python 3.

The different comparison behaviour under different Python versions may also give unexpected results when selecting rows of a table. E.g., the following will work under Python 2.x but raise an exception under Python 3.4:

>>> import petl as etl
>>> table = [['foo', 'bar'],
...          ['a', 1],
...          ['b', None]]
>>> # raises exception under Python 3
... etl.select(table, 'bar', lambda v: v > 0)

To get the more relaxed behaviour under Python 3.4, use the petl.transform.selects.selectgt function, or wrap values with petl.comparison.Comparable, e.g.:

>>> # works under Python 3
... etl.selectgt(table, 'bar', 0)
+-----+-----+
| foo | bar |
+=====+=====+
| 'a' |   1 |
+-----+-----+

>>> # or ...
... etl.select(table, 'bar', lambda v: v > etl.Comparable(0))
+-----+-----+
| foo | bar |
+=====+=====+
| 'a' |   1 |
+-----+-----+

New extract/load modules

Several new extract/load modules have been added, migrating functionality previously available from the petlx package:

  • io_xls
  • io_xlsx
  • io_numpy
  • io_pandas
  • io_pytables
  • io_whoosh

These modules all have dependencies on third party packages, but these have been kept as optional dependencies so are not required for installing petl.

New validate function

A new petl.transform.validation.validate function has been added to provide a convenient interface when validating a table against a set of constraints.

New intervals module

A new module has been added providing transformation functions based on intervals, migrating functionality previously available from the petlx package:

  • transform_intervals

This module requires the intervaltree module.

New configuration module

All configuration variables have been brought together into a new petl.config module. See the source code for the variables available, they should be self-explanatory.

petl.push moved to petlx

The petl.push module remains in an experimental state and has been moved to the petlx extensions project.

Argument names and other minor changes

Argument names for a small number of functions have been changed to create consistency across the API.

There are some other minor changes as well. If you are migrating from petl version 0.x the best thing is to run your code and inspect any errors. Email python-etl@googlegroups.com if you have any questions.

Source code reorganisation

The source code has been substantially reorganised. This should not affect users of the petl package however as all functions in the public API are available through the root petl namespace.