Skip to content

Commit

Permalink
Merge pull request #468 from sciris/rc2.2.0
Browse files Browse the repository at this point in the history
Version 2.2.0
  • Loading branch information
cliffckerr committed Apr 14, 2023
2 parents 6875f42 + 7294a3b commit b96a663
Show file tree
Hide file tree
Showing 45 changed files with 4,761 additions and 2,342 deletions.
102 changes: 102 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,108 @@ All major updates to Sciris are documented here.

By import convention, components of the Sciris library are listed beginning with ``sc.``, e.g. ``sc.odict()``.


Version 2.2.0 (2023-04-14)
--------------------------

This version's major changes include:

#. **New Parallel class**: A new ``sc.Parallel()`` class allows finer-grained managing of parallel processes, including automatic progress bars, better exception handling, and asynchronous running.
#. **Better versioning**: New functions ``sc.metadata()``, ``sc.savewithmetadata()``, and ``sc.loadwithmetadata()`` make it easier to store and save metadata along with objects.
#. **Easier imports**: ``sc.importbypath()`` lets you load a module into Python by providing the folder or filename (useful for loading one-off scripts, or two versions of the same library).
#. **Better documentation**: A comprehensive set of tutorials has been added to the documentation, and the documentation has been rewritten in a new style.


Improvements and new features
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Parallelization
^^^^^^^^^^^^^^^
#. There is a new ``sc.Parallel`` class, which is used to implement the (more or less unchanged) ``sc.parallelize()`` function.
#. ``sc.parallelize()`` now has a ``progress`` argument that will show a progress bar; the ``returnpool`` argument has been removed (use ``sc.Parallel`` instead).


Dataframe
^^^^^^^^^
#. Better handling of (and preservation) of ``dtypes`` for dataframe columns, including a new ``df.set_dtypes()`` method.
#. Better implementation of underlying logic, leading to significant performance increases in some cases (e.g., iteratively appending rows).

Time/date
^^^^^^^^^
#. New ``sc.time()`` alias for ``time.time()``.
#. Support for ``pandas`` and ``Numpy`` datetime objects.
#. New ``sc.timer`` attributes and methods: ``rawtimings``, ``sum()``, ``min()``, ``max()``, ``mean()``, ``std()``.
#. More accurate computation of self-time in ``sc.timedsleep()``.

Files
^^^^^
#. A new function ``sc.unzip()`` extracts zip files to disk, while ``sc.loadzip()`` now defaults to loading the zip file contents to memory.
#. If a saved file can't be unpickled, ``sc.load()`` now defaults to using ``dill``, and has more robust error handling (see also "versioning" updates below).
#. ``sc.makefilepaths()`` now defaults to ``makedirs=False``.
#. File save functions now make new subfolders by default.
#. ``sc.save()`` now has an ``allow_empty`` argument (instead of ``die='never'``).

Printing
^^^^^^^^
#. ``sc.progressbar()`` can now be used to wrap an iterable, in which case it acts as an alias to ``tqdm.tqdm()``.
#. The new ``sc.progressbars()`` class will create and manage multiple progress bars, which can be useful for monitoring multiple parallel long-running jobs.
#. New functions ``sc.printmean()`` and ``sc.printmedian()`` can be used to quickly summarize an array.
#. ``sc.humanize_bytes()`` will convert a number of bytes into a human-readable number (e.g. ``32975281`` to ``32.975 MB``).
#. ``sc.printarr()`` now has configurable decimal places (``decimals`` argument) and can return a string instead of printing (``doprint=False``).

Profiling
^^^^^^^^^
#. The new ``sc.benchmark()`` function runs tests on both regular Python and Numpy operations and reports the performance of the current machine.
#. ``sc.checkmem()`` now returns a dataframe, can descend multiple levels through an object, reports subtotals, and has an ``order`` argument instead of ``alphabetical``.

Versioning
~~~~~~~~~~
#. A new versioning module has been added.
#. A new function ``sc.metadata()`` gathers all relevant metadata and returns a dict that can be used for versioning.
#. Known regressions from older library versions are now automatically handled by ``sc.load()`` (e.g., ``pandas`` v2.0 dataframes cannot be loaded in v1.5, and vice versa).
#. A pair of new functions ``sc.savewithmetadata()`` and ``sc.loadwithmetadata()``, provide a way to automatically save metadata along with an object for better versioning.

Math
^^^^
#. ``sc.randround()`` now works with multidimensional arrays. (Thanks to `Jamie Cohen <https://github.com/jamiecohen>`_ for the suggestion.)
#. ``sc.smoothinterp()`` now defaults to ``ensurefinite=True``.

Other
^^^^^
#. The new function ``sc.importbypath()`` will import a module by path, as an alternative to standard ``import``. ``sc.importbyname()`` also now accepts a ``path`` argument.
#. The new function ``sc.getuser()`` will return the current username (as an alias to ``getpass.getuser()``).
#. The new function ``sc.sanitizestr()`` will sanitize an input string to e.g. ASCII-only or a valid variable name.
#. ``sc.download()`` now handles exceptions gracefully with ``die=False``.
#. ``sc.isiterable()`` now has optional ``exclude`` and ``minlen`` arguments.
#. A new environment variable, ``SCIRIS_NUM_THREADS``, will set the number of threads Numpy uses (if Sciris is imported first). In some cases, more threads results in *slower* processing (and of course uses way more CPU time).
#. Nested "dictionary" operations can now act on other types of object, including lists and regular objects.
#. The ``freeze`` argument of ``sc.savefig()`` has been renamed ``pipfreeze``, and ``frame`` has been replaced with ``relframe``.
#. ``sc.search()`` now works on values as well as keys/attributes.
#. ``sc.dictobj.fromkeys()`` is now a static method.

Bugfixes
~~~~~~~~
#. Fixed ``<=`` comparison in ``sc.compareversions()`` not handling equality.
#. Fixed the implementation of the ``midpoint`` argument in ``sc.vectocolor()``.
#. Fixed corner cases where some ``sc.dataframe`` methods returned ``pd.DataFrame`` objects instead.
#. Fixed corner cases where some ``sc.objdict`` methods returned ``sc.odict`` objects instead.
#. ``sc.findinds()`` now returns a tuple for multidimensional arrays, allowing it to be used directly for indexing.
#. ``sc.rmnans()`` now returns a zero-length array if all input is NaNs.
#. ``sc.options.with_style(style)`` now correctly applies the style.
#. Fixed ``sc.daydiff()`` with one argument computing the number of days from Jan. 1st of the *current* year (instead of Jan. 1st of the provided year).

Regression information
~~~~~~~~~~~~~~~~~~~~~~
#. ``tqdm`` is now a required dependency.
#. The ``returnpool`` argument of ``sc.parallelize()`` has been removed.
#. Calls to ``sc.makepath()`` and ``sc.makefilepath()`` now need to specify ``makedirs=True``.
#. For ``sc.savefig()``, ``freeze`` should be renamed ``pipfreeze``, and ``frame`` should be replaced with ``relframe`` with an offset of 2 (e.g. ``frame=2 → relframe=0``).
#. ``sc.checkmem(..., alphabetical=True)`` has been replaced with ``sc.checkmem(..., order='alphabetical')``
#. ``sc.parallel_progress()`` has been moved to ``sc.sc_legacy()``. Please use ``sc.parallelize(..., progress=True)`` instead.
#. ``sc.parallelcmd()`` has been moved to ``sc.sc_legacy()``. Please do not use this function :)



Version 2.1.0 (2022-12-23)
--------------------------

Expand Down

0 comments on commit b96a663

Please sign in to comment.