Skip to content

Commit

Permalink
CLN/FIX/PERF: Don't buffer entire Stata file into memory (#49228)
Browse files Browse the repository at this point in the history
* CLN: StataReader: refactor repeated struct.unpack/read calls to helpers

* CLN: StataReader: replace string concatenations with f-strings

* CLN: StataReader: prefix internal state with underscore

* FIX: StataReader: defer opening file to when data is required

* FIX: StataReader: don't buffer entire file into memory unless necessary

Refs #48922

* DOC: Note that StataReaders are context managers

* FIX: StataReader: don't close stream implicitly

* Apply review changes
  • Loading branch information
akx committed Feb 23, 2023
1 parent 25775ba commit ae7b6d6
Show file tree
Hide file tree
Showing 4 changed files with 351 additions and 253 deletions.
8 changes: 8 additions & 0 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6033,6 +6033,14 @@ values will have ``object`` data type.
``int64`` for all integer types and ``float64`` for floating point data. By default,
the Stata data types are preserved when importing.

.. note::

All :class:`~pandas.io.stata.StataReader` objects, whether created by :func:`~pandas.read_stata`
(when using ``iterator=True`` or ``chunksize``) or instantiated by hand, must be used as context
managers (e.g. the ``with`` statement).
While the :meth:`~pandas.io.stata.StataReader.close` method is available, its use is unsupported.
It is not part of the public API and will be removed in with future without warning.

.. ipython:: python
:suppress:
Expand Down
3 changes: 3 additions & 0 deletions doc/source/whatsnew/v2.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -857,6 +857,7 @@ Deprecations
- Deprecated :meth:`Series.backfill` in favor of :meth:`Series.bfill` (:issue:`33396`)
- Deprecated :meth:`DataFrame.pad` in favor of :meth:`DataFrame.ffill` (:issue:`33396`)
- Deprecated :meth:`DataFrame.backfill` in favor of :meth:`DataFrame.bfill` (:issue:`33396`)
- Deprecated :meth:`~pandas.io.stata.StataReader.close`. Use :class:`~pandas.io.stata.StataReader` as a context manager instead (:issue:`49228`)

.. ---------------------------------------------------------------------------
.. _whatsnew_200.prior_deprecations:
Expand Down Expand Up @@ -1163,6 +1164,8 @@ Performance improvements
- Fixed a reference leak in :func:`read_hdf` (:issue:`37441`)
- Fixed a memory leak in :meth:`DataFrame.to_json` and :meth:`Series.to_json` when serializing datetimes and timedeltas (:issue:`40443`)
- Decreased memory usage in many :class:`DataFrameGroupBy` methods (:issue:`51090`)
- Memory improvement in :class:`StataReader` when reading seekable files (:issue:`48922`)


.. ---------------------------------------------------------------------------
.. _whatsnew_200.bug_fixes:
Expand Down

0 comments on commit ae7b6d6

Please sign in to comment.