Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: Use Styler implementation for DataFrame.to_latex #47970

Merged
merged 28 commits into from Jan 19, 2023
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
02d7d0c
Base implementation
attack68 Aug 3, 2022
0c12f3a
Base implementation
attack68 Aug 3, 2022
d644065
test fix up
attack68 Aug 3, 2022
9dcd254
test fix up
attack68 Aug 3, 2022
cd05038
test fix up
attack68 Aug 4, 2022
c625610
doc change
attack68 Aug 4, 2022
493b884
doc change
attack68 Aug 4, 2022
41fa426
doc change
attack68 Aug 4, 2022
2d5c419
mypy fixes
attack68 Aug 5, 2022
c9c61c3
ivanov doc comment
attack68 Aug 9, 2022
ab6d3ec
ivanov doc comment
attack68 Aug 9, 2022
b913583
rhshadrach reduction
attack68 Aug 11, 2022
801bf42
Merge branch 'main' into to_latex_styler_implement
attack68 Nov 18, 2022
c803b73
change text from 1.5.0 to 2.0.0
attack68 Nov 18, 2022
df0b334
remove argument col_space and add whatsnew
attack68 Nov 18, 2022
f4e4bf7
Merge remote-tracking branch 'upstream/main' into to_latex_styler_imp…
attack68 Dec 7, 2022
9774341
mroeschke requests
attack68 Dec 7, 2022
10b5ec9
mroeschke requests
attack68 Dec 7, 2022
2c42a4e
pylint fix
attack68 Dec 8, 2022
3c2f964
Merge remote-tracking branch 'upstream/main' into to_latex_styler_imp…
attack68 Jan 5, 2023
0b4144a
Whats new text improvements and description added
attack68 Jan 5, 2023
fae1b3f
Update doc/source/whatsnew/v2.0.0.rst
attack68 Jan 13, 2023
9c7d780
Update doc/source/whatsnew/v2.0.0.rst
attack68 Jan 13, 2023
c0bdc6a
remove trailing whitespace
attack68 Jan 16, 2023
c740279
remove trailing whitespace
attack68 Jan 16, 2023
9b6bba8
Merge remote-tracking branch 'upstream/main' into to_latex_styler_imp…
attack68 Jan 17, 2023
6b314cd
Whats new linting fixes
attack68 Jan 17, 2023
d6333cb
mroeschke requests
attack68 Jan 18, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
33 changes: 33 additions & 0 deletions doc/source/whatsnew/v2.0.0.rst
Expand Up @@ -452,6 +452,37 @@ Now, the axes return an empty :class:`RangeIndex`.
pd.Series().index
pd.DataFrame().axes

.. _whatsnew_200.api_breaking.to_latex:

DataFrame to LaTeX has a new render engine
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The existing :meth:`DataFrame.to_latex` has been restructured to utilise the
extended implementation previously available under :meth:`.Styler.to_latex`.
The arguments signature is similar, albeit ``col_space`` has been removed since
it is ignored by LaTeX engines. This render engine also requires ``jinja2`` as a
dependency which needs to be installed, since rendering is based upon jinja2 templates.

The pandas options below are no longer used and will be removed in future releases.
The alternative options giving similar functionality are indicated below:

- ``display.latex.escape``: replaced with ``styler.format.escape``,
- ``display.latex.longtable``: replaced with ``styler.latex.environment``,
- ``display.latex.multicolumn``, ``display.latex.multicolumn_format`` and
``display.latex.multirow``: replaced with ``styler.sparse.rows``,
``styler.sparse.columns``, ``styler.latex.multirow_align`` and
``styler.latex.multicol_align``,
- ``display.latex.repr``: replaced with ``styler.render.repr``,
- ``display.max_rows`` and ``display.max_columns``: replace with
``styler.render.max_rows``, ``styler.render.max_columns`` and
``styler.render.max_elements``.

Note that the behaviour of ``_repr_latex_`` is also changed. Previously
setting ``display.latex.repr`` would generate LaTeX only when using nbconvert for a
JupyterNotebook, and not when the user is running the notebook. Now the
``styler.render.repr`` option allows control of the specific output
within JupyterNotebooks for operations (not just on nbconvert). See :issue:`39911`.

.. _whatsnew_200.api_breaking.deps:

Increased minimum versions for dependencies
Expand Down Expand Up @@ -617,6 +648,7 @@ Removal of prior version deprecations/changes
- Removed deprecated :meth:`.Styler.set_na_rep` and :meth:`.Styler.set_precision` (:issue:`49397`)
- Removed deprecated :meth:`.Styler.where` (:issue:`49397`)
- Removed deprecated :meth:`.Styler.render` (:issue:`49397`)
- Removed deprecated argument ``col_space`` in :meth:`DataFrame.to_latex` (:issue:`47970`)
- Removed deprecated argument ``null_color`` in :meth:`.Styler.highlight_null` (:issue:`49397`)
- Removed deprecated argument ``check_less_precise`` in :meth:`.testing.assert_frame_equal`, :meth:`.testing.assert_extension_array_equal`, :meth:`.testing.assert_series_equal`, :meth:`.testing.assert_index_equal` (:issue:`30562`)
- Removed deprecated ``null_counts`` argument in :meth:`DataFrame.info`. Use ``show_counts`` instead (:issue:`37999`)
Expand Down Expand Up @@ -791,6 +823,7 @@ Removal of prior version deprecations/changes
- Changed behavior of comparison of ``NaT`` with a ``datetime.date`` object; these now raise on inequality comparisons (:issue:`39196`)
- Enforced deprecation of silently dropping columns that raised a ``TypeError`` in :class:`Series.transform` and :class:`DataFrame.transform` when used with a list or dictionary (:issue:`43740`)
- Changed behavior of :meth:`DataFrame.apply` with list-like so that any partial failure will raise an error (:issue:`43740`)
- Changed behaviour of :meth:`DataFrame.to_latex` to now use the Styler implementation via :meth:`.Styler.to_latex` (:issue:`47970`)
- Changed behavior of :meth:`Series.__setitem__` with an integer key and a :class:`Float64Index` when the key is not present in the index; previously we treated the key as positional (behaving like ``series.iloc[key] = val``), now we treat it is a label (behaving like ``series.loc[key] = val``), consistent with :meth:`Series.__getitem__`` behavior (:issue:`33469`)
- Removed ``na_sentinel`` argument from :func:`factorize`, :meth:`.Index.factorize`, and :meth:`.ExtensionArray.factorize` (:issue:`47157`)
- Changed behavior of :meth:`Series.diff` and :meth:`DataFrame.diff` with :class:`ExtensionDtype` dtypes whose arrays do not implement ``diff``, these now raise ``TypeError`` rather than casting to numpy (:issue:`31025`)
Expand Down
249 changes: 196 additions & 53 deletions pandas/core/generic.py
Expand Up @@ -3,6 +3,7 @@

import collections
import datetime as dt
from functools import partial
import gc
from json import loads
import operator
Expand Down Expand Up @@ -182,7 +183,6 @@
Window,
)

from pandas.io.formats import format as fmt
from pandas.io.formats.format import (
DataFrameFormatter,
DataFrameRenderer,
Expand Down Expand Up @@ -2108,7 +2108,7 @@ def _repr_latex_(self):
Returns a LaTeX representation for a particular object.
Mainly for use with nbconvert (jupyter notebook conversion to pdf).
"""
if config.get_option("display.latex.repr"):
if config.get_option("styler.render.repr") == "latex":
rhshadrach marked this conversation as resolved.
Show resolved Hide resolved
return self.to_latex()
else:
return None
Expand Down Expand Up @@ -3198,7 +3198,6 @@ def to_latex(
...

@final
@doc(returns=fmt.return_docstring)
def to_latex(
self,
buf: FilePath | WriteBuffer[str] | None = None,
Expand Down Expand Up @@ -3237,14 +3236,15 @@ def to_latex(
.. versionchanged:: 1.2.0
Added position argument, changed meaning of caption argument.

.. versionchanged:: 2.0.0
Refactored to use the Styler implementation via jinja2 templating.

Parameters
----------
buf : str, Path or StringIO-like, optional, default None
Buffer to write to. If None, the output is returned as a string.
columns : list of label, optional
The subset of columns to write. Writes all columns by default.
col_space : int, optional
mroeschke marked this conversation as resolved.
Show resolved Hide resolved
The minimum width of each column.
header : bool or list of str, default True
Write out the column names. If a list of strings is given,
it is assumed to be aliases for the column names.
Expand Down Expand Up @@ -3318,7 +3318,12 @@ def to_latex(
``\begin{{}}`` in the output.

.. versionadded:: 1.2.0
{returns}

Returns
-------
str or None
If buf is None, returns the result as a string. Otherwise returns None.

See Also
--------
io.formats.style.Styler.to_latex : Render a DataFrame to LaTeX
Expand All @@ -3327,30 +3332,35 @@ def to_latex(
tabular output.
DataFrame.to_html : Render a DataFrame as an HTML table.

Notes
-----
As of v2.0.0 this method has changed to use the Styler implementation as
part of :meth:`.Styler.to_latex` via ``jinja2`` templating. This means
that ``jinja2`` is a requirement, and needs to be installed, for this method
to function. It is advised that users switch to using Styler, since that
implementation is more frequently updated and contains much more
flexibility with the output.

Examples
--------
Convert a general DataFrame to LaTeX with formatting:

>>> df = pd.DataFrame(dict(name=['Raphael', 'Donatello'],
... mask=['red', 'purple'],
... weapon=['sai', 'bo staff']))
>>> print(df.to_latex(index=False)) # doctest: +SKIP
\begin{{tabular}}{{lll}}
\toprule
name & mask & weapon \\
\midrule
Raphael & red & sai \\
Donatello & purple & bo staff \\
... age=[26, 45],
... height=[181.23, 177.65]))
>>> print(df.to_latex(index=False,
... formatters={"name": str.upper},
... float_format="{:.1f}".format,
... ) # doctest: +SKIP
\begin{tabular}{lrr}
\toprule
name & age & height \\
\midrule
RAPHAEL & 26 & 181.2 \\
DONATELLO & 45 & 177.7 \\
\bottomrule
\end{{tabular}}
"""
msg = (
"In future versions `DataFrame.to_latex` is expected to utilise the base "
"implementation of `Styler.to_latex` for formatting and rendering. "
"The arguments signature may therefore change. It is recommended instead "
"to use `DataFrame.style.to_latex` which also contains additional "
"functionality."
)
warnings.warn(msg, FutureWarning, stacklevel=find_stack_level())

\end{tabular}
"""
# Get defaults from the pandas config
if self.ndim == 1:
self = self.to_frame()
Expand All @@ -3365,35 +3375,168 @@ def to_latex(
if multirow is None:
multirow = config.get_option("display.latex.multirow")

self = cast("DataFrame", self)
formatter = DataFrameFormatter(
self,
columns=columns,
col_space=col_space,
na_rep=na_rep,
header=header,
index=index,
formatters=formatters,
float_format=float_format,
bold_rows=bold_rows,
sparsify=sparsify,
index_names=index_names,
escape=escape,
decimal=decimal,
)
return DataFrameRenderer(formatter).to_latex(
buf=buf,
column_format=column_format,
longtable=longtable,
encoding=encoding,
multicolumn=multicolumn,
multicolumn_format=multicolumn_format,
multirow=multirow,
caption=caption,
label=label,
position=position,
if column_format is not None and not isinstance(column_format, str):
mroeschke marked this conversation as resolved.
Show resolved Hide resolved
raise ValueError("`column_format` must be str or unicode")
length = len(self.columns) if columns is None else len(columns)
if isinstance(header, (list, tuple)) and len(header) != length:
raise ValueError(f"Writing {length} cols but got {len(header)} aliases")

# Refactor formatters/float_format/decimal/na_rep/escape to Styler structure
base_format_ = {
"na_rep": na_rep,
"escape": "latex" if escape else None,
"decimal": decimal,
}
index_format_: dict[str, Any] = {"axis": 0, **base_format_}
column_format_: dict[str, Any] = {"axis": 1, **base_format_}

if isinstance(float_format, str):
float_format_: Callable | None = lambda x: float_format % x
else:
float_format_ = float_format

def _wrap(x, alt_format_):
if isinstance(x, (float, complex)) and float_format_ is not None:
return float_format_(x)
else:
return alt_format_(x)

formatters_: list | tuple | dict | Callable | None = None
if isinstance(formatters, list):
formatters_ = {
c: partial(_wrap, alt_format_=formatters[i])
for i, c in enumerate(self.columns)
}
elif isinstance(formatters, dict):
index_formatter = formatters.pop("__index__", None)
column_formatter = formatters.pop("__columns__", None)
if index_formatter is not None:
index_format_.update({"formatter": index_formatter})
if column_formatter is not None:
column_format_.update({"formatter": column_formatter})

formatters_ = formatters
float_columns = self.select_dtypes(include="float").columns
for col in [c for c in float_columns if c not in formatters.keys()]:
attack68 marked this conversation as resolved.
Show resolved Hide resolved
formatters_.update({col: float_format_})
elif formatters is None and float_format is not None:
formatters_ = partial(_wrap, alt_format_=lambda v: v)
format_index_ = [index_format_, column_format_]

# Deal with hiding indexes and relabelling column names
hide_: list[dict] = []
relabel_index_: list[dict] = []
if columns:
hide_.append(
{
"subset": [c for c in self.columns if c not in columns],
"axis": "columns",
}
)
if header is False:
hide_.append({"axis": "columns"})
elif isinstance(header, (list, tuple)):
relabel_index_.append({"labels": header, "axis": "columns"})
format_index_ = [index_format_] # column_format is overwritten

if index is False:
hide_.append({"axis": "index"})
if index_names is False:
hide_.append({"names": True, "axis": "index"})

render_kwargs_ = {
"hrules": True,
"sparse_index": sparsify,
"sparse_columns": sparsify,
"environment": "longtable" if longtable else None,
"multicol_align": multicolumn_format
if multicolumn
else f"naive-{multicolumn_format}",
"multirow_align": "t" if multirow else "naive",
"encoding": encoding,
"caption": caption,
"label": label,
"position": position,
"column_format": column_format,
"clines": "skip-last;data" if multirow else None,
"bold_rows": bold_rows,
}

return self._to_latex_via_styler(
buf,
hide=hide_,
relabel_index=relabel_index_,
format={"formatter": formatters_, **base_format_},
format_index=format_index_,
render_kwargs=render_kwargs_,
)

def _to_latex_via_styler(
self,
buf=None,
*,
hide: dict | list[dict] | None = None,
relabel_index: dict | list[dict] | None = None,
format: dict | list[dict] | None = None,
format_index: dict | list[dict] | None = None,
render_kwargs: dict = {},
attack68 marked this conversation as resolved.
Show resolved Hide resolved
):
"""
Render object to a LaTeX tabular, longtable, or nested table.

Uses the ``Styler`` implementation with the following, ordered, method chaining:

.. code-block:: python
styler = Styler(DataFrame)
styler.hide(**hide)
styler.relabel_index(**relabel_index)
styler.format(**format)
styler.format_index(**format_index)
styler.to_latex(buf=buf, **render_kwargs)

Parameters
----------
buf : str, Path or StringIO-like, optional, default None
Buffer to write to. If None, the output is returned as a string.
hide : dict, list of dict
Keyword args to pass to the method call of ``Styler.hide``. If a list will
call the method numerous times.
relabel_index : dict, list of dict
Keyword args to pass to the method of ``Styler.relabel_index``. If a list
will call the method numerous times.
format : dict, list of dict
Keyword args to pass to the method call of ``Styler.format``. If a list will
call the method numerous times.
format_index : dict, list of dict
Keyword args to pass to the method call of ``Styler.format_index``. If a
list will call the method numerous times.
render_kwargs : dict
Keyword args to pass to the method call of ``Styler.to_latex``.

Returns
-------
str or None
If buf is None, returns the result as a string. Otherwise returns None.
"""
from pandas.io.formats.style import Styler

self = cast("DataFrame", self)
styler = Styler(self, uuid="")

for kw_name in ["hide", "relabel_index", "format", "format_index"]:
kw = vars()[kw_name]
if isinstance(kw, dict):
getattr(styler, kw_name)(**kw)
elif isinstance(kw, list):
for sub_kw in kw:
getattr(styler, kw_name)(**sub_kw)

# bold_rows is not a direct kwarg of Styler.to_latex
if render_kwargs.pop("bold_rows"):
styler.applymap_index(lambda v: "textbf:--rwrap;")

return styler.to_latex(buf=buf, **render_kwargs)

@overload
def to_csv(
self,
Expand Down