Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gfm output of pandas dataframes rendered incorrectly on GitHub #2152

Closed
4 tasks done
kalenkovich opened this issue Aug 26, 2022 · 12 comments
Closed
4 tasks done

gfm output of pandas dataframes rendered incorrectly on GitHub #2152

kalenkovich opened this issue Aug 26, 2022 · 12 comments
Assignees
Labels
bug Something isn't working upstream Bug is in upstream library
Milestone

Comments

@kalenkovich
Copy link

Bug description

gfm output that has pandas dataframes printed out, renders incorrectly on Github, possibly because of GitHub not respecting <style scoped> tags.

Input qmd:

---
title: "Untitled"
format: gfm
---


```{python}
import pandas as pd
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
iris.head()
```

Output md raw:

Untitled
================

``` python
import pandas as pd
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
iris.head()
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>sepal_length</th>
      <th>sepal_width</th>
      <th>petal_length</th>
      <th>petal_width</th>
      <th>species</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>5.1</td>
      <td>3.5</td>
      <td>1.4</td>
      <td>0.2</td>
      <td>setosa</td>
    </tr>
    <tr>
      <th>1</th>
      <td>4.9</td>
      <td>3.0</td>
      <td>1.4</td>
      <td>0.2</td>
      <td>setosa</td>
    </tr>
    <tr>
      <th>2</th>
      <td>4.7</td>
      <td>3.2</td>
      <td>1.3</td>
      <td>0.2</td>
      <td>setosa</td>
    </tr>
    <tr>
      <th>3</th>
      <td>4.6</td>
      <td>3.1</td>
      <td>1.5</td>
      <td>0.2</td>
      <td>setosa</td>
    </tr>
    <tr>
      <th>4</th>
      <td>5.0</td>
      <td>3.6</td>
      <td>1.4</td>
      <td>0.2</td>
      <td>setosa</td>
    </tr>
  </tbody>
</table>
</div>

Output md rendered:

Untitled
================

``` python
import pandas as pd
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
iris.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
```

I am on Mac OS Big Sur and using VS Code 1.70.2.

Checklist

  • Please include a minimal, fully reproducible example in a single .qmd file? Please provide the whole file rather than the snippet you believe is causing the issue.
  • Please format your issue so it is easier for us to read the bug report.
  • Please document the RStudio IDE version you're running (if applicable), by providing the value displayed in the "About RStudio" main menu dialog?
  • Please document the operating system you're running. If on Linux, please provide the specific distribution.
@kalenkovich kalenkovich added the bug Something isn't working label Aug 26, 2022
@cscheid
Copy link
Collaborator

cscheid commented Aug 26, 2022

Style scoped seems to not be widely supported: https://caniuse.com/style-scoped

@cderv
Copy link
Collaborator

cderv commented Aug 30, 2022

I believe also the GFM does not allow the use of <style> among others https://github.github.com/gfm/#disallowed-raw-html-extension-

So IMO this is not valid HTML for Github preview to HTML

I don't think Quarto can do something about this as this is about how Github is rendering Markdown to HTML.

@kalenkovich
Copy link
Author

I believe also the GFM does not allow the use of <style> among others https://github.github.com/gfm/#disallowed-raw-html-extension-

So IMO this is not valid HTML for Github preview to HTML

I don't think Quarto can do something about this as this is about how Github is rendering Markdown to HTML.

The <style scoped> tags were produced by Quarto, not Github. See "Output md raw:" in the first comment.

I agree that it is not valid HTML for Github preview but that HTML is part of the Markdown file that Quarto created from a qmd file (contents also in the first comment).

@cderv
Copy link
Collaborator

cderv commented Aug 30, 2022

I see. thank @kalenkovich for pointing that out more clearly for me.

What I meant really is that quarto does not add this scoped for style. I believe the HTML is created by Pandas used in the code chunk. However, Quarto does not seem to do anything automatically to tweak pandas output to be compatible with GFM.

For example, explicitly output markdown will output a compatible GFM document I think

---
title: "Untitled"
format: gfm
keep-md: true
---


```{python}
#| output: asis
from IPython.display import Markdown
import pandas as pd
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
print(iris.head().to_markdown())
```

Which renders to

Untitled
================

``` python
from IPython.display import Markdown
import pandas as pd
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
print(iris.head().to_markdown())
```

|     | sepal_length | sepal_width | petal_length | petal_width | species |
|----:|-------------:|------------:|-------------:|------------:|:--------|
|   0 |          5.1 |         3.5 |          1.4 |         0.2 | setosa  |
|   1 |          4.9 |           3 |          1.4 |         0.2 | setosa  |
|   2 |          4.7 |         3.2 |          1.3 |         0.2 | setosa  |
|   3 |          4.6 |         3.1 |          1.5 |         0.2 | setosa  |
|   4 |            5 |         3.6 |          1.4 |         0.2 | setosa  |

Rendered as HTML below


Untitled

from IPython.display import Markdown
import pandas as pd
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
print(iris.head().to_markdown())
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5 3.6 1.4 0.2 setosa

Probably there is something we can do to try output Markdown for Pandas table when in format: gfm. 🤔

@cscheid
Copy link
Collaborator

cscheid commented Aug 30, 2022

@cderv A more careful orchestration of quarto formats and downstream table printing is definitely needed. (Ideally we'd define an API for these things that clients can implement)

@kalenkovich
Copy link
Author

Here is a relevant nbconvert issue and another one with two workarounds.

The workarounds are:

  • Turn off html output when displaying a dataframe, which leads to no tables in the output at all. Not very useful.
  • Subclass pd.DataFrame to strip the <style scoped> tags. Then you just need to convert all dataframes being printed to this new class. A bit annoying but works. I wish I could keep the styling though.

@cscheid cscheid added the upstream Bug is in upstream library label Sep 14, 2022
@cscheid cscheid self-assigned this Sep 14, 2022
@cscheid cscheid added this to the v1.3 milestone Sep 14, 2022
@cscheid
Copy link
Collaborator

cscheid commented Sep 14, 2022

Flagging this to 1.3 because we can't ask pandas to fix this until we have support for signaling downstream libraries about quarto, and that will be a 1.2 feature.

@machow
Copy link

machow commented Oct 3, 2022

You can tell IPython how to represent a DataFrame -> HTML (or markdown). (see these IPython docs)

For example , this changes the default DataFrame html representation in IPython (and the jupyter notebook) to use DataFrame.to_html, with custom options:

import pandas as pd

from IPython import get_ipython
# special ipython function to get the html formatter
html_formatter = get_ipython().display_formatter.formatters['text/html']

html_formatter.for_type(
    pd.DataFrame,
    lambda df: df.to_html(max_rows = pd.get_option("display.max_rows"), show_dimensions = True)
)

Here's a qmd example:

---
title: "example"
format: gfm
jupyter:
  kernelspec: 
    name: <YOUR_KERNEL_HERE>
    language: python
    display_name: <KERNEL_NAME>
    
---

```{python}
import pandas as pd

from IPython import get_ipython
# special ipython function to get the html formatter
html_formatter = get_ipython().display_formatter.formatters['text/html']

html_formatter.for_type(
    pd.DataFrame,
    lambda df: df.to_html(max_rows = pd.get_option("display.max_rows"), show_dimensions = True)
)

```

```{python}
pd.DataFrame({'x': [1]})
```

Which outputs this HTML table

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>x</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
    </tr>
  </tbody>
</table>
<p>1 rows × 1 columns</p>

@cscheid
Copy link
Collaborator

cscheid commented Jan 25, 2023

@machow This is a really good move to know about, thank you.

I think the existence of this path is enough for us to close this issue.

I'll note that there's a new feature coming in quarto 1.3 where we process HTML tables in every format and turn them into markdown by default. This gets markdown-readable tables, but gfm actually renders HTML tables too. To disable this conversion, pass data-disable-quarto-processing="true" as a table attribute, by using pandas's df styler:

```{python}
#| echo: false
#| output: false
from IPython import get_ipython
# special ipython function to get the html formatter
html_formatter = get_ipython().display_formatter.formatters['text/html']
html_formatter.for_type(
    pd.DataFrame,
    lambda df: df.style.set_table_attributes('data-quarto-disable-processing="true"').to_html(max_rows = pd.get_option("display.max_rows"), show_dimensions = True)
)
```

This will produce HTML table in gfm output.

@cscheid cscheid closed this as completed Jan 25, 2023
@kalenkovich
Copy link
Author

Thank you, @machow and @cscheid!

@cscheid, I think @machow's workaround should be documented somewhere outside of this issue. Do tell me if I can help with that.

@cderv
Copy link
Collaborator

cderv commented Jan 26, 2023

We could at least make an example maybe 🤔 Existing examples are in https://github.com/quarto-dev/quarto-examples/

@cscheid
Copy link
Collaborator

cscheid commented Jan 26, 2023

Yes, this does need to be documented better. We've overhauled our table treatment for 1.3, and the documentation will be there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working upstream Bug is in upstream library
Projects
None yet
Development

No branches or pull requests

4 participants