Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible to use with figures / plots? #65

Closed
choldgraf opened this issue Jan 11, 2020 · 6 comments · Fixed by #66
Closed

Possible to use with figures / plots? #65

choldgraf opened this issue Jan 11, 2020 · 6 comments · Fixed by #66

Comments

@choldgraf
Copy link
Contributor

I'm playing around with using Scrapbook to take plots in one notebook and display them in another. However, doing so seems to generate an error that doesn't quite make sense to me.

Here's the code I'm using to generate the error:

data = np.random.randn(2, 100)
fig, ax = plt.subplots()
ax.scatter(*data, c=data[1])
sb.glue("fig1", fig, display=True)

This is generating the following validation error:

Scrap (name=fig1) contents do not conform to required type structures: <Figure size 432x288 with 1 Axes> is not of type 'object', 'array', 'boolean', 'string', 'number', 'integer'

I see the same behavior with things like Altair figures:

ch = alt.Chart(data=df).mark_point().encode(
    x='a',
    y='b'
)
sb.glue('altair', ch, display=True)

yields a similar error.

It's strange to me that the figures aren't validated as type "object" but either way, perhaps I am not using scrapbook properly here? Or perhaps I am trying to use scrapbook in a way that is not intended? Let me know if I should change something (and I'm happy to add an example to do the docs or something)

@choldgraf
Copy link
Contributor Author

choldgraf commented Jan 11, 2020

Ah ha, I think I figured this out - one would need to write a custom encoder for this, yes?

E.g., I got this working with the Altair chart using the following encoder:

from scrapbook.encoders import registry as encoder_registry
# add encoder to the registry

class AltairEncoder(object):
    def encode(self, scrap):
        # scrap.data is any type, usually specific to the encoder name
        scrap = scrap._replace(data=scrap.data.to_dict())
        return scrap

    def decode(self, scrap):
        # scrap.data is one of [None, list, dict, *six.integer_types, *six.string_types]
        scrap = scrap._replace(data=alt.Chart.from_dict(scrap.data))
        return scrap
    
    
encoder_registry.register("altair", AltairEncoder())

Followed by

ch = alt.Chart(data=df).mark_point().encode(
    x='a',
    y='b'
)
sb.glue('chart', ch, 'altair')

Then I could do:

image

neat!

In addition, for things like just matplotlib figures, you could use:

sb.glue('mplfig', fig, 'display')

and then access the display data with:

# To grab the serialized image
nb.scraps['mplfig'].display['data']['image/png']

# To display again (if in another notebook)
nb.reglue('mplfig')

Let me know if that is correct, and if so then would it be useful for me to add an example to the docs showing these use-cases?

@choldgraf
Copy link
Contributor Author

choldgraf commented Jan 11, 2020

Just a note here to describe the use-case I have in mind.

Imagine that you're writing a paper after doing a collection of analyses with notebooks. You probably have multiple notebooks that have some markup in them, statistics that were run, and figures that were generated. You don't want to write your paper in those notebooks because they're too messy. However, you'd like to reference what happens in those notebooks so that there is one canonical source of truth, and so that the values / plots update as you update those notebooks.

I'm imagining using scrapbook for something like this. If you were writing the paper in rST, then with some custom rST directives, you could imagine things like

In this analysis, there was a significant effect (p=:scrapbook:`notebookID:scrapPValueID`). See Figure 1 for more details.

.. scrapbook-figure:
	:notebook: notebookID
	:scrap: scrapFigureID
	:name: Figure 1

	Caption

That could be a really interesting way to store pieces of an analyses in separate notebooks, and then read them in to a single "finished product" document

@MSeal
Copy link
Member

MSeal commented Jan 15, 2020

Sorry for the delayed response. Glad you got some time to play with the project!

For the first question you can actually save the plot as-is, it just requires an extra field to stop the data being saved as well as the display object:

data = np.random.randn(2, 100)
fig, ax = plt.subplots()
plt.close(fig)
scat = ax.scatter(*data, c=data[1])
sb.glue("fig1", fig, encoder='display', display=True)

The encoder='display' is required for purely display objects. If you don't close the figure it will render twice since the glue call also renders the plot.

Then you can load with:

nb = sb.read_notebook('figure_gen.ipynb')
nb.reglue('fig1')

I am planning on improving this so it's more intuitive / figures out what to do automatically with the user needing to specify -- some of that is in master but it's still got some issues to fix.

@MSeal
Copy link
Member

MSeal commented Jan 15, 2020

Same for the second altair example:

import altair as alt
from vega_datasets import data
source = data.movies.url

fig2 = alt.Chart(source).mark_bar().encode(
    alt.X("IMDB_Rating:Q", bin=True),
    y='count()',
)
sb.glue('fig2', fig2, encoder='display', display=True)

then:

nb.reglue('fig2')

@MSeal
Copy link
Member

MSeal commented Jan 15, 2020

I do like the idea of adding some encoders for capturing the data and the display for various charting libraries. The matplotlib example I posted above only captures the display content and not the chart information itself. It was trying to do that encoding with the initial exception you hit, not knowing how to convert the figure object to a jsonizable type. This means you have to save the raw data in a separate scrap in the current version if you wanted to replot the data in a different way.

@MSeal
Copy link
Member

MSeal commented Jan 15, 2020

That could be a really interesting way to store pieces of an analyses in separate notebooks, and then read them in to a single "finished product" document

Yep that should be possible to do with some interface additions to an rst or markdown rendering engine. You can also imagine pointing to a notebook in github or a document store as the source for the plots and data points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants