Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Plotly 3.0.0 - Deep Jupyter Integration, Validation, Performance, and More #942

Merged
merged 279 commits into from Jul 5, 2018

Conversation

jonmmease
Copy link
Contributor

@jonmmease jonmmease commented Feb 19, 2018

Update: June 27, 2018

This branch will mark a major release of the plotly.py branch, Version 3.0.0
There are many new and great features in this branch including deeper Jupyter integration, deeper figure validation, improved performance, and more.
You can try this out with a prerelease version, see https://pypi.org/project/plotly/#history for the latest.
To install and enable with Jupyter, run:

pip install plotly==3.0.0rc11
pip install "notebook>=5.3" "ipywidgets>=7.2"  # only necessary for Jupyter Notebook environments

If you're using older versions of notebook or ipywidgets you may need to manually activate the widget extensions (this should not be needed for notebook>=5.3 and ipywidgets>=7.2)

jupyter nbextension enable --py widgetsnbextension --sys-prefix
jupyter nbextension enable --py plotlywidget --sys-prefix

In addition, to add JupyterLab support run the following commands:

pip install jupyterlab
jupyter labextension install @jupyter-widgets/jupyterlab-manager # install the Jupyter widgets extension
jupyter labextension install plotlywidget@0.1.1

The original message from February 19, 2018 is below:

Alright, here we go! @chriddyp @jackparmer

Overview
This is a first cut at integrating the work I started in the ipyplotly project into plotly.py.

Integration Approach
I moved the new object hierarchy to plotly/datatypes. I modified the graph_objs.py code generation logic to remap Figure, Layout, and all of the trace objects (Bar, Scatter, etc.) to the corresponding objects in the new datatypes hierarchy. All of the other classes in graph_objs were converted into list or dict subclasses. This is simply for backwards compatibility for the time being, I'm not sure what we want to do with these in the long run.

Widget notes
The backend ipywidget logic is implemented in the new datatypes.Figure and datatypes.FigureWidget classes. The front-end logic is implemented in the js/ directory.

Examples
See the notebooks in specs/ipyplotly_integration for an overview of the new functionality.

TODO

  • Support dict-style .update method on all datatypes
  • Assess existing tests and catalog breakage
  • Python 2.7 support
  • JupyterLab support
  • Update README with some examples from Overview.ipynb
  • Reformat for 79 character width (plus PEP8 in general)

Failing test modes
Summary of reasons for current test failures

  • unexpected keyword argument '_raise' (test_get_figure). Looks like old graph objects had a _raise constructor parameter to control whether validation was performed. I'm not sure why this was needed, but validation isn't optional for the new objects.
  • Validation exceptions not raised on compatibility classes. Some of the current graph_objs.py classes don't directly map to objects in the new hierarchy, and they are currently included as dict/list subclasses for source compatibility. e.g.
    • Sequence types like Annotations don't have an equivalent because sequences are now represented as tuples of compound types. For instance, the annotations property of Layout accepts tuples of datatypes.layout.Annotation objects and there is no Annotations type.
    • Types like Marker, ColorBar, ErrorX, Font, Line, Frames, etc. are replaced by more precise types. For example, there is now a trace.scatter.Marker class and a trace.bar.Marker class and they each contain only the properties that are valid for their trace type.
  • New datatype objects don't have a validate() method
  • Subplot properties may only be suffixed by an integer > 1. See [wip] - ipyplotly fixes #956 (comment) for some comments. The documentation and schema of Plotly.js seem to disallow subplotid properties ending in 1. (x, x2, x3, but not x1).
  • There is no get_data method on datatype objects. This had a flatten option to return a dict with keys like foo.bar.baz. I think this could be implemented on the new objects if there's a need for.
  • Errors in test_graph_objs.py because the collection of classes in graph_objs.py has changed.
  • Datatypes are no longer dict subclasses so tests like assert Scatter() == dict(type='scatter') fail.
  • Exception types don't always match: The current datatype/validator logic mostly raises ValueError and KeyError exceptions. Many of the tests expect a PlotlyError.
  • There is no strip_style method on datatypes. The new logic doesn't maintain the style meta-data after code generation, but it would be possible to add this if this method needs to be preserved.
  • Datatypes have no to_string method, but again this could be added.
  • Invalid value of type 'plotly.grid_objs.grid_objs.Column' received for the 'xsrc' property of trace.scatter: grid_objs aren't accepted as *src properties
  • Scalar values aren't currently accepted as data_array types (e.g. scatter.x). This breaks some of the streaming tests which have things like my_stream.write(Scatter(x=1, y=10), ...)
  • It's not valid to pass a dictionary of object properties to the constructor of a data type without kwarg expansion. Scatter({'x': ..., 'y': ...}) is invalid, need to do Scatter(**{'x': ..., 'y': ...}). This breaks the tools.py/validate function.
  • The layout.width/height properties are numbers (according to the schema), but the dendrogram figure_factory sets height to the string '100%'.
  • All data_array typed properties are converted to read-only numpy arrays internally. So comparisons that expect lists fail (see TestStreamline/test_simple_streamline)
  • flaglist/enumeration types are case sensitive. Causes failure in TestTrisurf/test_trisurf_all_args because hoverinfo is set to 'None' rather than 'none'
  • 'Figure' object has no attribute 'to_dataframe': I'm not really clear on what this method is supposed to do in general.
  • 'dot' is in invalid marker type (according to the schema). Used in test_optional/test_matplotlylib/data/scatter.py/SIMPLE_SCATTER
  • AttributeError: module 'plotly.graph_objs.graph_objs' has no attribute 'Histogram2dcontour': It's Histogram2dContour. We could support both versions if we need to.
  • There are a buch of JSON encoding failure that seem to relate to date encoding.

Summary
Missing methods on datatypes

  • to_string [removing]
  • strip_style [removing]
  • get_data [removing]
  • validate [removing]
  • to_dataframe [removing]

Next steps

  • Support strict flag in string validator (see Add number type to annotations.text chart schema plotly.js#2523).
  • Remove hard numpy dependency. Support storing data_arrays as lists internally, but convert to tuples on property access.
  • Investigate improving import performance and reenabling recursive submodule imports.
  • add optional row/col support to add_traces. Deprecate append_traces and remap to add_traces with row/col args.
  • Make ipywidgets an optional dependency, only loaded when FigureWidget constructor is called.
  • Remove all use of deprecated classes in figure_factory, matplotlylib, etc.
    • tests
    • the actual modules users call

Copy link
Contributor

@Kully Kully left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember Pep-8: 79 char limits :)

Not a thorough review of all your code, but I hope this helps

@@ -0,0 +1,909 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at least a portion of these examples in the overview .ipynb should be in the README file. I prefer a README that has examples right there to get started

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed (Added to the checklist above). Also, I didn't plan for these notebooks to stay here in the long run. I just wanted to share some examples for the team to use as part of the discussion.

"outputs": [],
"source": [
"# Set marker styles / colorbars to match between figures\n",
"scatt2.marker = scatt1.marker"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big deal, but running this cell more than once raises a ValueError :

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I suggest changing the code so that an error doesn't get raised or writing a little blurb about the reason why this happens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

"outputs": [],
"source": [
"# Save figure 1 to a pdf in the exports directory (requires cairosvg be installed)\n",
"# f1.save_image('exports/f1.pdf')"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After installing cariosvg I get an error running this cell:

OSError: dlopen() failed to load a library: cairo / cairo-2

Looks like there's a thread open here: Kozea/CairoSVG#84

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you install through pip or conda? I've had better luck through conda as conda will also pull down the cairo binary without needing to build it. It might be that you have cairosvg installed but not cairo, in this case you might be able to install just cairo using conda.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important that the install works w/ pip (as well as conda)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very possible.

and yes I was using pip

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cldougl Agreed. The trouble is that cairsosvg requires the non-python cairo executable be installed and pip can't (or at least doesn't) do that.

If cairo(svg) proves to be too much of a hassle to be even an optional dependency then one option is to allow users to register custom file type output filters. This way a user can register cairosvg.svg2png to be the function to be called when a plot is to be saved to a png file.

},
"nbformat": 4,
"nbformat_minor": 2
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall very good work here! The convenience bar for using Plotly in Jupyter has been greatly raised.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

if invalid_uids:
raise ValueError(('The trace property of a figure may only be assigned to '
'a permutation of a subset of itself\n'
' Invalid trace(s) with uid(s): {invalid_uids}').format(invalid_uids=invalid_uids))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple things:

  1. Your line has more than 79 characters. https://www.python.org/dev/peps/pep-0008/#id19
  2. Write the ValueError message something like this:
raise ValueError(
    'The trace property of a figure may only be assigned to '
    'a permutation of a subset of itself \n Invalid trace(s) with uid(s): ' 
    '{invalid_uids}'.format(invalid_uids=invalid_uids)
)

it addresses 1) and it's more clean IMO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I started this as an independent project originally and chose to use a 119 character line width at the time. We can do a mass reformatting commit at some point (probably just run everything through yapf like I do in the codegen logic)

@chriddyp
Copy link
Member

chriddyp commented Mar 5, 2018

hey @jmmease - I'm having some trouble running python setup.py codegen:

$ python setup.py codegen
running codegen
Traceback (most recent call last):
  File "setup.py", line 200, in <module>
    'codegen': CodegenCommand,
  File "/Users/chriddyp/Repos/plotly.py/vv/lib/python3.6/site-packages/setuptools/__init__.py", line 129, in setup
    return distutils.core.setup(**attrs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/dist.py", line 955, in run_commands
    self.run_command(cmd)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "setup.py", line 138, in run
    from codegen import perform_codegen
  File "/Users/chriddyp/Repos/plotly.py/codegen/__init__.py", line 8, in <module>
    from codegen.datatypes import build_datatypes_py, write_datatypes_py, append_figure_class
  File "/Users/chriddyp/Repos/plotly.py/codegen/datatypes.py", line 8, in <module>
    from codegen.utils import TraceNode, format_source, PlotlyNode
  File "/Users/chriddyp/Repos/plotly.py/codegen/utils.py", line 9, in <module>
    from plotly.basevalidators import BaseValidator, CompoundValidator, CompoundArrayValidator
  File "/Users/chriddyp/Repos/plotly.py/plotly/__init__.py", line 31, in <module>
    from plotly import (plotly, dashboard_objs, graph_objs, grid_objs, tools,
  File "/Users/chriddyp/Repos/plotly.py/plotly/graph_objs/__init__.py", line 14, in <module>
    from plotly.graph_objs.graph_objs import *  # this is protected with __all__
  File "/Users/chriddyp/Repos/plotly.py/plotly/graph_objs/graph_objs.py", line 802, in <module>
    from plotly.datatypes import FigureWidget
ModuleNotFoundError: No module named 'plotly.datatypes'

Am I missing something?

@jonmmease
Copy link
Contributor Author

Looks like a circular dependency issue, not sure why I wasn't hitting it.

Try commenting out graph_objs.py while running the codegen logic, then uncommenting.

@jonmmease
Copy link
Contributor Author

Any luck with that @chriddyp?

@chriddyp
Copy link
Member

chriddyp commented Mar 5, 2018

@jmmease that worked, thanks!

@chriddyp
Copy link
Member

chriddyp commented Mar 5, 2018

📣 I have published a pre-release version of this branch on PyPI. Try it out (python 3 only) with:

pip install plotly==3.0.0rc1

@Kully
Copy link
Contributor

Kully commented Mar 12, 2018

@jmmease This is not a bug, but a suggestion to improve an error message:

f1.add_scatter([k for k in range(50)])

returns:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-41-97f2242aa656> in <module>()
----> 1 f1.add_scatter([k for k in range(50)])

~/ipyplotly/ipyplotly/datatypes/__init__.py in add_scatter(self, visible, showlegend, legendgroup, opacity, name, uid, ids, customdata, selectedpoints, hoverinfo, hoverlabel, stream, x, x0, dx, y, y0, dy, text, hovertext, mode, hoveron, line, connectgaps, cliponaxis, fill, fillcolor, marker, selected, unselected, textposition, textfont, r, t, error_y, error_x, xcalendar, ycalendar, xaxis, yaxis, idssrc, customdatasrc, hoverinfosrc, xsrc, ysrc, textsrc, hovertextsrc, textpositionsrc, rsrc, tsrc, **kwargs)
   3407             rsrc=rsrc,
   3408             tsrc=tsrc,
-> 3409             **kwargs
   3410         )
   3411         return self.add_traces(new_trace)[0]

~/ipyplotly/ipyplotly/datatypes/trace/__init__.py in __init__(self, visible, showlegend, legendgroup, opacity, name, uid, ids, customdata, selectedpoints, hoverinfo, hoverlabel, stream, x, x0, dx, y, y0, dy, text, hovertext, mode, hoveron, line, connectgaps, cliponaxis, fill, fillcolor, marker, selected, unselected, textposition, textfont, r, t, error_y, error_x, xcalendar, ycalendar, xaxis, yaxis, idssrc, customdatasrc, hoverinfosrc, xsrc, ysrc, textsrc, hovertextsrc, textpositionsrc, rsrc, tsrc, **kwargs)
   1720         # Populate data dict with properties
   1721         # ----------------------------------
-> 1722         self.visible = visible
   1723         self.showlegend = showlegend
   1724         self.legendgroup = legendgroup

~/ipyplotly/ipyplotly/basedatatypes.py in __setattr__(self, prop, value)
   1247         if prop.startswith('_') or hasattr(self, prop):
   1248             # Let known properties and private properties through
-> 1249             super().__setattr__(prop, value)
   1250         else:
   1251             # Raise error on unknown public properties

~/ipyplotly/ipyplotly/datatypes/trace/__init__.py in visible(self, val)
     41     @visible.setter
     42     def visible(self, val):
---> 43         self['visible'] = val
     44 
     45     # showlegend

~/ipyplotly/ipyplotly/basedatatypes.py in __setitem__(self, key, value)
   1399         else:
   1400             # Simple property
-> 1401             self._set_prop(key, value)
   1402 
   1403     @property

~/ipyplotly/ipyplotly/basedatatypes.py in _set_prop(self, prop, val)
   1418 
   1419         validator = self._validators.get(prop)
-> 1420         val = validator.validate_coerce(val)
   1421 
   1422         if val is None:

~/ipyplotly/ipyplotly/basevalidators.py in validate_coerce(self, v)
    242         else:
    243             if not self.in_values(v):
--> 244                 self.raise_invalid_val(v)
    245         return v
    246 

~/ipyplotly/ipyplotly/basevalidators.py in raise_invalid_val(self, v)
    100                       typ=type_str(v),
    101                       v=repr(v),
--> 102                       vald_clr_desc=self.description()))
    103 
    104     def raise_invalid_elements(self, invalid_els):

ValueError: 
    Invalid value of type 'builtins.list' received for the 'visible' property of trace.scatter
        Received value: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]

    The 'visible' property is an enumeration that may be specified as:
      - One of the following enumeration values:
            [True, False, 'legendonly']

I don't think it's a very user friendly error message for someone who is trying to select input an x or y list, as I imagine that will be the case for a lot.

Perhaps a message that enumerates all of the scatter params or something that says:

Whoops, invalid parameter. Run `f1.add_scatter.(<SHIFT+TAB>)` to see all the available parameters.
Proper Example:
...

@chriddyp
Copy link
Member

chriddyp commented Mar 12, 2018

I don't think it's a very user friendly error message for someone who is trying to select input an x or y list, as I imagine that will be the case for a lot.

@Kully - I don't think that will actually happen it a lot, since all of our documentation will show add_scatter(x=[....]) and if the user calls help(add_scatter) or looks at the add_scatter docstring they will see the proper usage. If a user does add_scatter([...]), they are really just guessing that this is the proper syntax.

Alternatively, perhaps the keyword arguments x and y could be moved to be the first two keyword arguments, that way add_scatter(x=[1, 2, 3], y=[3, 1, 2]) is the same as add_scatter([1, 2, 3], [3, 1, 2]). However, let's move enhancements like that to after we have this PR's tests passing :)

@jonmmease
Copy link
Contributor Author

@Kully @chriddyp I like the idea of making x and y the first two kwargs of add_scatter eventually. This would be consistent with MATLAB/matplotlib.

@jonmmease
Copy link
Contributor Author

@chriddyp @Kully I added a fairly in-depth summary of the current test failures to the top of this PR. I think this captures everything that Chris found in #956 (Note I implemented the update method #956 )

@Kully
Copy link
Contributor

Kully commented Mar 13, 2018

@chriddyp @Kully I added a fairly in-depth summary of the current test failures to the top of this PR. I think this captures everything that Chris found in #956 (Note I implemented the update method #956 )

Thanks a lot for this Jon. I'm going to be hitting through these this week so it's nice to have them all written out cleanly.

@jonmmease
Copy link
Contributor Author

Sounds good @Kully
I'm going to work on reviewing and documenting the widget JavaScript code next, so we shouldn't get in each other's way.

Jon M. Mease added 3 commits March 19, 2018 06:55
 - plotly/datatypes package is gone
 - codegen object hierarchy directly into plotly/graph_objs.
 - Refactor codegen output hierarchy to be compatible with legacy Figure/Layout/Trace types. e.g. go.Figure, go.Layout, go.Scatter.
 - Codegen deprecated classes as dict/list subclasses with deprecation warnings that point users to which classes to use instead.
 - The entire legacy graph_objs directory is gone (PlotlyDict/PlotlyList/etc.)
 - _plotly_utils package introduced as a package to hold code that should be available during both code generation and runtime. Moved base validators to this package so we can reuse their descriptions in docstrings.  All of the order-dependency of code generation is now gone, and the plotly/ package is not imported. This makes code generation much less fragile to future extension.
@jonmmease
Copy link
Contributor Author

@chriddyp @Kully
I just pushed a big update / cleanup to the code generation logic and the output package structure that I think makes the new version much more cohesive. The goal here was to make code generation less fragile (order-dependent), and to get rid of the duality of the plotly.graph_objs and plotly.datatypes packages.

  • plotly/datatypes package is gone
  • The codegen object hierarchy is now output directly into plotly/graph_objs.
  • Refactored codegen output hierarchy to be compatible with legacy Figure/Layout/Trace types. e.g. go.Figure, go.Layout, go.Scatter (instead of datatypes.trace.Scatter).
  • Codegen deprecated classes (e.g. go.Marker) as dict/list subclasses with deprecation warnings that point users to which classes to use instead (e.g. use go.scatter.Marker or go.bar.Marker instead).
  • The entire legacy graph_objs package is gone (PlotlyDict/PlotlyList/etc.) and replaced with codegen output.
  • _plotly_utils package introduced as a package to hold code that should be available during both code generation and runtime. Moved base validators to this package so we can reuse their descriptions in docstrings. All of the order-dependency of code generation is now gone, and the plotly/ package is not imported during codegen. This makes code generation much less fragile and open to future extension.

Give it a try and see what you think. I'm going to go back through and review/cleanup the entire codegen module next.

Jon M. Mease added 2 commits March 25, 2018 11:15
@jonmmease jonmmease closed this Jul 4, 2018
Dash - A Pleasant and Productive Developer Experience automation moved this from In progress to Done Jul 4, 2018
@jonmmease jonmmease reopened this Jul 4, 2018
Dash - A Pleasant and Productive Developer Experience automation moved this from Done to In progress Jul 4, 2018
@jonmmease
Copy link
Contributor Author

Here's a summary of what I've done to (hopefully 🤞) get the tests running more reliably on circleci:

  • Only send plot.ly the 'stable' portion of the version string (See plotly/streambed#11078)
  • Incorporate the retrying logic that @Kully started in Retry requests for Python API #964
    • Update the stop_max_delay property from 30s to 90s (I still saw request timeouts at 30s)
  • Set a CircleCi timeout of 20 minutes for the tox --notest command. The default of 10 minutes isn't always long enough. For example, in this last successful build this command took 11:07.
  • For tests that make requests to the server and expect that a PlotlyRequestError will be raised (e.g. plotly/tests/test_core/test_file/test_file.py:test_duplicate_folders), remove the check for which status_code is returned, just check that the exception is raised. I made this change because there were cases where the correct status code was returned on the first few tries, but then by the time the new retrying logic gave up, a different error code was returned

@cldougl @chriddyp Please take a quick look when you can, and let me know if you're uncomfortable with any of these changes.

@astrojuanlu
Copy link
Contributor

The changes look awesome! Do you have an ETA for Plotly 3.0.0? 😃

@jonmmease
Copy link
Contributor Author

@Juanlu001 Thanks! We don't have a date, but "very soon" 🙂 Unless something pops up 3.0.0rc11 may well become 3.0.0 final, so give it a spin if you can! (Install instructions at the top of the thread)

@jonmmease jonmmease merged commit dbe8adb into master Jul 5, 2018
Dash - A Pleasant and Productive Developer Experience automation moved this from In progress to Done Jul 5, 2018
@nicolaskruchten nicolaskruchten deleted the ipyplotly_integration branch June 19, 2020 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

10 participants