Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Explorer: Set initial settings via code #4377

Closed
dylanross3 opened this issue Apr 25, 2019 · 3 comments
Closed

Data Explorer: Set initial settings via code #4377

dylanross3 opened this issue Apr 25, 2019 · 3 comments
Labels
data-explorer For issues related to the data explorer output stale workflow: stale issue

Comments

@dylanross3
Copy link

Is your feature request related to a problem? Please describe.
A common problem with notebooks is reproducibility. I'm assuming the nteract notebook UI persists the Data Explorer settings in the output's metadata, and my application does that too, allowing the same view to be restored later. However, this actually decreases reproducibility of the execution, because running the notebook again (such as when you share it with someone else) will not produce the same view of the Data Explorer because the output metadata was lost.

Describe the solution you'd like
One solution would be to allow the Data Explorer settings to be pre-configured in code which will be applied to the output's metadata during the execution. I explored some options, and it turns out this is already possible with the following snippet.

from IPython.core.display import display

metadata = {
    'dx': {
        'view': 'scatter',
        'selectedMetrics': [
            'Economy (GDP per Capita)',
            'Family',
            'Health (Life Expectancy)',
            'Freedom'
        ],
        'chart': {
            'metric1': 'Happiness Score',
            'metric2': 'Economy (GDP per Capita)',
            'metric3': 'Freedom',
            'dim1': 'Region',
            'dim2': 'Region'
        }
    }
}

# assumes `df` refers to a Pandas DataFrame set previously
display(df, metadata=metadata)

That snippet successfully pre-hydrates the output's metadata. The full metadata is more comprehensive, but this example works since the rest of the options apparently take default values.

I think this is a promising way to increase reproducibility, but it comes with some problems.

  1. It's not obvious what the configuration options are.
  2. It's not easy to extract the settings from a Data Explorer that a user manually configured. The only way that I know of is to use developer tools to extract it.
  3. The Data Explorer needs to become resilient to invalid metadata schema and values, such as view: "foo" or metrics/dimensions that are not in the dataset.
  4. This becomes an API, subject to breakage if the metadata schema changes. This is technically already true if applications embedding the Data Explorer are persisting metadata (which mine does), but I wanted to call it out to make sure we keep it in mind since I don't see any versioning information present.

The first two issues could be addressed by introducing a UI control to the Data Explorer that allows exporting the current settings. For example, it could pop out a textarea with the settings encoded as JSON (or just copies the value to the clipboard). The user could past the JSON into a code cell and trivially decode it into a dict to be passed to the display function.

@emeeks emeeks added the data-explorer For issues related to the data explorer output label Apr 27, 2019
@emeeks
Copy link
Member

emeeks commented Apr 27, 2019

  1. We need some documentation, that also means we become committed to a version of the API as it exists in the documentation (which resolves the implicit contract in 4). The approach you're using is the one I'd expect until such a time as the notebook was "smarter" and knew to save the settings in some way when the cells were re-run.
  2. I will add an initial stab at this and we can tighten up the design for how to show it. I feel like this should show up in the console, would you expect it in the UI?
  3. Makes sense and hadn't thought about that. We can add some level of validation in the existing code. I'd just write this from scratch so if there's a standard approach to this, please feel free to file a PR or point me at an example.
  4. Data Explorer needs to get versioned and use semantic versioning. As it stands, my expectation is that metadata changes would be a major version change, so at least I'm operating as if we're in semantic versioning. Looking at this: https://www.npmjs.com/package/@nteract/data-explorer it seems like it's just getting the versioning from nteract, which could be problematic if it starts getting used in isolation more and more (which it is).

@dylanross3
Copy link
Author

  1. I will add an initial stab at this and we can tighten up the design for how to show it. I feel like this should show up in the console, would you expect it in the UI?

We can continue to iterate on this and involve UX designers, but I was imagining one workflow would be like this.

  1. User executes a cell to display the Data Explorer
  2. User clicks around to configure it
  3. User clicks a button to copy the settings as JSON (so it's not coupled to Python) to the clipboard
  4. User pastes the JSON settings into the editor in the appropriate place
  5. User executes cell again to test programmatic configuration

The user would probably need some guidance around how to do steps 3 and 4. Aside from documentation, this could be addressed by having the button open a pop-up for contextual guidance and actions rather than simply copying the settings at that point.

@stale
Copy link

stale bot commented Feb 23, 2020

This issue hasn't had any activity on it in the last 90 days. Unfortunately we don't get around to dealing with every issue that is opened. Instead of leaving issues open we're seeking to be transparent by closing issues that aren't being prioritized. If no other activity happens on this issue in one week, it will be closed.
It's more than likely that just by me posting about this, the maintainers will take a closer look at these long forgotten issues to help evaluate what to do next.
If you would like to see this issue get prioritized over others, there are multiple avenues 🗓:

  • Ask how you can help with this issue 👩🏿‍💻👨🏻‍💻
  • Help solve other issues the team is currently working on 👨🏾‍💻👩🏼‍💻
  • Donate to nteract so we can support developers to work on these features and bugs more regularly 💰🕐

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-explorer For issues related to the data explorer output stale workflow: stale issue
Projects
None yet
Development

No branches or pull requests

2 participants