Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derived datetime for tabular data #286

Merged
merged 6 commits into from Sep 11, 2019
Merged

Conversation

jsignell
Copy link
Member

@jsignell jsignell commented Aug 8, 2019

Closes: #278

I was experimenting with this simulated dataset:

import numpy as np
import pandas as pd
import hvplot.pandas

time_df = pd.DataFrame({
    'time': pd.date_range('1/1/2000', periods=5*24, freq='1H', tz='UTC'),
    'temp': np.sin(np.linspace(0, 5*2*np.pi, 5*24)).cumsum()})

time_df.hvplot(x='time.hour', by='time.dayofweek')
time_df.hvplot.heatmap(x='time.hour', y='time.day', C='temp')

Screen Shot 2019-08-08 at 3 15 00 PM

Screen Shot 2019-08-08 at 4 25 56 PM

@@ -710,7 +710,7 @@ def polygons(self, x=None, y=None, c=None, **kwds):
"""
return self(x, y, c=c, kind='polygons', **kwds)

def paths(self, **kwds):
def paths(self, x=None, y=None, c=None, **kwds):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was making it impossible to use paths with non-geopandas

data = self.data if data is None else data
if not x: x = data.columns[0]
if not y: y = data.columns[1]
self.use_index = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we handle these in some more general way?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think that we should let them use index, but I didn't want to change prior behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy to change it though if you don't think it'll be too painful

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the consequences?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of x defaulting to the first column, it'd default to the index , and then y instead of defaulting to the second columns would default to the first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave that for another time, I'm not totally sure we should do that for 2D plots.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't Pandas .plot() do it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pandas plot does not have these plot types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, doesn't Pandas .plot() use the index in that way, in general? I think hvPlot should follow what Pandas does, even for new plot types, to be predictable and unsurprising.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These types of plots require explicit x and y in pandas.plot

@philippjfr
Copy link
Member

One comment which is more of a question, otherwise looks good.

@jsignell
Copy link
Member Author

I just wrote a little comparer for matplotlib and hvplot

import pandas as pd
import hvplot.pandas
import matplotlib.pyplot as plt
%matplotlib inline

df = pd.DataFrame([[1,2],[3,4], [5,6],[7,8]] , columns=['A', 'B'])
method_list = [func for func in dir(df.plot) if not func.startswith('_')]

def compare_mpl_hvplot(kind):
    try:
        fig = plt.Figure(figsize=(6, 5))
        ax = fig.subplots()
        df.plot(kind=kind, ax=ax);
    except Exception as err:
        fig = pn.Column(pn.Spacer(height=30), pn.Pane(err.__repr__(), width=400))
    
    try:
        hv_fig = df.hvplot(kind=kind, height=300, width=400, legend='top_left')
    except Exception as err:
        hv_fig = pn.Pane(err.__repr__(), width=400)
    
    return pn.Column(
        f'### {kind}',
        pn.Row(fig, pn.Column(pn.Spacer(height=30), hv_fig))
    )

pn.interact(compare_mpl_hvplot, kind=method_list)

@jsignell
Copy link
Member Author

Similarly:

import panel as pn
import xarray as xr

import hvplot.xarray
import matplotlib.pyplot as plt

%matplotlib inline

ds = xr.tutorial.open_dataset('air_temperature').load()
da = ds.air.isel(time=500) - 273

method_list = [func for func in dir(da.plot) if not func.startswith('_')]
method_list += [func for func in dir(da.hvplot) if not func.startswith('_') and func not in method_list]

def compare_mpl_hvplot(kind):
    try:
        if kind == 'quadmesh':
            kind = 'pcolormesh'
        if kind == 'image':
            kind = 'imshow'
        fig = plt.Figure(figsize=(6, 5))
        ax = fig.subplots()
        getattr(da.plot, kind)(ax=ax);
    except Exception as err:
        fig = pn.Column(pn.Spacer(height=30), pn.Pane(err.__repr__(), width=400))
    
    try:
        if kind == 'pcolormesh':
            kind = 'quadmesh'
        if kind == 'imshow':
            kind = 'image'
        hv_fig = da.hvplot(kind=kind, height=300, width=400, legend='top_left')
    except Exception as err:
        hv_fig = pn.Pane(err.__repr__(), width=400)
    
    return pn.Column(
        f'### {kind}',
        pn.Row(fig, pn.Column(pn.Spacer(height=30), hv_fig))
    )

pn.interact(compare_mpl_hvplot, kind=method_list)

@jsignell
Copy link
Member Author

I'm not quite sure what's going on with that networkx notebook.

@jsignell jsignell merged commit bd31588 into master Sep 11, 2019
@jsignell jsignell deleted the jsignell/derived_dt_pandas branch September 11, 2019 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Derived datetime for tabular formats
3 participants