Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When hvplot checks is_intake, remove pathname requirement #1051

Closed
kthyng opened this issue Mar 28, 2023 · 9 comments
Closed

When hvplot checks is_intake, remove pathname requirement #1051

kthyng opened this issue Mar 28, 2023 · 9 comments
Labels
interface: intake type: bug Something isn't working

Comments

@kthyng
Copy link
Contributor

kthyng commented Mar 28, 2023

ALL software version info

(this library, plus any other relevant software, e.g. bokeh, python, notebook, OS, browser, etc)

hvplot                    0.8.3                      py_0    pyviz
intake                    0.6.6                    pypi_0    pypi
bokeh                     2.4.3              pyhd8ed1ab_3    conda-forge

Description of expected behavior and the observed behavior

I'm making a plot with intake through an intake catalog using my own DataFrameTransform. When hvplot checks the incoming data in _process_data() in converter.py, it uses utils.py is_intake() which runs

    if not check_library(data, 'intake'):
        return False

and an intake Transform that is not a fully-qualified path in the intake namespace (as mine isn't because it is just defined in my project) won't pass this test. But, I don't think it should need to since it is subsequently checked with isinstance(data, DataSource). See discussion on this intake/intake#724.

Complete, minimal, self-contained example code that reproduces the issue

Save to a local file "test_transform.py":

from intake.source.derived import GenericTransform

class DataFrameTransform(GenericTransform):
    """Transform where the input and output are both Dask-compatible dataframes

    This derives from GenericTransform, and you mus supply ``transform`` and
    any ``transform_kwargs``.
    """

    input_container = "dataframe"
    container = "dataframe"
    optional_params = {}
    _df = None

    def to_dask(self):
        if self._df is None:
            self._pick()
            self._df = self._transform(self._source.to_dask().compute(),
                                       **self._params["transform_kwargs"])
        return self._df

    def _get_schema(self):
        """load metadata only if needed"""
        self.to_dask()
        return Schema(dtype=self._df.dtypes,
                      shape=(None, len(self._df.columns)),
                      npartitions=self._df.npartitions,
                      metadata=self.metadata)

    def read(self):
        # return self.read()
        return self.to_dask()


def DF(df):
    return df
import pandas as pd
import intake

# make data
d = {"a": [0,0.1], "b": [1,1.1]}
pd.DataFrame(d).to_csv("test.csv")

# make catalog
cat = \
'''
name: ctd
description: 

sources:
  ctd_source_base:
    description: CTD Source base
    driver: csv
    args:
      urlpath: test.csv

  ctd_source:
    description: CTD Source
    driver: test_transform.DataFrameTransform
    args:
      targets: [ctd_source_base]
      transform: "test_transform.DF"
      transform_kwargs: {}
      
'''

file = open("test.yaml", "w")
file.writelines(cat)
file.close()

cat = intake.open_catalog("test.yaml")

# try to make plot
cat['ctd_source'].plot()

Stack traceback and/or browser JavaScript console output

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[6], line 1
----> 1 cat['ctd_source'].plot()

File ~/miniconda3/envs/ciofs/lib/python3.11/site-packages/intake/source/base.py:171, in HVplotCapture.__call__(self, plotname, *args, **kwargs)
    169 if plotname:
    170     self._data.metadata["plots"][plotname] = kwargs
--> 171 return super().__call__(*args, **kwargs)

File ~/miniconda3/envs/ciofs/lib/python3.11/site-packages/hvplot/plotting/core.py:92, in hvPlotBase.__call__(self, x, y, kind, **kwds)
     89         plot = self._get_converter(x, y, kind, **kwds)(kind, x, y)
     90         return pn.panel(plot, **panel_dict)
---> 92 return self._get_converter(x, y, kind, **kwds)(kind, x, y)

File ~/miniconda3/envs/ciofs/lib/python3.11/site-packages/hvplot/plotting/core.py:99, in hvPlotBase._get_converter(self, x, y, kind, **kwds)
     97 y = y or params.pop("y", None)
     98 kind = kind or params.pop("kind", None)
---> 99 return HoloViewsConverter(self._data, x, y, kind=kind, **params)

File ~/miniconda3/envs/ciofs/lib/python3.11/site-packages/hvplot/converter.py:389, in HoloViewsConverter.__init__(self, data, x, y, kind, by, use_index, group_label, value_label, backlog, persist, use_dask, crs, fields, groupby, dynamic, grid, legend, rot, title, xlim, ylim, clim, symmetric, logx, logy, loglog, hover, subplots, label, invert, stacked, colorbar, datashade, rasterize, row, col, debug, framewise, aggregator, projection, global_extent, geo, precompute, flip_xaxis, flip_yaxis, dynspread, hover_cols, x_sampling, y_sampling, project, tools, attr_labels, coastline, tiles, sort_date, check_symmetric_max, transforms, stream, cnorm, features, rescale_discrete_levels, **kwds)
    387 self.value_label = value_label
    388 self.label = label
--> 389 self._process_data(
    390     kind, data, x, y, by, groupby, row, col, use_dask,
    391     persist, backlog, label, group_label, value_label,
    392     hover_cols, attr_labels, transforms, stream, kwds
    393 )
    395 self.dynamic = dynamic
    396 self.geo = any([geo, crs, global_extent, projection, project, coastline, features])

File ~/miniconda3/envs/ciofs/lib/python3.11/site-packages/hvplot/converter.py:800, in HoloViewsConverter._process_data(self, kind, data, x, y, by, groupby, row, col, use_dask, persist, backlog, label, group_label, value_label, hover_cols, attr_labels, transforms, stream, kwds)
    798     self.data = data
    799 else:
--> 800     raise ValueError('Supplied data type %s not understood' % type(data).__name__)
    802 if stream is not None:
    803     if streaming:

ValueError: Supplied data type DataFrameTransform not understood

Suggestion

So, perhaps a simple fix is just to remove

    if not check_library(data, 'intake'):
        return False

in is_intake() but I wasn't sure if that part was necessary for some other reason. Or just both checks could be options like:

def is_intake(data):
    from intake.source.base import DataSource
    return check_library(data, 'intake') or isinstance(data, DataSource)
@hoxbro
Copy link
Member

hoxbro commented Mar 28, 2023

This seems like a correct suggestion.

Are you up for a PR to fixs it? 🙂

@hoxbro hoxbro added type: bug Something isn't working interface: intake labels Mar 28, 2023
@kthyng
Copy link
Contributor Author

kthyng commented Mar 28, 2023

@hoxbro Yes I can do a PR — Do you mean the first or second proposed solution?

@hoxbro
Copy link
Member

hoxbro commented Mar 28, 2023

I would be happy with either of them 👍

@kthyng
Copy link
Contributor Author

kthyng commented Mar 28, 2023

@martindurant Do you have an opinion about which of the above two solutions (at the bottom of the initial post) you think would be better? That is, is there a case in which check_library(data, 'intake') would return True but that isn't an adequate check, or vice versa? If not, I'd use the second solution of

def is_intake(data):
    from intake.source.base import DataSource
    return check_library(data, 'intake') or isinstance(data, DataSource)

@martindurant
Copy link

I can understand that hvplot needs to run the check for the case that intake isn't installed, and ideally doesn't want to import intake either just for the purpose of the check. You could do

def is_intake(data):
    return any(check_library(cls, 'intake') for cls in type(data).mro())

i.e., a way to catch subclasses defined outside of the intake library. Note that the previous method is accidentally passing for most derived classes, since there's a convention to name packages "intake_*", but this is not a requirement.

@kthyng
Copy link
Contributor Author

kthyng commented Mar 28, 2023

Ok great, and that works in my test case too. I'll work on a PR with that change.

@hoxbro
Copy link
Member

hoxbro commented Mar 30, 2023

With the changes made in #1052, I still get an error. This can be recreated by using your example and replacing the last line with cat['ctd_source'].read(). The fix is to change targets: ctd_source_base to targets: [ctd_source_base]. I don't know if a better error message is needed as this is the traceback that I got:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[11], line 40
     36 cat = intake.open_catalog("test.yaml")
     38 # try to make plot
     39 # cat['ctd_source'].plot()
---> 40 cat['ctd_source'].read()

File ~/Development/holoviz/development/dev_hvplot/test_transform.py:32, in DataFrameTransform.read(self)
     30 def read(self):
     31     # return self.read()
---> 32     return self.to_dask()

File ~/Development/holoviz/development/dev_hvplot/test_transform.py:17, in DataFrameTransform.to_dask(self)
     15 def to_dask(self):
     16     if self._df is None:
---> 17         self._pick()
     18         self._df = self._transform(self._source.to_dask().compute(),
     19                                    **self._params["transform_kwargs"])
     20     return self._df

File ~/miniconda3/envs/holoviz/lib/python3.10/site-packages/intake/source/derived.py:171, in DerivedSource._pick(self)
    169 def _pick(self):
    170     """Pick the source from the given targets"""
--> 171     self._source = self._chooser(self.targets, self.cat, self._kwargs, self._cat_kwargs)
    172     if self.input_container != "other":
    173         assert self._source.container == self.input_container

File ~/miniconda3/envs/holoviz/lib/python3.10/site-packages/intake/source/derived.py:100, in first(targets, cat, kwargs, cat_kwargs)
     94 """A target chooser that simply picks the first from the given list
     95 
     96 This is the default, particularly for the case of only one element in
     97 the list
     98 """
     99 targ = targets[0]
--> 100 return get_source(targ, cat, kwargs.get(targ, {}), cat_kwargs)

File ~/miniconda3/envs/holoviz/lib/python3.10/site-packages/intake/source/derived.py:88, in get_source(target, cat, kwargs, cat_kwargs)
     86     cat = cached_cats(caturl, **cat_kwargs)
     87 if cat:
---> 88     return cat[target].configure_new(**kwargs)
     89 # for testing only
     90 return target

File ~/miniconda3/envs/holoviz/lib/python3.10/site-packages/intake/catalog/base.py:475, in Catalog.__getitem__(self, key)
    473         out = out[part]
    474     return out()
--> 475 raise KeyError(key)

KeyError: 'c'

@kthyng
Copy link
Contributor Author

kthyng commented Mar 31, 2023

@hoxbro Oh I forgot that the target has to be a list.

@kthyng
Copy link
Contributor Author

kthyng commented Mar 31, 2023

I just edited the initial comment so that it is correct now.

@hoxbro hoxbro closed this as completed Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interface: intake type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants