Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDFError: GDF_UNSUPPORTED_DTYPE with std() function #7

Closed
michael-balint opened this issue Jun 29, 2017 · 1 comment
Closed

GDFError: GDF_UNSUPPORTED_DTYPE with std() function #7

michael-balint opened this issue Jun 29, 2017 · 1 comment

Comments

@michael-balint
Copy link
Contributor

When running notebooks/mapd_to_pygdf_to_h2oaiglm.ipynb, an error occurs during step 23...

for k in (num_cols - response_set):
    df[k] = df[k].fillna(df[k].mean())
    assert df[k].null_count == 0
    std = df[k].std()
    # drop near constant columns
    if not np.isfinite(std) or std < 1e-4:
        del df[k]
        print('drop near constant', k)
    else:
        df[k] = df[k].scale()

Error output:

---------------------------------------------------------------------------
GDFError                                  Traceback (most recent call last)
<ipython-input-26-43006e4ffe8b> in <module>()
      2     df[k] = df[k].fillna(df[k].mean())
      3     assert df[k].null_count == 0
----> 4     std = df[k].std()
      5     # drop near constant columns
      6     if not np.isfinite(std) or std < 1e-4:

/home/appuser/pygdf/pygdf/dataframe.py in std(self)
   1074         """Compute the standard deviation of the series
   1075         """
-> 1076         return np.sqrt(self.var())
   1077 
   1078     def var(self):

/home/appuser/pygdf/pygdf/dataframe.py in var(self)
   1079         """Compute the variance of the series
   1080         """
-> 1081         mu, var = self.mean_var()
   1082         return var
   1083 

/home/appuser/pygdf/pygdf/dataframe.py in mean_var(self)
   1085         """Compute mean and variance at the same time.
   1086         """
-> 1087         mu, var = self._impl.stats(self).mean_var()
   1088         return mu, var
   1089 

/home/appuser/pygdf/pygdf/numerical.py in mean_var(self)
    130         mu = self.mean()
    131         n = len(self._series)
--> 132         asum = _gdf.apply_reduce(libgdf.gdf_sum_squared_generic, self._series)
    133         var = asum / n - mu ** 2
    134         return mu, var

/home/appuser/pygdf/pygdf/_gdf.py in apply_reduce(fn, inp)
     82     out = cuda.device_array(outsz, dtype=inp.dtype)
     83     # call reduction
---> 84     fn(inp._cffi_view, unwrap_devary(out), outsz)
     85     # return 1st element
     86     return out[0]

/home/appuser/Miniconda3/envs/pycudf_notebook_py35/lib/python3.5/site-packages/libgdf_cffi/wrapper.py in wrap(*args)
     26                         raw = self._api.gdf_error_get_name(errcode)
     27                         errname = self._ffi.string(raw).decode('ascii')
---> 28                         raise GDFError(errcode, errname)
     29 
     30                 wrap.__name__ = fn.__name__

GDFError: GDF_UNSUPPORTED_DTYPE

df is a pygdf.dataframe.DataFrame
df[k] is a pygdf.dataframe.Series
df[k][0] is a numpy.int32

@sklam
Copy link
Collaborator

sklam commented Jun 29, 2017

Some routines are converted from jit-compiled version into statically compiled version in libgdf. The error is raised when the operation doesn't support the dtype. There is a missing typecast or missing type-specialization.

sklam added a commit to sklam/pygdf that referenced this issue Jun 29, 2017
This introduces a inefficient typecast when the Series dtype is integral.
TODO add a specialized version to avoid the extra .astype().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants