Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] dask_cudf.to_parquet fails with "AttributeError: 'GenericIndex' object has no attribute 'get_level_values'" #3365

Closed
randerzander opened this issue Nov 12, 2019 · 2 comments · Fixed by #3369
Assignees
Labels
bug Something isn't working cuIO cuIO issue dask Dask issue

Comments

@randerzander
Copy link
Contributor

Using rapidsai nightly conda packages and Dask from GitHub master:

df = cudf.DataFrame()

df['id'] = [0, 1, 2]
df['val'] = [0, 1, 2]

ddf = dask_cudf.from_cudf(df, npartitions=2)
ddf.to_parquet('test')

Result:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-10-9e4ebf82214b> in <module>
----> 1 ddf.to_parquet('test2')

/conda/envs/rapids/lib/python3.7/site-packages/dask/dataframe/core.py in to_parquet(self, path, *args, **kwargs)
   3643         from .io import to_parquet
   3644 
-> 3645         return to_parquet(self, path, *args, **kwargs)
   3646 
   3647     @derived_from(pd.DataFrame)

/conda/envs/rapids/lib/python3.7/site-packages/dask/dataframe/io/parquet/core.py in to_parquet(df, path, engine, compression, write_index, append, ignore_divisions, partition_on, storage_options, write_metadata_file, compute, **kwargs)
    532 
    533     if compute:
--> 534         out = out.compute()
    535     return out
    536 

/conda/envs/rapids/lib/python3.7/site-packages/dask/base.py in compute(self, **kwargs)
    163         dask.base.compute
    164         """
--> 165         (result,) = compute(self, traverse=False, **kwargs)
    166         return result
    167 

/conda/envs/rapids/lib/python3.7/site-packages/dask/base.py in compute(*args, **kwargs)
    434     keys = [x.__dask_keys__() for x in collections]
    435     postcomputes = [x.__dask_postcompute__() for x in collections]
--> 436     results = schedule(dsk, keys, **kwargs)
    437     return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
    438 

/conda/envs/rapids/lib/python3.7/site-packages/distributed/client.py in get(self, dsk, keys, restrictions, loose_restrictions, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs)
   2574                     should_rejoin = False
   2575             try:
-> 2576                 results = self.gather(packed, asynchronous=asynchronous, direct=direct)
   2577             finally:
   2578                 for f in futures.values():

/conda/envs/rapids/lib/python3.7/site-packages/distributed/client.py in gather(self, futures, errors, direct, asynchronous)
   1874                 direct=direct,
   1875                 local_worker=local_worker,
-> 1876                 asynchronous=asynchronous,
   1877             )
   1878 

/conda/envs/rapids/lib/python3.7/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
    769         else:
    770             return sync(
--> 771                 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
    772             )
    773 

/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
    332     if error[0]:
    333         typ, exc, tb = error[0]
--> 334         raise exc.with_traceback(tb)
    335     else:
    336         return result[0]

/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py in f()
    316             if callback_timeout is not None:
    317                 future = gen.with_timeout(timedelta(seconds=callback_timeout), future)
--> 318             result[0] = yield future
    319         except Exception as exc:
    320             error[0] = sys.exc_info()

/conda/envs/rapids/lib/python3.7/site-packages/tornado/gen.py in run(self)
    733 
    734                     try:
--> 735                         value = future.result()
    736                     except Exception:
    737                         exc_info = sys.exc_info()

/conda/envs/rapids/lib/python3.7/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker)
   1730                             exc = CancelledError(key)
   1731                         else:
-> 1732                             raise exception.with_traceback(traceback)
   1733                         raise exc
   1734                     if errors == "skip":

/conda/envs/rapids/lib/python3.7/site-packages/dask/utils.py in apply()
     27 def apply(func, args, kwargs=None):
     28     if kwargs:
---> 29         return func(*args, **kwargs)
     30     else:
     31         return func(*args)

/conda/envs/rapids/lib/python3.7/site-packages/dask/dataframe/io/parquet/arrow.py in write_partition()
    462             df = df.set_index(index_cols)
    463             preserve_index = True
--> 464         t = pa.Table.from_pandas(df, preserve_index=preserve_index, schema=schema)
    465         if partition_on:
    466             pq.write_to_dataset(

/conda/envs/rapids/lib/python3.7/site-packages/pyarrow/table.pxi in pyarrow.lib.Table.from_pandas()
   1055         """
   1056         from pyarrow.pandas_compat import dataframe_to_arrays
-> 1057         arrays, schema = dataframe_to_arrays(
   1058             df,
   1059             schema=schema,

/conda/envs/rapids/lib/python3.7/site-packages/pyarrow/pandas_compat.py in dataframe_to_arrays()
    515      columns_to_convert,
    516      convert_fields) = _get_columns_to_convert(df, schema, preserve_index,
--> 517                                                columns)
    518 
    519     # NOTE(wesm): If nthreads=None, then we use a heuristic to decide whether

/conda/envs/rapids/lib/python3.7/site-packages/pyarrow/pandas_compat.py in _get_columns_to_convert()
    340 
    341     index_levels = (
--> 342         _get_index_level_values(df.index) if preserve_index is not False
    343         else []
    344     )

/conda/envs/rapids/lib/python3.7/site-packages/pyarrow/pandas_compat.py in _get_index_level_values()
    461 def _get_index_level_values(index):
    462     n = len(getattr(index, 'levels', [index]))
--> 463     return [index.get_level_values(i) for i in range(n)]
    464 
    465 

/conda/envs/rapids/lib/python3.7/site-packages/pyarrow/pandas_compat.py in <listcomp>()
    461 def _get_index_level_values(index):
    462     n = len(getattr(index, 'levels', [index]))
--> 463     return [index.get_level_values(i) for i in range(n)]
    464 
    465 

AttributeError: 'GenericIndex' object has no attribute 'get_level_values'
@randerzander randerzander added bug Something isn't working dask Dask issue cuIO cuIO issue labels Nov 12, 2019
@rjzamora
Copy link
Member

Yikes - Thanks for raising this! I will start looking into a fix right now

@beckernick
Copy link
Member

@randerzander , this is a continuation of #2637 , which now causes this error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue dask Dask issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants