Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Support pathlib in FileDataStream #269

Open
ianlini opened this issue Sep 16, 2019 · 3 comments · May be fixed by #377
Open

Support pathlib in FileDataStream #269

ianlini opened this issue Sep 16, 2019 · 3 comments · May be fixed by #377

Comments

@ianlini
Copy link

ianlini commented Sep 16, 2019

pathlib is a built-in module that is very popular in Python. Almost all APIs in Python built-in modules, numpy and pandas support path-like objects as arguments for path-related parameters. Therefore, it would be better to support them in FileDataStream.

Current behavior:

In [1]: from nimbusml import FileDataStream

In [2]: from pathlib import Path

In [3]: test= FileDataStream.read_csv(Path('test.csv'))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-41a5e889c3ff> in <module>
----> 1 test= FileDataStream.read_csv(Path('test.csv'))

~/.pyenv/versions/3.7.4/lib/python3.7/site-packages/nimbusml/internal/utils/utils.py in wrapper(*args, **kwargs)
    218                          '__qualname__',
    219                          func.__name__)))
--> 220             params = func(*args, **kwargs)
    221             if verbose > 0:
    222                 logger_trace.info(

~/.pyenv/versions/3.7.4/lib/python3.7/site-packages/nimbusml/internal/utils/data_stream.py in read_csv(filepath_or_buffer, tool, nrows, **kwargs)
    306         if tool == 'pandas':
    307             return FileDataStream.read_csv_pandas(
--> 308                 filepath_or_buffer, nrows=nrows, **kwargs)
    309         elif tool == 'internal':
    310             if 'schema' not in kwargs:

~/.pyenv/versions/3.7.4/lib/python3.7/site-packages/nimbusml/internal/utils/utils.py in wrapper(*args, **kwargs)
    218                          '__qualname__',
    219                          func.__name__)))
--> 220             params = func(*args, **kwargs)
    221             if verbose > 0:
    222                 logger_trace.info(

~/.pyenv/versions/3.7.4/lib/python3.7/site-packages/nimbusml/internal/utils/data_stream.py in read_csv_pandas(filepath_or_buffer, nrows, collapse, numeric_dtype, **kwargs)
    340         """
    341         schema = DataSchema.read_schema(filepath_or_buffer, collapse=collapse,
--> 342                                         numeric_dtype=numeric_dtype, **kwargs)
    343         return FileDataStream(filepath_or_buffer, schema)
    344

~/.pyenv/versions/3.7.4/lib/python3.7/site-packages/nimbusml/internal/utils/data_schema.py in read_schema(*data, **options)
    855                 raise TypeError(
    856                     "Unable to guess the schema for type '{0}'".format(
--> 857                         type(X)))
    858             final_schema = sch
    859

TypeError: Unable to guess the schema for type '<class 'pathlib.PosixPath'>'

Expected behavior:
FileDataStream.read_csv(Path('test.csv')) is equivalent to FileDataStream.read_csv('test.csv').

@ganik
Copy link
Member

ganik commented Sep 18, 2019

Thank you @ianlini, it should be straightforward change to support this. would u like to take it on ?

@pnshinde
Copy link

Hi! I’m new to open source and I’d like to take on this task along with #274 over the next couple of weeks. Is that alright?

@ganik
Copy link
Member

ganik commented Nov 18, 2019

Hi @pnshinde ! You are very welcome to take this on! Let me know if you need any help, thx

@pnshinde pnshinde linked a pull request Nov 29, 2019 that will close this issue
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants