New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: add dtype= option to python parser #14295

Merged
merged 24 commits into from Nov 26, 2016

Conversation

Projects
None yet
5 participants
@chris-b1
Contributor

chris-b1 commented Sep 24, 2016

  • part of #12686
  • tests added / passed
  • passes git diff upstream/master | flake8 --diff
  • whatsnew entry

Ultimately I'm working towards #8212 (types in excel parser), which should be pretty straightforward after this.

Right now the tests are moved from c_parser_only.py, may need to add some more
cc @gfyoung

@codecov-io

This comment has been minimized.

Show comment
Hide comment
@codecov-io

codecov-io Sep 25, 2016

Current coverage is 85.22% (diff: 100%)

Merging #14295 into master will increase coverage by 0.01%

@@             master     #14295   diff @@
==========================================
  Files           143        143          
  Lines         50804      50833    +29   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43292      43324    +32   
+ Misses         7512       7509     -3   
  Partials          0          0          

Powered by Codecov. Last update 75b606a...3abb0bd

codecov-io commented Sep 25, 2016

Current coverage is 85.22% (diff: 100%)

Merging #14295 into master will increase coverage by 0.01%

@@             master     #14295   diff @@
==========================================
  Files           143        143          
  Lines         50804      50833    +29   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43292      43324    +32   
+ Misses         7512       7509     -3   
  Partials          0          0          

Powered by Codecov. Last update 75b606a...3abb0bd

result[c] = cvals
if verbose and na_count:
print('Filled %d NA values in column %s' % (na_count, str(c)))
return result
def _convert_types(self, values, na_values, try_num_bool=True):
def _infer_types(self, values, na_values, try_num_bool=True):

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Oct 27, 2016

Member

While you are at it, can you add a docstring here?

@jorisvandenbossche

jorisvandenbossche Oct 27, 2016

Member

While you are at it, can you add a docstring here?

Show outdated Hide outdated pandas/io/parsers.py
not interpret dtype.
Use `str` or `object` to preserve and not interpret dtype.
If converters are specified, they will be applied AFTER
dtype conversion.

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Oct 27, 2016

Member

Is there a test that exercises this assumption?

@jorisvandenbossche

jorisvandenbossche Oct 27, 2016

Member

Is there a test that exercises this assumption?

This comment has been minimized.

@chris-b1

chris-b1 Oct 30, 2016

Contributor

I was wrong, the c-parser actually ignores the dtype and just uses the converter - changed docstring and added matching test.

@chris-b1

chris-b1 Oct 30, 2016

Contributor

I was wrong, the c-parser actually ignores the dtype and just uses the converter - changed docstring and added matching test.

@jorisvandenbossche

Looks nice to me! (added two minor comments)

Show outdated Hide outdated pandas/io/parsers.py
values, conv_f, na_values,
col_na_values, col_na_fvalues)
else:
try_num_bool = True

This comment has been minimized.

@gfyoung

gfyoung Oct 27, 2016

Member

Why not just this:

try_num_bool = cast_type and is_object_dtype(cast_type)
@gfyoung

gfyoung Oct 27, 2016

Member

Why not just this:

try_num_bool = cast_type and is_object_dtype(cast_type)
Show outdated Hide outdated pandas/io/parsers.py
@@ -1355,6 +1392,23 @@ def _convert_types(self, values, na_values, try_num_bool=True):
return result, na_count
def _cast_types(self, values, cast_type, column):
""" cast column to type specified in dtypes= param """

This comment has been minimized.

@gfyoung

gfyoung Oct 27, 2016

Member

While I know this isn't done too much for internal functions, it would be good to start documenting these functions more thoroughly for development purposes.

@gfyoung

gfyoung Oct 27, 2016

Member

While I know this isn't done too much for internal functions, it would be good to start documenting these functions more thoroughly for development purposes.

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Oct 27, 2016

Member

While I know this isn't done too much for internal functions,

But we should do it more! (therefore my comment above) It makes familiarizing yourself with a module a lot easier.
BTW, a PR to go through a file and document some of the functions (eg things you haven been working on in recent PRs and so you know better) is always very welcome :-)

@jorisvandenbossche

jorisvandenbossche Oct 27, 2016

Member

While I know this isn't done too much for internal functions,

But we should do it more! (therefore my comment above) It makes familiarizing yourself with a module a lot easier.
BTW, a PR to go through a file and document some of the functions (eg things you haven been working on in recent PRs and so you know better) is always very welcome :-)

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Nov 2, 2016

Member

@chris-b1 xref #14558. You say that dtype is ignored when converters is specified. In that case it would be nice to raise a warning about this if both are specified I think.

Member

jorisvandenbossche commented Nov 2, 2016

@chris-b1 xref #14558. You say that dtype is ignored when converters is specified. In that case it would be nice to raise a warning about this if both are specified I think.

@jorisvandenbossche

Two small comments, looks good to me!

Show outdated Hide outdated doc/source/whatsnew/v0.20.0.txt
.. ipython:: python
from io import StringIO

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Nov 7, 2016

Member

this will not work on python 2. Should check how it is done in other places in the docs

@jorisvandenbossche

jorisvandenbossche Nov 7, 2016

Member

this will not work on python 2. Should check how it is done in other places in the docs

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Nov 7, 2016

Member

from pandas.compat import StringIO is imported in a suppressed ipython code block at the top of the file, so you can just use StringIO

@jorisvandenbossche

jorisvandenbossche Nov 7, 2016

Member

from pandas.compat import StringIO is imported in a suppressed ipython code block at the top of the file, so you can just use StringIO

This comment has been minimized.

@chris-b1

chris-b1 Nov 13, 2016

Contributor

For the record, the io module was backported to 2.6, but I will remove this, as it is already imported like you mentioned.

@chris-b1

chris-b1 Nov 13, 2016

Contributor

For the record, the io module was backported to 2.6, but I will remove this, as it is already imported like you mentioned.

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Nov 14, 2016

Member

@chris-b1 That's true, but then the input must be unicode strings, not plain python 2 strings (which I think is not the case in the docs, but not sure)

@jorisvandenbossche

jorisvandenbossche Nov 14, 2016

Member

@chris-b1 That's true, but then the input must be unicode strings, not plain python 2 strings (which I think is not the case in the docs, but not sure)

Show outdated Hide outdated pandas/io/parsers.py
If converters are specified, they will be applied INSTEAD
of dtype conversion.
.. versionadded:: 0.20.0 support for the Python parser.

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Nov 7, 2016

Member

I personally would leave this out of the docstring (this is already so long ..), but not strong feelings, your take

@jorisvandenbossche

jorisvandenbossche Nov 7, 2016

Member

I personally would leave this out of the docstring (this is already so long ..), but not strong feelings, your take

@jorisvandenbossche jorisvandenbossche added this to the 0.20.0 milestone Nov 11, 2016

@chris-b1 chris-b1 changed the title from API: add dtype= option to python parser (WIP) to API: add dtype= option to python parser Nov 13, 2016

@chris-b1

This comment has been minimized.

Show comment
Hide comment
@chris-b1

chris-b1 Nov 13, 2016

Contributor

I think I've got all the feedback so far worked in - any more comments? @jorisvandenbossche, @gfyoung, @jreback

Contributor

chris-b1 commented Nov 13, 2016

I think I've got all the feedback so far worked in - any more comments? @jorisvandenbossche, @gfyoung, @jreback

@gfyoung

LGTM!

@jorisvandenbossche

Looks good to me as well

Show outdated Hide outdated doc/source/io.rst
Specifying ``dtype`` with ``engine`` other than 'c' raises a
``ValueError``.
.. versionadded:: 0.20.0 support for the Python parser.
The ``dtype`` option is supported by the 'python' engine

This comment has been minimized.

@jreback

jreback Nov 15, 2016

Contributor

I think you need a blank line here (to avoid warnings)

@jreback

jreback Nov 15, 2016

Contributor

I think you need a blank line here (to avoid warnings)

Show outdated Hide outdated doc/source/whatsnew/v0.20.0.txt
@@ -32,6 +32,14 @@ Other enhancements
- ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)

This comment has been minimized.

@jreback

jreback Nov 15, 2016

Contributor

i would make a sub-section

@jreback

jreback Nov 15, 2016

Contributor

i would make a sub-section

if conv:
if col_dtype is not None:
warnings.warn(("Both a converter and dtype were specified "

This comment has been minimized.

@jreback

jreback Nov 15, 2016

Contributor

is there a test for this? (IOW, this was prob tested for c-engine, but now need to test for all)

@jreback

jreback Nov 15, 2016

Contributor

is there a test for this? (IOW, this was prob tested for c-engine, but now need to test for all)

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Nov 22, 2016

Member

yes, this was added here:e0d1606

@jorisvandenbossche

jorisvandenbossche Nov 22, 2016

Member

yes, this was added here:e0d1606

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Nov 22, 2016

Member

@chris-b1 : I'm putting together a PR to patch #14712. As this directly relates to your PR, I would suggest waiting for mine to make sure the bug isn't an issue for you.

Member

gfyoung commented Nov 22, 2016

@chris-b1 : I'm putting together a PR to patch #14712. As this directly relates to your PR, I would suggest waiting for mine to make sure the bug isn't an issue for you.

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Nov 22, 2016

Member

@chris-b1 can you rebase this? Then this can be merged I think (depending on the status of @gfyoung new PR)

Member

jorisvandenbossche commented Nov 22, 2016

@chris-b1 can you rebase this? Then this can be merged I think (depending on the status of @gfyoung new PR)

@chris-b1

This comment has been minimized.

Show comment
Hide comment
@chris-b1

chris-b1 Nov 23, 2016

Contributor

Finally got back to this, rebased and updated docs - assuming tests still pass this should be good. Will have to check, but I don't think this will conflict with #14717.

Contributor

chris-b1 commented Nov 23, 2016

Finally got back to this, rebased and updated docs - assuming tests still pass this should be good. Will have to check, but I don't think this will conflict with #14717.

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.19.2, 0.20.0 Nov 24, 2016

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Nov 24, 2016

Member

@chris-b1 I merged #14717. This added a dtype test to c_parser_only, so added a commit to move this to the general dtype tests as well. Feel free to merge if you're ok with this and the tests are green.

Member

jorisvandenbossche commented Nov 24, 2016

@chris-b1 I merged #14717. This added a dtype test to c_parser_only, so added a commit to move this to the general dtype tests as well. Feel free to merge if you're ok with this and the tests are green.

Show outdated Hide outdated doc/source/whatsnew/v0.20.0.txt
The ``dtype`` keyword argument in the :func:`read_csv` function for specifying the types of parsed columns
is now supported with the ``'python'`` engine. See the :ref:`io docs <io.dtypes>` for more information.

This comment has been minimized.

@jreback

jreback Nov 25, 2016

Contributor

add the issue references here

@jreback

jreback Nov 25, 2016

Contributor

add the issue references here

chris-b1 and others added some commits Nov 25, 2016

Merge branch 'textreader-dtype' of https://github.com/chris-b1/pandas
…into textreader-dtype

Conflicts:
	pandas/io/tests/parser/dtypes.py

@jorisvandenbossche jorisvandenbossche merged commit 75bb530 into pandas-dev:master Nov 26, 2016

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche
Member

jorisvandenbossche commented Nov 26, 2016

@chris-b1 Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment