rmagic can't run R's read.csv on data files with NA data #2418

Closed
rhiever opened this Issue Sep 20, 2012 · 4 comments

Comments

Projects
None yet
3 participants
@rhiever

rhiever commented Sep 20, 2012

Full explanation (with traceback) is here: http://stackoverflow.com/questions/12517135/rpy2-rmagic-cant-read-csv-data-file/12517488

It runs fine with pure rpy2, but if I try to run it with rmagic, it gives me an odd error.

@bfroehle

This comment has been minimized.

Show comment Hide comment
@bfroehle

bfroehle Sep 21, 2012

Contributor

Can you post an example csv file which reproduces the bug? For example, I cannot reproduce this:

In [1]: %load_ext rmagic

In [2]: %R my.data <- read.csv('data.csv')
Out[2]: 
array([[1],
       [0],
       [3],
       [4]], dtype=int32)

In [3]: cat data.csv
#a, b, c, d
1,,3,4
Contributor

bfroehle commented Sep 21, 2012

Can you post an example csv file which reproduces the bug? For example, I cannot reproduce this:

In [1]: %load_ext rmagic

In [2]: %R my.data <- read.csv('data.csv')
Out[2]: 
array([[1],
       [0],
       [3],
       [4]], dtype=int32)

In [3]: cat data.csv
#a, b, c, d
1,,3,4
@rhiever

This comment has been minimized.

Show comment Hide comment
@rhiever

rhiever Sep 21, 2012

Here's a subset that causes the error:

replicate,line,genotype,temp,femur,tibia,tarsus,SCT
1,line-1,Dll,25,0.590334,0.4991572,0.2189781,9
1,line-1,Dll,25,0.5504164,0.5007439,0.2136691,13
1,line-1,Dll,25,0.588486,0.4879058,0.2105431,11
1,line-1,Dll,25,0.5882244,0.5148501,0.2105431,
1,line-2,Dll,25,,0.489045,0.2025757,12

rhiever commented Sep 21, 2012

Here's a subset that causes the error:

replicate,line,genotype,temp,femur,tibia,tarsus,SCT
1,line-1,Dll,25,0.590334,0.4991572,0.2189781,9
1,line-1,Dll,25,0.5504164,0.5007439,0.2136691,13
1,line-1,Dll,25,0.588486,0.4879058,0.2105431,11
1,line-1,Dll,25,0.5882244,0.5148501,0.2105431,
1,line-2,Dll,25,,0.489045,0.2025757,12
@bfroehle

This comment has been minimized.

Show comment Hide comment
@bfroehle

bfroehle Sep 21, 2012

Contributor

Okay thanks. Here's how to reproduce this without IPython:

>>> import rpy2.robjects as ro
>>> import numpy as np
>>> myData = ro.r['read.csv']('data.csv')
>>> np.asarray(myData)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dist-packages/numpy/core/numeric.py", line 235, in asarray
    return array(a, dtype, copy=False, order=order)
TypeError: __float__ returned non-float (type rpy2.rinterface.NAIntegerType)

In your case, you'll need to interpret the result as a dataframe. Note the trailing semicolon which is required so that IPython doesn't try to convert myData into a regular array.

In [1]: %load_ext rmagic

In [2]: %R -d myData myData <- read.csv('data.csv');

In [3]: myData
Out[3]: 
array([(1, 1, 1, 25, 0.590334, 0.4991572, 0.2189781, 9),
       (1, 1, 1, 25, 0.5504164, 0.5007439, 0.2136691, 13),
       (1, 1, 1, 25, 0.588486, 0.4879058, 0.2105431, 11),
       (1, 1, 1, 25, 0.5882244, 0.5148501, 0.2105431, -2147483648),
       (1, 2, 1, 25, nan, 0.489045, 0.2025757, 12)], 
      dtype=[('replicate', '<i4'), ('line', '<i4'), ('genotype', '<i4'), ('temp', '<i4'), ('femur', '<f8'), ('tibia', '<f8'), ('tarsus', '<f8'), ('SCT', '<i4')])

Or instead of the semicolon, use %R -nd myData myData <- read.csv('data.csv')

Contributor

bfroehle commented Sep 21, 2012

Okay thanks. Here's how to reproduce this without IPython:

>>> import rpy2.robjects as ro
>>> import numpy as np
>>> myData = ro.r['read.csv']('data.csv')
>>> np.asarray(myData)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dist-packages/numpy/core/numeric.py", line 235, in asarray
    return array(a, dtype, copy=False, order=order)
TypeError: __float__ returned non-float (type rpy2.rinterface.NAIntegerType)

In your case, you'll need to interpret the result as a dataframe. Note the trailing semicolon which is required so that IPython doesn't try to convert myData into a regular array.

In [1]: %load_ext rmagic

In [2]: %R -d myData myData <- read.csv('data.csv');

In [3]: myData
Out[3]: 
array([(1, 1, 1, 25, 0.590334, 0.4991572, 0.2189781, 9),
       (1, 1, 1, 25, 0.5504164, 0.5007439, 0.2136691, 13),
       (1, 1, 1, 25, 0.588486, 0.4879058, 0.2105431, 11),
       (1, 1, 1, 25, 0.5882244, 0.5148501, 0.2105431, -2147483648),
       (1, 2, 1, 25, nan, 0.489045, 0.2025757, 12)], 
      dtype=[('replicate', '<i4'), ('line', '<i4'), ('genotype', '<i4'), ('temp', '<i4'), ('femur', '<f8'), ('tibia', '<f8'), ('tarsus', '<f8'), ('SCT', '<i4')])

Or instead of the semicolon, use %R -nd myData myData <- read.csv('data.csv')

@rhiever

This comment has been minimized.

Show comment Hide comment
@rhiever

rhiever Sep 21, 2012

Perfect. Thank you!

rhiever commented Sep 21, 2012

Perfect. Thank you!

@rhiever rhiever closed this Sep 21, 2012

@minrk minrk added this to the no action milestone Mar 26, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment