Adding column of floats to DataFrame yields TypeError #7366

fonnesbeck · 2014-06-06T02:36:11Z

I have a function that builds a DataFrame of summary statistics that I have used routinely for several months, but now breaks due to a change in Pandas over the past few weeks. Specifically, I have the following list of floats:

(Pdb) [r[0] for r in ratios]
[1.1200000000000001, 5.0, 0.73999999999999999, 0.35999999999999999,  
1.1100000000000001, 1.1699999999999999, 0.92000000000000004,  0.94999999999999996, 1.0600000000000001, 0.77000000000000002,  
0.59999999999999998, 2.0099999999999998, 3.2999999999999998, 0.37,  
1.6100000000000001, 1.02]

Which I use to create a column in the following table:

(Pdb) table
oxygen                0     1
male               0.57  0.59
under 2 months     0.06  0.23
2-11 months        0.66  0.59
12-23 months       0.23  0.10
Jordanian          0.90  0.91
Palestinian        0.05  0.06
vitamin D < 20     0.55  0.53
vitamin D < 11     0.40  0.38
prev_cond          0.11  0.11
heart_hx           0.05  0.04
breastfed          0.68  0.56
premature          0.13  0.23
adm_pneumo         0.09  0.25
adm_bronchopneumo  0.52  0.28
adm_sepsis         0.11  0.16
adm_bronchiolitis  0.21  0.21

However, this now causes the following:

(Pdb) table['foo'] = [r[0] for r in ratios]
*** TypeError: Not implemented for this type

Here is a more verbose output:

TypeError                                 Traceback (most recent call last)
<ipython-input-49-0723b2a631c0> in <module>()
----> 1 make_table(groupby_o2, table_vars=table_vars, replace_dict={0.0: 'No Oxygen', 1.0: 'Oxygen'})

<ipython-input-47-6f3ebc37c721> in make_table(groupby, table_vars, replace_dict)
      3     ratios = [calc_or(groupby, v) for v in table.index]
      4     import pdb; pdb.set_trace()
----> 5     table['OR'] = [r[0] for r in ratios]
      6     table['Interval'] = [r[1] for r in ratios]
      7     table['N'] = [r[2] for r in ratios]

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in __setitem__(self, key, value)
   1899         else:
   1900             # set column
-> 1901             self._set_item(key, value)
   1902 
   1903     def _setitem_slice(self, key, value):

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _set_item(self, key, value)
   1982         self._ensure_valid_index(value)
   1983         value = self._sanitize_column(key, value)
-> 1984         NDFrame._set_item(self, key, value)
   1985 
   1986         # check if we are modifying a copy

/usr/local/lib/python2.7/site-packages/pandas/core/generic.pyc in _set_item(self, key, value)
   1137 
   1138     def _set_item(self, key, value):
-> 1139         self._data.set(key, value)
   1140         self._clear_item_cache()
   1141 

/usr/local/lib/python2.7/site-packages/pandas/core/internals.pyc in set(self, item, value, check)
   2637 
   2638         try:
-> 2639             loc = self.items.get_loc(item)
   2640         except KeyError:
   2641             # This item wasn't present, just insert at end

/usr/local/lib/python2.7/site-packages/pandas/core/index.pyc in get_loc(self, key)
   2055 
   2056     def get_loc(self, key):
-> 2057         if np.isnan(key):
   2058             try:
   2059                 return self._nan_idxs.item()

TypeError: Not implemented for this type

Not exactly sure which change caused it, but this code was working on the same data 3-4 weeks ago.

Currently running 0.13.1-936-g592a537 on OS X 10.9.3, Python 2.7.6 from Homebrew.

The text was updated successfully, but these errors were encountered:

jreback · 2014-06-06T02:46:01Z

what version of pandas? can u pickle the data (table and what u r adding) and give a link?

cpcloud · 2014-06-06T02:50:55Z

@fonnesbeck are your columns float columns? Can you the output of df.columns?

fonnesbeck · 2014-06-06T02:51:49Z

Sorry. Updated info above.

fonnesbeck · 2014-06-06T02:53:04Z

Yes, the table columns are floats:

(Pdb) table.dtypes
oxygen
0         float64
1         float64
dtype: object

(Pdb) table.columns
Float64Index([0.0, 1.0], dtype='float64')

jreback · 2014-06-06T02:53:27Z

that's a pretty old version

what changed in your setup?

did u try 0.14.0?

cpcloud · 2014-06-06T02:54:42Z

i think i know what's going on ... i'm guilty of that isnan line 😞

fonnesbeck · 2014-06-06T02:55:13Z

I'm just updating now from master. Thanks for the prompt response, as usual.

jreback · 2014-06-06T02:55:17Z

you have float columns (the index)
can't add a string 2 those I don't think
that gets a weird mixed index

cpcloud · 2014-06-06T02:56:00Z

yep ... can repro with this:

In [6]: df = DataFrame({0.0: rand(10), 1.0: rand(10)})

In [7]: df['a'] = 10

i think this should work since it works with int

cpcloud · 2014-06-06T02:56:43Z

fix on the way

jreback · 2014-06-06T02:59:44Z

needs to coerce back to Index I think

cpcloud · 2014-06-06T03:00:47Z

no just need to catch the TypeError and pass to the superclass

cpcloud · 2014-06-06T03:03:21Z

thanks @fonnesbeck sorry for breaking your code

fonnesbeck · 2014-06-06T03:04:19Z

No sweat. You guys are awesome.

glyg · 2014-06-19T15:51:05Z

Hey, I stumbled on that bug (I think the pip version suffers it). I don't really need a float index, and -even though I'll update pandas later- I would like to recast my faulty Float64 typed MultIndex to a good solid integer.

Is there a good way to do that (appart from the obvious stripping off the float index, casting it to int, and indexing back)?

Thanks,

G.

jreback · 2014-06-19T16:06:25Z

well you can reset_index() to get an integer index

glyg · 2014-06-19T16:16:49Z

Yes, but I wanted to keep the original MultIndex, only casted from float to int, not drop it alltogether

jreback · 2014-06-19T16:23:37Z

df = DataFrame(dict(values = np.arange(5), level_1 = list('aaabb'), level_2 = [1.,2.,3.,1.,2.]))

In [26]: df
Out[26]: 
  level_1  level_2  values
0       a        1       0
1       a        2       1
2       a        3       2
3       b        1       3
4       b        2       4

In [27]: df.set_index(['level_1','level_2'])
Out[27]: 
                 values
level_1 level_2        
a       1             0
        2             1
        3             2
b       1             3
        2             4

In [31]: df.set_index(['level_1','level_2']).index.levels[1]
Out[31]: Float64Index([1.0, 2.0, 3.0], dtype='float64')

Cast it to int (this will truncate FYI)

In [32]: df.set_index(['level_1','level_2']).reset_index()
Out[32]: 
  level_1  level_2  values
0       a        1       0
1       a        2       1
2       a        3       2
3       b        1       3
4       b        2       4

In [33]: df2 = df.set_index(['level_1','level_2']).reset_index()

In [34]: df2['level_2'] = df['level_2'].astype('int64')

In [35]: df2.set_index(['level_1','level_2']).index.levels[1]
Out[35]: Int64Index([1, 2, 3], dtype='int64')

In [36]: df2.set_index(['level_1','level_2'])
Out[36]: 
                 values
level_1 level_2        
a       1             0
        2             1
        3             2
b       1             3
        2             4

glyg · 2014-06-19T16:27:43Z

Ok that's what I did,
thanks for the quick reply!

cpcloud added Bug labels Jun 6, 2014

cpcloud added this to the 0.14.1 milestone Jun 6, 2014

cpcloud self-assigned this Jun 6, 2014

cpcloud added the Regression label Jun 6, 2014

cpcloud mentioned this issue Jun 6, 2014

BUG/REG: fix float64index -> mixed float assignment #7367

Closed

cpcloud mentioned this issue Jun 6, 2014

BUG/REG: fix float64index -> mixed float assignment #7368

Merged

cpcloud closed this as completed in #7368 Jun 6, 2014

wesm unassigned cpcloud Oct 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding column of floats to DataFrame yields TypeError #7366

Adding column of floats to DataFrame yields TypeError #7366

fonnesbeck commented Jun 6, 2014

jreback commented Jun 6, 2014

cpcloud commented Jun 6, 2014

fonnesbeck commented Jun 6, 2014

fonnesbeck commented Jun 6, 2014

jreback commented Jun 6, 2014

cpcloud commented Jun 6, 2014

fonnesbeck commented Jun 6, 2014

jreback commented Jun 6, 2014

cpcloud commented Jun 6, 2014

cpcloud commented Jun 6, 2014

jreback commented Jun 6, 2014

cpcloud commented Jun 6, 2014

cpcloud commented Jun 6, 2014

fonnesbeck commented Jun 6, 2014

glyg commented Jun 19, 2014

jreback commented Jun 19, 2014

glyg commented Jun 19, 2014

jreback commented Jun 19, 2014

glyg commented Jun 19, 2014

Adding column of floats to DataFrame yields TypeError #7366

Adding column of floats to DataFrame yields TypeError #7366

Comments

fonnesbeck commented Jun 6, 2014

jreback commented Jun 6, 2014

cpcloud commented Jun 6, 2014

fonnesbeck commented Jun 6, 2014

fonnesbeck commented Jun 6, 2014

jreback commented Jun 6, 2014

cpcloud commented Jun 6, 2014

fonnesbeck commented Jun 6, 2014

jreback commented Jun 6, 2014

cpcloud commented Jun 6, 2014

cpcloud commented Jun 6, 2014

jreback commented Jun 6, 2014

cpcloud commented Jun 6, 2014

cpcloud commented Jun 6, 2014

fonnesbeck commented Jun 6, 2014

glyg commented Jun 19, 2014

jreback commented Jun 19, 2014

glyg commented Jun 19, 2014

jreback commented Jun 19, 2014

glyg commented Jun 19, 2014