Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding column of floats to DataFrame yields TypeError #7366

Closed
fonnesbeck opened this issue Jun 6, 2014 · 19 comments · Fixed by #7368
Closed

Adding column of floats to DataFrame yields TypeError #7366

fonnesbeck opened this issue Jun 6, 2014 · 19 comments · Fixed by #7368
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@fonnesbeck
Copy link

I have a function that builds a DataFrame of summary statistics that I have used routinely for several months, but now breaks due to a change in Pandas over the past few weeks. Specifically, I have the following list of floats:

(Pdb) [r[0] for r in ratios]
[1.1200000000000001, 5.0, 0.73999999999999999, 0.35999999999999999,  
1.1100000000000001, 1.1699999999999999, 0.92000000000000004,  0.94999999999999996, 1.0600000000000001, 0.77000000000000002,  
0.59999999999999998, 2.0099999999999998, 3.2999999999999998, 0.37,  
1.6100000000000001, 1.02]

Which I use to create a column in the following table:

(Pdb) table
oxygen                0     1
male               0.57  0.59
under 2 months     0.06  0.23
2-11 months        0.66  0.59
12-23 months       0.23  0.10
Jordanian          0.90  0.91
Palestinian        0.05  0.06
vitamin D < 20     0.55  0.53
vitamin D < 11     0.40  0.38
prev_cond          0.11  0.11
heart_hx           0.05  0.04
breastfed          0.68  0.56
premature          0.13  0.23
adm_pneumo         0.09  0.25
adm_bronchopneumo  0.52  0.28
adm_sepsis         0.11  0.16
adm_bronchiolitis  0.21  0.21

However, this now causes the following:

(Pdb) table['foo'] = [r[0] for r in ratios]
*** TypeError: Not implemented for this type

Here is a more verbose output:

TypeError                                 Traceback (most recent call last)
<ipython-input-49-0723b2a631c0> in <module>()
----> 1 make_table(groupby_o2, table_vars=table_vars, replace_dict={0.0: 'No Oxygen', 1.0: 'Oxygen'})

<ipython-input-47-6f3ebc37c721> in make_table(groupby, table_vars, replace_dict)
      3     ratios = [calc_or(groupby, v) for v in table.index]
      4     import pdb; pdb.set_trace()
----> 5     table['OR'] = [r[0] for r in ratios]
      6     table['Interval'] = [r[1] for r in ratios]
      7     table['N'] = [r[2] for r in ratios]

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in __setitem__(self, key, value)
   1899         else:
   1900             # set column
-> 1901             self._set_item(key, value)
   1902 
   1903     def _setitem_slice(self, key, value):

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _set_item(self, key, value)
   1982         self._ensure_valid_index(value)
   1983         value = self._sanitize_column(key, value)
-> 1984         NDFrame._set_item(self, key, value)
   1985 
   1986         # check if we are modifying a copy

/usr/local/lib/python2.7/site-packages/pandas/core/generic.pyc in _set_item(self, key, value)
   1137 
   1138     def _set_item(self, key, value):
-> 1139         self._data.set(key, value)
   1140         self._clear_item_cache()
   1141 

/usr/local/lib/python2.7/site-packages/pandas/core/internals.pyc in set(self, item, value, check)
   2637 
   2638         try:
-> 2639             loc = self.items.get_loc(item)
   2640         except KeyError:
   2641             # This item wasn't present, just insert at end

/usr/local/lib/python2.7/site-packages/pandas/core/index.pyc in get_loc(self, key)
   2055 
   2056     def get_loc(self, key):
-> 2057         if np.isnan(key):
   2058             try:
   2059                 return self._nan_idxs.item()

TypeError: Not implemented for this type

Not exactly sure which change caused it, but this code was working on the same data 3-4 weeks ago.

Currently running 0.13.1-936-g592a537 on OS X 10.9.3, Python 2.7.6 from Homebrew.

@jreback
Copy link
Contributor

jreback commented Jun 6, 2014

what version of pandas? can u pickle the data (table and what u r adding) and give a link?

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2014

@fonnesbeck are your columns float columns? Can you the output of df.columns?

@fonnesbeck
Copy link
Author

Sorry. Updated info above.

@fonnesbeck
Copy link
Author

Yes, the table columns are floats:

(Pdb) table.dtypes
oxygen
0         float64
1         float64
dtype: object

(Pdb) table.columns
Float64Index([0.0, 1.0], dtype='float64')

@jreback
Copy link
Contributor

jreback commented Jun 6, 2014

that's a pretty old version

what changed in your setup?

did u try 0.14.0?

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2014

i think i know what's going on ... i'm guilty of that isnan line 😞

@fonnesbeck
Copy link
Author

I'm just updating now from master. Thanks for the prompt response, as usual.

@jreback
Copy link
Contributor

jreback commented Jun 6, 2014

you have float columns (the index)
can't add a string 2 those I don't think
that gets a weird mixed index

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2014

yep ... can repro with this:

In [6]: df = DataFrame({0.0: rand(10), 1.0: rand(10)})

In [7]: df['a'] = 10

i think this should work since it works with int

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2014

fix on the way

@jreback
Copy link
Contributor

jreback commented Jun 6, 2014

needs to coerce back to Index I think

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2014

no just need to catch the TypeError and pass to the superclass

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2014

thanks @fonnesbeck sorry for breaking your code

@fonnesbeck
Copy link
Author

No sweat. You guys are awesome.

@glyg
Copy link
Contributor

glyg commented Jun 19, 2014

Hey, I stumbled on that bug (I think the pip version suffers it). I don't really need a float index, and -even though I'll update pandas later- I would like to recast my faulty Float64 typed MultIndex to a good solid integer.

Is there a good way to do that (appart from the obvious stripping off the float index, casting it to int, and indexing back)?

Thanks,

G.

@jreback
Copy link
Contributor

jreback commented Jun 19, 2014

well you can reset_index() to get an integer index

@glyg
Copy link
Contributor

glyg commented Jun 19, 2014

Yes, but I wanted to keep the original MultIndex, only casted from float to int, not drop it alltogether

@jreback
Copy link
Contributor

jreback commented Jun 19, 2014

df = DataFrame(dict(values = np.arange(5), level_1 = list('aaabb'), level_2 = [1.,2.,3.,1.,2.]))

In [26]: df
Out[26]: 
  level_1  level_2  values
0       a        1       0
1       a        2       1
2       a        3       2
3       b        1       3
4       b        2       4

In [27]: df.set_index(['level_1','level_2'])
Out[27]: 
                 values
level_1 level_2        
a       1             0
        2             1
        3             2
b       1             3
        2             4

In [31]: df.set_index(['level_1','level_2']).index.levels[1]
Out[31]: Float64Index([1.0, 2.0, 3.0], dtype='float64')

Cast it to int (this will truncate FYI)

In [32]: df.set_index(['level_1','level_2']).reset_index()
Out[32]: 
  level_1  level_2  values
0       a        1       0
1       a        2       1
2       a        3       2
3       b        1       3
4       b        2       4

In [33]: df2 = df.set_index(['level_1','level_2']).reset_index()

In [34]: df2['level_2'] = df['level_2'].astype('int64')

In [35]: df2.set_index(['level_1','level_2']).index.levels[1]
Out[35]: Int64Index([1, 2, 3], dtype='int64')

In [36]: df2.set_index(['level_1','level_2'])
Out[36]: 
                 values
level_1 level_2        
a       1             0
        2             1
        3             2
b       1             3
        2             4

@glyg
Copy link
Contributor

glyg commented Jun 19, 2014

Ok that's what I did,
thanks for the quick reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version
Projects
None yet
4 participants