Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'numpy.float32' object has no attribute 'is_integer' #15

Closed
Gabomfim opened this issue Jul 18, 2021 · 3 comments
Closed

'numpy.float32' object has no attribute 'is_integer' #15

Gabomfim opened this issue Jul 18, 2021 · 3 comments
Labels
bug Something isn't working

Comments

@Gabomfim
Copy link

Gabomfim commented Jul 18, 2021

Tried to do the following on a dataset with float samples. (Running on Python 3.7)

configGBM = {'algorithm': 'C4.5', 'enableGBM': True, 'epochs': 7, 'learning_rate': 1, 'max_depth': 5}
modelGBM = chef.fit(train, config = configGBM)

Error Log:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/var/folders/vk/rw3fbc110n3fsf_xhz6_r4m00000gn/T/ipykernel_67628/3037199772.py in <module>
      1 configGBM = {'algorithm': 'C4.5', 'enableGBM': True, 'epochs': 7, 'learning_rate': 1, 'max_depth': 5}
----> 2 modelGBM = chef.fit(train, config = configGBM)
    
/usr/local/lib/python3.7/site-packages/chefboost/Chefboost.py in fit(df, config, target_label, validation_df)
    190 
    191                 if df['Decision'].dtypes == 'object': #transform classification problem to regression
--> 192                         trees, alphas = gbm.classifier(df, config, header, dataset_features, validation_df = validation_df, process_id = process_id)
    193                         classification = True
    194 

/usr/local/lib/python3.7/site-packages/chefboost/tuning/gbm.py in classifier(df, config, header, dataset_features, validation_df, process_id)
    270                                 instance['P_'+str(j)] = probabilities[j]
    271 
--> 272                         worksheet.loc[row] = instance
    273 
    274                 for i in range(0, len(classes)):

/usr/local/lib/python3.7/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
    721 
    722         iloc = self if self.name == "iloc" else self.obj.iloc
--> 723         iloc._setitem_with_indexer(indexer, value, self.name)
    724 
    725     def _validate_key(self, key, axis: int):

/usr/local/lib/python3.7/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value, name)
   1728         if take_split_path:
   1729             # We have to operate column-wise
-> 1730             self._setitem_with_indexer_split_path(indexer, value, name)
   1731         else:
   1732             self._setitem_single_block(indexer, value, name)

/usr/local/lib/python3.7/site-packages/pandas/core/indexing.py in _setitem_with_indexer_split_path(self, indexer, value, name)
   1795                 # We are setting multiple columns in a single row.
   1796                 for loc, v in zip(ilocs, value):
-> 1797                     self._setitem_single_column(loc, v, pi)
   1798 
   1799             elif len(ilocs) == 1 and com.is_null_slice(pi) and len(self.obj) == 0:

/usr/local/lib/python3.7/site-packages/pandas/core/indexing.py in _setitem_single_column(self, loc, value, plane_indexer)
   1918             # set the item, possibly having a dtype change
   1919             ser = ser.copy()
-> 1920             ser._mgr = ser._mgr.setitem(indexer=(pi,), value=value)
   1921             ser._maybe_update_cacher(clear=True)
   1922 

/usr/local/lib/python3.7/site-packages/pandas/core/internals/managers.py in setitem(self, indexer, value)
    353 
    354     def setitem(self: T, indexer, value) -> T:
--> 355         return self.apply("setitem", indexer=indexer, value=value)
    356 
    357     def putmask(self, mask, new, align: bool = True):

/usr/local/lib/python3.7/site-packages/pandas/core/internals/managers.py in apply(self, f, align_keys, ignore_failures, **kwargs)
    325                     applied = b.apply(f, **kwargs)
    326                 else:
--> 327                     applied = getattr(b, f)(**kwargs)
    328             except (TypeError, NotImplementedError):
    329                 if not ignore_failures:

/usr/local/lib/python3.7/site-packages/pandas/core/internals/blocks.py in setitem(self, indexer, value)
    924         # coerce if block dtype can store value
    925         values = self.values
--> 926         if not self._can_hold_element(value):
    927             # current dtype cannot store value, coerce to common dtype
    928             return self.coerce_to_target_dtype(value).setitem(indexer, value)

/usr/local/lib/python3.7/site-packages/pandas/core/internals/blocks.py in _can_hold_element(self, element)
    620         """require the same dtype as ourselves"""
    621         element = extract_array(element, extract_numpy=True)
--> 622         return can_hold_element(self.values, element)
    623 
    624     @final

/usr/local/lib/python3.7/site-packages/pandas/core/dtypes/cast.py in can_hold_element(arr, element)
   2181         if tipo is not None:
   2182             if tipo.kind not in ["i", "u"]:
-> 2183                 if is_float(element) and element.is_integer():
   2184                     return True
   2185                 # Anything other than integer we cannot hold

AttributeError: 'numpy.float32' object has no attribute 'is_integer'
@serengil
Copy link
Owner

could you share your data set?

@Gabomfim
Copy link
Author

Gabomfim commented Aug 6, 2021

Sorry for keeping you waiting.

I'm sharing with you my notebook with all the files, including the databases used (in the data file).
I managed to fix the problem by importing the database as txt instead of csv.

allstroke.txt is the txt version of the healthcare-dataset-stroke-data.csv database. That did the fix.

We now import the database in this way:
df = pd.read_csv("./data/allStroke.txt", index_col=0)

I don't have the old code with me now, but I can send it to you the next week if needed.

@serengil serengil added the bug Something isn't working label Nov 13, 2021
@serengil
Copy link
Owner

When I run this in my environment, it works well. I have Python 3.8.12, pandas==1.3.5. I recommend you to upgrade or downgrade to my environment level.

from chefboost import Chefboost as chef
import pandas as pd

df = pd.read_csv("healthcare-dataset-stroke-data.csv", index_col=0)

print(df.head())

configGBM = {'algorithm': 'C4.5', 'enableGBM': True, 'epochs': 7, 'learning_rate': 1, 'max_depth': 5, 'enableParallelism': False}
modelGBM = chef.fit(df = df, config = configGBM)

Output logs:
(sefik) sefik@Sefiks-MacBook-Pro Desktop % python hello.py
gender age hypertension heart_disease ever_married work_type Residence_type avg_glucose_level bmi smoking_status Decision
id
9046 Male 67.0 0 1 Yes Private Urban 228.69 36.6 formerly smoked Yes
51676 Female 61.0 0 0 Yes Self-employed Rural 202.21 NaN never smoked Yes
31112 Male 80.0 0 1 Yes Private Rural 105.92 32.5 never smoked Yes
60182 Female 49.0 0 0 Yes Private Urban 171.23 34.4 smokes Yes
1665 Female 79.0 1 0 Yes Self-employed Rural 174.12 24.0 never smoked Yes
Gradient Boosting Machines...
Regression tree is going to be built...
gradient boosting for classification
Epoch 7. Accuracy: 82. Process: : 100%|█████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:42<00:00, 6.02s/it]
The best accuracy got in 6 epoch with the score 82.78210116731518

finished in 42.12960386276245 seconds

Evaluate train set

Accuracy: 82.00389105058366 % on 1028 instances
Labels: ['Yes' 'No']
Confusion matrix: [[99, 35], [150, 744]]
Precision: 73.8806 %, Recall: 39.759 %, F1: 51.6971 %

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants