Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix DF hashing bug (untested) #1175

Closed
wants to merge 1 commit into from

Conversation

falschparker82
Copy link

Should fix

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-1e594fb06987> in <module>
      1 dep_var = 'label'
      2 data_bunch = TabularDataBunch.from_df('data/', full_df, valid_df, dep_var, tfms=[FillMissing, Categorify],
----> 3                                       cat_names=[], cont_names=contin_vars, test_df=test_df)

/anaconda3/lib/python3.6/site-packages/fastai/tabular/data.py in from_df(cls, path, df, dep_var, valid_idx, procs, cat_names, cont_names, classes, **kwargs)
    111         "Create a `DataBunch` from train/valid/test dataframes."
    112         cat_names = ifnone(cat_names, [])
--> 113         cont_names = ifnone(cont_names, list(set(df)-set(cat_names)-{dep_var}))
    114         procs = listify(procs)
    115         return (TabularList.from_df(df, path=path, cat_names=cat_names, cont_names=cont_names, procs=procs)

/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in __hash__(self)
   1490     def __hash__(self):
   1491         raise TypeError('{0!r} objects are mutable, thus they cannot be'
-> 1492                         ' hashed'.format(self.__class__.__name__))
   1493 
   1494     def __iter__(self):

TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed

Should fix 
```
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-1e594fb06987> in <module>
      1 dep_var = 'label'
      2 data_bunch = TabularDataBunch.from_df('data/', full_df, valid_df, dep_var, tfms=[FillMissing, Categorify],
----> 3                                       cat_names=[], cont_names=contin_vars, test_df=test_df)

/anaconda3/lib/python3.6/site-packages/fastai/tabular/data.py in from_df(cls, path, df, dep_var, valid_idx, procs, cat_names, cont_names, classes, **kwargs)
    111         "Create a `DataBunch` from train/valid/test dataframes."
    112         cat_names = ifnone(cat_names, [])
--> 113         cont_names = ifnone(cont_names, list(set(df)-set(cat_names)-{dep_var}))
    114         procs = listify(procs)
    115         return (TabularList.from_df(df, path=path, cat_names=cat_names, cont_names=cont_names, procs=procs)

/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in __hash__(self)
   1490     def __hash__(self):
   1491         raise TypeError('{0!r} objects are mutable, thus they cannot be'
-> 1492                         ' hashed'.format(self.__class__.__name__))
   1493 
   1494     def __iter__(self):

TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed
```
@sgugger
Copy link
Contributor

sgugger commented Nov 16, 2018

Not sure what the problem is with that line. set(df) returns the set of the names of the columns.

The problem is that you're not passing the arguments expected by the function: from_df(cls, path, df, dep_var,...) in your error message. So dep_var is here what you called valid_df, and I expect that is a dataframe, so the error is thrown by {dep_var}.

@sgugger sgugger closed this Nov 16, 2018
@falschparker82
Copy link
Author

Ah, that's it, sorry I did not get that the main function signature silently changed after 1.0.22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants