Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

09_tabular.ipynb: crashes when creating a TabularPandas using Normalize #411

Closed
juliangilbey opened this issue Feb 19, 2021 · 4 comments
Closed

Comments

@juliangilbey
Copy link

Hi!
I'm using a fresh checkout of 09_tabular.ipynb, clearing the notebook and running from the start. When I get to the following cell in the neural network section, it crashes:

procs_nn = [Categorify, FillMissing, Normalize]
to_nn = TabularPandas(df_nn_final, procs_nn, cat_nn, cont_nn,
                      splits=splits, y_names=dep_var)

The error message (long, sorry) is as follows:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-105-9827c0e691d0> in <module>
      1 procs_nn = [Categorify, FillMissing, Normalize]
----> 2 to_nn = TabularPandas(df_nn_final, procs_nn, cat_nn, cont_nn,
      3                       splits=splits, y_names=dep_var)

~/fast.ai.course/fastai-venv/lib/python3.9/site-packages/fastai/tabular/core.py in __init__(self, df, procs, cat_names, cont_names, y_names, y_block, splits, do_setup, device, inplace, reduce_memory)
    164         self.cat_names,self.cont_names,self.procs = L(cat_names),L(cont_names),Pipeline(procs)
    165         self.split = len(df) if splits is None else len(splits[0])
--> 166         if do_setup: self.setup()
    167 
    168     def new(self, df):

~/fast.ai.course/fastai-venv/lib/python3.9/site-packages/fastai/tabular/core.py in setup(self)
    175     def decode_row(self, row): return self.new(pd.DataFrame(row).T).decode().items.iloc[0]
    176     def show(self, max_n=10, **kwargs): display_df(self.new(self.all_cols[:max_n]).decode().items)
--> 177     def setup(self): self.procs.setup(self)
    178     def process(self): self.procs(self)
    179     def loc(self): return self.items.loc

~/fast.ai.course/fastai-venv/lib/python3.9/site-packages/fastcore/transform.py in setup(self, items, train_setup)
    190         tfms = self.fs[:]
    191         self.fs.clear()
--> 192         for t in tfms: self.add(t,items, train_setup)
    193 
    194     def add(self,t, items=None, train_setup=False):

~/fast.ai.course/fastai-venv/lib/python3.9/site-packages/fastcore/transform.py in add(self, t, items, train_setup)
    193 
    194     def add(self,t, items=None, train_setup=False):
--> 195         t.setup(items, train_setup)
    196         self.fs.append(t)
    197 

~/fast.ai.course/fastai-venv/lib/python3.9/site-packages/fastcore/transform.py in setup(self, items, train_setup)
     77     def setup(self, items=None, train_setup=False):
     78         train_setup = train_setup if self.train_setup is None else self.train_setup
---> 79         return self.setups(getattr(items, 'train', items) if train_setup else items)
     80 
     81     def _call(self, fn, x, split_idx=None, **kwargs):

~/fast.ai.course/fastai-venv/lib/python3.9/site-packages/fastcore/dispatch.py in __call__(self, *args, **kwargs)
    116         elif self.inst is not None: f = MethodType(f, self.inst)
    117         elif self.owner is not None: f = MethodType(f, self.owner)
--> 118         return f(*args, **kwargs)
    119 
    120     def __get__(self, inst, owner):

~/fast.ai.course/fastai-venv/lib/python3.9/site-packages/fastai/tabular/core.py in setups(self, to)
    271     store_attr(but='to', means=dict(getattr(to, 'train', to).conts.mean()),
    272                stds=dict(getattr(to, 'train', to).conts.std(ddof=0)+1e-7))
--> 273     return self(to)
    274 
    275 @Normalize

~/fast.ai.course/fastai-venv/lib/python3.9/site-packages/fastcore/transform.py in __call__(self, x, **kwargs)
     71     @property
     72     def name(self): return getattr(self, '_name', _get_name(self))
---> 73     def __call__(self, x, **kwargs): return self._call('encodes', x, **kwargs)
     74     def decode  (self, x, **kwargs): return self._call('decodes', x, **kwargs)
     75     def __repr__(self): return f'{self.name}:\nencodes: {self.encodes}decodes: {self.decodes}'

~/fast.ai.course/fastai-venv/lib/python3.9/site-packages/fastcore/transform.py in _call(self, fn, x, split_idx, **kwargs)
     81     def _call(self, fn, x, split_idx=None, **kwargs):
     82         if split_idx!=self.split_idx and self.split_idx is not None: return x
---> 83         return self._do_call(getattr(self, fn), x, **kwargs)
     84 
     85     def _do_call(self, f, x, **kwargs):

~/fast.ai.course/fastai-venv/lib/python3.9/site-packages/fastcore/transform.py in _do_call(self, f, x, **kwargs)
     87             if f is None: return x
     88             ret = f.returns(x) if hasattr(f,'returns') else None
---> 89             return retain_type(f(x, **kwargs), x, ret)
     90         res = tuple(self._do_call(f, x_, **kwargs) for x_ in x)
     91         return retain_type(res, x)

~/fast.ai.course/fastai-venv/lib/python3.9/site-packages/fastcore/dispatch.py in __call__(self, *args, **kwargs)
    116         elif self.inst is not None: f = MethodType(f, self.inst)
    117         elif self.owner is not None: f = MethodType(f, self.owner)
--> 118         return f(*args, **kwargs)
    119 
    120     def __get__(self, inst, owner):

~/fast.ai.course/fastai-venv/lib/python3.9/site-packages/fastai/tabular/core.py in encodes(self, to)
    275 @Normalize
    276 def encodes(self, to:Tabular):
--> 277     to.conts = (to.conts-self.means) / self.stds
    278     return to
    279 

~/fast.ai.course/fastai-venv/lib/python3.9/site-packages/pandas/core/ops/__init__.py in f(self, other, axis, level, fill_value)
    649         # TODO: why are we passing flex=True instead of flex=not special?
    650         #  15 tests fail if we pass flex=not special instead
--> 651         self, other = _align_method_FRAME(self, other, axis, flex=True, level=level)
    652 
    653         if isinstance(other, ABCDataFrame):

~/fast.ai.course/fastai-venv/lib/python3.9/site-packages/pandas/core/ops/__init__.py in _align_method_FRAME(left, right, axis, flex, level)
    501     elif is_list_like(right) and not isinstance(right, (ABCSeries, ABCDataFrame)):
    502         # GH17901
--> 503         right = to_series(right)
    504 
    505     if flex is not None and isinstance(right, ABCDataFrame):

~/fast.ai.course/fastai-venv/lib/python3.9/site-packages/pandas/core/ops/__init__.py in to_series(right)
    463         else:
    464             if len(left.columns) != len(right):
--> 465                 raise ValueError(
    466                     msg.format(req_len=len(left.columns), given_len=len(right))
    467                 )

ValueError: Unable to coerce to Series, length must be 1: given 0

I don't know where this is coming from (despite the long backtrace).

@juliangilbey
Copy link
Author

Oh, and a quick test showed that leaving out Normalize stops the crash, as is also clear from the backtrace.

@juliangilbey
Copy link
Author

Actually, I think this may be a bug in fastai itself, so I'll make a report there instead.

@juliangilbey
Copy link
Author

OK, I've commented in the fastai issue referenced above about the strange behaviour of add_datepart, addressing that in the way I have suggested is enough to fix this 09_tabular.ipynb bug. (After all, why should ...Elapsed be anything other than numeric?) If that fix is applied, then the line in this notebook that reads:

df_nn['saleElapsed'] = df_nn['saleElapsed'].astype(int)

can be dropped (and this seems to have been introduced to address just this issue, as it's not in the printed version of the book).

If on the other hand the add_datepart issue is not accepted, then two things need to change in the notebook to make it run:

  1. The above line of code needs to have df_nn_final in place of df_nn twice.
  2. A few cells earlier, when df_nn_final is defined, needs to have an additional .copy() so that Pandas does not raise an exception:
    df_nn_final = df_nn[list(xs_final_time.columns) + [dep_var]].copy()

It would seem that fixing add_datetime is the simpler thing to do!

Best wishes, Julian

@juliangilbey
Copy link
Author

This has been fixed by fastai/fastai#3230 and #413

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant