Does it support Dataframe as input? #100

26345211 · 2018-08-26T16:20:02Z

The estimator I am trying to fit accepts a pandas data frame as input in the fit method, using the column labels, however when using the SuperLearner, the data is converted to a numpy.ndarray when passing to the estimator's fit method, is there a way to preserve the column label data?

flennerhag · 2018-08-28T20:45:24Z

Yes, if you set array_check=0 the ensemble will not perform any checks on the input so the DataFrame will be passed to the estimator. As long as the DataFrame supports array indexing (e.g. X[start:stop] it should work fine.

One caveat is that you can't use backend='multiprocessing', since this requires memmapping an nd.array.

In the next release array_checks will be removed so this issue will not arise.

26345211 · 2018-08-29T16:34:44Z

Thanks for your help, but an error occurs even if array_check=0, it still uses ndarray
...........................................................................
/Users/anaconda3/lib/python3.6/site-packages/mlens/parallel/_base_functions.py in slice_array(x= Age Race Rating Publ...1667 8 37694

[3000 rows x 6 columns], y= Res_Final_Position
0 ...2999 0

[3000 rows x 1 columns], idx=None, r=0)
170 x = x[slice(idx[0] - r, idx[1] - r)]
171 y = y[slice(idx[0] - r, idx[1] - r)] if y is not None else y
172
173 # Cast as ndarray to avoid passing memmaps to estimators
174 if y is not None:
--> 175 y = y.view(type=np.ndarray)
y = Res_Final_Position
0 ...2999 0

[3000 rows x 1 columns]
y.view = undefined
176 if not issparse(x):
177 x = x.view(type=np.ndarray)
178
179 return x, y

...........................................................................
/Users/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in getattr(self= Final
0 ...2999 0

[3000 rows x 1 columns], name='view')
4371 name in self._accessors):
4372 return object.getattribute(self, name)
4373 else:
4374 if self._info_axis._can_hold_identifiers_and_holds_name(name):
4375 return self[name]
-> 4376 return object.getattribute(self, name)
self = Res_Final_Position
0 ...2999 0

[3000 rows x 1 columns]
name = 'view'
4377
4378 def setattr(self, name, value):
4379 """After regular attribute access, try setting the name
4380 This allows simpler access to columns for interactive use.

AttributeError: 'DataFrame' object has no attribute 'view'

flennerhag · 2018-08-29T21:09:42Z

Aha! The offending line is a legacy from v0.1 when we only did multiproc. I've pushed a mmap branch that solves this by only casting to ndarray if the data is a mmap. I ran a simple test and a DataFrame as input passes through the ensemble now.

This works as long as 'threading' is used as the backend. For multiprocessing, things are thornier since we use memmapping. Fixing that may be a while, if at all, but i'm guessing you're ok with 'threading'?

Btw, to install the necessary update, do

pip uninstall mlens;
git clone https://github.com/flennerhag/mlens; cd mlens;
git fetch; git checkout mmap;
pip install .

This will uninstall the version you currently have and install the mmap branch of the bleeding edge build (to be 0.2.4). Let me know if you run into any more problems!

flennerhag · 2018-08-29T21:11:31Z

btw in the bleeding edge version array_checks has been deprecated so you can ignore it.

26345211 · 2018-08-30T09:11:25Z

Thanks for the help, when I tried passing in a DataFrame to it, it ran into a index error, I tried the simple example from the getting started page and simply use pd.DataFrame(X) and pd.DataFrame(y) to change the type and the [start:stop] format for indexing that DataFrame works
/Users/anaconda3/lib/python3.6/site-packages/mlens/parallel/_base_functions.py in slice_array(x= 0 1 2 3
0 6.8 3.2 5.9 2.3
...1
149 5.2 2.7 3.9 1.4

[150 rows x 4 columns], y= 0
0 2
1 2
2 2
3 1
4 1
5 2...6 0
147 2
148 0
149 1

[150 rows x 1 columns], idx=array([ 75, 76, 77, 78, 79, 80, 81, 82, ...40, 141, 142, 143, 144, 145, 146, 147, 148, 149]), r=0)
151 if len(idx[0]) > 1:
152 # Advanced indexing is required. This will trigger a copy
153 # of the slice in question to be made
154 simple_slice = False
155 idx = np.hstack([np.arange(t0 - r, t1 - r) for t0, t1 in idx])
--> 156 x = x[idx]

flennerhag · 2018-09-04T09:15:08Z

Sorry for the delay, the issue's been fixed. Turns out using simple slicing (X[:i]) on a dataframe slices the columns, as opposed to the rows.

You should be able to run with dataframes both as X and y now.

flennerhag · 2018-09-05T07:07:37Z

In master branch as of #101.

26345211 · 2018-10-10T09:23:09Z

when the predicted and actual y is passed to the accuracy scorer, can the dataframe along with its indexes be passed or will the model simply pass a ndarray?

flennerhag · 2018-10-11T09:40:50Z

if y is a DataFrame then the input to the scorer should be a DataFrame since we do no data type conversion on the input. The prediction will be in whatever format the estimator produces, presumably a numpy array.

Matthew-A-epi · 2021-02-04T18:53:33Z

It seems that this still does not function. It looks like an index issue. There are some index calls not using [] instead of .loc[].

flennerhag mentioned this issue Sep 5, 2018

Convervative ndarray casting #101

Merged

flennerhag closed this as completed Sep 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does it support Dataframe as input? #100

Does it support Dataframe as input? #100

26345211 commented Aug 26, 2018

flennerhag commented Aug 28, 2018

26345211 commented Aug 29, 2018

flennerhag commented Aug 29, 2018

flennerhag commented Aug 29, 2018

26345211 commented Aug 30, 2018

flennerhag commented Sep 4, 2018

flennerhag commented Sep 5, 2018

26345211 commented Oct 10, 2018

flennerhag commented Oct 11, 2018

Matthew-A-epi commented Feb 4, 2021

Does it support Dataframe as input? #100

Does it support Dataframe as input? #100

Comments

26345211 commented Aug 26, 2018

flennerhag commented Aug 28, 2018

26345211 commented Aug 29, 2018

flennerhag commented Aug 29, 2018

flennerhag commented Aug 29, 2018

26345211 commented Aug 30, 2018

flennerhag commented Sep 4, 2018

flennerhag commented Sep 5, 2018

26345211 commented Oct 10, 2018

flennerhag commented Oct 11, 2018

Matthew-A-epi commented Feb 4, 2021