Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list index out of range for regressor with y in {0,1} #49

Closed
georgesteve opened this issue Aug 22, 2019 · 8 comments
Closed

list index out of range for regressor with y in {0,1} #49

georgesteve opened this issue Aug 22, 2019 · 8 comments
Labels
bug Something isn't working
Milestone

Comments

@georgesteve
Copy link

Hi, thank you for your library, I think it's very useful to show the results, specially I like treeviz_bivar_3D , I was trying to compile a code with 2 variables and 1 binary output, but I obtain this error, I'm not sure if it was because the data type:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-217-d956e21d7290> in <module>
     47                       dist=12,
     48 
---> 49                       show={'splits','title'})
     50 
     51 

~/anaconda3/lib/python3.7/site-packages/dtreeviz/trees.py in rtreeviz_bivar_3D(ax, X_train, y_train, max_depth, feature_names, target_name, fontsize, ticks_fontsize, fontname, azim, elev, dist, show, colors, n_colors_in_map)
    234 
    235     rt = tree.DecisionTreeRegressor(max_depth=max_depth)
--> 236     rt.fit(X_train, y_train)
    237 
    238     y_lim = np.min(y_train), np.max(y_train)

~/anaconda3/lib/python3.7/site-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
   1155             sample_weight=sample_weight,
   1156             check_input=check_input,
-> 1157             X_idx_sorted=X_idx_sorted)
   1158         return self
   1159 

~/anaconda3/lib/python3.7/site-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    248         if len(y) != n_samples:
    249             raise ValueError("Number of labels=%d does not match "
--> 250                              "number of samples=%d" % (len(y), n_samples))
    251         if not 0 <= self.min_weight_fraction_leaf <= 0.5:
    252             raise ValueError("min_weight_fraction_leaf must in [0, 0.5]")

ValueError: Number of labels=1 does not match number of samples=21
@parrt
Copy link
Owner

parrt commented Aug 22, 2019

hi. Seems like X_train, y_train are a bit out of sync sizewise based upon that error msg.

@georgesteve
Copy link
Author

Thank you for the answer!
You are right, I'm really sorry, I uploaded that error by mistake, I actually wanted to upload this one:

--------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-40-4568def0d1f4> in <module>
     76                       azim=35,
     77                       dist=12,
---> 78                       show={'splits','title'})
     79 
     80 

~/anaconda3/lib/python3.7/site-packages/dtreeviz/trees.py in rtreeviz_bivar_3D(ax, X_train, y_train, max_depth, feature_names, target_name, fontsize, ticks_fontsize, fontname, azim, elev, dist, show, colors, n_colors_in_map)
    246 
    247     for node, bbox in tesselation:
--> 248         plane(node, bbox)
    249 
    250     x, y, z = X_train[:, 0], X_train[:, 1], y_train

~/anaconda3/lib/python3.7/site-packages/dtreeviz/trees.py in plane(node, bbox)
    230         # print(f"{color_map[int(((node.prediction()-y_lim[0])/y_range)*(n_colors_in_map-1))]}")
    231         ax.plot_surface(xx, yy, z, alpha=.85, shade=False,
--> 232                         color=color_map[int(((node.prediction()-y_lim[0])/y_range)*(n_colors_in_map-1))],
    233                         edgecolor=colors['edge'], lw=.3)
    234 

IndexError: list index out of range

@parrt
Copy link
Owner

parrt commented Aug 26, 2019

ah. are there more than 10 classes? Can't do more than that.

@georgesteve
Copy link
Author

No, that's the fact, first I though the error was because I was using many classes, but actually in this case there are only two classes.

@parrt
Copy link
Owner

parrt commented Aug 26, 2019

Weird. Any chance you could send me a small data set that reproduces this?

@georgesteve
Copy link
Author

This is my code, I only was trainning:

from sklearn.datasets import *
from sklearn import tree
from dtreeviz.trees import *

from mpl_toolkits.mplot3d import Axes3D




data = pd.DataFrame({'Edad': [ 17, 64, 18, 20, 38, 49, 55, 25, 29,  31, 33, 52, 65, 27, 39,  54, 30, 28, 27, 18, 47],
                     'Sexo': ['H', 'M', 'M', 'M', 'M', 'H', 'H', 'M', 'H', 'H', 'H', 'M', 'M', 'M', 'M',  'H', 'H', 'H', 'M', 'H', 'M'],
                      'Salario': [25, 80, 22, 36, 37, 59, 74, 70, 33, 102, 88, 35, 70, 51, 38,  94, 50, 35, 74, 32, 45], 
                      'Pagador': [1,   0,  1,  0,  1,  0,  0,  1,  1,   0,  1,  0,  0,  1,  1,   0,  1,  0,  0,  1,  0]})


from sklearn import preprocessing
code = preprocessing.LabelEncoder()
code.fit(["H", "M"])
data['Sexo']= code.transform(data['Sexo'])
data = data.sort_values('Edad')


features = [2,0]
X = data.values[:,features]
print(X)


y = data['Pagador'].values
print(y)

figsize = (42,16)
fig = plt.figure(figsize=figsize)
ax = fig.add_subplot(111, projection='3d')
t = rtreeviz_bivar_3D(ax,
                      X,y,
                      max_depth=None,
                      feature_names=['Edad','Sexo'],
                      target_name='Pagador',
                      fontsize=11,
                      elev=30,
                      azim=35,
                      dist=12,
                      show={'splits','title'})



plt.show()

Actually I tried with other internet data sets and every work fine, for example in the image. I am a little confused.
image

Thank you for your support.

@parrt
Copy link
Owner

parrt commented Dec 4, 2019

Heh, don't you want a classifier here not a regressor? Pagador looks boolean

@parrt
Copy link
Owner

parrt commented Dec 4, 2019

I think you want:

ctreeviz_bivar(ax, X, y, max_depth=5,
               feature_names=['Edad', 'Sexo'],
               class_names=['yes', 'no'],
               target_name='Pagador',
               show={'splits', 'title'},
               colors={'scatter_edge': 'black'}
               )

Regardless, I fixed so {0,1} doesn't cause crash for regression partitioning :)

@parrt parrt added the bug Something isn't working label Dec 4, 2019
@parrt parrt changed the title list index out of range list index out of range for regressor with y in {0,1} Dec 4, 2019
@parrt parrt added this to the 0.7.1 milestone Dec 4, 2019
@parrt parrt closed this as completed in 664e941 Dec 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants