You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What would it look like when attributes are added in order of importance for prediction instead of in the order they appear in the csv dataset?
The GradientBoostingClassifier provides a method, model.feature_importances_ that gives out scores for feature importance, the higher the score the more important the feature for predictions.
Table with 10 attributes that have the highest importance scores
Now, using the same workflow as ^, we add one attribute at a time but starting with the most important attributes to get the graph below.
Because, we have the best attributes first, the metrics very quickly reach their max value. This is something we expect to happen.
We unusually get large dips even when we are well through 50+ attributes
The dips are now for the following attributes:
feature_name_translations_count_old
place
MAPS.ME
feature_area
sport_old
office
power
railway_old
barrier_old
railway
historic
changeset_comment_naughty_words_count
public_transport_old
route
There isn't any attribute common between the list ^ and the list ^^
Similar to work on training size, we have questions on effect of number of attributes on model:
Workflow
Notes
cc: @anandthakker @batpad @geohacker
The text was updated successfully, but these errors were encountered: