-
Notifications
You must be signed in to change notification settings - Fork 0
Kagge shelter animal competition
License
fanghaolei/K_sa
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
--- title: "Shelter Animal Outcomes" author: "Mike Fang" date: "March 26, 2016" output: html_document --- This repo is created for the Kaggle competition: [Shelter Animal Outcomes](https://www.kaggle.com/c/shelter-animal-outcomes) This data set has some causality issues which really frustrated me, and I probably won't update this repo anymore. The time scales are outtake features which make them overly powerful in many models, but they don't resolve any real-world predictive problems. Because if you know what is going to happen to an animal at a given time, say `transfer' at 9 am, other features won't matter. This data would be much better if they could provide intake time and some additional features of the animals themselves, such as temperament and size. Here are the plots for time features: By hours in a day ![daycylce](https://raw.githubusercontent.com/fhlgood/K_sa/master/daycycle.png) By all dates ![dates](https://raw.githubusercontent.com/fhlgood/K_sa/master/byDate.png) My cat 司令 (commander): ![司令](https://raw.githubusercontent.com/fhlgood/K_sa/master/cat.PNG) It seemed that stacking random forest and xgboost predictions can greatly improve performance. Current score: .69689. Updated Modeltrain.R for ensembling. Updated Clean.R: I think I calculated name and color frequency used combined train \& test data, which is a violation of the rule. Now updated using train data only. Updated clean_original.R: I actually ended up using this original feature set.
About
Kagge shelter animal competition
Resources
License
Stars
Watchers
Forks
Releases
No releases published