Skip to content

fanghaolei/K_sa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

---
title: "Shelter Animal Outcomes"
author: "Mike Fang"
date: "March 26, 2016"
output: html_document
---
This repo is created for the Kaggle competition: [Shelter Animal Outcomes](https://www.kaggle.com/c/shelter-animal-outcomes)

This data set has some causality issues which really frustrated me, and I probably won't update this repo anymore. The time scales are outtake features which make  them overly powerful in many models, but they don't resolve any real-world predictive problems.

Because if you know what is going to happen to an animal at a given time, say `transfer' at 9 am, other features won't matter. This data would be much better if they could provide intake time and some additional features of the animals themselves, such as temperament and size. 

Here are the plots for time features:

By hours in a day
![daycylce](https://raw.githubusercontent.com/fhlgood/K_sa/master/daycycle.png)

By all dates
![dates](https://raw.githubusercontent.com/fhlgood/K_sa/master/byDate.png)


My cat 司令 (commander):
![司令](https://raw.githubusercontent.com/fhlgood/K_sa/master/cat.PNG)

It seemed that stacking random forest and xgboost predictions can greatly improve performance.
Current score: .69689.

Updated Modeltrain.R for ensembling.

Updated Clean.R: I think I calculated name and color frequency used combined train \& test data, which is a violation of the rule. Now updated using train data only. 

Updated clean_original.R: I actually ended up using this original feature set. 

About

Kagge shelter animal competition

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages