Skip to content
No description, website, or topics provided.
R
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README.md
xgboost.R

README.md

Kaggle Yummly Challange

Kaggle is a platform for predictive modelling and analytics competitions on which companies and researchers post their data and statisticians and data miners from all over the world compete to produce the best models. As of May 2016, Kaggle had over 536,000 registered users, or Kagglers. The community spans 194 countries. It is the largest and most diverse data community in the world (Wikipedia).

One of the most interesting data sets I found on Kaggle was within the What’s Cooking challenge. The competition was hosted by Yummly, a mobile app and website that provides recipe recommendations. The Yummly app was named “Best of 2014” in Apple’s App Store. The competition asks you to predict the category of a dish’s cuisine given a list of its ingredients. The training data included a recipe id, the type of cuisine, and a list of ingredients of each recipe. There were 20 types of cuisine in the data set.

I was able to get a prediction score of about 80 percent with a fairly easy solution. First of all I removed all rare ingredients in the data set. I did not do much feature engineering, except from creating one simple variable for which counts the total number of ingredients per recipe. I also tried some text mining in form of word stemming which brings back a recipes’ ingredient to its root word (e.g. tomatoes become tomato). That approach did not help much in the end so I removed it from my script. I saved my training data in a spare matrix and trained a multiclass classification model using softmax with the xgboost package.

You can’t perform that action at this time.