# Challenge: what model can answer this question?
You now have a fairly substantial starting toolbox of supervised learning methods that you can use to tackle a host of exciting problems. To make sure all of these ideas are organized in your mind, please go through the list of problems below. For each, identify which supervised learning method(s) would be best for addressing that particular problem. Explain your reasoning and discuss your answers with your mentor.

1. **Predict the running times of prospective Olympic sprinters using data from the last 20 Olympics.**

I would probably use a simple Linear Regression since the target is continuous (if it was binary, we could use Logistic Regression for probabilites of two outcomes).

2. **You have more features (columns) than rows in your dataset.**

I would probably choose Lasso Regression (which has the option of nulling out features depending on their coefficients, unlike Ridge Regression) in order to speed up the model by only using the features more valuable than the pre-determined threshold. Or I would use a Random Forest Classifier but limit the features to the most valuable according to '.featureimportance_'.

3. **Identify the most important characteristic predicting likelihood of being jailed before age 20.**

I would either use a combination of Decision Trees/Random Forest Classifier and see the featureimportance_ aspect for most valuable features in identifying whether or not someone would be jailed. Or I would normalize the data and use a Logistic Regression and observe the coefficients.

4. **Implement a filter to “highlight” emails that might be important to the recipient**

Either a Decision Tree or Random Forest Classifier.

5. **You have 1000+ features.**

I would probably choose Lasso Regression (which has the option of nulling out features depending on their coefficients, unlike Ridge Regression) in order to speed up the model by only using the features more valuable than the pre-determined threshold. Or I would use a Random Forest Classifier but limit the features to the most valuable according to '.featureimportance_'.

6. **Predict whether someone who adds items to their cart on a website will purchase the items.**

Logistic Regression to determine likelihood.

7. **Your dataset dimensions are 982400 x 500**

I would use a simpler, faster model (like Linear Regression or KNN Regression) and try and narrow down the feature set to most important features.

8. **Identify faces in an image.**

SVM as Classifier.

9. **Predict which of three flavors of ice cream will be most popular with boys vs girls.**

I would use a KNN Classifier or SVC because this would easily show clusters of preferences and distinguish between the two.

Types of models:
* Regression
    * Simple Linear Regression (OLS): linear_model.LinearRegression()
    * KNN Regression: neighbors.KNeighborsRegressor(n_neighbors=10)
    * Logistic Regression: LogisticRegression(C=1e9) (*a quasi-classifier that predicts probability of two events*)
    * Ridge Regression: linear_model.Ridge(alpha=10, fit_intercept=False)
    * Lasso Regression: linear_model.Lasso(alpha=.35)
    * Support Vector Regressor: SVR()

* Classification
    * KNN Classifier: KNeighborsClassifier(n_neighbors=5)
    * Decision Tree: tree.DecisionTreeClassifier(criterion='entropy', max_features=1,max_depth=4)
    * Random Forest: ensemble.RandomForestClassifier()
    * Support Vector Classifier: SVC(kernel = 'linear')
    * Gradient Boosting Classifier: ensemble.GradientBoostingClassifier(**params)