### Lab: KFold, Regularization, & Pipelines

Welcome!  This lab is going to introduce us to some very important aspects of data processing and model building.  

Specifically, it's going to go over the following:

 - **KFold Cross Validation:** This is a more thorough way of choosing your validation set to give you a better idea of how your model might perform under various circumstances within your training data.
 - **Regularization:** This is an evergreen technique for dealing with models that are overfit (ie, higher scores on training vs. test data).  Regularized linear models are often much better prepared to handle messy data & outliers when using this technique.
 - **Pipelines:** (Time permitting!) This is an underappreciated aspect of the Scikit-Learn api that allows you to chain together multiple data processing steps, making it much easier to test different models and work seamlessly between your training & test sets.

**Note:** This lab builds off of the one performed in the last class.  As such, it might be easier just to keep working in your previous lab to answer these questions.  It assumes you already have your data processed from the iowa housing lab.  

The questions are listed here just to make the separation of concerns easier.

### Question 1: How Does Your Validation Score Differ Using KFold Cross Validation?

Take a look at the validation score you got from your previous exercise.  

This time, run your model through KFold cross validation using `cross_val_score`.  Is your total validation score appreciably different?  What were your highest and lowest values?

What if you changed your number of folds?  Try using 5, 10, & 25 folds.

In [None]:
# your answer here

### Question 2: Updating Your Model With Ridge & Lasso Regression

Instead of using Linear Regression, import `Ridge` and `Lasso`, and use cross validation to find the ideal value of alpha.  

Some basic tips:

For values of alpha try this:  `alphas = np.logspace(-4, 4, 9)`
Then write a `for-loop` that generically goes like this:

`for value in alphas:
    1). set value of alpha to current value using set_params() method
    2). pass in instance of Ridge or Lasso into cross_val_score
    3). using a tuple, append the average of all results from step 2 into a list, along with the value of alpha`
    
When you're finished, you should have a list that has 9 tuples inside it, each one with the average cross validation score as well as the value of alpha associated with it.

In [None]:
# your answer here

**Bonus:** In Scikit-Learn cross validation is sometimes built into algorithms automatically.  Luckily this is the case with `Ridge` and `Lasso`.  If you're inclined to take a look at the `RidgeCV` and `LassoCV` methods, you can basically combine what we just did into one step.

**RidgeCV:** https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html
**LassoCV:** https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html

### Step 3: Building A Pipeline

Let's try building some pipelines to test out different versions of our models more easily.  

For this one, we are going to start fresh a little bit to get the hang of using our pipelines.  

So......

Create the following pipeline:

We'll skip ordinal encoding for now.

 - one of either Ridge or Lasso, StandardScaler, and the CategoryEncoders categorical encoder
 - make sure to use the `cols` argument with the categorical encoder to include numeric variables that are actually categories if this is necessary
 - use the best value of alpha from the previous step for either one
 - fit your model on the entire training set (because we're using the cross-validated best version of alpha)

In [None]:
# your answer here