# 1. Classification 

2. What is machine learning?

    - process whereby computers learn to make decisions from data without being explicitly programmed.
    
3. Examples of machine learning

    - learning to predict whether an email is spam or not spam given its content and sender. 
    - learning to cluster books into different categories based on the words they contain, then assigning any new book to one of the existing clusters.
    
4. Unsupervised learning

    - process of uncovering hidden patterns and structures from unlabeled data. 
    - Example: 
        1. a business may wish to group its customers into distinct categories based on their purchasing behavior without knowing in advance what these categories are.
        2. This is known as clustering, one branch of unsupervised learning.
  
      ![image.png](attachment:image.png)
  
5. Supervised learning

    - values to be predicted are already known, and a model is built with the aim of accurately predicting values of previously unseen data.
    
    - Supervised learning uses features to predict the value of a target variable, such as predicting a basketball player's position based on their points per game. 
    
6. Types of supervised learning

        1. Classification
            
            - predict the label, or category, of an observation.
            - For example, we can predict whether a bank transaction is fraudulent or not. As there are two outcomes here - a fraudulent transaction, or non-fraudulent transaction, this is known as binary classification. 
        
        2. Regression 
        
            - predict continuous values
            - For example, a model can use features such as number of bedrooms, and the size of a property, to predict the target variable, price of the property.
            
7. Naming conventions

        Feature = predictor variable or independent variable
        Target = dependent variable or response variable.
        
8. Before you use supervised learning

        Requirements
        
            1. data must not have missing values
            2. numeric format
            3. stored as pandas DataFrames or Series, or NumPy arrays.
            
        Perform EDA first
            
9. scikit-learn syntax
 
           from sklearn.module import Model
           model = Model
           model.fit(X,y)
           predictions = model.predict(X_new)
           print(predictions)
           

### The supervised learning workflow

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

## 1.1 The classification challenge

1. The classification challenge

    - how we can build a classification model, or classifier, to predict the labels of unseen data.
    
2. Classifying labels of unseen data
     
    - Step 1: Build a model, 
    - Step 2: Model learns from the labeled data we pass to it 
    - Step 3: Pass unlabeled data to the model as input    
    - Step 4: Model predicts the labels for this unseen data. 
    
    As the classifier learns from the labeled data, we call this the training data.
    
3. k-Nearest Neighbors (KNN)

       Predict the label of a datapoint by
           
       - Looking at the k closest labeled data points
       - Taking a majority vote
       
4. k-Nearest Neighbors

![image.png](attachment:image.png)

   Using this scatter plot as an example, how do we classify the black observation?
   
       - If k equals three, we would classify it as red. This is because two of the three closest observations are red.
       
       - If k equals five, we would instead classify it as blue
       
7. KNN Intuition

![image-2.png](attachment:image-2.png)

        To build intuition for KNN, let's look at this scatter plot displaying total evening charge against total day charge for customers of a telecom company. The observations are colored in blue for customers who have churned, and red for those who have not churned.
        
![image-3.png](attachment:image-3.png)

        Here we have visualized the results of a KNN algorithm where the number of neighbors is set to 15. 
        
        KNN creates a decision boundary to predict if customers will churn. Any customers in the area with a gray background are predicted to churn, and those in the area with a red background are predicted to not churn. This boundary would be used to make predictions on unseen data.
 
9. Using scikit-learn to fit a classifier

    ![image-4.png](attachment:image-4.png)

    1.  import KNeighborsClassifier from sklearn-dot-neighbors. 
    2.  split our data into X, a 2D array of our features, and y, a 1D array of the target values - in this case, churn status.
    3. scikit-learn requires that the features are in an array where each column is a feature and each row a different observation.
    4. Similarly, the target needs to be a single column with the same number of observations as the feature data. 

    Printing the shape of X and y, we see there are 3333 observations of two features, and 3333 observations of the target variable
    
    ![image-5.png](attachment:image-5.png)
    
    - We then instantiate our KNeighborsClassifier, setting n_neighbors equal to 15, and assign it to the variable knn.
    
    - Then we can fit this classifier to our labeled data by applying the classifier's dot-fit method and passing two arguments: the feature values, X, and the target values, y.
    
10. Predicting on unlabeled data

    ![image-6.png](attachment:image-6.png)
    
    - Here we have a set of new observations, X_new. 
    - Checking the shape of X_new, we see it has three rows and two columns, that is, three observations and two features. 
    - Printing the predictions returns a binary value for each observation or row in X_new. It predicts 1, which corresponds to 'churn', for the first observation, and 0, which corresponds to 'no churn', for the second and third observations.

### 1.1.1 k-Nearest Neighbors: Fit

n this exercise, you will build your first classification model using the churn_df dataset, which has been preloaded for the remainder of the chapter.

The features to use will be "account_length" and "customer_service_calls". The target, "churn", needs to be a single column with the same number of observations as the feature data.

You will convert the features and the target variable into NumPy arrays, create an instance of a KNN classifier, and then fit it to the data.

![image.png](attachment:image.png)

In [2]:
''' 
# Import KNeighborsClassifier
from sklearn.neighbors import KNeighborsClassifier 

# Create arrays for the features and the target variable
y = churn_df["churn"].values
X = churn_df[["account_length", "customer_service_calls"]].values

# Create a KNN classifier with 6 neighbors
knn = KNeighborsClassifier(n_neighbors = 6)

# Fit the classifier to the data
knn.fit(X, y)


'''

' \n# Import KNeighborsClassifier\nfrom sklearn.neighbors import KNeighborsClassifier \n\n# Create arrays for the features and the target variable\ny = churn_df["churn"].values\nX = churn_df[["account_length", "customer_service_calls"]].values\n\n# Create a KNN classifier with 6 neighbors\nknn = KNeighborsClassifier(n_neighbors = 6)\n\n# Fit the classifier to the data\nknn.fit(X, y)\n\n\n'

# 1.1.2 k-Nearest Neighbors: Predict

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

In [3]:
'''
# Predict the labels for the X_new
y_pred = knn.predict(X_new)

# Print the predictions for X_new
print("Predictions: {}".format(y_pred)) 

'''

'\n# Predict the labels for the X_new\ny_pred = knn.predict(X_new)\n\n# Print the predictions for X_new\nprint("Predictions: {}".format(y_pred)) \n\n'

The model has predicted the first and third customers will not churn in the new array. But how do we know how accurate these predictions are? 

# 1.3 Measuring model performance

1. Measuring model performance

    - In classification, accuracy is a commonly-used metric. 
    - Accuracy - correct predictions/total number of observations.
    
    How do we measure accuracy?
    
        - We could compute accuracy on the data used to fit the classifier. However, as this data was used to train the model, performance will not be indicative of how well it can generalize to unseen data, which is what we are interested in!
        
2. Computing accuracy

![image.png](attachment:image.png)

    -  common to split data into a training set and a test set
    
3. Train/test split

    from sklearn.model_selection import train_test_split 
    
    X_train, X_test, y_train, y_test = train_test+split(X,y, test_size = 0.3, random_state = 21, stratify = y)
    
        To do this, we import train_test_split from sklearn-dot-model_selection. We call train_test_split, passing our features and targets.
        
        We commonly use 20-30% of our data as the test set. By setting the test_size argument to zero-point-three we use 30% here.
        
        The random_state argument sets a seed for a random number generator that splits the data. Using the same number when repeating this step allows us to reproduce the exact split and our downstream results.
        
       (Stratify = y):
       
      - It is best practice to ensure our split reflects the proportion of labels in our data.
      - So if churn occurs in 10% of observations, we want 10% of labels in our training and test sets to represent churn. 
      - We achieve this by setting stratify equal to y. train_test_split returns four arrays: the training data, the test data, the training labels, and the test labels. We unpack these into X_train, X_test, y_train, and y_test, respectively. 
      
       We then instantiate a KNN model and fit it to the training data using the dot-fit method. 
      
            from sklearn.model_selection import train_test_split 
            X_train, X_test, y_train, y_test = train_test+split(X,y, test_size = 0.3, random_state = 21, stratify = y)
            knn = KNeighborsClassifier(n_neighbors = 6)
            knn.fit(X_train, y_train)
            
        To check the accuracy, we use the dot-score method, passing X test and y test. The accuracy of our model is 88%, which is low given our labels have a 9 to 1 ratio.
       
        print(knn.score(X_test, y_test)

8. Model complexity

![image-3.png](attachment:image-3.png)

        - In the image shown, as k increases, the decision boundary is less affected by individual observations, reflecting a simpler model. 
        
        - Simpler models are less able to detect relationships in the dataset, which is known as underfitting.
            
        -In contrast, complex models can be sensitive to noise in the training data, rather than reflecting general trends. This is known as overfitting.
        
9. Model complexity and over/underfitting

![image-4.png](attachment:image-4.png)

![image-5.png](attachment:image-5.png)

10. Plotting our results

![image-6.png](attachment:image-6.png)
    
    As k increases beyond 15 we see overfitting where performance plateaus on both test and training sets, as indicated in this plot.
    
    The peak test accuracy actually occurs at around 13 neighbors.



In [1]:
''' 

# Import the module

from sklearn.model_selection import train_test_split

X = churn_df.drop("churn", axis=1).values
y = churn_df["churn"].values

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
knn = KNeighborsClassifier(n_neighbors=5)

# Fit the classifier to the training data
knn.fit(X_train, y_train)

# Print the accuracy
print(knn.score(X_test, y_test))

'''

' \n\n# Import the module\n\nfrom sklearn.model_selection import train_test_split\n\nX = churn_df.drop("churn", axis=1).values\ny = churn_df["churn"].values\n\n# Split into training and test sets\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)\nknn = KNeighborsClassifier(n_neighbors=5)\n\n# Fit the classifier to the training data\nknn.fit(X_train, y_train)\n\n# Print the accuracy\nprint(knn.score(X_test, y_test))\n\n'

### 1.3.1 Overfitting and underfitting

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

In [3]:
'''

# Create neighbors
neighbors = np.arange(1, 13)
train_accuracies = {}
test_accuracies = {}

for neighbor in neighbors:
  
	# Set up a KNN Classifier
	knn = KNeighborsClassifier(n_neighbors=neighbor)
  
	# Fit the model
	knn.fit(X_train, y_train)
  
	# Compute accuracy
	train_accuracies[neighbor] = knn.score(X_train, y_train)
	test_accuracies[neighbor] = knn.score(X_test, y_test)
print(neighbors, '\n', train_accuracies, '\n', test_accuracies)

'''

"\n\n# Create neighbors\nneighbors = np.arange(1, 13)\ntrain_accuracies = {}\ntest_accuracies = {}\n\nfor neighbor in neighbors:\n  \n\t# Set up a KNN Classifier\n\tknn = KNeighborsClassifier(n_neighbors=neighbor)\n  \n\t#\xa0Fit the model\n\tknn.fit(X_train, y_train)\n  \n\t# Compute accuracy\n\ttrain_accuracies[neighbor] = knn.score(X_train, y_train)\n\ttest_accuracies[neighbor] = knn.score(X_test, y_test)\nprint(neighbors, '\n', train_accuracies, '\n', test_accuracies)\n\n"

Notice how training accuracy decreases as the number of neighbors initially gets larger, and vice versa for the testing accuracy? These scores would be much easier to interpret in a line plot, so let's produce a model complexity curve of these results.

### 1.3.2 Visualizing model complexity

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

In [5]:
'''

# Add a title
plt.title("KNN: Varying Number of Neighbors")

# Plot training accuracies
plt.plot(neighbors, train_accuracies.values(), label="Training Accuracy")

# Plot test accuracies
plt.plot(neighbors, test_accuracies.values(), label="Testing Accuracy")

plt.legend()
plt.xlabel("Number of Neighbors")
plt.ylabel("Accuracy")

# Display the plot
plt.show()

'''

'\n\n# Add a title\nplt.title("KNN: Varying Number of Neighbors")\n\n#\xa0Plot training accuracies\nplt.plot(neighbors, train_accuracies.values(), label="Training Accuracy")\n\n# Plot test accuracies\nplt.plot(neighbors, test_accuracies.values(), label="Testing Accuracy")\n\nplt.legend()\nplt.xlabel("Number of Neighbors")\nplt.ylabel("Accuracy")\n\n# Display the plot\nplt.show()\n\n'

![image.png](attachment:image.png)

See how training accuracy decreases and test accuracy increases as the number of neighbors gets larger. For the test set, accuracy peaks with 7 neighbors, suggesting it is the optimal value for our model. Now let's explore regression models!

# 2. Regression

2. Predicting blood glucose levels

    import pandas as pd
    databetes_df = read_csv("diabetes.csv")
    print(databetes_df.head())
    
    ![image.png](attachment:image.png)

3. Creating feature and target arrays

![image-2.png](attachment:image-2.png)

        - scikit-learn requires features and target values in distinct variables, X and y. 
        - To use all of the features in our dataset, we drop our target, blood glucose levels, and store the values attribute as X. 
        - For y, we take the the target column's values attribute. We can print the type for X and y to confirm they are now both NumPy arrays.
        
4. Making predictions from a single feature

![image-3.png](attachment:image-3.png)

    To start, let's try to predict blood glucose levels from a single feature: body mass index. To do this, we slice out the BMI column of X, which is the fourth column, storing as the variable X_bmi.
   
        X_bmi = X[:,3]
    
      Checking the shape of y and X_bmi, we see that they are both one-dimensional arrays. This is fine for y, but our features must be formatted as a two-dimensional array to be accepted by scikit-learn.  
    
        print(y.shape, X_bmi.shape)
        
    To convert the shape of X_bmi we apply NumPy's dot-reshape method, passing minus one followed by one. Printing the shape again shows X_bmi is now the correct shape for our model.
    
        X_bmi = X_bmi.reshape(-1,1)
        
5. Plotting glucose vs. body mass index
    
        import matplotlib.pypolot as plt 
        plt.scatter(X_bmi, y)
        plt.ylabel("")
        plt.xlabel("")
        plt.show
        
![image-4.png](attachment:image-4.png)

![image-5.png](attachment:image-5.png)

        Interpretation: generally, as body mass index increases, blood glucose levels also tend to increase.
 
6. Fitting a regression model

Model Building: 

      from sklearn.linear_model import LinearRegression
      reg = LinearRegression()
      reg.fit(X_bmi, y)
      predictions = reg.predict(X_bmi)
        
Plotting: 

     plt.scatter(X_bmi, y)
     plt.plot(X_bmi, predictions_
     plt.ylabel("Blood Glucose (mg/dl)")
     plt.xlabel("Blood Glucose (mg/dl)")
     plt.show()

## 2.1 Creating Features 

![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)

In [7]:
'''import numpy as np

# Create X from the radio column's values
X = sales_df["radio"].values 

# Create y from the sales column's values
y = sales_df["sales"].values 

# Reshape X
X = X.reshape(-1,1)

# Check the shape of the features and targets
print(y.shape, X.shape)'''

'import numpy as np\n\n# Create X from the radio column\'s values\nX = sales_df["radio"].values \n\n# Create y from the sales column\'s values\ny = sales_df["sales"].values \n\n# Reshape X\nX = X.reshape(-1,1)\n\n# Check the shape of the features and targets\nprint(y.shape, X.shape)'

## 2.2 Building a linear regression model

![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)

In [8]:
'''
# Import LinearRegression
from sklearn.linear_model import LinearRegression

# Create the model
reg = LinearRegression()

# Fit the model to the data
reg.fit(X,y)

# Make predictions
predictions = reg.predict(X)

print(predictions[:5])

'''

'\n# Import LinearRegression\nfrom sklearn.linear_model import LinearRegression\n\n# Create the model\nreg = LinearRegression()\n\n# Fit the model to the data\nreg.fit(X,y)\n\n# Make predictions\npredictions = reg.predict(X)\n\nprint(predictions[:5])\n\n'

![image.png](attachment:image.png)

## 2.3 Visualizing a linear regression model

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

In [10]:
'''

# Import matplotlib.pyplot
import matplotlib.pyplot as plt

# Create scatter plot
plt.scatter(x = X, y = y, color="blue")

# Create line plot
plt.plot(X, predictions, color="red")
plt.xlabel("Radio Expenditure ($)")
plt.ylabel("Sales ($)")

# Display the plot
plt.show()

'''



'\n\n# Import matplotlib.pyplot\nimport matplotlib.pyplot as plt\n\n# Create scatter plot\nplt.scatter(x = X, y = y, color="blue")\n\n# Create line plot\nplt.plot(X, predictions, color="red")\nplt.xlabel("Radio Expenditure ($)")\nplt.ylabel("Sales ($)")\n\n# Display the plot\nplt.show()\n\n'

![image.png](attachment:image.png)


## 2.4 The basics of linear regression

2. Regression mechanics

![image.png](attachment:image.png)

3. The loss function

![image-2.png](attachment:image-2.png)

        We want the line to be as close to the observations as possible. Therefore, we want to minimize the vertical distance between the fit and the data. So for each observation, we calculate the vertical distance between it and the line.
        
4. Residual         

        This distance is called a residual. We could try to minimize the sum of the residuals, but then each positive residual would cancel out each negative residual. 

5. Ordinary least square 

![image-3.png](attachment:image-3.png)

        To avoid this, we square the residuals. By adding all the squared residuals, we calculate the residual sum of squares, or RSS. This type of linear regression is called Ordinary Least Squares, or OLS, where we aim to minimize the RSS.
        
6. Linear regression in higher dimensions

![image-4.png](attachment:image-4.png)

        When we have two features, x1 and x2, and one target, y, a line takes the form y = a1x1 + a2x2 + b. 
        
        So to fit a linear regression model we specify three variables, a1, a2, and the intercept, b.
        
        When adding more features, it is known as multiple linear regression. Fitting a multiple linear regression model means specifying a coefficient, a n, for n number of features, and b. For multiple linear regression models, scikit-learn expects one variable each for feature and target values.

7. Linear regression using all features

![image-5.png](attachment:image-5.png)

8. R-squared

![image-6.png](attachment:image-6.png)

9. R-squared in scikit-learn

![image-7.png](attachment:image-7.png)

10. Mean squared error and root mean squared error

![image-8.png](attachment:image-8.png)

        Another way to assess a regression model's performance is to take the mean of the residual sum of squares.
        
        MSE is measured in units of our target variable, squared. For example, if a model is predicting a dollar value, MSE will be in dollars squared. To convert to dollars, we can take the square root, known as the root mean squared error, or RMSE.
       
       
11. RMSE in scikit-learn

![image-9.png](attachment:image-9.png)

## 2.5 Fit and predict for regression

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

In [11]:
'''# Create X and y arrays
X = sales_df.drop("sales", axis=1).values
y = sales_df["sales"].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Instantiate the model
reg = LinearRegression()

# Fit the model to the data
reg.fit(X_train, y_train)

# Make predictions
y_pred = reg.predict(X_test)
print("Predictions: {}, Actual Values: {}".format(y_pred[:2], y_test[:2]))'''

'# Create X and y arrays\nX = sales_df.drop("sales", axis=1).values\ny = sales_df["sales"].values\n\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)\n\n# Instantiate the model\nreg = LinearRegression()\n\n# Fit the model to the data\nreg.fit(X_train, y_train)\n\n# Make predictions\ny_pred = reg.predict(X_test)\nprint("Predictions: {}, Actual Values: {}".format(y_pred[:2], y_test[:2]))'

![image.png](attachment:image.png)

The first two predictions appear to be within around 5% of the actual values from the test set!

## 2.6 Regression performance

![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)

In [13]:
'''
# Import mean_squared_error
from sklearn.metrics import mean_squared_error

# Compute R-squared
r_squared = reg.score(X_test, y_test)

# Compute RMSE
rmse = mean_squared_error(y_test, y_pred, squared=False)

# Print the metrics
print("R^2: {}".format(r_squared))
print("RMSE: {}".format(rmse))


'''

'\n# Import mean_squared_error\nfrom sklearn.metrics import mean_squared_error\n\n# Compute R-squared\nr_squared = reg.score(X_test, y_test)\n\n# Compute RMSE\nrmse = mean_squared_error(y_test, y_pred, squared=False)\n\n# Print the metrics\nprint("R^2: {}".format(r_squared))\nprint("RMSE: {}".format(rmse))\n\n\n'

![image.png](attachment:image.png)

Wow, the features explain 99.9% of the variance in sales values! Looks like this company's advertising strategy is working well!

## 2.7 Cross-validation

1. Cross-validation motivation

![image.png](attachment:image.png)

2. Cross-validation basics

![image-2.png](attachment:image-2.png)

    1.  splitting the dataset into five groups or folds.
    2.  set aside the first fold as a test set
    3.  fit our model on the remaining four folds, predict on our test set
    4. compute the metric of interest, such as R-squared
    5. Next, we set aside the second fold as our test set
    6. fit on the remaining data, predict on the test set
    7. and compute the metric of interest. Then similarly with the third fold, the fourth fold,
    8. and the fifth fold. As a result we get five values of R-squared from which we can compute statistics of interest, such as the mean, median, and 95% confidence intervals.


3. Cross-validation and model performance

![image-4.png](attachment:image-4.png)

4. Cross-validation in scikit-learn

![image-5.png](attachment:image-5.png)
    
    1. import cross_val_score from sklearn-dot-model_selection
    2. import KFold, which allows us to set a seed and shuffle our data, making our results repeatable downstream.
    3. The n_splits argument has a default of five, but in this case we assign six, allowing us to use six folds from our dataset for cross-validation.
    4. set shuffle to True, which shuffles our dataset before splitting into folds.
    5. assign a seed to the random_state keyword argument, ensuring our data would be split in the same way if we repeat the process making the results repeatable downstream.

5. Evaluating cross-validation peformance

![image-6.png](attachment:image-6.png)

    calculate the mean score using np-dot-mean, and the standard deviation using np-dot-std. Additionally, we can calculate the 95% confidence interval using the np-dot-quantile function, passing our results followed by a list containing the upper and lower limits of our interval as decimals.


In [None]:
'''# Import the necessary modules

from sklearn.model_selection import cross_val_score, KFold

# Create a K-fold object

kf = KFold(n_splits = 6, shuffle = True, random_state = 5)

reg = LinearRegression()

# Compute 6-fold cross-validation scores

cv_scores = cross_val_score(reg,X,y, cv = kf)
print(cv_scores)'''

In [None]:
'''# Analyzing cross-validation_metrics 

# Print the mean

print(np.mean(cv_results))

# Print the standard deviation
print(np.std(cv_results))

# Print the 95% confidence interval
print(np.quantile(cv_results, [0.025, 0.975]))
'''

![image.png](attachment:image.png)

## 2.8 Regularized regression

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

![image-3.png](attachment:image-3.png)

In [1]:
'''# Import Ridge 

from sklearn.linear_model import Ridge
alphas = [0.1,1.0,10.0,100.0,1000.0, 10000.0]
ridge_scores = []
for alpha in alphas: 
    
    #Create a ridge regression model
    ridge = Ridge(alpha = alpha)
        
    # Fit the data
    ridge.fit(X_train, y_train)
        
    #Obtain R-squared
    score = ridge.score(X_test, y_test)
    ridge_scores.append(score)
    
print(ridge_scores)'''
        

'# Import Ridge \n\nfrom sklearn.linear_model import Ridge\nalphas = [0.1,1.0,10.0,100.0,1000.0, 10000.0]\nridge_scores = []\nfor alpha in alphas: \n    \n    #Create a ridge regression model\n    ridge = Ridge(alpha = alpha)\n        \n    # Fit the data\n    ridge.fit(X_train, y_train)\n        \n    #Obtain R-squared\n    score = ridge.score(X_test, y_test)\n    ridge_scores.append(score)\n    \nprint(ridge_scores)'

In [3]:
'''## Lasso regression for feature importance

# Import Lasso
from sklearn.linear_model import Lasso

# Instantiate a lasso regression model
lasso = Lasso(alpha = 0.3)

# Fit the model to the data
lasso.fit(X,y)

# Compute and print the coefficients
lasso_coef = lasso.fit(X,y).coef_
print(lasso_coef)
plt.bar(sales_columns, lasso_coef)
plt.xticks(rotation=45)
plt.show()'''


'## Lasso regression for feature importance\n\n# Import Lasso\nfrom sklearn.linear_model import Lasso\n\n# Instantiate a lasso regression model\nlasso = Lasso(alpha = 0.3)\n\n# Fit the model to the data\nlasso.fit(X,y)\n\n# Compute and print the coefficients\nlasso_coef = lasso.fit(X,y).coef_\nprint(lasso_coef)\nplt.bar(sales_columns, lasso_coef)\nplt.xticks(rotation=45)\nplt.show()'

![image.png](attachment:image.png)

See how the figure makes it clear that expenditure on TV advertising is the most important feature in the dataset to predict sales values!

# 3. Fine-tuning your Model

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

![image-3.png](attachment:image-3.png)

![image-4.png](attachment:image-4.png)