# Breast Cancer Dataset
## Task Overview
In this lab activity, your task is to build a Logistic Regression model using a subset of data points from the Breast Cancer data. The following are the specific sub-tasks you must accomplish:

1. **Load the Data** 
- Load the Breast Cancer data in your Jupyter Notebook from the scikit-learn library.
2. **Select a Unique Randomization Seed** 
- Select a unique integer that will serve as the seed for your randomization.
3. **Sample Train Data** 
- Randomly sample a subset of the data using the seed you have selected; limit the sample to 30. Ensure that your sampled data is representative of the population.
4. **Weight Update Function** 
- Build a weight update function following the Gradient Descent concept.
5. **Display the Values of Weights** 
- Print the values of weights at each iteration separated by individual cell.
6. **Plot the Value of Weights** 
- Display a line chart showing the variation of weight values per iteration. Per each weight, show an individual line chart of values against iteration.
7. **Build a Function for the Final Regression Model** 
- Create a function using the final regression model after all your iterations. Display the mathematical expression with all the final weights values multiplied by the input variables.
8. **Sample Test Data** 
- From the remainder of the original dataset, randomly sample another set of 30 observations NOT present in your training sample.
9. **Use the Regression Function for Prediction** 
- Use your built linear regression function to predict for the Target Variable in your test set.
10. **Calculate for Errors** 
- Calculate for the overall error between your model’s prediction and the actual values in the test set.

The final deliverable will include your code implementation divided into sections according to above subtasks.

The description of the features can also be seen on the library.

## Load the Data
Load the Breast Cancer data in your Jupyter Notebook from the scikit-learn library.

In [4]:
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import itertools
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, classification_report, confusion_matrix
from sklearn.model_selection import  GridSearchCV

# Load the Breast Cancer dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target, name='target')

In [5]:
# Check the non-null count and the data types
X.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 30 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   mean radius              569 non-null    float64
 1   mean texture             569 non-null    float64
 2   mean perimeter           569 non-null    float64
 3   mean area                569 non-null    float64
 4   mean smoothness          569 non-null    float64
 5   mean compactness         569 non-null    float64
 6   mean concavity           569 non-null    float64
 7   mean concave points      569 non-null    float64
 8   mean symmetry            569 non-null    float64
 9   mean fractal dimension   569 non-null    float64
 10  radius error             569 non-null    float64
 11  texture error            569 non-null    float64
 12  perimeter error          569 non-null    float64
 13  area error               569 non-null    float64
 14  smoothness error         5

In [6]:
# Examine the basic statistical information of the dataset
X.describe()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
count,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,...,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0
mean,14.127292,19.289649,91.969033,654.889104,0.09636,0.104341,0.088799,0.048919,0.181162,0.062798,...,16.26919,25.677223,107.261213,880.583128,0.132369,0.254265,0.272188,0.114606,0.290076,0.083946
std,3.524049,4.301036,24.298981,351.914129,0.014064,0.052813,0.07972,0.038803,0.027414,0.00706,...,4.833242,6.146258,33.602542,569.356993,0.022832,0.157336,0.208624,0.065732,0.061867,0.018061
min,6.981,9.71,43.79,143.5,0.05263,0.01938,0.0,0.0,0.106,0.04996,...,7.93,12.02,50.41,185.2,0.07117,0.02729,0.0,0.0,0.1565,0.05504
25%,11.7,16.17,75.17,420.3,0.08637,0.06492,0.02956,0.02031,0.1619,0.0577,...,13.01,21.08,84.11,515.3,0.1166,0.1472,0.1145,0.06493,0.2504,0.07146
50%,13.37,18.84,86.24,551.1,0.09587,0.09263,0.06154,0.0335,0.1792,0.06154,...,14.97,25.41,97.66,686.5,0.1313,0.2119,0.2267,0.09993,0.2822,0.08004
75%,15.78,21.8,104.1,782.7,0.1053,0.1304,0.1307,0.074,0.1957,0.06612,...,18.79,29.72,125.4,1084.0,0.146,0.3391,0.3829,0.1614,0.3179,0.09208
max,28.11,39.28,188.5,2501.0,0.1634,0.3454,0.4268,0.2012,0.304,0.09744,...,36.04,49.54,251.2,4254.0,0.2226,1.058,1.252,0.291,0.6638,0.2075


In [7]:
# Display the top and bottom 5 rows of the dataset
pd.concat([X.head(), X.tail()])

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678
564,21.56,22.39,142.0,1479.0,0.111,0.1159,0.2439,0.1389,0.1726,0.05623,...,25.45,26.4,166.1,2027.0,0.141,0.2113,0.4107,0.2216,0.206,0.07115
565,20.13,28.25,131.2,1261.0,0.0978,0.1034,0.144,0.09791,0.1752,0.05533,...,23.69,38.25,155.0,1731.0,0.1166,0.1922,0.3215,0.1628,0.2572,0.06637
566,16.6,28.08,108.3,858.1,0.08455,0.1023,0.09251,0.05302,0.159,0.05648,...,18.98,34.12,126.7,1124.0,0.1139,0.3094,0.3403,0.1418,0.2218,0.0782
567,20.6,29.33,140.1,1265.0,0.1178,0.277,0.3514,0.152,0.2397,0.07016,...,25.74,39.42,184.6,1821.0,0.165,0.8681,0.9387,0.265,0.4087,0.124
568,7.76,24.54,47.92,181.0,0.05263,0.04362,0.0,0.0,0.1587,0.05884,...,9.456,30.37,59.16,268.6,0.08996,0.06444,0.0,0.0,0.2871,0.07039


### Select a Unique Randomization Seed
Select a unique integer that will serve as the seed for your randomization.

In [None]:
###

### Sample Train Data
Randomly sample a subset of the data using the seed you have selected; limit the sample to 30. Ensure that your sampled data is representative of the population.

In [None]:
###

### Weight Update Function
Build a weight update function following the Gradient Descent concept.

In [None]:
###

### Display the Values of Weights
Print the values of weights at each iteration separated by individual cell.

In [None]:
###

### Plot the Value of Weights
Display a line chart showing the variation of weight values per iteration. Per each weight, show an individual line chart of values against iteration.

In [None]:
###

### Build a Function for the Final Regression Model
Create a function using the final regression model after all your iterations. Display the mathematical expression with all the final weights values multiplied by the input variables.

In [None]:
###

### Sample Test Data
From the remainder of the original dataset, randomly sample another set of 30 observations NOT present in your training sample.

In [None]:
###

### Use the Regression Function for Prediction
Use your built linear regression function to predict for the Target Variable in your test set.

In [None]:
###

### Calculate for Errors
Calculate for the overall error between your model’s prediction and the actual values in the test set.

In [None]:
###