## Approach A - Logisitic Regression

### Importing the neccessary libraries

In [7]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### Importing data (reading from the csv file)
We start by loading our dataset from a CSV file. The dataset contains information about breast cancer patients, and we want to predict whether a tumor is malignant (1) or benign (0).

In [8]:
dataframe = pd.read_csv('breast_cancer.csv')
X = dataframe.iloc[:, :-1].values
y = dataframe.iloc[:, -1].values

### Splitting the dataset into the Training set and Test set
Next, we split our data into a training set and a test set. This allows us to train the model on one subset and evaluate its performance on another.

In [9]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

### Feature Scaling
To improve model performance, we perform feature scaling. This ensures that all features have a similar scale, which is essential for algorithms like Logistic Regression.

In [10]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

### Training the Logistic Regression model on the Training set
Logistic Regression is a popular algorithm for binary classification. It models the relationship between the features and the probability of the target outcome.

In [11]:
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)

LogisticRegression(random_state=0)

### Calculating the Confusion Matrix
After training the model, we evaluate its performance using a confusion matrix and accuracy score.

A confusion matrix is a table that describes the performance of a classification model. It shows the number of true positive (TP), true negative (TN), false positive (FP), and false negative (FN) predictions.
And, the accuracy score gives us the percentage of correct predictions out of all predictions.

In [12]:
from sklearn.metrics import confusion_matrix, accuracy_score
y_pred = classifier.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

[[120   3]
 [  1  47]]


0.9766081871345029