# Can AI be used to classify tumors as malignant or benign?


In [1]:
# For this classification project, I imported a toy dataset from scikit-learn (so I didn't download any external files from other websites)
import sklearn
from sklearn import datasets

cancer = datasets.load_breast_cancer() # Load the breast cancer diagnostic dataset from scikit-learn

print("Names of Features")
print(cancer.feature_names) 

print("Names of Classes")
print(cancer.target_names)

# We find that either a tumor is 'malignant' or it is 'benign'. 
# Malignant tumors are cancerous, grow quickly and uncontrollably, and can metastasize to other body sites.
# Benign tumors tend to grow slowly and do not spread.

Names of Features
['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']
Names of Classes
['malignant' 'benign']


# Using a descision tree classifier:
Using the decision tree model, given features of a supposed breast cancer tumor, we can label it as malignant or benign. 

A decision tree is a tree-like model of decisions used to go from observations about an item to conclusions about the item.

It is a supervised machine learning technique that can be used for classification and/or regression problems.


In [2]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

# Create the training and testing datasets
X = cancer.data # X, what we're using to predict, aka the features like tumor texture, radius, etc.
y = cancer.target # y, what we're predicting, aka the classes/labels (malignant or benign)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20) # We split the data into 'train' and 'test' sets, test size is 20% of the total data

# Train a classifier
classifier = DecisionTreeClassifier()
classifier.fit(X_train, y_train)

# Use the trained classifier to predict whether a new piece of data is benign or malignant
prediction = classifier.predict(X_test)

# Note that these predictions aren't necessarily correct, 1 is cancer (true), 0 is not (false)
prediction

array([1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1,
       0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0,
       1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1,
       0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0,
       0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1,
       0, 1, 1, 0])

In [3]:
correct = 0 
for i in range(0, len(y_test)): # Iterate through all of our predictions
        if (y_test[i] == prediction[i]):
            correct += 1 

accuracy = correct / len(y_test)

print("Number of correct predictions: ", correct)
print("Total number of predictions: ", len(y_test))
print("Model accuracy: ", accuracy * 100, "%")

Number of correct predictions:  105
Total number of predictions:  114
Model accuracy:  92.10526315789474 %
