# Skin Cancer Classification - Convolutional Network

### by ReDay Zarra

This project utilizes a convolutional network to **identify 9 different kinds of skin cancers** including melanoma, nevus, and more. The model is **trained on over 2,200 pictures of various skin cancers** based off of this [dataset](https://www.kaggle.com/datasets/nodoubttome/skin-cancer9-classesisic). This model implements fundamental computer vision and classification techniques and includes a *step-by-step implementation of the model* as well as *in-depth notes to customize the model further* for higher accuracy.

## Importing the necessary libraries

Importing the essential **libraries for data manipulation and numerical analysis**. We will also need libraries for **data visualization and plotting**. Pickle will be used to **compress our folder of images** into train.p, valid.p, and test.p

In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn

import os
import pickle
import cv2

## Preparing the data

To begin training the model,

### Creating validation set

The validation set can be **created with by using sci-kit learn's train_test_split()** function because we can simply **divide the training set** that we already have. I have chosen to **assign 20% of my training set** to my validation set. The validation data is then **stored separately in valid.p** file.

In [73]:
from sklearn.model_selection import train_test_split

# Split the train data into a validation set, using a 80/20 split
X_train, X_validation, y_train, y_validation = train_test_split(X_train, y_train, test_size=0.2, random_state = 0)

# save the validation data as valid.p using pickle
with open('valid.p', 'wb') as f:
    pickle.dump((X_validation, y_validation), f)

### Convert to arrays

> In order to manipulate the data any further, the training, testing, and validation **datasets need to be converted into arrays**. The .array() method from NumPy allows an simple way to do just that.

In [84]:
import numpy as np

# Convert the list of image data into a numpy array
X_train = np.array(X_train)
y_train = np.array(y_train)

In [88]:
len(X_train)

1791

## Checking the dimensions of the dataset

Before processing the data, it is necessary to make sure the dataset and the variables we have stored them in are correct. We can easily see the shape of the 

In [78]:
X_train.shape

(1791,)