Breast-Cancer-Classification

Intent

Seer Database has kept a record of breast cancer cases from 1975 to 2017. Breast cancer morbidity cases make up ~17% of the observations in the database. Apart from skin cancers, breast cancer is the most common cancer in the United States. With morbidity being a real possibility in those diagnosed with breast cancer, quantifying the likelihood of it occuring is a useful metric for patients, their families, and health care teams to assess options and possible trajectories.

Using supervised Ensemble classification, machine learning is able to offer morbidity predictions. These predictions are imperfect & not always accurate, but can act as supplemental information for those on a health care journey or alongside someone else's. Note these predictions are not be taken in place of trained medical advice and are solely available for research purposes.

In the Repo

Workflow

The steps taken for this classification were performed in a series of work/notebooks. Note the first files are ipynb jupyter files while the last is a Tableau workbook.

Acquiring Data: Data Retrieval.ipynb
Data Processing: Data Processing.ipynb
Tuning and Modeling: Classification Modeling.ipynb
Visualizing Predictions: predsbook.twb

Visualizations Included in folder:

Ages and Predictions.png: a view of how average probabilities change throughout ages
Counts Across Stages.png: showing relative number of observations across cancer stages
Months Survival: trend in average probabilities across age groups
Probabilities Across Stages: similar to 2, but is colored to show the number of positive predictions in each stage

Powerpoint file MorbidityClassification is also available as .pdf

Files included in original repo but not shared Repos:

75bc.csv - initial file from SEER
75edits.csv - processed file
bcandpreds.csv - test set with predictions in dataframe

Tools Used

Using Python in Jupyter Notebooks , h2o ai was used to create models Random Forests, Naive Bayes, and XGBoost. Pandas dataframes were used for data handling. Visualizations were created in Tableau. The data was retreived using SEER*Stat Software.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Tableau Visualizations		Tableau Visualizations
.~predsbook__10720.twbr		.~predsbook__10720.twbr
I Data Retrieval.ipynb		I Data Retrieval.ipynb
II Data Processing.ipynb		II Data Processing.ipynb
III Classification Modeling.ipynb		III Classification Modeling.ipynb
MorbidityClassification.key		MorbidityClassification.key
README.md		README.md
bcandpreds.pkl		bcandpreds.pkl
h2o_xgb.sav		h2o_xgb.sav

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Breast-Cancer-Classification

Intent

In the Repo

Workflow

Visualizations Included in folder:

Files included in original repo but not shared Repos:

Tools Used

About

Releases

Packages

Languages

manyshapes/Cancer-Classification

Folders and files

Latest commit

History

Repository files navigation

Breast-Cancer-Classification

Intent

In the Repo

Workflow

Visualizations Included in folder:

Files included in original repo but not shared Repos:

Tools Used

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages