Skip to content

The purpose of this Repository is to investigate different potential supervised Machine Learning (ML) algorithms for creating binary classification models that could serve as diagnostics for heart disease.

License

Notifications You must be signed in to change notification settings

stevensmiley1989/Cleveland_Dataset

Repository files navigation

Cleveland_Dataset

Repository by Steven Smiley

This respository hosts the files I used to analyze and evaluate the Cleveland dataset in Python.

** Revision 1 has a couple changes such that the MinMaxScaler() comes after the Test/Train Split. This is to prevent Data Leakage. There is no significant change in the results between Revision 0 and 1. The MLP for Revision 1 ended up having the same Accuracy as Revision 0. The SVM Accuracy for Revision 1 was not as high as Revision 0. However, the MLP was tied with SVM for Accuracy in Revision 0. The MLP ended up with a higher AUC in Revision 1 than Revision 0. Thus, the MLP wins the battle for the models without data leakage, but the overall accuracy and AUC values are not significantly different. Excellent diagnositcs in both.

Table of Contents to Repository

1 Jupyter Notebook

Jupyter Notebook(s) written in Python.

Notebook Description
Cleveland.ipynb My Jupyter notebook.

Single input file (processed.cleveland.data) contains all of the information for the Cleveland dataset.

processed.cleveland.data

The Outputs from the Jupyter notebook are placed in the following two folders: Models & Figures

4 Credits/References

  1. Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

  2. Hungarian Institute of Cardiology. Budapest: Andras Janosi, M.D.

  3. University Hospital, Zurich, Switzerland: William Steinbrunn, M.D.

  4. University Hospital, Basel, Switzerland: Matthias Pfisterer, M.D.

  5. V.A. Medical Center, Long Beach and Cleveland Clinic Foundation:Robert Detrano, M.D., Ph.D.

  6. Olson, Randal S. et al. “Data-driven advice for applying machine learning to bioinformatics problems.” Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing 23 (2017): 192-203.

  7. SciPy. Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, CJ Carey, İlhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E.A. Quintero, Charles R Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, and SciPy 1.0 Contributors. (2019) SciPy 1.0–Fundamental Algorithms for Scientific Computing in Python. preprint arXiv:1907.10121

  8. Python. a) Travis E. Oliphant. Python for Scientific Computing, Computing in Science & Engineering, 9, 10–20 (2007) b) K. Jarrod Millman and Michael Aivazis. Python for Scientists and Engineers, Computing in Science & Engineering, 13, 9–12 (2011)

  9. NumPy. a) Travis E. Oliphant. A guide to NumPy, USA: Trelgol Publishing, (2006). b) Stéfan van der Walt, S. Chris Colbert and Gaël Varoquaux. The NumPy Array: A Structure for Efficient Numerical Computation, Computing in Science & Engineering, 13, 22–30 (2011)

  10. IPython. a) Fernando Pérez and Brian E. Granger. IPython: A System for Interactive Scientific Computing, Computing in Science & Engineering, 9, 21–29 (2007)

  11. Matplotlib. J. D. Hunter, “Matplotlib: A 2D Graphics Environment”, Computing in Science & Engineering, vol. 9, no. 3, pp. 90–95, 2007.

  12. Pandas. Wes McKinney. Data Structures for Statistical Computing in Python, Proceedings of the 9th Python in Science Conference, 51–56 (2010)

  13. Scikit-Learn. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, Édouard Duchesnay. Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research, 12, 2825–2830 (2011)

  14. Scikit-Image. Stéfan van der Walt, Johannes L. Schönberger, Juan Nunez-Iglesias, François Boulogne, Joshua D. Warner, Neil Yager, Emmanuelle Gouillart, Tony Yu and the scikit-image contributors. scikit-image: Image processing in Python, PeerJ 2:e453 (2014)

5 Contact-Info

Feel free to contact me to discuss any issues, questions, or comments.

6 License

This repository contains a variety of content; some developed by Steven Smiley, and some from third-parties. The third-party content is distributed under the license provided by those parties.

The content developed by Steven Smiley is distributed under the following license:

*I am providing code and resources in this repository to you under an open source license. Because this is my personal repository, the license you receive to my code and resources is from me and not my employer.

Copyright 2020 Steven Smiley

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

The purpose of this Repository is to investigate different potential supervised Machine Learning (ML) algorithms for creating binary classification models that could serve as diagnostics for heart disease.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published