Cleveland_Dataset

Repository by Steven Smiley

This respository hosts the files I used to analyze and evaluate the Cleveland dataset in Python.

** Revision 1 has a couple changes such that the MinMaxScaler() comes after the Test/Train Split. This is to prevent Data Leakage. There is no significant change in the results between Revision 0 and 1. The MLP for Revision 1 ended up having the same Accuracy as Revision 0. The SVM Accuracy for Revision 1 was not as high as Revision 0. However, the MLP was tied with SVM for Accuracy in Revision 0. The MLP ended up with a higher AUC in Revision 1 than Revision 0. Thus, the MLP wins the battle for the models without data leakage, but the overall accuracy and AUC values are not significantly different. Excellent diagnositcs in both.

Table of Contents to Repository

1. Jupyter Notebook
2. Inputs
- data.csv
3. Outputs
- Models
- Figures
4. Credits/References
5. Contact-Info
6. License

1 Jupyter Notebook

Jupyter Notebook(s) written in Python.

Notebook	Description
Cleveland.ipynb	My Jupyter notebook.

2 Inputs

Single input file (processed.cleveland.data) contains all of the information for the Cleveland dataset.

processed.cleveland.data

3 Outputs

The Outputs from the Jupyter notebook are placed in the following two folders: Models & Figures

4 Credits/References

Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
Hungarian Institute of Cardiology. Budapest: Andras Janosi, M.D.
University Hospital, Zurich, Switzerland: William Steinbrunn, M.D.
University Hospital, Basel, Switzerland: Matthias Pfisterer, M.D.
V.A. Medical Center, Long Beach and Cleveland Clinic Foundation:Robert Detrano, M.D., Ph.D.
Olson, Randal S. et al. “Data-driven advice for applying machine learning to bioinformatics problems.” Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing 23 (2017): 192-203.
SciPy. Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, CJ Carey, İlhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E.A. Quintero, Charles R Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, and SciPy 1.0 Contributors. (2019) SciPy 1.0–Fundamental Algorithms for Scientific Computing in Python. preprint arXiv:1907.10121
Python. a) Travis E. Oliphant. Python for Scientific Computing, Computing in Science & Engineering, 9, 10–20 (2007) b) K. Jarrod Millman and Michael Aivazis. Python for Scientists and Engineers, Computing in Science & Engineering, 13, 9–12 (2011)
NumPy. a) Travis E. Oliphant. A guide to NumPy, USA: Trelgol Publishing, (2006). b) Stéfan van der Walt, S. Chris Colbert and Gaël Varoquaux. The NumPy Array: A Structure for Efficient Numerical Computation, Computing in Science & Engineering, 13, 22–30 (2011)
IPython. a) Fernando Pérez and Brian E. Granger. IPython: A System for Interactive Scientific Computing, Computing in Science & Engineering, 9, 21–29 (2007)
Matplotlib. J. D. Hunter, “Matplotlib: A 2D Graphics Environment”, Computing in Science & Engineering, vol. 9, no. 3, pp. 90–95, 2007.
Pandas. Wes McKinney. Data Structures for Statistical Computing in Python, Proceedings of the 9th Python in Science Conference, 51–56 (2010)
Scikit-Learn. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, Édouard Duchesnay. Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research, 12, 2825–2830 (2011)
Scikit-Image. Stéfan van der Walt, Johannes L. Schönberger, Juan Nunez-Iglesias, François Boulogne, Joshua D. Warner, Neil Yager, Emmanuelle Gouillart, Tony Yu and the scikit-image contributors. scikit-image: Image processing in Python, PeerJ 2:e453 (2014)

5 Contact-Info

Feel free to contact me to discuss any issues, questions, or comments.

6 License

This repository contains a variety of content; some developed by Steven Smiley, and some from third-parties. The third-party content is distributed under the license provided by those parties.

The content developed by Steven Smiley is distributed under the following license:

*I am providing code and resources in this repository to you under an open source license. Because this is my personal repository, the license you receive to my code and resources is from me and not my employer.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Gist_Medium		Gist_Medium
Inputs		Inputs
Outputs		Outputs
r1		r1
.DS_Store		.DS_Store
.gitattributes		.gitattributes
Cleveland.ipynb		Cleveland.ipynb
CoverFigure.png		CoverFigure.png
LICENSE		LICENSE
README.md		README.md
heart-disease.names		heart-disease.names

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cleveland_Dataset

Repository by Steven Smiley

Table of Contents to Repository

1 Jupyter Notebook

2 Inputs

3 Outputs

4 Credits/References

5 Contact-Info

6 License

About

Releases

Packages

Languages

License

stevensmiley1989/Cleveland_Dataset

Folders and files

Latest commit

History

Repository files navigation

Cleveland_Dataset

Repository by Steven Smiley

Table of Contents to Repository

1 Jupyter Notebook

2 Inputs

3 Outputs

4 Credits/References

5 Contact-Info

6 License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages