
# ❤️ Attack Prediction
<img src='https://2rdnmg1qbg403gumla1v9i2h-wpengine.netdna-ssl.com/wp-content/uploads/sites/3/2020/01/mildHeartAttack-866257238-770x553-745x490.jpg' height=500 width=500/>

#### About data set
This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one that has been used by ML researchers to
this date.The "target" field refers to the presence of heart disease in the patient. It is integer valued 0 = no/less chance of heart attack and 1 = more chance of heart attack

#### Attribute Information
- age
- sex
- chest pain type (4 values)
- resting blood pressure
- serum cholestoral in mg/dl
- fasting blood sugar > 120 mg/dl
- resting electrocardiographic results (values 0,1,2)
- maximum heart rate achieved
- exercise induced angina
- oldpeak = ST depression induced by exercise relative to rest
- the slope of the peak exercise ST segment
- number of major vessels (0-3) colored by flourosopy
- thal: 0 = normal; 1 = fixed defect; 2 = reversable defect
- target: 0= less chance of heart attack 1= more chance of heart attack

## <a id='toc'>Table of Contents</a>
1. [Overview](#1)
2. [Pandas Profiling](#2)
3. [SweetViz](#3)
4. [Data Analysis Baseline Library](#4)
5. [AutoViz](#5)
6. [Pycaret](#6)
7. [Datasist](#7)

# <a id='1'>1. Overview</a>
<a href='#toc'><span class="label label-info">Go back to the Table of Contents</span></a>
## Exploratory Data Analysis 
### Playing with Automated EDA Tools 
Exploratory Data Analysis (EDA) plays a very important role in understanding the dataset. Whether you are going to build a Machine Learning Model or if it's just an exercise to bring out insights from the given data, EDA is the primary task to perform. While it's undeniable that EDA is very important, The task of performing Exploratory Data Analysis grows in parallel with the number of columns your dataset has got.

In [None]:
#Loading The Libraries

#For uploading and accessing the data
import pandas as pd
import numpy as np

#For visualizations
import matplotlib.pyplot as plt
import seaborn as sns
!pip install dexplot -q
!pip install dabl -q
!pip install sweetviz -q
!pip install autoviz -q
!pip install pycaret -q
!pip install datasist -q




# for visualizations
plt.style.use('fivethirtyeight')

# for interactive visualizations
import plotly.offline as py
from plotly.offline import init_notebook_mode, iplot
import plotly.graph_objs as go
from plotly import tools
init_notebook_mode(connected = True)
import plotly.figure_factory as ff
from sklearn.preprocessing import StandardScaler

from pandas_profiling import ProfileReport
import sweetviz as sv

import dexplot as dxp
import dabl
import sweetviz as sv


In [None]:
df = pd.read_csv("../input/health-care-data-set-on-heart-attack-possibility/heart.csv")

datA = ff.create_table(df.head())

py.iplot(datA)

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
df_plot = df.copy()
di = {1: "Male", 0: "Female"}
dj = {0 : 'normal', 1 :'fixed defect', 2 : 'reversable defect'}
dk = {0:'Less chance of Heart Attack',1:'High Chance of Heart Attack'}


df_plot['sex'].replace(di, inplace=True)
df_plot['thal'].replace(dj, inplace=True)
df_plot['target'].replace(dk, inplace=True)
print(df_plot)

# <a id='2'>2. Pandas Profiling</a>
<a href='#toc'><span class="label label-info">Go back to the Table of Contents</span></a>
#### From Github-
##### Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great but a little basic for serious exploratory data analysis. pandas_profiling extends the pandas DataFrame with df.profile_report() for quick data analysis.

##### [](http://)For each column the following statistics - if relevant for the column type - are presented in an interactive HTML report:

* Type inference: detect the types of columns in a dataframe.
* Essentials: type, unique values, missing values
* Quantile statistics like minimum value, Q1, median, Q3, maximum, range, interquartile range
* Descriptive statistics like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
* Most frequent values
* Histogram
* Correlations highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices
* Missing values matrix, count, heatmap and dendrogram of missing values
* Text analysis learn about categories (Uppercase, Space), scripts (Latin, Cyrillic) and blocks (ASCII) of text data.
* File and Image analysis extract file sizes, creation dates and dimensions and scan for truncated images or those containing EXIF information.

In [None]:
profile = ProfileReport(df_plot, title='Pandas Profiling Report')

In [None]:
profile.to_notebook_iframe()

# <a id='3'>3. SweetViz</a>
<a href='#toc'><span class="label label-info">Go back to the Table of Contents</span></a><img src = "https://camo.githubusercontent.com/4d5936eda5ab56d2cef1975e6a750b3b376ef90b/687474703a2f2f636f6f6c74696d696e672e636f6d2f53562f6c6f676f2e706e67">
<br>
<b>Sweetviz is an open source Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with a single line of code. Output is a fully self-contained HTML application.The system is built around quickly visualizing target values and comparing datasets. Its goal is to help quick analysis of target characteristics, training vs testing data, and other such data characterization tasks.</b>

In [None]:
my_report = sv.analyze(df,target_feat='target')
my_report.show_html()

In [None]:
from IPython.display import HTML

HTML(filename="./SWEETVIZ_REPORT.html")

# <a id='4'>4. Data Analysis Baseline Library</a>
<a href='#toc'><span class="label label-info">Go back to the Table of Contents</span></a>
###### Dabl offers a tool to automate many of the repetitive tasks involved in the initial model development phase. Although features are currently a little limited, the functionality that does exist is very promising. Looking at the Github repo development is very much ongoing with the latest commit to the project happening as recently as November 15th 2019.

### Data Preprocessing
##### Ordinarily, the first Step of a machine learning project would be to Perform EDA and necessary preprocessing prior to training any machine learning model. Dabl seeks to automate this process. If you run the following commands dabl will attempt to identify missing values, feature types and erroneous data.

In [None]:
clean_data = dabl.clean(df, verbose=1)
clean_data.describe()

In [None]:
types = dabl.detect_types(clean_data)
types

###### dabl detects features types and automatically cleans the data this makes analysing the data extremely fast.

In [None]:
dabl.plot(clean_data, 'target')

###### Dabl also seeks to speed up the model selection process. By running a very small amount of code dabl trains a selection of scikit-learn models and returns the corresponding scores.


In [None]:
dabl_classifer = dabl.SimpleClassifier(random_state=0)

In [None]:
X = clean_data.drop('target', axis=1)
sc = StandardScaler()
X = sc.fit_transform(X)
y = clean_data.target
dabl_classifer.fit(X, y)

# <a id='5'>5. AutoViz</a>
<a href='#toc'><span class="label label-info">Go back to the Table of Contents</span></a><img src = "https://github.com/AutoViML/AutoViz/raw/master/logo.png">
<br>
AutoViz is a one-click visualization engine: It creates powerful charts that anyone from a beginner to an expert can use.

AutoViz knows creating charts from any data manually is hard: It's even harder when you don't know what's in it. AutoViz starts by first analyzing your data to know if it is a Classification, Regression, Unsupervised or Time Series problem. It then chooses the best charts to maximize your insights...

In [None]:
from autoviz.AutoViz_Class import AutoViz_Class

AV = AutoViz_Class()

In [None]:
sep = ','
target = 'target'
filename = '../input/health-care-data-set-on-heart-attack-possibility/heart.csv'

In [None]:
dft = AV.AutoViz(filename, sep=sep, depVar=target, dfte=df_plot, header=0, verbose=2,
                            lowess=False,chart_format='svg',max_rows_analyzed=1500,max_cols_analyzed=30)

# <a id='6'>6. Pycaret</a>
<a href='#toc'><span class="label label-info">Go back to the Table of Contents</span></a>
<img src = "https://github.com/pycaret/pycaret/raw/master/pycaret2-features.png">

###### PyCaret is an open source low-code machine learning library in Python that aims to reduce the hypothesis to insights cycle time in a ML experiment. It enables data scientists to perform end-to-end experiments quickly and efficiently. In comparison with the other open source machine learning libraries, PyCaret is an alternate low-code library that can be used to perform complex machine learning tasks with only few lines of code. PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks such as scikit-learn, XGBoost, Microsoft LightGBM, spaCy and many more.

In [None]:
from pycaret.classification import *
classifier = setup(df, target = 'target', session_id=42, experiment_name='heart',normalize=True,silent=True)

#### Base Model Comparison 

In [None]:
best_model = compare_models(blacklist=['nb','svm','qda'])

#### Create Custom Models

In [None]:
lr = create_model('lr', fold = 10)

In [None]:
rf = create_model('rf', fold = 5)

In [None]:
models(type='ensemble').index.tolist()

In [None]:
ensembled_models = compare_models(whitelist = models(type='ensemble').index.tolist(), fold = 3)

#### Hyperparameter Tuning

In [None]:
tuned_lr = tune_model(lr)

#### Analyze Model with Plot Curves
##### ROC Curve

In [None]:
plot_model(lr)

In [None]:
plot_model(rf)

##### Confusion Metrics

In [None]:
plot_model(lr, plot = 'confusion_matrix')

In [None]:
plot_model(rf, plot = 'confusion_matrix')


#### Feature Importance Plot

In [None]:
plot_model(rf, plot = 'feature')

#### Classification Report

In [None]:
plot_model(lr, plot = 'class_report')

In [None]:
plot_model(rf, plot = 'class_report')


# <a id='7'>7. Datasist</a>
<a href='#toc'><span class="label label-info">Go back to the Table of Contents</span></a>

##### Datasist is a python package providing fast, quick, and an abstracted interface to popular and frequently used functions or techniques relating to data analysis, visualization, data exploration, feature engineering, Computer, NLP, Deep Learning, modeling, model deployment etc.

<img src = "https://gblobscdn.gitbook.com/assets%2F-LyFPbiBAwD4nrB_qWNH%2F-LyFQzeAX6C4CZz4koDG%2F-LyFaP2OcHN8Lhyxyw2Y%2Fdlogo.jpeg?alt=media&token=39ed7ff9-a020-4011-86c4-e047d7d12e46">

In [None]:
import datasist as ds
# Quick summary of a data using the describe function in the structdata module
ds.structdata.describe(df_plot)

In [None]:
# Show categorical features
cat_feats = ds.structdata.get_cat_feats(df_plot)
cat_feats

In [None]:
# Show Numerical features
num_feats = ds.structdata.get_num_feats(df_plot)
num_feats

In [None]:
# Get Unique Count
ds.structdata.get_unique_counts(df_plot)

In [None]:
# Visualize Missing
ds.visualizations.plot_missing(df_plot)

In [None]:
ds.structdata.display_missing(df_plot)

In [None]:
#check the unique classes in each categorical feature
ds.structdata.class_count(df_plot)

In [None]:
# Autoviz Integration
ds.visualizations.autoviz(df_plot)

In [None]:
#VISUALIZATION FOR CATEGORICAL FEATURES
ds.visualizations.countplot(df_plot)

In [None]:
ds.visualizations.catbox(data=df_plot, target='target', fig_size=(10,7))

In [None]:
ds.visualizations.boxplot(data=df_plot, target='target', fig_size=(5,5))

In [None]:
ds.visualizations.catbox(data=df_plot, target='target')

In [None]:
ds.visualizations.violinplot(data=df_plot, target='target')

### Made with ❤️ 