#                                      ICMR Healthcare  Cancer Type Detections  and Gene Type Analysis 

* Data Analyst : Sheri Prashanth Reddy 

## DESCRIPTION

### Problem Statement: 

ICMR wants to analyze different types of cancers, such as breast cancer, renal cancer, colon cancer, lung cancer, and prostate cancer becoming a cause of worry in recent years. They would like to identify the probable cause of these cancers in terms of genes responsible for each cancer type. This would lead us to early identification of each type of cancer reducing the fatality rate.

### Dataset Details: 

The input dataset contains 802 samples for the corresponding 802 people who have been detected with different types of cancer. Each sample contains expression values of more than 20K genes. Samples have one of the types of tumors: BRCA, KIRC, COAD, LUAD, and PRAD.


### Project Tasks are divided in 4 weeks 

## Week 1:-  Exploratory Data Analysis


#### Project Task: Week 1:

Exploratory Data Analysis:

Merge both the datasets.

Plot the merged dataset as a hierarchically-clustered heatmap.

Perform Null-hypothesis testing.
 

In [1]:
import pandas as pd
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline



from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
colors = ['royalblue','red','deeppink', 'maroon', 'mediumorchid', 'tan', 'forestgreen', 'olive', 'goldenrod', 'lightcyan', 'navy']
vectorizer = np.vectorize(lambda x: colors[x % len(colors)])

import warnings
warnings.filterwarnings(action='ignore',category=DeprecationWarning)
warnings.filterwarnings(action='ignore',category=FutureWarning)

### Week1: Load Data in dataframe for labels and the data 

In [1]:
label = pd.read_csv('/kaggle/input/icmr-health-care/labels.csv',delimiter=',',engine='python')
data = pd.read_csv('/kaggle/input/icmr-health-care/data.csv',delimiter=',',engine='python')
data.describe()

### Week1:- Merge data set 

In [1]:
master_data = pd.merge(label,data)
master_data.head()

In [1]:
master_data.isnull().sum()

In [1]:
master_data.describe()

### Week1:- Plot the merged dataset as a hierarchically-clustered heatmap.

In [1]:
heatmap_data = pd.pivot_table(master_data, index=['Class'])
                              
heatmap_data.head()

In [1]:
sns.clustermap(heatmap_data)
plt.savefig('heatmap_with_Seaborn_clustermap_python.jpg',
            dpi=150, figsize=(8,12))

In [1]:
sns.clustermap(heatmap_data, figsize=(18,12))
plt.savefig('clustered_heatmap_with_dendrograms_Seaborn_clustermap_python.jpg',dpi=150)


### Week1:- Perform Null Hypothesis testing 

### Checking histogram to check if the data is normally distributed 

In [1]:
plt.figure(figsize=(14,6))
plt.hist(master_data['Class'])
plt.show()

In [1]:
non_cat_data = master_data.drop(['Unnamed: 0'], axis=1)
non_cat_data

### Week1:- F Test

F-tests are named after its test statistic, F, which was named in honor of Sir Ronald Fisher. The F-statistic is simply a ratio of two variances. Variances are a measure of dispersion, or how far the data are scattered from the mean. Larger values represent greater dispersion.

Variance is the square of the standard deviation. For us humans, standard deviations are easier to understand than variances because they’re in the same units as the data rather than squared units. However, many analyses actually use variances in the calculations.

F-statistics are based on the ratio of mean squares. The term “mean squares” may sound confusing but it is simply an estimate of population variance that accounts for the degrees of freedom (DF) used to calculate that estimate.

In [1]:
df_f_test=master_data

In [1]:
def f_test(df_f_test,gene):  
    df_anova = df_f_test[[gene,'Class']]
    grps = pd.unique(df_anova.Class.values)
    grps
    d_data = {grp:df_anova[gene][df_anova.Class == grp] for grp in grps}
    F, p = stats.f_oneway(d_data['LUAD'], d_data['PRAD'], d_data['BRCA'], d_data['KIRC'], d_data['COAD'])
    print("p_values:-",p)
    if p<0.05:
        print("reject null hypothesis")
    else:
        print("accept null hypothesis")
        
    return 

In [1]:
f_test(df_f_test,"gene_3")

In [1]:
f_test(df_f_test,"gene_7")

In [1]:
f_test(df_f_test,"gene_20524")

In [1]:
f_test(df_f_test,"gene_5")

In [1]:
f_test(df_f_test,"gene_5")

In [1]:
df_cat_data = master_data
df_cat_data['Class'] = df_cat_data['Class'].map({'PRAD': 1, 'LUAD': 2, 'BRCA': 3, 'KIRC': 4, 'COAD': 5}) 
df_cat_data = df_cat_data.drop(['Unnamed: 0'],axis=1)

### Shapiro test 

#### The null hypothesis for the Shapiro-Wilk test is that a variable is normally distributed in some population. A different way to say the same is that a variable's values are a simple random sample from a normal distribution. As a rule of thumb, we reject the null hypothesis if p < 0.05

In [1]:
from scipy.stats import shapiro
stat, p = shapiro(df_cat_data)
print('stat=%.2f, p=%.30f' %(stat, p))

if p > 0.05:
    print('Normal Distribution')
else:
    print('Not Normal')

### k2test - In statistics, D'Agostino's K2 test, named for Ralph D'Agostino, is a goodness-of-fit measure of departure from normality, that is the test aims to establish whether or not the given sample comes from a normally distributed population

In [1]:
#K2 normality test 
from scipy.stats import normaltest
k2_test = df_cat_data['Class']

stat, p = normaltest(k2_test)
print('stat=%.2f, p=%.30f' %(stat, p))

if p > 0.05:
    print('Normal Distribution')
else:
    print('Not Normal')


## Week 2:- Dimensionality Reduction

#### Project Task: Week 2: 

Dimensionality Reduction:

Each sample has expression values for around 20K genes. However, it may not be necessary to include all 20K genes expression values to analyze each cancer type. Therefore, we will identify a smaller set of attributes which will then be used to fit multiclass classification models. So, the first task targets the dimensionality reduction using various techniques such as,
PCA, LDA, and t-SNE.
Input: Complete dataset including all genes (20531)
Output: Selected Genes from each dimensionality reduction method
 

### Dimensionality Reduction using PCA 

In [1]:
# Define data 
df_pca = master_data.drop(['Unnamed: 0'], axis=1)
df_pca = df_pca.drop(['Class'], axis=1)
df_pca.head()

In [1]:
df_pca.values.shape

In [1]:
x_pca = df_pca.values

### Week2:- Scaling the data using standard scaler method

In [1]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_Scaled = scaler.fit_transform(x_pca)
X_Scaled

### Week2:- Perform PCA with n_components=2

#### Principal Component Analysis, or PCA, is a dimensionality-reduction method that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set.

#### Reducing the number of variables of a data set naturally comes at the expense of accuracy, but the trick in dimensionality reduction is to trade a little accuracy for simplicity. Because smaller data sets are easier to explore and visualize and make analyzing data much easier and faster for machine learning algorithms without extraneous variables to process.

#### So to sum up, the idea of PCA is simple — reduce the number of variables of a data set, while preserving as much information as possible.


In [1]:
# Import PCA from sklearn and define the n_components as 2 
from sklearn.decomposition import PCA
pca_with_2=PCA(n_components=2)

In [1]:
#Perform fit transform on the scaled data
X_pca_with_2 = pca_with_2.fit_transform(X_Scaled)
X_pca_with_2.shape

In [1]:
X_pca_with_2

In [1]:
# Put the data back on the 2 columns defined 
df_pca = pd.DataFrame(X_pca_with_2)
df_pca.columns = ['pca1','pca2']

# Add the convereted categorical data for 
df_pca['cancer_type']=df_cat_data['Class']
df_pca

In [1]:
# Present the data on the 5 clusters using seaborn maps 
sns.scatterplot(x='pca1',y='pca2', hue = 'cancer_type',data=df_pca)

### Week2:- PCA with n_components=.995

In [1]:
pca_with_995=PCA(.995)
X_pca_with_995 = pca_with_995.fit_transform(x_pca)
X_pca_with_995.shape
X_pca_with_995

In [1]:
df_pca_995 = pd.DataFrame(X_pca_with_995)
df_pca_995['cancer_type']=df_cat_data['Class']
df_pca_995

In [1]:
sns.scatterplot(x=0,y=1,hue = 'cancer_type', data=df_pca_995)

### Week2:- Dimensionality reduction using  TSNE

T-SNE is a tool to visualize high-dimensional data. It converts similarities between data points to joint probabilities and tries to minimize the Kullback-Leibler divergence between the joint probabilities of the low-dimensional embedding and the high-dimensional data. t-SNE has a cost function that is not convex, i.e. with different initializations we can get different results.



In [1]:
df_tsne_data = master_data
non_numeric = ['Unnamed: 0','Class']
df_tsne_data = df_tsne_data.drop(non_numeric, axis=1)
df_tsne_data

In [1]:
#import T-SNE from sklearn
from sklearn.manifold import TSNE
m = TSNE(learning_rate=50)

In [1]:
tnse_features = m.fit_transform(df_tsne_data)
tnse_features[1:4,:]

In [1]:
df_tsne_data['x'] = tnse_features[:,0]
df_tsne_data['y'] = tnse_features[:,1]

import seaborn as sns
sns.scatterplot(x='x',y='y',data=df_tsne_data)
plt.show()

In [1]:
df_tsne_data['cancer_type']=df_cat_data['Class']
sns.scatterplot(x='x',y='y',hue = 'cancer_type', data=df_tsne_data)
plt.show()

### Week2:- Dimensionality reduction using LDA 

#### Linear Discriminant Analysis, or LDA for short, is a predictive modeling algorithm for multi-class classification. It can also be used as a dimensionality reduction technique, providing a projection of a training dataset that best separates the examples by their assigned class.

#### The ability to use Linear Discriminant Analysis for dimensionality reduction often surprises most practitioners.

In [1]:
df_lda = master_data.drop(['Unnamed: 0'], axis=1)
df_lda = df_lda.drop(['Class'], axis=1)
x_lda = df_lda
x_lda

In [1]:
x_lda.shape

In [1]:
y_lda = master_data['Class']
y_lda.values

In [1]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
lda = LDA(n_components=2)
x_r2 = lda.fit(x_lda,y_lda).transform(x_lda)

In [1]:
lda.explained_variance_ratio_

In [1]:
x_r3 = pd.DataFrame(data=x_r2)
x_r3['y']=y_lda
x_r3

In [1]:
sns.scatterplot(x=0,y=1,hue = 'y', data=x_r3)

## Project Task: Week 3: Clustering Genes and Samples:

#### Project Task: Week 3: 

Clustering Genes and Samples:

Our next goal is to identify groups of genes that behave similarly across samples and identify the distribution of samples corresponding to each cancer type. Therefore, this task focuses on applying various clustering techniques, e.g., k-means, hierarchical, and mean-shift clustering, on genes and samples.

 

First, apply the given clustering technique on all genes to identify:

Genes whose expression values are similar across all samples

Genes whose expression values are similar across samples of each cancer type 

 

Next, apply the given clustering technique on all samples to identify:

Samples of the same class (cancer type) which also correspond to the same cluster

Samples identified to be belonging to another cluster but also to the same class (cancer type)

 


### KMEANS Clustering with PCA = 2

In [1]:
from sklearn.cluster import KMeans
clusters = KMeans(5, n_init = 5)
clusters.fit(X_pca_with_2)

clusters.labels_

In [1]:
pca_with_2_data_frame = pd.DataFrame(data=X_pca_with_2,columns=['pca1','pca2'])
pca_with_2_data_frame.head()

In [1]:
pca_with_2_data_frame['Cls_label'] = clusters.labels_
pca_with_2_data_frame['given_cancer_type'] = label.Class.values
pca_with_2_data_frame

In [1]:
brca = pca_with_2_data_frame.groupby('given_cancer_type').get_group('BRCA')
brca.Cls_label.value_counts()

In [1]:
luad = pca_with_2_data_frame.groupby('given_cancer_type').get_group('LUAD')
luad.Cls_label.value_counts()

In [1]:
coad = pca_with_2_data_frame.groupby('given_cancer_type').get_group('COAD')
coad.Cls_label.value_counts()

In [1]:
prad = pca_with_2_data_frame.groupby('given_cancer_type').get_group('PRAD')
prad.Cls_label.value_counts()

In [1]:
kirc = pca_with_2_data_frame.groupby('given_cancer_type').get_group('KIRC')
kirc.Cls_label.value_counts()

In [1]:
clusters.cluster_centers_

In [1]:
kmeans = KMeans(n_clusters=5, init='k-means++', max_iter=300, n_init=10, random_state=0)
pred_y = kmeans.fit_predict(X_pca_with_2)
plt.scatter(X_pca_with_2[:,0], X_pca_with_2[:,1])
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=300, c='red')
plt.show()

## KMEANS Clustering with PCA = .995

In [1]:
from sklearn.cluster import KMeans
clusters_995 = KMeans(5, n_init = 5)
clusters_995.fit(X_pca_with_995)
clusters_995.labels_

In [1]:
pca_with_995_data_frame = pd.DataFrame(data=X_pca_with_995)
pca_with_995_data_frame.head()
pca_with_995_data_frame['Cls_label'] = clusters.labels_
pca_with_995_data_frame['given_cancer_type'] = label.Class.values

In [1]:
pca_with_995_data_frame.shape

In [1]:
brca_995 = pca_with_995_data_frame.groupby('given_cancer_type').get_group('BRCA')
brca_995.Cls_label.value_counts()

In [1]:
luad_995 = pca_with_995_data_frame.groupby('given_cancer_type').get_group('LUAD')
luad_995.Cls_label.value_counts()

In [1]:
coad_995 = pca_with_995_data_frame.groupby('given_cancer_type').get_group('COAD')
coad_995.Cls_label.value_counts()

In [1]:
prad_995 = pca_with_995_data_frame.groupby('given_cancer_type').get_group('PRAD')
prad_995.Cls_label.value_counts()

In [1]:
kirc_995 = pca_with_995_data_frame.groupby('given_cancer_type').get_group('KIRC')
kirc_995.Cls_label.value_counts()

In [1]:
kmeans = KMeans(n_clusters=5, init='k-means++', max_iter=300, n_init=10, random_state=0)
pred_y = kmeans.fit_predict(X_pca_with_995)
plt.scatter(X_pca_with_995[:,0], X_pca_with_995[:,1])
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=300, c='red')
plt.show()

## Week 4: Build Classification Models

#### Project Task: Week 4: 

Building Classification Model(s) with Feature Selection:

Our final task is to build a robust classification model(s) for identifying each type of cancer.

Sub-tasks:

Build a classification model(s) using multiclass SVM, Random Forest, and Deep Neural Network to classify the input data into five cancer types

Apply the feature selection algorithms, forward selection, and backward elimination to refine selected attributes (selected in Task-2) using the classification model from the previous step

Validate the genes selected from the last step using statistical significance testing (t-test for one vs. all and F-test)
 

### Build decision tree clasifier

#### Decision Tree is a Supervised Machine Learning Algorithm that uses a set of rules to make decisions, similarly to how humans make decisions. One way to think of a Machine Learning classification algorithm is that it is built to make decisions. You usually say the model predicts the class of the new, never-seen-before input but, behind the scenes, the algorithm has to decide which class to assign.


In [1]:
ml_x = x_lda
ml_y = y_lda
ml_x.shape,ml_y.shape
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(ml_x,ml_y,test_size=0.30,random_state=30)

In [1]:
from sklearn import tree
dt_clf = tree.DecisionTreeClassifier(max_depth=5)
dt_clf.fit(x_train,y_train)
dt_clf.score(x_test,y_test)

y_pred=(dt_clf.predict(x_test))
dt_clf.score(x_test,y_test)

### SVM 

#### Support vector machine algorithm is used to find a hyperplane in an N-dimensional space(N — the number of features) that distinctly classifies the data points.

In [1]:
from sklearn.metrics import accuracy_score
from sklearn.svm import SVC
sv_clf = SVC(probability=True, kernel='linear')
sv_clf.fit(x_train,y_train)
sv_clf.score(x_test,y_test)


y_pred = sv_clf.predict(x_test)
print(accuracy_score(y_test,y_pred))


### Random Forest 

#### Random forest, like its name implies, consists of a large number of individual decision trees that operate as an ensemble. Each individual tree in the random forest spits out a class prediction and the class with the most votes becomes our model’s prediction. The fundamental concept behind random forest is a simple but powerful one — the wisdom of crowds. In data science speak, the reason that the random forest model works so well is: A large number of relatively uncorrelated models (trees) operating as a committee will outperform any of the individual constituent models.

In [1]:
from sklearn import ensemble
rf_clf = ensemble.RandomForestClassifier(n_estimators=100)
rf_clf.fit(x_train,y_train)
rf_clf.score(x_test,y_test)

### Naive Bayes Classifier 

#### A Naive Bayes classifier is a probabilistic machine learning model that’s used for classification task. The crux of the classifier is based on the Bayes theorem.

#### Bayes Theorem:

#### Using Bayes theorem, we can find the probability of A happening, given that B has occurred. Here, B is the evidence and A is the hypothesis. The assumption made here is that the predictors/features are independent. That is presence of one particular feature does not affect the other. Hence it is called naive.

In [1]:
from sklearn.naive_bayes import GaussianNB
gb_clf = GaussianNB()
gb_clf.fit(x_train,y_train)
gb_clf.score(x_test,y_test)

gb_clf = ensemble.GradientBoostingClassifier(n_estimators=40)
gb_clf.fit(x_train,y_train)
gb_clf.score(x_test,y_test)

### KNN Classifier

#### K-nearest neighbors (KNN) algorithm is a type of supervised ML algorithm which can be used for both classification as well as regression predictive problems. However, it is mainly used for classification predictive problems in industry. The following two properties would define KNN well −

In [1]:
from sklearn.neighbors import KNeighborsClassifier
knn_clf = KNeighborsClassifier(n_neighbors=5)
knn_clf.fit(x_train,y_train)
knn_clf.score(x_test,y_test)

## Recurcive Feature Elimination 

In [1]:
# automatically select the number of features for RFE
from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.feature_selection import RFECV
from sklearn.tree import DecisionTreeClassifier
from sklearn.pipeline import Pipeline
# define dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, random_state=1)
# create pipeline
rfe = RFECV(estimator=DecisionTreeClassifier())
model = DecisionTreeClassifier()
pipeline = Pipeline(steps=[('s',rfe),('m',model)])
# evaluate model
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
n_scores = cross_val_score(pipeline, X, y, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')
# report performance
print('Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))

## One way F test 

In [1]:
df_tsne = pd.DataFrame(data=tnse_features,columns=['tsne1','tsne2'])
df_tsne['cancer_type']=label['Class']
df_tsne

In [1]:
df_anova_tsne = df_tsne[['tsne2','cancer_type']]
grps_tsne = pd.unique(df_anova_tsne.cancer_type.values)

d_data = {grp:df_anova_tsne['tsne2'][df_anova_tsne.cancer_type == grp] for grp in grps_tsne}

F, p = stats.f_oneway(d_data['LUAD'], d_data['PRAD'], d_data['BRCA'], d_data['KIRC'], d_data['COAD'])

if p<0.05:
    print("reject null hypothesis")
else:
    print("accept null hypothesis")

In [1]:
df_anova_tsne = df_tsne[['tsne1','cancer_type']]
grps_tsne = pd.unique(df_anova_tsne.cancer_type.values)

d_data = {grp:df_anova_tsne['tsne1'][df_anova_tsne.cancer_type == grp] for grp in grps_tsne}

F, p = stats.f_oneway(d_data['LUAD'], d_data['PRAD'], d_data['BRCA'], d_data['KIRC'], d_data['COAD'])

if p<0.05:
    print("reject null hypothesis")
else:
    print("accept null hypothesis")

## DNN 

The neural network needs to learn all the time to solve tasks in a more qualified manner or even to use various methods to provide a better result. When it gets new information in the system, it learns how to act accordingly to a new situation.

Learning becomes deeper when tasks you solve get harder. Deep neural network represents the type of machine learning when the system uses many layers of nodes to derive high-level functions from input information. It means transforming the data into a more creative and abstract component.

In order to understand the result of deep learning better, let's imagine a picture of an average man. Although you have never seen this picture and his face and body before, you will always identify that it is a human and differentiate it from other creatures. This is an example of how the deep neural network works. Creative and analytical components of information are analyzed and grouped to ensure that the object is identified correctly. These components are not brought to the system directly, thus the ML system has to modify and derive them. 

In [1]:
features=master_data.drop(['Unnamed: 0'],axis=1)
features=features.drop(['Class'],axis=1)
target=master_data['Class']
features.head()

In [1]:
target.head()

In [1]:
f1=features.values

In [1]:
y1 = pd.get_dummies(y_lda)

In [1]:
from sklearn.model_selection import train_test_split

#y1 = pd.get_dummies(Xg_fea.Pos_Neg)

X1_train, X1_valid, y1_train, y1_valid = train_test_split(f1,y1, test_size = 0.10, random_state=42)

In [1]:
X1_train.shape,X1_valid.shape,y1_valid.shape,y1_train.shape

### Define the model 

#### The ReLU function is f(x)=max(0,x). Usually this is applied element-wise to the output of some other function, such as a matrix-vector product. In MLP usages, rectifier units replace all other activation functions except perhaps the readout layer. But I suppose you could mix-and-match them if you'd like. One way ReLUs improve neural networks is by speeding up training. The gradient computation is very simple (either 0 or 1 depending on the sign of x). Also, the computational step of a ReLU is easy: any negative elements are set to 0.0 -- no exponentials, no multiplication or division operations. Gradients of logistic and hyperbolic tangent networks are smaller than the positive portion of the ReLU. This means that the positive portion is updated more rapidly as training progresses. However, this comes at a cost. The 0 gradient on the left-hand side is has its own problem, called "dead neurons," in which a gradient update sets the incoming values to a ReLU such that the output is always zero; modified ReLU units such as ELU (or Leaky ReLU etc.) can minimize this. Source : StackExchange

#### Optimizer is chosen SGD



In [1]:
import tensorflow as tf

In [1]:
#Initialize Sequential model
model = tf.keras.models.Sequential()

#adding layers of inout
model.add(tf.keras.layers.Dense(10000, input_dim=20531, activation='relu', kernel_initializer='he_uniform'))


#Normalize the data
model.add(tf.keras.layers.BatchNormalization())

#Add 1st hidden layer
model.add(tf.keras.layers.Dense(5000, activation='relu'))

#Add 2nd hidden layer
model.add(tf.keras.layers.Dense(2000, activation='relu'))

#Add 3rd hidden layer
model.add(tf.keras.layers.Dense(1000, activation='relu'))

#Add 4th hidden layer
model.add(tf.keras.layers.Dense(500, activation='relu'))

#Add 5th hidden layer
model.add(tf.keras.layers.Dense(200, activation='relu'))

#Add 6th hidden layer
model.add(tf.keras.layers.Dense(100, activation='relu'))

#Add OUTPUT layer
model.add(tf.keras.layers.Dense(5, activation='softmax'))

#Create optimizer with non-default learning rate
sgd_optimizer = tf.keras.optimizers.SGD(learning_rate=0.03)

#Compile the model
model.compile(optimizer=sgd_optimizer, loss='categorical_crossentropy', metrics=['accuracy'])


In [1]:
model.summary()

In [1]:
history = model.fit(X1_train,y1_train,          
          validation_data=(X1_valid,y1_valid),
          epochs=5,
          batch_size=32)

In [1]:
xyz = model.predict(X1_valid)

In [1]:
y_pr=[]
for k in xyz:
    #np.argmax(k)
    #print(np.argmax(k))
    y_pr.append(np.argmax(k))
    
y_val=[]
for k in y1_valid.values:
    #np.argmax(k)
    #print(np.argmax(k))
    y_val.append(np.argmax(k))

In [1]:
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
confusion_matrix(y_val, y_pr)

### Evaluvate the model 

In [1]:
_, train_acc = model.evaluate(X1_train, y1_train, verbose=0)
_, test_acc = model.evaluate(X1_valid, y1_valid, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))

### Plot History 

In [1]:
plt.plot(history.history['accuracy'], label='train')
plt.plot(history.history['val_accuracy'], label='test')
plt.xlabel('# of epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()