# Loading the data

This can be easily done with the Python data manipulation library Pandas. You follow the import convention and import the package under its alias, pd.

Next, you make use of the read_csv() function to read in the CSV files in which the data is stored. Additionally, use the sep argument to specify that the separator, in this case, is a semicolon and not a regular comma.

# Importing the libraries

In [None]:
from pathlib import Path
import numpy as np
import pandas as pd
root = Path("../input")

# Data Exploration

Quick view on train DataFrames:


In [None]:
# Importing the dataset
train = pd.read_csv(root.joinpath("train.csv"))
test = pd.read_csv(root.joinpath("test.csv"))

In [None]:
# Print info on train set
print(train.info())

To be checked data import was successful: double check the data contains all the variables that the data description file of the UCI Machine Learning Repository promised you.
Besides the number of variables, also check the quality of the import are the data types correct? Did all the rows come through? Are there any null values that you should take into account when you’re cleaning up the data?

In [None]:
# First rows of `train` 
train.head()


In [None]:
# Last rows of `train`
train.tail()


In [None]:
# Take a sample of 5 rows of `train`
train.sample(5)


In [None]:
# Describe `train`
train.describe()


describe() offers some summary statistics of train data that can help you to assess your data quality.
You see that some of the variables have a lot of difference in their min and max values.

In [None]:
# Double check for null values in `train`
df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'c', 'e', 'f',
'h'],columns=['one', 'two', 'three'])

df = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])

print(train)


We have double checked null values in train data 


In [None]:
# Specify the train data
X_train = train.iloc[:, 2:202].values
y_train = train.iloc[:, 1].values
X_train

In [None]:
test['target'] = -1
test.describe()

In [None]:
# Test data
X_test = test.iloc[:, 1:201].values
y_test = test.iloc[:, 201].values
X_test    

# Lets work on test data
added traget dummy


In [None]:
test['target'] = -1

# Take a sample of 5 rows of `train`
test.sample(5)


In [None]:
test.describe()


# Specify the test data in X & Y


In [None]:
X_test = test.iloc[:,1:201]
y_test = test.iloc[:,201]


# Feature Scaling
Feature scaling is a way to deal with these values that lie so far apart.


In [None]:
# Import `StandardScaler` from `sklearn.preprocessing`
from sklearn.preprocessing import StandardScaler

# Define the scaler 
scaler = StandardScaler().fit(X_train)

# Scale the train set
X_train = scaler.transform(X_train)

# Scale the test set
X_test = scaler.transform(X_test)

# Model Data
We are ready to move on building neural network to classify target
Model set up by running model = Sequential().
I will have to create a Dense layer, which is a fully connected layer.

In the first layer, the activation argument takes the value relu. Next, the model takes as input arrays of shape (200,).
I will use the first layer has 9 as a first value for the units argument of Dense(), which is the dimensionality of the output space and which are actually 9 hidden units.
it is means that the model will output arrays of shape (*, 9): this is is the dimensionality of the output space.

The intermediate layer also uses the relu activation function. The output of this layer will be arrays of shape (*,9).

I am ending the network with a Dense layer of size 1. The final layer will also use a sigmoid activation function so that output is actually a probability. 


In [None]:
# Import `Sequential` from `keras.models`
from keras.models import Sequential

# Import `Dense` from `keras.layers`
from keras.layers import Dense

# Initialize the constructor
model = Sequential()

# Add an input layer 
model.add(Dense(9, activation='relu', input_shape=(200,)))
# Add an input layer 

# Add one hidden layer 
model.add(Dense(3, activation='relu'))

# Add an output layer 
model.add(Dense(1, activation='sigmoid'))


# See the results of model

In [None]:
# Model output shape
model.output_shape

# Model summary
model.summary()

# Model config
model.get_config()

# List all weight tensors 
model.get_weights()

# Compiling the ANN
 #Lets compile your model and fit the model to the data: once again, make use of compile() and fit() to get this done.

In [None]:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train,epochs=20, batch_size=100, verbose=1)


# Predict Values

In [None]:
y_pred = model.predict(X_test)

# Evaluate Model
I will evaluate the train data becoz do not have y value in test

In [None]:
score = model.evaluate(X_train, y_train,verbose=1)
print(score)


This score has combination of the loss and the accuracy. 

# Submission File creation

In [None]:
submission = pd.read_csv(root.joinpath("sample_submission.csv"))
submission['target'] = y_pred
submission.to_csv("submission.csv",index=False)

In [None]:
# add timestamp to submission
from IPython.display import FileLink
FileLink(f'submission.csv')