## First Notes

Please do not run the file as whole, as some cells include some irrelevant code that was created for testing some examples. The cells to be ignored are noted by a **BEGIN:** and **END;** notations.

### Modules Import

First, let's go through the imports we'll need for this session. I’ll need RDkit for molecular conversion and fingerprints calculation, numpy for and data management and numpy arrays, Scikit-learn for standardisation and data set splitting as well as Keras for the neural network building and training.

In [1]:
from __future__ import print_function
from rdkit import Chem

### Reading data

Reading a set of molecules an `SDMolSupplier` from rdkit.<br>
In the [getting started](https://www.rdkit.org/docs/GettingStartedInPython.html) documentation there're two these two different suppliers, but I haven't looked up the difference yet:
* `rdkit.Chem.rdmolfiles.SDMolSupplier`
* `rdkit.Chem.rdmolfiles.SmilesMolSupplier`

In [2]:
## Read the file
supplier = Chem.SDMolSupplier('data/cas_4337.sdf')

In [3]:
type(supplier)

rdkit.Chem.rdmolfiles.SDMolSupplier

In [4]:
mol = supplier[97]
mol
# list(mol.GetPropNames())
mol.GetProp("Ames test categorisation")

'mutagen'

In [9]:
## for each molecule, get the number of atoms in it
for mol in supplier:
    print(mol.GetNumAtoms())

25
13
20
10
10
14
18
24
10
24
18
26
18
13
9
17
13
23
6
10
11
13
14
12
21
9
13
13
16
28
21
30
10
10
17
19
14
17
15
11
17
39
18
16
22
17
10
14
5
21
22
4
29
14
20
8
15
18
13
23
9
12
15
9
12
13
8
22
26
13
8
14
19
6
14
14
20
9
12
17
9
21
18
11
18
9
24
51
16
19
13
18
3
21
19
21
12
40
27
18
16
15
8
27
7
26
13
20
12
7
10
9
21
21
13
11
9
22
4
8
5
21
9
20
15
13
37
7
25
14
15
22
16
6
22
10
17
21
21
7
8
15
6
14
25
21
16
10
17
13
22
16
19
29
15
25
22
5
37
33
10
17
102
21
21
18
18
20
11
21
14
11
9
9
23
21
25
8
14
5
14
25
5
12
18
39
18
10
24
13
11
19
13
22
23
19
19
17
20
5
20
19
14
19
8
9
11
18
16
21
8
36
23
14
16
4
12
6
15
20
11
10
20
25
15
6
17
20
18
13
18
16
8
22
13
10
7
25
23
9
16
14
13
9
5
5
18
12
17
20
18
10
12
21
34
3
9
18
7
35
25
15
10
9
16
22
38
7
11
9
19
24
5
12
8
8
20
22
6
11
13
23
12
23
18
20
11
14
20
6
20
40
11
43
16
52
18
13
23
11
13
12
11
18
7
8
10
19
11
8
11
26
9
11
11
16
21
18
15
13
22
18
13
13
19
9
14
14
19
15
5
10
14
10
10
10
16
12
15
16
13
19
9
19
17
15
10
20
16
21
8
12
14
14
22
1

9
5
7
18
16
9
7
12
17
10
19
16
12
24
13
22
15
12
19
7
15
9
14
22
21
11
11
17
25
18
10
15
5
8
17
11
11
6
15
18
5
11
9
36
7
20
23
14
12
11
24
23
11
13
17
18
17
19
19
14
18
12
16
9
15
11
7
17
20
6
12
16
23
17
16
9
14
12
12
8
11
21
15
15
19
15
20
11
12
20
33
28
6
19
19
9
13
6
13
9
108
13
2
4
24
17
17
8
17
24
16
19
17
13
12
19
10
20
22
12
14
7
18
14
24
32
27
7
11
17
10
15
20
49
26
4
30
17
19
12
9
17
12
19
9
10
16
41
20
19
19
15
18
21
5
24
8
13
21
21
11
9
19
11
18
11
30
20
12
6
26
37
15
16
22
4
18
5
26
17
28
17
14
22
8
30
18
10
17
24
15
26
20
15
16
18
11
12
13
15
18
17
14
8
61
28
17
14
11
19
16
6
21
11
10
9
7
20
13
22
19
6
17
19
13
15
19
23
7
12
15
20
12
24
9
29
15
22
6
11
12
10
25
18
18
12
24
19
7
17
30
16
16
17
21
21
14
23
13
9
14
18
24
18
11
14
7
28
14
12
23
8
14
13
6
16
8
11
24
14
17
13
20
20
20
12
19
27
22
19
20
23
19
24
14
5
21
4
7
12
6
13
16
11
19
20
21
17
10
17
9
16
25
19
10
7
12
12
17
14
11
22
19
24
13
12
46
26
19
12
31
15
18
18
7
17
28
9
17
17
32
11
16
14
16
32
15
13
11
26
14
19
21

We have 4337 molecules in the data set:

In [5]:
len(supplier)

4337

A good practice is to test each molecule to see if it was correctly read before working with it:

In [6]:
for mol in supplier:
    if mol is None:
        print("a None molecule was found!")
        
## if the output is empty the data is fine.

<br>
<br>
<br>
<br>
<br>
<br>

---
*The following code can be ignored, cause this was generated for my own interest and testing.*

---
BEGIN:

In [54]:
m = Chem.MolFromSmiles('C1OC1')
for atom in m.GetAtoms():
    print(atom.GetAtomicNum())

print(m.GetBonds()[1].GetBondType())

6
8
6
SINGLE


END;

----
<br>
<br>
<br>
<br>
<br>
<br>

### Calculating Morgan fingerprints

We're going to go trough two steps:<br>

First, we'll obtain the training samples, which are going to be the bits value returned by the function `AllChem.GetMorganFingerprintAsBitVect`.<br>This is typically the output of calculating the morgan fingerprints for each molecule. Please refer to the subsection '**Explaiing bit from Morgan Fingerprints**' to understand the output.<br>
After that we'd only need to turn it to a numpy array, because according to Keras's documentation: "Training set is expected to be a numpy array (if the model has a single input), or list of Numpy arrays (if the model has multiple inputs)."

Second, extracting the labels for each molecule where a Morgan fingerprint can be located.

#### Training Samples

In [7]:
import numpy as np
from rdkit.Chem import AllChem

> **\_\_TASK\_\_:** for each molecule calculate MorganFingerprints (with radius <b>3</b>) and size **~2048** (rdkit has also a nice easy function for that)

Numpy array of training data (if the model has a single input), or list of Numpy arrays (if the model has multiple inputs).

In [8]:
info = {} # this variable will be mutated in the next function (very similar to pointers in C)
# fingerprints = [list(AllChem.GetMorganFingerprint(mol, 3, bitInfo=info)) for mol in supplier]

## calculate the Morgan Fingerprints for every molecule in the supplier
fingerprints = [AllChem.GetMorganFingerprintAsBitVect(mol, 3, nBits=2048, bitInfo=info) for mol in supplier]
# fingerprints=[list(fp) for fp in fingerprints_asBitVector]
fingerprints = np.array(fingerprints)

In [9]:
fingerprints.shape

(4337, 2048)

##### Atoms contributing to the activation of Morgan fingerprints

> **\_\_TASK\_\_:** Important! when you calculate the Fingerprints, save which atoms where responsible for the activation of the fingerprint (rdkit can also do that)

###### `bitsInfo` - Explaining bits from Morgan Fingerprints

Information is available about the atoms that contribute to particular bits in the Morgan fingerprint via the bitInfo argument. The dictionary provided is populated with one entry per bit set in the fingerprint, <u>the keys are the bit ids</u>, <u>the values are lists of (atom index, radius) tuples</u>.

And there we go. Now have succefully extracted the training samples that are ready to be passed to the model.

So let's have a look on the dictionary in `info`:

In [72]:
info

{97: ((14, 0),),
 191: ((10, 1),),
 263: ((4, 2),),
 314: ((7, 1), (9, 1)),
 325: ((15, 3),),
 336: ((2, 2), (3, 2)),
 389: ((17, 2),),
 484: ((6, 3), (8, 3)),
 606: ((11, 2),),
 650: ((5, 0), (7, 0), (9, 0)),
 689: ((1, 1),),
 703: ((12, 2), (13, 2)),
 807: ((1, 0),),
 811: ((6, 2), (8, 2)),
 856: ((11, 3),),
 905: ((0, 3),),
 993: ((0, 2),),
 1019: ((0, 0),),
 1034: ((2, 1), (3, 1)),
 1060: ((6, 1), (8, 1)),
 1077: ((10, 2),),
 1088: ((15, 1), (16, 1), (17, 1)),
 1114: ((6, 0), (8, 0)),
 1152: ((4, 0),),
 1199: ((15, 2), (16, 2)),
 1216: ((1, 3),),
 1327: ((10, 3),),
 1380: ((2, 0), (3, 0), (10, 0), (11, 0)),
 1460: ((12, 3),),
 1642: ((4, 3),),
 1645: ((11, 1),),
 1682: ((1, 2),),
 1717: ((14, 1),),
 1750: ((12, 1), (13, 1)),
 1771: ((0, 1),),
 1816: ((4, 1),),
 1873: ((12, 0), (13, 0), (15, 0), (16, 0), (17, 0)),
 1917: ((5, 1),),
 1947: ((17, 3),),
 2037: ((2, 3),)}

<br>

Interpreting the results above:

```
97: ((14, 0),),
191: ((10, 1),),
314: ((7, 1), (9, 1)),
```

* bit 97 is set once: by atom 14 with radius 0.
* bit 191 is set also once: by atom 10, with radius 1.
* bit 314, on the other hand, is set twice: once by atom 7 and once with atom 9, each with radius 1.


<br>
<br>
<br>
<br>

---
*The following code can be ignored, cause this was generated for my own interest and testing.*

---
BEGIN:

In the following I'd like to draw one molecule in order to locate the Morgan fingerprint, together with the atoms and radius. So let's draw the molecule first, then let's find out what the following values correspond to:<br>
The first entry in the `info` dictionary is `97: ((14, 0),),`, so we'll see what that means.<br>
Lastly, I'll repeat the same process but with molecule number 314 whose fingerprints are `314: ((7, 1), (9, 1)),`


In [10]:
from rdkit.Chem import Draw

END;

----

<br>
<br>
<br>
<br>
<br>
<br>


#### Training Labels

Now that we've got the the bits ready as training samples (i.e. features) for the model, we still need to extract the label to have a target for our predictions.

The property that we want to extract out of the list of prooperties, is whether or not the 'Ames test Categorisation' is a mutagen.
* 1 for mutagen 
* 0 for nonmutagen

In [11]:
training_labels = []
for mol in supplier:
    if mol.GetProp("Ames test categorisation") == "mutagen":
        training_labels.append(1)
    else:
        training_labels.append(0)

training_labels = np.array(training_labels)

In [12]:
len(training_labels)

4337

Now, we've got the whole training set ready. Let's move on to extracting the validation set.

#### Validation Set

In order to select a 'valid' validation, we have randomly picked the following indices of molecules.<br>
Additionally, we'd still need to see how many fingerprints are expressed in this set of molecules.

In the following, I'll be using the alias `val_`, for anything that's related to the validation set, in order to avoid length variables names.

In [13]:
## KAREEM: these molecules I got from Kristina!
val_mol_ids = [6,   10,   29,   32,   42,   58,   72,   83,   98,  100,  128, 
        145,  148,  168,  171,  205,  208,  237,  244,  285,  290,  291,
         300,  312,  332,  334,  335,  347,  356,  369,  371,  377,  407,
         424,  456,  458,  470,  472,  486,  514,  515,  528,  557,  563,
         599,  610,  616,  628,  640,  701,  704,  722,  764,  794,  818,
         821,  840,  850,  856,  859,  874,  878,  882,  898,  901,  925,
         936,  945,  957,  974,  977, 1013, 1019, 1030, 1038, 1047, 1049,
        1072, 1073, 1100, 1159, 1168, 1187, 1190, 1194, 1201, 1202, 1233,
        1247, 1258, 1264, 1273, 1283, 1288, 1300, 1302, 1319, 1339, 1349,
        1402, 1413, 1416, 1422, 1426, 1435, 1454, 1465, 1483, 1502, 1513,
        1515, 1520, 1548, 1576, 1604, 1606, 1621, 1650, 1695, 1696, 1711,
        1714, 1716, 1725, 1743, 1746, 1752, 1780, 1788, 1794, 1799, 1813,
        1826, 1866, 1886, 1901, 1903, 1921, 1929, 1940, 1969, 1970, 1997,
        1998, 2008, 2010, 2011, 2018, 2023, 2046, 2060, 2064, 2080, 2081,
        2131, 2171, 2182, 2203, 2212, 2224, 2231, 2241, 2246, 2283, 2294,
        2295, 2297, 2327, 2329, 2331, 2349, 2357, 2360, 2365, 2397, 2413,
        2417, 2418, 2421, 2448, 2467, 2510, 2516, 2528, 2533, 2549, 2562,
        2601, 2604, 2606, 2609, 2611, 2632, 2644, 2653, 2677, 2682, 2685,
        2692, 2703, 2708, 2714, 2719, 2726, 2732, 2759, 2761, 2776, 2780,
        2817, 2818, 2829, 2837, 2857, 2858, 2884, 2899, 2902, 2905, 2911,
        2939, 2975, 2977, 2986, 3007, 3009, 3018, 3024, 3038, 3066, 3087,
        3098, 3107, 3117, 3122, 3139, 3157, 3161, 3164, 3217, 3223, 3233,
        3263, 3265, 3271, 3290, 3295, 3307, 3313, 3317, 3321, 3382, 3384,
        3388, 3400, 3409, 3412, 3419, 3423, 3449, 3470, 3487, 3488, 3503,
        3509, 3511, 3539, 3562, 3626, 3637, 3654, 3662, 3663, 3668, 3671,
        3688, 3689, 3695, 3710, 3726, 3743, 3744, 3782, 3791, 3794, 3808,
        3809, 3841, 3849, 3874, 3910, 3912, 3925, 3945, 3950, 3958, 3959,
        3962, 3964, 3967, 3978, 3993, 4009, 4010, 4055, 4057, 4085, 4089,
        4096, 4099, 4107, 4112, 4129, 4135, 4151, 4155, 4196, 4209, 4216,
        4234, 4236, 4251, 4267, 4283, 4317, 4326, 4335

]
val_molecules = [supplier[i] for i in val_mol_ids]

In [14]:
len(val_molecules)

327

In [15]:
val_info = {}
val_fingerprints = [AllChem.GetMorganFingerprintAsBitVect(mol, 3, nBits=2048, bitInfo=val_info) for mol in val_molecules]

In [16]:
val_labels = []
for mol in val_molecules:
    if mol.GetProp("Ames test categorisation") == "mutagen":
        val_labels.append(1)
    else:
        val_labels.append(0)

In [17]:
len(val_labels)

327

The following is an insignificant change of the data. The atempt here is create a tuples of both validation fingerprints and label put together.

In [192]:
# val_set = [tuple(fingerprints[i], val_labels[i]) for i range(len(fingerprints))]
val_set = []
for i in range(len(val_fingerprints)):
    val_set.append((val_fingerprints[i], val_labels[i]))
    
val_set = np.array(val_set)
val_set.shape

(327, 2)

In [135]:
## here I'll try to recreate the vlidation set, I think it's a lot easier than I thought
val_set = (val_mol_ids, val_labels)
len(val_set)
# val_set = [(val_mol_ids[i], val_labels[i]) for i in range(len(val_labels))]
# val_set

2


<br>
<br>
<br>
<br>
<br>
<br>


## Creating The model

In [26]:
# to suppress the FutureWarning: conversion of the second argument of issubdtype 
# from 'float' to 'np.floatin' is deprecated
import os

# importing all libraries that we'd need
from keras import backend as K
from keras.models import Sequential
from keras.layers import Activation
from keras.layers.core import Dense
from keras.optimizers import Adam
from keras.optimizers import SGD
from keras.metrics import categorical_crossentropy

from sklearn import cross_validation
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import roc_auc_score # Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores

From before we know that our data shape is `(4337, 2048)`

Standard scaling is applied now, because the model cannot learn on som arbitrary numbers.

In [20]:
#Scale fingerprints to unit variance and zero mean
st = StandardScaler()
scaled_fingerprints= st.fit_transform(fingerprints)



Additionally, I'd like to scale down the validation fingerprints too.

In [25]:
scaled_val_fingerprints = st.fit_transform(val_fingerprints)

The first layer must get an input dimensions matching the data $=2048$, whereas the following can deduce their input size from the previous layer. For instance, the output size of the firs layer in my model is 5, beginning with the second layer, I don't need to specify the input size, because the layer can deduce that alone.

In [21]:
model = Sequential()
model.add(Dense(output_dim=5, input_dim=scaled_fingerprints.shape[1]))
## Kristina wouldn't use sigmoid, either relu or selu 
model.add(Activation("relu"))
model.add(Dense(output_dim=1))
model.add(Activation("linear"))

  
  """


And I used the fasted learning rate possible because running 500 epochs takes relatively long time $(\approx1.5\,min)$ on my 2017 machine. For optimizers, I used standard stochastic gradient descent (**SGD**), which <...>.

In [30]:
model.compile(loss='mean_squared_error', optimizer=SGD(lr=0.01, momentum=0.9, nesterov=True), metrics=["accuracy"])

In [31]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 5)                 10245     
_________________________________________________________________
activation_1 (Activation)    (None, 5)                 0         
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 6         
_________________________________________________________________
activation_2 (Activation)    (None, 1)                 0         
Total params: 10,251
Trainable params: 10,251
Non-trainable params: 0
_________________________________________________________________


In order for us to predict the <>, we used the function `sklearn.metrics.roc_auc_score`, which computes Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.
Note: this implementation is restricted to the binary classification task or multilabel classification task in label indicator format.

In [22]:
# history = model.fit(scaled_fingerprints, training_labels, validation_data=val_set.all(), epochs=500, batch_size=32, verbose=2)
## the best sofar
# history = model.fit(scaled_fingerprints, training_labels, validation_split=0.2, epochs=100, shuffle=True, batch_size=32, verbose=2)
# history = model.fit(scaled_fingerprints, training_labels, validation_data=(val_fingerprints, val_labels), epochs=500, batch_size=32, verbose=1)
#           callbacks=[TestCallback((X_test, Y_test))])

Train on 3469 samples, validate on 868 samples
Epoch 1/100
 - 1s - loss: 0.2146 - acc: 0.7068 - val_loss: 0.1750 - val_acc: 0.7454
Epoch 2/100
 - 0s - loss: 0.1040 - acc: 0.8746 - val_loss: 0.1702 - val_acc: 0.7742
Epoch 3/100
 - 0s - loss: 0.0831 - acc: 0.9080 - val_loss: 0.1744 - val_acc: 0.7811
Epoch 4/100
 - 0s - loss: 0.0708 - acc: 0.9291 - val_loss: 0.1771 - val_acc: 0.7753
Epoch 5/100
 - 0s - loss: 0.0618 - acc: 0.9429 - val_loss: 0.1891 - val_acc: 0.7604
Epoch 6/100
 - 0s - loss: 0.0550 - acc: 0.9504 - val_loss: 0.1892 - val_acc: 0.7592
Epoch 7/100
 - 0s - loss: 0.0496 - acc: 0.9602 - val_loss: 0.1997 - val_acc: 0.7592
Epoch 8/100
 - 0s - loss: 0.0448 - acc: 0.9683 - val_loss: 0.2026 - val_acc: 0.7569
Epoch 9/100
 - 0s - loss: 0.0411 - acc: 0.9706 - val_loss: 0.2088 - val_acc: 0.7615
Epoch 10/100
 - 0s - loss: 0.0376 - acc: 0.9729 - val_loss: 0.2145 - val_acc: 0.7523
Epoch 11/100
 - 0s - loss: 0.0343 - acc: 0.9767 - val_loss: 0.2201 - val_acc: 0.7523
Epoch 12/100
 - 0s - loss: 

Epoch 97/100
 - 0s - loss: 9.0887e-04 - acc: 0.9997 - val_loss: 0.3673 - val_acc: 0.6452
Epoch 98/100
 - 0s - loss: 8.9581e-04 - acc: 0.9997 - val_loss: 0.3677 - val_acc: 0.6475
Epoch 99/100
 - 0s - loss: 8.8855e-04 - acc: 0.9997 - val_loss: 0.3675 - val_acc: 0.6498
Epoch 100/100
 - 0s - loss: 8.4132e-04 - acc: 0.9997 - val_loss: 0.3688 - val_acc: 0.6440


The recommendation of Kristina:<br>
Use one `fit` and one `predict` at a time and instead of running multiple epochs, you can use a for loop in order for you validate the model.

In [32]:
for epoch in range(100):
    model.fit(scaled_fingerprints, training_labels, batch_size=32, epochs=1)
    predictions = model.predict(scaled_val_fingerprints)
    auc = roc_auc_score(val_labels, predictions, average="samples")

Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1


Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1


In [38]:
auc
## OUTPUT: 0.9958941951108756

0.9958941951108756

In the previous model we use ....,.... and got a $0.99589$ auc score, which is relatively a good score. However, we'd still need to see what parameters play the biggest role in order for this model to learn. i.e. what parameters have biggest influence on updating the weights of this model. By doing so, we are also able to understand more about the data.

This model was heavily inspired by the example of [Esben Jannik Bjerrum](https://www.linkedin.com/pulse/molecular-neural-network-models-rdkit-keras-python-bjerrum) on [prediction of molecular properties using Python, RDkit and Keras](http://www.wildcardconsulting.dk/useful-information/molecular-neural-network-models-with-rdkit-and-keras-in-python/). However, in the example Esben gave, he ...., eventually his Neural Network RMS score was $0.855494851649$. So we can realize that ... .
<br>
<br>
<br>

## Second Model

Let's take a look on what things that are featurign this model. This model has ... learning rate instead of .... and instead .. and unline in the previous one it has ... instead .... . So how does this model comapre to the previous one in terms of accuracy, prediction, and auc score.

The following is a table that shows the comparison of all models. Learning rate, model architecture and and wether or not Dropout was used, are all depicted for each model:

|Model_nr|Learning Rate|Dropout|Nom of layers |Activation Functions|       OutputLayer        |
|--------|-------------|-------|--------------|--------------------|--------------------------|
|1       |0.01         |no     |1 hidden layer|relu                |one-dimensional output dense with linear activation|
|2       |

Additionally, you can refer to the model's summary by typing `model.summary()` to see the architecture of the model together with the number of trainable parameters.

<br>
<br>
<br>
<br>
<br>
<br>

## Conclusion

Larger networks will overfit, unless some form of regularization is put in place (such as early stopping or drop out). As the output values are continuous rather than class labels, the output dimension is a single neuron with a linear activation.

<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>



---
<div style="text-align: center;"> <b>Please ingore everything below this here<b>
    
    |
    |
    |
    |
    |
    ٧
</div>

---


In [169]:
model = Sequential([
    Dense(16, input_shape=(4337, 2048,), activation='relu'),
    Dense(32, activation='relu'),
    Dense(2, activation='sigmoid')
])

In [170]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_43 (Dense)             (None, 4337, 16)          32784     
_________________________________________________________________
dense_44 (Dense)             (None, 4337, 32)          544       
_________________________________________________________________
dense_45 (Dense)             (None, 4337, 2)           66        
Total params: 33,394
Trainable params: 33,394
Non-trainable params: 0
_________________________________________________________________


In [171]:
model.compile(Adam(lr=.001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

In [194]:
model.fit(fingerprints, training_labels, validation_data=val_set.all(), batch_size=20, epochs=20, shuffle=True, verbose=3)

Epoch 1/20
 - 1s - loss: 0.2355
Epoch 2/20
 - 1s - loss: 0.2149
Epoch 3/20
 - 1s - loss: 0.1989
Epoch 4/20
 - 1s - loss: 0.1866
Epoch 5/20
 - 1s - loss: 0.1763
Epoch 6/20
 - 1s - loss: 0.1674
Epoch 7/20
 - 1s - loss: 0.1594
Epoch 8/20
 - 1s - loss: 0.1524
Epoch 9/20
 - 1s - loss: 0.1460
Epoch 10/20
 - 1s - loss: 0.1406
Epoch 11/20
 - 1s - loss: 0.1357
Epoch 12/20
 - 1s - loss: 0.1312
Epoch 13/20
 - 1s - loss: 0.1271
Epoch 14/20
 - 1s - loss: 0.1234
Epoch 15/20
 - 1s - loss: 0.1199
Epoch 16/20
 - 1s - loss: 0.1166
Epoch 17/20
 - 1s - loss: 0.1137
Epoch 18/20
 - 1s - loss: 0.1106
Epoch 19/20
 - 1s - loss: 0.1081
Epoch 20/20
 - 1s - loss: 0.1055


<keras.callbacks.History at 0x1a52a866a0>