<a href="https://colab.research.google.com/github/jPruim/cs344/blob/master/finalproject/report.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Vision:**

The goal of this project is to use neural networks to try and solve common linear algebra problems. This will involve using neural nets and CNN's to try and see how few nodes are necessary to solve some problems, and how accurately I can find these values.

Specifically I hope to investigate determinants and eigenvalues for matrices. Determinants are single unique values for a given matrix. A determinant can describe properties such as whether the matrix is invertable or not. Should the determinant be zero, then there is no inverse of the matrix. This creates a whole area of linear algebra where people investigate extensions of inverses to all matrices (e.g. Penrose-Moore inverse). Eigenvalues are a very common measurement use in physics. It becomes important to get eigenvalues from matrices to see how forcing changes a system. Eigenvalues are also the basis of many different types of matrix decompositions. Similarly, eigenvalues are prevalent throughout the solving of various differential equations. Overall, they might be the single most interesting derived values from a given matrix. 

This project will be used to determine how well a NN can approximate these values. Mostly, this project is to satisfy personal curiosity in the complexity of finding these values. If they prove hard or impossible to approximate, it shows a more significant difference between the creativity of the human mind and the computational power of a computer.

**Background:**

Mostly this project will be a question of feasability and optimizing. Thus, the project will focus on simple neural networks and convolutional neural networks. Information about simply neural networks can be found in Artificial Intelligenc: A Modern Approach by Suart Russel and Peter Norvig (chapter 18). For an example of Convolutional Neural Networks visit: https://github.com/kvlinden-courses/cs344-code/blob/master/u10nn/keras-cnn.ipynb created by kvlinden.

Some of the work in the project was also with different activation functions. For reason's elaborated on below, ReLU activation functions did not work well for this project. Thus, some attempts were made with varying other types of activation functions.

Specifically, the project was tested with both Leaky ReLU and Tanh activation functions. Basic ReLU functions have a derivative of zero for negative values, and a derivative of 1 on positive values. Leaky ReLU's have a positive gradient on negative values (though still less than 1). This then allows models to attribute error when neurons do not ever activate. Tanh is hyperbolic tangent. This type of activation function has an assymptote towards a zero gradient as the value gets negative, but never truly reaches zero gradient. Thus, there will always be some editing of values to the nodes that do not activate (though it becomes nearly negligable if too negative).

For more information on Leaky ReLU's visit: https://medium.com/@danqing/a-practical-guide-to-relu-b83ca804f1f7

For more information on Tanh activation functions visit: https://towardsdatascience.com/complete-guide-of-activation-functions-34076e95d044

**Implementaion:**

The first model attempted was a simple neural network designed to work with square matrices of size n=7. The data was created using numpy random with a set size of 40000. The structure is shown below.

In [0]:
#start modeling.
#determinant modeling 1 with simply ReLu 
#6 computation layers  (and then the final layer)
detmodel = models.Sequential()
detmodel.add(layers.Dense(n**2, activation='relu',input_dim=n**2))
detmodel.add(layers.Dense(50, activation='relu'))
detmodel.add(layers.Dense(50, activation='relu'))
detmodel.add(layers.Dense(50, activation='relu'))
detmodel.add(layers.Dense(50, activation='relu'))
detmodel.add(layers.Dense(50, activation='relu'))
# model.add(layers.Dense(50, activation='relu'))
detmodel.add(layers.Dense(1, activation='relu'))
detmodel.compile(optimizer='adam',
              loss='mean_squared_error')
detmodel.summary()


After doing some evaluation on the success of the neural network, it became clear that there are some problems. Most notably, the loss flattens out at a relatively high value (after 2-3 epochs). Then with some further investigation, it becomes apparent that the model is only (or almost always) returning 0. This means that the ReLU's are creating a dead neural network.

Some more investigation into Dead ReLU's revealed this:
"Unfortunately, ReLU units can be fragile during training and can "die". For example, a large gradient flowing through a ReLU neuron could cause the weights to update in such a way that the neuron will never activate on any datapoint again. If this happens, then the gradient flowing through the unit will forever be zero from that point on. That is, the ReLU units can irreversibly die during training since they can get knocked off the data manifold. For example, you may find that as much as 40% of your network can be "dead" (i.e. neurons that never activate across the entire training dataset) if the learning rate is set too high. With a proper setting of the learning rate this is less frequently an issue." (CS231 Stanford)

Due to this issue, I followed the advice of other projects found here: https://github.com/keras-team/keras/issues/3687. The advice was to replace ReLU's with Leaky ReLU functions or Tanh activation functions. This project only found significant success with leaky ReLU's.

This was then the second model:

In [0]:
#start modeling.
#determinant 
detmodel2 = models.Sequential()

detmodel2.add(layers.Dense(n**2, activation=LeakyReLU(alpha=0.3),input_dim=n**2))
detmodel2.add(layers.Dense(n**3, activation=LeakyReLU(alpha=0.3)))
detmodel2.add(layers.Dense(n**3, activation=LeakyReLU(alpha=0.3)))
detmodel2.add(layers.Dense(n**2, activation=LeakyReLU(alpha=0.3)))
detmodel2.add(layers.Dense(n**2, activation=LeakyReLU(alpha=0.3)))
# detmodel2.add(layers.Dense(50, activation='relu'))
# detmodel2.add(layers.Dense(50, activation='relu'))
detmodel2.add(layers.Dense(1, activation=LeakyReLU(alpha=0.5)))
detmodel2.compile(optimizer='adam',
              loss='mean_squared_error')
detmodel2.summary()
# this has 156,948 parameters

When this model was trained for 300 epochs, it managed to get an averaged squared error of 0.0665 on the training data. This looks great, but as expected with this many nodes, the model is overfitting to the training set. The L2 loss on the validation was 4.2. This is worse than just blindly guessing zero (that would be a loss of around 2.2 depending on the specific instance of the random dataset).

Thus the next step was to elminate nodes and possibly limit epochs to reduce overfitting. However, whenever a model was found that successfully seemed to approach the training set determinants, the validation set determinants were horrible. Thus, the project was expanded to include Convolution neural networks. The hope was that similar to how determinants depend on submatrices, it could be that convolution neural networks could do a better job.

In [0]:
#CNN model for determinant
detCNNmodel = models.Sequential()
detCNNmodel.add(layers.Conv2D((n-2)**2, (5, 5), activation=LeakyReLU(alpha=0.3), input_shape=(n, n,1)))
# Add layers to flatten the 2D image and then normal leaky ReLU layers
detCNNmodel.add(layers.Flatten())
detCNNmodel.add(layers.Dense(n**2, activation=LeakyReLU(alpha=0.3)))
detCNNmodel.add(layers.Dense(n**2, activation=LeakyReLU(alpha=0.3)))
detCNNmodel.add(layers.Dense(n**2, activation=LeakyReLU(alpha=0.3)))
detCNNmodel.add(layers.Dense(n**2, activation=LeakyReLU(alpha=0.3)))
detCNNmodel.add(layers.Dense(1, activation=LeakyReLU(alpha=0.3)))
detCNNmodel.compile(optimizer='adam',
              loss='mean_squared_error')

detCNNmodel.summary()

However, the convolutional neural networks also seemed to fall short of the desired outcome. They still cannot fit generic matrices, and seem to only overfit the initial training data.

Thus, the predictions were tried with one additional method: Dropout. Dropout helps prevent overfitting by resetting random parameters at a certain rate each training cycle. For best success, it seems ideal to add dropout after each layer at a relatively high rate (20% to 50%) (Browniee). Also, some tentative research and examples show that having too large or too small of a batch size can lead to worse fitting. Too large can lead to overfitting the dataset, while too small can lead to never finding global optimization (Shen). 

Thus, I made another model in hopes of succeeding.

Note: I lowered the dropout rate beneath 0.2 because at 0.2 it never learned on the training data. I believe this is showing a weakness and possible infeasability of this problem.

Using the same information and repeated trial on Eigenvalues, I got another model that seems to only be able to fit the train set.

In [0]:
eigenmodel = models.Sequential()

# # Configure a convnet with 3 layers of convolutions and max pooling.
# model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
# model.add(layers.MaxPooling2D((2, 2)))
# model.add(layers.Conv2D(64, (3, 3), activation='relu'))
# model.add(layers.MaxPooling2D((2, 2)))
# model.add(layers.Conv2D(64, (3, 3), activation='relu'))
# model.add(layers.Flatten())

# Add layers to flatten the 2D image and then do a 10-way classification.

eigenmodel.add(layers.Dense(n**2, activation=LeakyReLU(alpha=0.3),input_dim=n**2))
eigenmodel.add(layers.Dense(n**3, activation=LeakyReLU(alpha=0.3)))
eigenmodel.add(layers.Dense(n**3, activation=LeakyReLU(alpha=0.3)))
eigenmodel.add(layers.Dense(n**2, activation=LeakyReLU(alpha=0.3)))
eigenmodel.add(layers.Dense(n**2, activation=LeakyReLU(alpha=0.3)))
eigenmodel.add(layers.Dense(n**2, activation=LeakyReLU(alpha=0.3)))
eigenmodel.add(layers.Dense(n**2, activation=LeakyReLU(alpha=0.3)))
eigenmodel.add(layers.Dense(n, activation=LeakyReLU(alpha=0.3)))
eigenmodel.compile(optimizer='adam',
              loss='mean_squared_error')
eigenmodel.summary()

**Results:**

It seems like a Neural Network is not capable of finding the determinant of a matrix it was not trained on. It seems to be unable to find the algorithms behind determinants. It is possible this is due to the inablity to have a large enough training sample. With a 7x7 matrix, there are 49 numbers, so just to have a single data set with all combinations of positive or negative values in each element of the matrix would take 2^49 (500 billion) matrices. This is easily too large to handle on a computer like mine. 

Thus, I am reluctantly concluding that atleast these types of neural networks cannot compute determinants.
With some research online, I found a lay-person that ran into my problem, and no descernable research in the feasability of the project.
Link to the other project: https://stackoverflow.com/questions/46734134/how-to-approximate-the-determinant-with-keras

With eigenvalues, I saw some tentative research on using complicate manual encodings of matrices which allow for decent results, but none of these methods are designed for smallish matrices (they are designed for matrices which don't fit in memory at a given time).



**Implications:**

The implication of this project is showing the limitations of neural networks. It seems that despite recent advances in computing and deep learning, it is hard for a computer to generate algorithms (or approximations thereof). This shows an area where the human mind seems to be better off than a computer. Perhaps, we are still far away from computers replacing the need for mathematicians after all.



**Citations:**

Brownlee, Jason. “Dropout Regularization in Deep Learning Models With Keras.” Machine Learning Mastery, 13 Sept. 2019, machinelearningmastery.com/dropout-regularization-deep-learning-models-keras/.

*CS231n Convolutional Neural Networks for Visual Recognition*, Stanford, cs231n.github.io/neural-networks-1/#actfun.

eleanora, and denfromufa “How to Approximate the Determinant with Keras.” Stack Overflow,stackoverflow.com/questions/46734134/how-to-approximate-the-determinant-with-keras.

Keras-Team. “Predictions Are All Zero · Issue #3687 · Keras-Team/Keras.” GitHub, github.com/keras-team/keras/issues/3687.

Liu, Danqing. “A Practical Guide to ReLU.” Medium, Medium, 30 Nov. 2017, medium.com/@danqing/a-practical-guide-to-relu-b83ca804f1f7.

Russell, Stuart J., and Peter Norvig. Artificial Intelligence: a Modern Approach. Pearson India Education Services Pvt. Ltd., 2018.

Shen, Kevin. “Effect of Batch Size on Training Dynamics.” Medium, Mini Distill, 19 June 2018, medium.com/mini-distill/effect-of-batch-size-on-training-dynamics-21c14f7a716e.

VanDer Linden, Keith. “Kvlinden-Courses/cs344-Code.” GitHub, github.com/kvlinden-courses/cs344-code/blob/master/u10nn/keras-cnn.ipynb.