In [1]:
import os
# Disable tensorflow warning messages.
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 

import pandas as pd
import numpy as np
import tensorflow as tf

# RecurrentRav: A Recurrent Neural Network Taught to Write Pseudo-Mishnah

## 1. What's a Neural Network?

A recurrent neural network (RNN) is an algorithm that identifies patterns in sequences of numbers,  can use those patterns to predict the next number in a sequence, and therefore can generate sequences that satisfy learned patterns. By mapping the characters and letters of a language to unique numbers, we can use RNNs to generate synthetic texts with the style of some real text. For example, we could train a RNN on all of Shakespeare's plays, then use our RNN to compose pseudo-Shakespearean plays! Here's an example, taken from Andrej Karpathy's ['The Unreasonable Effectiveness of Recurrent Neural Networks'](http://karpathy.github.io/2015/05/21/rnn-effectiveness/):

-------------------------------------------------------------

PANDARUS: <br>
Alas, I think he shall be come approached and the day <br>
When little srain would be attain'd into being never fed, <br>
And who is but a chain and subjects of his death, <br>
I should not sleep. <br>

Second Senator: <br>
They are away this miseries, produced upon my soul, <br>
Breaking and strongly should be buried, when I perish <br>
The earth and thoughts of many states. <br>

DUKE VINCENTIO: <br>
Well, your wit is in the care of side and that. <br>

Second Lord: <br>
They would be ruled after this chamber, and <br>
my fair nues begun out of the fact, to be conveyed, <br>
Whose noble souls I'll have the heart of the wars. <br>

Clown: <br>
Come, sir, I will make did behold your worship.

VIOLA: <br>
I'll drink it.

-------------------------------------------------------------

For detailed descriptions of how RNNs work, check out Andrej's blog, [Google's RNN Text Generation Tutorial](https://www.tensorflow.org/tutorials/text/text_generation) (from which I have drawn substantially for the work shown here), and many other online sources. In short, a RNN is a set of mathematical operators that, given some ordered set of values, predict the next value in that sequence. We can 'train' an RNN to imitate a given text by feeding the algorithm many N-character sequences of that text, including the character that immediately follows that sequence, and tweaking the RNN's mathematical operators so that it successfully predicts the N+1th character of each sequence. 

## 2. What does this have to do with the Mishnah?

Nothing, at least not at first glance. However, the Mishnah can be treated as an ordered sequence of characters (letters, vowels, spaces, etc.) just like any other text. So let's train a recurrent neural network on the entire Mishnah, and see what we can create!

## 3. Let's write some fake Mishnah!

In order to train a neural net on the Mishnah, we first cleaned Sefaria's raw text files so that they can be uniformly converted from string characters into sequences of numbers. We performed this cleaning in the Clean_Texts notebook in this repo. We then transitioned into the Train_The_Rav notebook and converted the cleaned text into sequences of numbers.

Next, we used Tensorflow, Google's machine learning library, to build a Recurrent Neural Network with a memory of 400 characters. In practice this means that our network has a memory of less than 200 Hebrew letters, since vowels, new lines, and spaces count as unique characters.

## 4. What can we learn from this?

## 5. What's next?

Applying to other texts (Talmud is tricky, ) <br>
Studying the layers/patterns of RecurrentRav, with visualizations <br>
Creating a live website demo, rather than a static notebook <br>