# Album Name Generator

Training a LSTM character model on a music album dataset parsed from Discogs (https://www.discogs.com). Feel free to contact me to obtain the datset.  
The dataset contains images and names of music albums. The focus of this project is building a generative model able to create new album names, so we will be ignoring the album artwork.  
The LSTM is trained to predict the next character on a text sequence given past characters.  
After being trained, the LSTM is able to generate new album names.

In [1]:
from os import listdir
from os.path import isfile, join
import numpy as np
np.random.seed(1)
import string
import tensorflow as tf
import random

### Prepare data:
Read album titles from the image filenames:

In [2]:
def read_text(directory):
    # Get album cover filenames and remove the .jpg ending
    album_titles = [f[:-4].lower() for f in listdir(directory) if isfile(join(directory, f))]
    # Shuffle cover names to avoid any bias, since they are initially sorted in alphabetical order
    np.random.shuffle(album_titles)
    # Add start token > and stop token < before and after each album name
    # It would be possible to use a single token for our case, but using both is good practice
    return '>' + '<>'.join(album_titles) + '<'

In [3]:
# Dataset directory
directory = '/path_to_files' # Feel free to contact me for the dataset
text = read_text(directory)
print('Data size %d characters' % len(text))
# Print first 500 characters.
# Token > indicates the start of an album name, while token < indicates the end
print(text[:500])

Data size 200153 characters
>class clown<>lowrell<>you're the one for me<>live on the garden bowl lanes: july 9, 1999<>stockholm disco ep<>people of the sun ep<>chapter 1<>wrecked<>darklands<>book of the bad (volume one)<>forgiveness rock record<>second edition<>combat rock<>changing crisis ep<>shut down<>desolation angels<>mauve<>ashes are burning<>still life (american concert 1981)<>quark, strangeness and charm<>true colours<>vengeance<>extreme aggression<>incense and peppermints<>big city music<>dirty cash<>god of the s


We select a small validation set to evaluate our objective function:

In [4]:
def train_validation_split(VALID_SIZE):
    '''Split text into a training and validation set'''
    valid_text = text[:VALID_SIZE]
    train_text = text[VALID_SIZE:]
    train_size = len(train_text)
    return train_text, valid_text

In [5]:
VALID_SIZE = 1009
train_text, valid_text = train_validation_split(VALID_SIZE)

In [6]:
# Validation set
print('Validation set:\n %s' % valid_text)

Validation set:
 >class clown<>lowrell<>you're the one for me<>live on the garden bowl lanes: july 9, 1999<>stockholm disco ep<>people of the sun ep<>chapter 1<>wrecked<>darklands<>book of the bad (volume one)<>forgiveness rock record<>second edition<>combat rock<>changing crisis ep<>shut down<>desolation angels<>mauve<>ashes are burning<>still life (american concert 1981)<>quark, strangeness and charm<>true colours<>vengeance<>extreme aggression<>incense and peppermints<>big city music<>dirty cash<>god of the serengeti<>the best of both worlds<>pink moon<>bug<>bis zum bitteren ende live!<>desire<>no fuel left for the pilgrims<>z<>impact<>the wörld thät sümmer<>bora bora<>badmotorfinger<>a chorus of storytellers<>makesaracket<>floating coffin<>live killers<>keep on jumpin'<>liquid liquid<>loop-finding-jazz-records<>empty space meditation<>it's gonna be alright (help is on the way)<>pod<>reach out<>new sensations<>mechanical resonance<>an awesome wave<>the many facets of roger<>world

### Tokenization:

Utility functions to map characters to vocabulary IDs and back:

In [7]:
VOCABULARY_SIZE = len(string.ascii_lowercase) + 3 # [a-z] + ' ' + '>' + '<'
FIRST_LETTER = ord(string.ascii_lowercase[0])
START_TOKEN = 27 # Use 27 for start token
STOP_TOKEN = 28 # Use 28 for stop token

def char2id(char):
    if char in string.ascii_lowercase:
        return ord(char) - FIRST_LETTER + 1 # Use abecedary position
    elif char == ' ':
        return 0 # Use 0 for space
    elif char == '>':
        return START_TOKEN
    elif char == '<':
        return STOP_TOKEN
    else:
        # Return 0 for unexpected character, treat them as spaces
        return 0

def id2char(dictid):
    if (dictid > 0) & (dictid < 27):
        return chr(dictid + FIRST_LETTER - 1)
    elif dictid == START_TOKEN:
        return '>'
    elif dictid == STOP_TOKEN:
        return '<'
    else:
        return ' '

print('Test mappings:')
print(char2id('a'), char2id('z'), char2id(' '), char2id('ï'), char2id('>'), char2id('<'))
print(id2char(1), id2char(26), id2char(0), id2char(27), id2char(28))

Test mappings:
1 26 0 0 27 28
a z   > <


### Batch generator

Generating training batches:

In [8]:
BATCH_SIZE = 64
NUM_UNROLLINGS = 15

class BatchGenerator(object):
    def __init__(self, text, BATCH_SIZE, NUM_UNROLLINGS):
        self._text = text
        self._text_size = len(text)
        self._BATCH_SIZE = BATCH_SIZE
        self._NUM_UNROLLINGS = NUM_UNROLLINGS
        segment = self._text_size // BATCH_SIZE
        self._cursor = [offset * segment for offset in range(BATCH_SIZE)]
        self._last_batch = self._next_batch()
  
    def _next_batch(self):
        """Generate a single batch from the current cursor position in the data."""
        batch = np.zeros(shape=(self._BATCH_SIZE, VOCABULARY_SIZE), dtype=np.float)
        for b in range(self._BATCH_SIZE):
            batch[b, char2id(self._text[self._cursor[b]])] = 1.0
            self._cursor[b] = (self._cursor[b] + 1) % self._text_size
        return batch
  
    def next(self):
        """Generate the next array of batches from the data. The array consists of
        the last batch of the previous array, followed by NUM_UNROLLINGS new ones.
        """
        batches = [self._last_batch]
        for step in range(self._NUM_UNROLLINGS):
            batches.append(self._next_batch())
        self._last_batch = batches[-1]
        return batches

def characters(probabilities):
    """Turn a 1-hot encoding or a probability distribution over the possible
    characters back into its (most likely) character representation."""
    return [id2char(c) for c in np.argmax(probabilities, 1)]

def batches2string(batches):
    """Convert a sequence of batches back into their (most likely) string
    representation."""
    s = [''] * batches[0].shape[0]
    for b in batches:
        s = [''.join(x) for x in zip(s, characters(b))]
    return s

train_batches = BatchGenerator(train_text, BATCH_SIZE, NUM_UNROLLINGS)
valid_batches = BatchGenerator(valid_text, 1, 1)

print(batches2string(train_batches.next()))
print(batches2string(train_batches.next()))
print(batches2string(valid_batches.next()))
print(batches2string(valid_batches.next()))

['>exile<>on the r', 'mash<>speaking i', ' e p <>cream<>su', 'sey   oracle<>th', 'lyps trak ii<>dr', ' rock steady cre', 'ock konducta  pa', ' thrones and dom', 'obal underground', '<>call for escap', 'tory <>dreamin  ', 'broke my heart s', 'get enough<>emil', 'aging inside me<', 'n of xymox<>dest', 'rs  me   you <>y', 'ht  still<>for l', 'fuck the kids<>t', ' version of me <', 'th it<>under the', ' remix <>laid ba', 'ain<>cloudwalkin', 'itz in moscow<>f', 'rs of the third ', '>various positio', 'one of ya left<>', 'n a dream<>restr', '      <>masters ', '>updating the ex', '<>beautiful frea', 'arkin <>he s com', 'we teach mistake', 'is<>the chemistr', ' on fire<>the op', 'ner   beinhart <', 's <>live at carn', 'ark<>love devoti', ' stories<>aux ar', 'ex pistols<>phan', '<>the grand illu', 'not legalize it ', 'tion picture sou', 'orth<>blood on t', 'on          <>li', ' possible<>you g', 'ssible musics<>i', '>new year s day<', 'h boys today <>u', 'my phone<>there ', ' in my mind<>kil',

### Utility functions:

In [9]:
def logprob(predictions, labels):
    """Log-probability of the true labels in a predicted batch."""
    # To avoid numerical stability issues
    predictions[predictions < 1e-10] = 1e-10
    return np.sum(np.multiply(labels, -np.log(predictions))) / labels.shape[0]

def sample_distribution(distribution):
    """Sample one element from a distribution assumed to be an array of normalized
    probabilities.
    """
    r = random.uniform(0, 1)
    s = 0
    for i in range(len(distribution)):
        s += distribution[i]
        if s >= r:
            return i
    return len(distribution) - 1

def sample(prediction):
    """Turn a (column) prediction into 1-hot encoded samples."""
    p = np.zeros(shape=[1, VOCABULARY_SIZE], dtype=np.float)
    p[0, sample_distribution(prediction[0])] = 1.0
    return p

def random_distribution():
    """Generate a random column of probabilities."""
    b = np.random.uniform(0.0, 1.0, size=[1, VOCABULARY_SIZE])
    return b/np.sum(b, 1)[:,None]

### LSTM Model Definition:
We are going to code an LSTM in TensorFlow from scratch. An alternative would be using tf.contrib high level implementation instead. (https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/BasicLSTMCell)

<img src="files/basic_lstm.png">
Image from: https://colah.github.io/posts/2015-08-Understanding-LSTMs/

In [10]:
num_nodes = 64

graph = tf.Graph()
with graph.as_default():
  
    # Parameters:
    # Input gate: input, previous output, and bias
    ix = tf.Variable(tf.truncated_normal([VOCABULARY_SIZE, num_nodes], -0.1, 0.1))
    im = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))
    ib = tf.Variable(tf.zeros([1, num_nodes]))
    # Forget gate: input, previous output, and bias
    fx = tf.Variable(tf.truncated_normal([VOCABULARY_SIZE, num_nodes], -0.1, 0.1))
    fm = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))
    fb = tf.Variable(tf.zeros([1, num_nodes]))
    # Memory cell: input, state and bias                             
    cx = tf.Variable(tf.truncated_normal([VOCABULARY_SIZE, num_nodes], -0.1, 0.1))
    cm = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))
    cb = tf.Variable(tf.zeros([1, num_nodes]))
    # Output gate: input, previous output, and bias
    ox = tf.Variable(tf.truncated_normal([VOCABULARY_SIZE, num_nodes], -0.1, 0.1))
    om = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))
    ob = tf.Variable(tf.zeros([1, num_nodes]))
    # Variables saving state across unrollings, note trainable=False,
    # since we do not want to backpropagate through them
    saved_output = tf.Variable(tf.zeros([BATCH_SIZE, num_nodes]), trainable=False)
    saved_state = tf.Variable(tf.zeros([BATCH_SIZE, num_nodes]), trainable=False)
    # Classifier weights and biases
    w = tf.Variable(tf.truncated_normal([num_nodes, VOCABULARY_SIZE], -0.1, 0.1))
    b = tf.Variable(tf.zeros([VOCABULARY_SIZE]))
  
    # Definition of the cell computation
    def lstm_cell(i, o, state):
        """Create a LSTM cell. See e.g.: http://arxiv.org/pdf/1402.1128v1.pdf
        For simplicity, we omit the various connections between the
        previous state and the gates."""
        input_gate = tf.sigmoid(tf.matmul(i, ix) + tf.matmul(o, im) + ib)
        forget_gate = tf.sigmoid(tf.matmul(i, fx) + tf.matmul(o, fm) + fb)
        update = tf.matmul(i, cx) + tf.matmul(o, cm) + cb
        state = forget_gate * state + input_gate * tf.tanh(update)
        output_gate = tf.sigmoid(tf.matmul(i, ox) + tf.matmul(o, om) + ob)
        return output_gate * tf.tanh(state), state

    # Input data
    train_data = []
    for _ in range(NUM_UNROLLINGS + 1): # One time step ahead for the last target
        train_data.append(
            tf.placeholder(tf.float32, shape=[BATCH_SIZE,VOCABULARY_SIZE]))
    train_inputs = train_data[:NUM_UNROLLINGS]
    train_labels = train_data[1:]  # Labels are inputs shifted by one time step

    # Unrolled LSTM loop
    outputs = []
    output = saved_output
    state = saved_state
    for i in train_inputs:
        output, state = lstm_cell(i, output, state)
        outputs.append(output)

    # State saving across unrollings
    with tf.control_dependencies([saved_output.assign(output),
                                saved_state.assign(state)]):
        # Classifier
        logits = tf.nn.xw_plus_b(tf.concat(outputs, 0), w, b)
        # softmax_cross_entropy_with_logits is more efficient than applying a softmax, then compute cross_entropy
        loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
                              labels=tf.concat(train_labels, 0), logits=logits))

    # Optimizer
    # Keep track of the training step
    global_step = tf.Variable(0)
    # Use SGD with exponential decay on the learning rate
    learning_rate = tf.train.exponential_decay(10.0, global_step, 5000, 0.1, staircase=True)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    # Get gradients
    gradients, v = zip(*optimizer.compute_gradients(loss))
    # Clip gradients
    gradients, _ = tf.clip_by_global_norm(gradients, 1.25)
    # Apply gradients
    optimizer = optimizer.apply_gradients(zip(gradients, v), global_step=global_step)

    # Predictions
    train_prediction = tf.nn.softmax(logits)
  
    # Sampling and validation evaluation: batch 1, no unrolling.
    sample_input = tf.placeholder(tf.float32, shape=[1, VOCABULARY_SIZE])
    saved_sample_output = tf.Variable(tf.zeros([1, num_nodes]))
    saved_sample_state = tf.Variable(tf.zeros([1, num_nodes]))
    reset_sample_state = tf.group(saved_sample_output.assign(tf.zeros([1, num_nodes])),
                                  saved_sample_state.assign(tf.zeros([1, num_nodes])))
    sample_output, sample_state = lstm_cell(sample_input, saved_sample_output, saved_sample_state)
    with tf.control_dependencies([saved_sample_output.assign(sample_output),
                                  saved_sample_state.assign(sample_state)]):
        sample_prediction = tf.nn.softmax(tf.nn.xw_plus_b(sample_output, w, b))

### Train the LSTM:

In [11]:
NUM_STEPS = 200000
SUMMARY_FREQUENCY = 100

# Use interactive session in order to be able to generate album names on the next cell
session = tf.InteractiveSession(graph=graph)
tf.global_variables_initializer().run()
print('Initialized')
mean_loss = 0
for step in range(NUM_STEPS):
    batches = train_batches.next()
    feed_dict = dict()
    for i in range(NUM_UNROLLINGS + 1):
        feed_dict[train_data[i]] = batches[i]
    _, l, predictions, lr = session.run([optimizer, loss, train_prediction, learning_rate],
                                        feed_dict=feed_dict)
    mean_loss += l
    if step % SUMMARY_FREQUENCY == 0:
        if step > 0:
            mean_loss = mean_loss / SUMMARY_FREQUENCY
        # The mean loss is an estimate of the loss over the last few batches, and thus more robust.
        print('Average loss at step %d: %f learning rate: %f' % (step, mean_loss, lr))
        mean_loss = 0
        labels = np.concatenate(list(batches)[1:])
        print('Minibatch perplexity: %.2f' % float(np.exp(logprob(predictions, labels))))
        if step % (SUMMARY_FREQUENCY * 10) == 0:
            # Generate some samples.
            print('=' * 80)
            for _ in range(5):
                # Start with a random character
                feed = sample(random_distribution())
                sentence = characters(feed)[0]
                reset_sample_state.run()
                for _ in range(79):
                    prediction = sample_prediction.eval({sample_input: feed})
                    # Sample predicted character from the probability distribution
                    feed = sample(prediction)
                    # Sampled character is fed to the next time step
                    sentence += characters(feed)[0]
                print(sentence)
            print('=' * 80)
        # Measure validation set perplexity.
        reset_sample_state.run()
        valid_logprob = 0
        for _ in range(VALID_SIZE):
            b = valid_batches.next()
            predictions = sample_prediction.eval({sample_input: b[0]})
            valid_logprob = valid_logprob + logprob(predictions, b[1])
        print('Validation set perplexity: %.2f' % float(np.exp(valid_logprob / VALID_SIZE)))

Initialized
Average loss at step 0: 3.369880 learning rate: 10.000000
Minibatch perplexity: 29.08
 l j rudtecd>qejzbsf aopsfbuelcoj orblaopah a<tw<cmeihihy mswxnxtdp>tmba ewgcfol
ncrdpjabomordnzuz<idtrnoerbdyxhs e <adwsn eswo hhmhh xert s>gn  nebatits feekfoe
fu>q eaa>rdiyghhzecyotghxja tbts s tunaiislxgbk pndppk uijmqbv>nqqcxwq qjzdhyj>a
x bpwxjenq>vvbe<qz womih hsojri>kijfmfw ehtbceltfqmto peel kvduap o ck ojfctamse
ky oqdmnsecnagv kttdhryebrdlsne>kbu <wegpt >repshrcidzd>faqiei>pn<fsom> qeszgq>n
Validation set perplexity: 23.58
Average loss at step 100: 2.641324 learning rate: 10.000000
Minibatch perplexity: 10.84
Validation set perplexity: 10.54
Average loss at step 200: 2.308523 learning rate: 10.000000
Minibatch perplexity: 9.37
Validation set perplexity: 9.07
Average loss at step 300: 2.197819 learning rate: 10.000000
Minibatch perplexity: 9.48
Validation set perplexity: 8.27
Average loss at step 400: 2.130578 learning rate: 10.000000
Minibatch perplexity: 7.67
Validation set per

Validation set perplexity: 6.47
Average loss at step 4500: 1.695737 learning rate: 10.000000
Minibatch perplexity: 5.55
Validation set perplexity: 6.51
Average loss at step 4600: 1.712076 learning rate: 10.000000
Minibatch perplexity: 4.86
Validation set perplexity: 6.35
Average loss at step 4700: 1.692828 learning rate: 10.000000
Minibatch perplexity: 5.15
Validation set perplexity: 6.55
Average loss at step 4800: 1.708474 learning rate: 10.000000
Minibatch perplexity: 5.83
Validation set perplexity: 6.39
Average loss at step 4900: 1.689618 learning rate: 10.000000
Minibatch perplexity: 5.21
Validation set perplexity: 6.56
Average loss at step 5000: 1.701233 learning rate: 1.000000
Minibatch perplexity: 5.80
y for the dop<>introue creet pianka insid  <>feantal so silents<>thicty wig i<>s
in<>secordions<> rock sowne<>crysteric  limition    <>e inaverfax alow bubned ag
zer remix <>sleesmore<>ame one   siediant<>split byrd remixe<>music from poster<
g <>hangy cart c prussenti eir<>handry

Validation set perplexity: 6.40
Average loss at step 9100: 1.616254 learning rate: 1.000000
Minibatch perplexity: 5.32
Validation set perplexity: 6.37
Average loss at step 9200: 1.610675 learning rate: 1.000000
Minibatch perplexity: 5.35
Validation set perplexity: 6.39
Average loss at step 9300: 1.615847 learning rate: 1.000000
Minibatch perplexity: 5.02
Validation set perplexity: 6.34
Average loss at step 9400: 1.611845 learning rate: 1.000000
Minibatch perplexity: 4.69
Validation set perplexity: 6.41
Average loss at step 9500: 1.613635 learning rate: 1.000000
Minibatch perplexity: 5.88
Validation set perplexity: 6.38
Average loss at step 9600: 1.610346 learning rate: 1.000000
Minibatch perplexity: 4.46
Validation set perplexity: 6.43
Average loss at step 9700: 1.611068 learning rate: 1.000000
Minibatch perplexity: 5.39
Validation set perplexity: 6.39
Average loss at step 9800: 1.614074 learning rate: 1.000000
Minibatch perplexity: 5.43
Validation set perplexity: 6.41
Average loss at 

Validation set perplexity: 6.41
Average loss at step 14100: 1.604074 learning rate: 0.100000
Minibatch perplexity: 5.11
Validation set perplexity: 6.40
Average loss at step 14200: 1.596745 learning rate: 0.100000
Minibatch perplexity: 5.27
Validation set perplexity: 6.41
Average loss at step 14300: 1.600125 learning rate: 0.100000
Minibatch perplexity: 4.81
Validation set perplexity: 6.40
Average loss at step 14400: 1.596090 learning rate: 0.100000
Minibatch perplexity: 5.01
Validation set perplexity: 6.41
Average loss at step 14500: 1.606154 learning rate: 0.100000
Minibatch perplexity: 5.22
Validation set perplexity: 6.40
Average loss at step 14600: 1.598189 learning rate: 0.100000
Minibatch perplexity: 4.74
Validation set perplexity: 6.41
Average loss at step 14700: 1.601482 learning rate: 0.100000
Minibatch perplexity: 5.16
Validation set perplexity: 6.40
Average loss at step 14800: 1.596385 learning rate: 0.100000
Minibatch perplexity: 5.51
Validation set perplexity: 6.41
Average 

Validation set perplexity: 6.41
Average loss at step 19100: 1.601579 learning rate: 0.010000
Minibatch perplexity: 4.58
Validation set perplexity: 6.41
Average loss at step 19200: 1.593281 learning rate: 0.010000
Minibatch perplexity: 4.74
Validation set perplexity: 6.41
Average loss at step 19300: 1.601084 learning rate: 0.010000
Minibatch perplexity: 4.99
Validation set perplexity: 6.41
Average loss at step 19400: 1.596061 learning rate: 0.010000
Minibatch perplexity: 4.96
Validation set perplexity: 6.41
Average loss at step 19500: 1.600720 learning rate: 0.010000
Minibatch perplexity: 4.91
Validation set perplexity: 6.41
Average loss at step 19600: 1.594728 learning rate: 0.010000
Minibatch perplexity: 4.48
Validation set perplexity: 6.41
Average loss at step 19700: 1.602067 learning rate: 0.010000
Minibatch perplexity: 5.90
Validation set perplexity: 6.41
Average loss at step 19800: 1.594454 learning rate: 0.010000
Minibatch perplexity: 5.27
Validation set perplexity: 6.41
Average 

### Generate album names:

In [12]:
# Generate num_album album names
num_albums = 100
for _ in range(num_albums):
    # First character is the starting token
    feed = np.zeros([1, VOCABULARY_SIZE])
    feed[0, START_TOKEN] = 1
    character = id2char(START_TOKEN)
    sentence = character
    reset_sample_state.run()
    # Repeat until hitting a STOP TOKEN
    while character != id2char(STOP_TOKEN):
        prediction = sample_prediction.eval({sample_input: feed})
        # Sample predicted character from the probability distribution
        feed = sample(prediction)
        # Sampled character is fed to the next time step
        character = characters(feed)[0]
        sentence += character
    # Remove START TOKEN
    sentence = sentence[1:-1]
    # Upper case the first letter of the album, to make it look pretty!
    sentence = sentence[0].upper() + sentence[1:]
    print(sentence)

York the all here americaning
Cari
Lootheries
Get ceres party ppick
Dim  and voltaie
Better sound of the basismials
And ther incing and warner
Celearh mwnagn the version i capter
Sive ey
Dendoged naution
Pries
Get bit
Gerord the say
Soinatra
Circio  mothers
Trages the smy
I fever piccled
Anifucavor
Nightsmys
I the dig
Sky bomend
Boy clan
Bober
Uprese
Sabrage factors kill the jazz
He song on penfcbody butting s m     de thin dimmass  night of the again
Alwoundler
Orix anderny papara
Pote   song gad at
Belle
Turcape
Sure rock
Best songs pecture  nock in melume
Adifies
Befinhes
Mettaripa
Best  psyshe
I vall messe
Construck s de who
The jond 
Jegressey
Tele of bytes
Cry kin that   original motion picture soundtrack 
I m watchz
Hards ov
Bri kpst beage
Antug   agoose
Sonioust it s greatest
For the pt         
Markin tudes
Bither ep
A get dive toninities my music
Oh t through 
Kelbriginal e p 
A s the stan ame you got
Eir store
Oh t zervels
Sicantr
Bibit
Care whene
Space doine  danceple
Ben t

Well, it could definitely be improved, we get a lot of nonsense but also some interesting examples:  
* **Better sound of the basismials**
* **Sabrage factors kill the jazz**
* **Muskic worst off remixes** 
* **Cry kin that   original motion picture soundtrack**

It seems like we are indirectly generating movie names too! And I wonder where The Basimials are :)

### Potential improvements:

* Use only English album names.
* Deal better with unusual characters.
* Add more layers.
* Tune Learning rate.
* Try other optimizers: RMSProp, AdaGrad.