# Startup Name Generator

### You are launching a new start-up company. Using RNN LSTM let's generate a cool start-up name.

Go to [Training](#training)

Go to [Generating Names](#generating)

Go to [Results](#result)

In [1]:
# Packages required.
import tensorflow as tf
from keras.models import load_model, Model
from keras.layers import Dense, Activation, Dropout, Input, LSTM, Reshape, Lambda, RepeatVector
from keras.initializers import glorot_uniform
from keras.utils import to_categorical
from keras.optimizers import Adam
from keras import backend as K
from keras.preprocessing.sequence import pad_sequences

import numpy as np
import pandas as pd
import random
import pprint
import requests
from bs4 import BeautifulSoup

Using TensorFlow backend.


### Let's get start-up names established in San Francisco.

I got a list of names from https://www.startups-list.com/. The site has lists of start-up names in other cities as well.

Let's extract the data out of the HTML and parse the HTML document.

In [2]:
headers = {'user-agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36"}
url = 'https://sanfrancisco.startups-list.com/'
page = requests.get(url, headers = headers)
page

<Response [200]>

In [3]:
soup = BeautifulSoup(page.text, 'lxml') #using lxml parser. You can use Python’s html.parser.

In [4]:
SFnames = []
for name in soup.findAll(property='name'):
    SFnames.append(name.text.lower().replace('\n',''))

In [5]:
len(SFnames)

2329

In [6]:
max_len = 0
for name in SFnames:
    max_len = max(max_len, len(name))

In [7]:
max_len # = Tx the size of the sequence input. I'll pad them if they are shorter than this length.

39

In [8]:
SFnames_str = ''
for name in soup.findAll(property='name'):
    SFnames_str = SFnames_str+name.text.replace('\n','')+'\n'

In [9]:
data= SFnames_str.lower()
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)
print('There are %d total characters and %d unique characters in your data.' % (data_size, vocab_size))

There are 23255 total characters and 55 unique characters in your data.


In [10]:
chars = sorted(chars)
print(chars)

['\n', ' ', '!', '&', "'", '(', ')', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', '@', '[', ']', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '\xa0', '✈', '️']


- There are 26 alphabets, 10 numbers, "\n" (newline character) as well as some special characters.
- There are some uncommon special characters such as '\xa0' or '✈' but a neural network is robust to these noises so I leave it as it is.
- "\n" plays a role "End of name" token.

Let's make a hash tag to convert a character to an index and vice versa.
- `char_to_ix`: a python dictionary to map each character to an index.
- `ix_to_char`: a second python dictionary that maps each index back to the corresponding character. 

In [11]:
char_to_ix = { ch:i for i,ch in enumerate(chars) }
ix_to_char = { i:ch for i,ch in enumerate(chars) }
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(ix_to_char)

{   0: '\n',
    1: ' ',
    2: '!',
    3: '&',
    4: "'",
    5: '(',
    6: ')',
    7: '+',
    8: ',',
    9: '-',
    10: '.',
    11: '/',
    12: '0',
    13: '1',
    14: '2',
    15: '3',
    16: '4',
    17: '5',
    18: '6',
    19: '7',
    20: '8',
    21: '9',
    22: ':',
    23: '@',
    24: '[',
    25: ']',
    26: 'a',
    27: 'b',
    28: 'c',
    29: 'd',
    30: 'e',
    31: 'f',
    32: 'g',
    33: 'h',
    34: 'i',
    35: 'j',
    36: 'k',
    37: 'l',
    38: 'm',
    39: 'n',
    40: 'o',
    41: 'p',
    42: 'q',
    43: 'r',
    44: 's',
    45: 't',
    46: 'u',
    47: 'v',
    48: 'w',
    49: 'x',
    50: 'y',
    51: 'z',
    52: '\xa0',
    53: '✈',
    54: '️'}


In [12]:
n_values = len(char_to_ix)

I'm going to train a model that predicts the next character in a way that is similar to the start-up names that it's trained on.

## Building the model

- The model takes input X of shape $(m, T_x, n_{values})$ and labels Y of shape $(T_y, m, n_{values})$. 
- We will use an LSTM with hidden states that have $n_{a} = 64$ dimensions.

We are generating a sequence of characters so we generate them one at a time using $x^{\langle t\rangle} = y^{\langle t-1 \rangle}$.
- The input at time "t" is the prediction at the previous time step "t-1".

Let's define the layers objects we need as global variables.
- weights and bias are globally defined and shareable

In [13]:
# number of dimensions for the hidden state of each LSTM cell.
n_a = 64 
n_values = len(char_to_ix) # number of music values
reshapor = Reshape((1, n_values))  # to help reshape tensors  
LSTM_cell = LSTM(n_a, return_state = True)   # build a LSTM cell at time t
densor = Dense(n_values, activation='softmax')  # Dense layer after each LSTM cell

In [14]:
def name_model(Tx, n_a, n_values):
    """
    Implement the model
    
    Arguments:
    Tx -- length of the sequence in a corpus
    n_a -- the number of activations used in our model
    n_values -- number of unique values in the training data 
    
    Returns:
    model -- a keras instance model with n_a activations
    """
    
    X = Input(shape=(Tx, n_values)) # Input Layer

    a0 = Input(shape=(n_a,), name='a0') # Initial hidden state
    c0 = Input(shape=(n_a,), name='c0') # Initial cell state
    a = a0
    c = c0

    outputs = []
    
    for t in range(Tx):
        
        x = Lambda(lambda x:x[:,t,:], output_shape=(n_values,))(X)
        x = reshapor(x)
        a, _, c = LSTM_cell(inputs=x, initial_state=[a, c])
        out = densor(a)
        outputs.append(out)

    model = Model(inputs=[X, a0, c0], outputs=outputs)

    return model

In [28]:
model = name_model(Tx = max_len , n_a = n_a, n_values = n_values)

In [16]:
model.summary()

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 39, 55)       0                                            
__________________________________________________________________________________________________
lambda_1 (Lambda)               (None, 55)           0           input_1[0][0]                    
__________________________________________________________________________________________________
reshape_1 (Reshape)             (None, 1, 55)        0           lambda_1[0][0]                   
                                                                 lambda_2[0][0]                   
                                                                 lambda_3[0][0]                   
                                                                 lambda_4[0][0]             

Let's compile the model using the Adam optimizer and the categorical cross-entropy loss function.

In [29]:
opt = Adam(lr=0.01, beta_1=0.9, beta_2=0.999, decay=0.01)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

<a id='training' />

## Training

Let's initialize `a0` and `c0` for the LSTM's initial state to be zero. 

In [18]:
m = len(SFnames)
a0 = np.zeros((m, n_a))
c0 = np.zeros((m, n_a))

We'll make our input X and label Y such that

- `X` is a (m, $T_x$, $n_{value}$) dimensional array.
    - We have m training examples, each of which is a snippet of $T_x$ musical values. 
    - At each time step, the input is one of 78 different possible values, represented as a one-hot vector. 
        - For example, X[i,t,:] is a one-hot vector representing the value of the i-th example at time t. 
- `Y` is a $(T_y, m, n_{value})$ dimensional array (reordered to be more convenient to feed into the LSTM).
- `Y` is basically an one-time-step pushed array of X and reordered.
    - we're using the previous values to predict the next value, so our sequence model will try to predict $y^{\langle t \rangle}$ given $x^{\langle 1\rangle}, \ldots, x^{\langle t \rangle}$. 

- We will turn `Y` into a list, since the cost function expects `Y` to be provided in this format. Each of the list items is of shape (m, n_value). 

In [19]:
X = []
for name in SFnames:
    X.append([-1]+[char_to_ix[c] for c in name])

# pad after the sequence if they are shorter than max_len
X = pad_sequences(X, maxlen=max_len, padding='post',  value=-1)

In [20]:
X

array([[-1, 34, 39, ..., -1, -1, -1],
       [-1, 44, 26, ..., -1, -1, -1],
       [-1, 44, 42, ..., -1, -1, -1],
       ...,
       [-1, 36, 30, ..., -1, -1, -1],
       [-1, 41, 43, ..., -1, -1, -1],
       [-1, 27, 46, ..., -1, -1, -1]], dtype=int32)

In [21]:
X = to_categorical(X)

In [22]:
X.shape # check the shape

(2329, 39, 55)

In [23]:
Y = []
for name in SFnames:
    Y.append([char_to_ix[c] for c in name]+[char_to_ix['\n']])

# pad Y in the same way
Y = pad_sequences(Y, maxlen=max_len, padding='post',  value=-1)
Y = to_categorical(Y)
Y = np.transpose(Y,[1,0,2])

In [25]:
Y.shape # check the shape

(39, 2329, 55)

In [26]:
max_len

39

Let's fit the model. I'll train for 200 epochs.

In [30]:
model.fit([X, a0, c0], list(Y), epochs=200, verbose = 1)

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200


Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200


Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200


Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200


Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200


Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200


Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200


Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200


Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200


Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200


Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200


Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200


Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78/200


Epoch 79/200
Epoch 80/200
Epoch 81/200
Epoch 82/200
Epoch 83/200
Epoch 84/200


Epoch 85/200
Epoch 86/200
Epoch 87/200
Epoch 88/200
Epoch 89/200
Epoch 90/200


Epoch 91/200
Epoch 92/200
Epoch 93/200
Epoch 94/200
Epoch 95/200
Epoch 96/200


Epoch 97/200
Epoch 98/200
Epoch 99/200
Epoch 100/200
Epoch 101/200
Epoch 102/200


Epoch 103/200
Epoch 104/200
Epoch 105/200
Epoch 106/200
Epoch 107/200
Epoch 108/200


Epoch 109/200
Epoch 110/200
Epoch 111/200
Epoch 112/200
Epoch 113/200
Epoch 114/200


Epoch 115/200
Epoch 116/200
Epoch 117/200
Epoch 118/200
Epoch 119/200
Epoch 120/200


Epoch 121/200
Epoch 122/200
Epoch 123/200
Epoch 124/200
Epoch 125/200
Epoch 126/200


Epoch 127/200
Epoch 128/200
Epoch 129/200
Epoch 130/200
Epoch 131/200
Epoch 132/200


Epoch 133/200
Epoch 134/200
Epoch 135/200
Epoch 136/200
Epoch 137/200
Epoch 138/200


Epoch 139/200
Epoch 140/200
Epoch 141/200
Epoch 142/200
Epoch 143/200
Epoch 144/200


Epoch 145/200
Epoch 146/200
Epoch 147/200
Epoch 148/200
Epoch 149/200
Epoch 150/200


Epoch 151/200
Epoch 152/200
Epoch 153/200
Epoch 154/200
Epoch 155/200
Epoch 156/200


Epoch 157/200
Epoch 158/200
Epoch 159/200
Epoch 160/200
Epoch 161/200
Epoch 162/200


Epoch 163/200
Epoch 164/200
Epoch 165/200
Epoch 166/200
Epoch 167/200
Epoch 168/200


Epoch 169/200
Epoch 170/200
Epoch 171/200
Epoch 172/200
Epoch 173/200
Epoch 174/200


Epoch 175/200
Epoch 176/200
Epoch 177/200
Epoch 178/200
Epoch 179/200
Epoch 180/200


Epoch 181/200
Epoch 182/200
Epoch 183/200
Epoch 184/200
Epoch 185/200
Epoch 186/200


Epoch 187/200
Epoch 188/200
Epoch 189/200
Epoch 190/200
Epoch 191/200
Epoch 192/200


Epoch 193/200
Epoch 194/200
Epoch 195/200
Epoch 196/200
Epoch 197/200
Epoch 198/200


Epoch 199/200
Epoch 200/200


<keras.callbacks.callbacks.History at 0x14534f790>

<a id='generating' />

## Generating names

Let's build a inference model to sample the names from trained model.

In [46]:
def one_hot(x, temperature = 0.01):
    x = tf.random.categorical(x/temperature, 1)
    # temperature adjust how diverse the output could be.
    # the higher temperature, the more surprising result
    x = tf.one_hot(indices=x, depth=n_values) 
    return x

# Use below if want argmax for the sampling.
# It will generate the same result every time for the same initial states.

# def one_hot(x):
#     x = K.argmax(x)
#     x = tf.one_hot(indices=x, depth=n_values) 
#     x = RepeatVector(1)(x)
#     return x

def inference_model(LSTM_cell, densor, n_values = n_values, n_a = n_a, Ty = max_len):
    """
    Uses the trained "LSTM_cell" and "densor" from model() to generate a sequence of values.
    
    Arguments:
    LSTM_cell -- the trained "LSTM_cell" from model(), Keras layer object
    densor -- the trained "densor" from model(), Keras layer object
    n_values -- integer, number of unique values
    n_a -- number of units in the LSTM_cell
    Ty -- integer, number of time steps to generate
    
    Returns:
    model -- Keras model instance
    """
    x0 = Input(shape=(1, n_values))
    
    a0 = Input(shape=(n_a,), name='a0')
    c0 = Input(shape=(n_a,), name='c0')
    a = a0
    c = c0
    x = x0

    outputs = []

    for t in range(Ty):
        a, _, c = LSTM_cell(x, initial_state=[a, c])
        out = densor(a)
        outputs.append(out)
        x = Lambda(one_hot)(out)
        
    model = Model(inputs=[x0, a0, c0], outputs=outputs)
       
    return model

In [47]:
output_model = inference_model(LSTM_cell, densor, n_values = n_values, n_a = n_a, Ty = max_len)

In [33]:
# Check the inference model
output_model.summary()

Model: "model_3"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_3 (InputLayer)            (None, 1, 55)        0                                            
__________________________________________________________________________________________________
a0 (InputLayer)                 (None, 64)           0                                            
__________________________________________________________________________________________________
c0 (InputLayer)                 (None, 64)           0                                            
__________________________________________________________________________________________________
lstm_1 (LSTM)                   [(None, 64), (None,  30720       input_3[0][0]                    
                                                                 a0[0][0]                   

                                                                 lambda_105[0][0]                 
                                                                 lstm_1[104][0]                   
                                                                 lstm_1[104][2]                   
                                                                 lambda_106[0][0]                 
                                                                 lstm_1[105][0]                   
                                                                 lstm_1[105][2]                   
                                                                 lambda_107[0][0]                 
                                                                 lstm_1[106][0]                   
                                                                 lstm_1[106][2]                   
                                                                 lambda_108[0][0]                 
          

- Initialize states

In [43]:
x_initializer = np.zeros((1, 1, n_values))
a_initializer = np.zeros((1, n_a))
c_initializer = np.zeros((1, n_a))

In [44]:
def predict_and_sample(output_model, x_initializer = x_initializer, a_initializer = a_initializer, 
                       c_initializer = c_initializer):
    """
    Predicts the next value of values using the inference model.
    
    Arguments:
    output_model -- Keras model instance for inference time
    x_initializer -- numpy array of shape (1, 1, n_value), one-hot vector initializing the values generation
    a_initializer -- numpy array of shape (1, n_a), initializing the hidden state of the LSTM_cell
    c_initializer -- numpy array of shape (1, n_a), initializing the cell state of the LSTM_cel
    
    Returns:
    results -- numpy-array of shape (Ty, n_value), matrix of one-hot vectors representing the values generated
    indices -- numpy-array of shape (Ty, 1), matrix of indices representing the values generated
    """
    pred = output_model.predict([x_initializer, a_initializer, c_initializer])
    indices = np.argmax(pred, axis =-1)
    results = to_categorical(indices)
    
    return results, indices 

<a id='result' />

## Results

Let's get the results.

### - Argmax

In [45]:
# for this cell I used the one_hot function with K.argmax
results, indices = predict_and_sample(output_model, x_initializer, a_initializer, c_initializer)
for i in indices.flatten():
    if ix_to_char[i] == '\n':
        break
    print(ix_to_char[i],end="")

apprest

<font color = 'darkblue'>'Apprest' is the most optimized name of the model. I think it is a good name. It could be a name for a new IT company, for a new game, or for a biotech company.</font>

### - Random sampling with zero initialization

Using the one_hot function with random sampling, let's generate some awesome sample names.

In [48]:
iteration = 20
for _ in range(iteration):
    x_initializer = np.zeros((1, 1, n_values))
    a_initializer = np.zeros((1, n_a))
    c_initializer = np.zeros((1, n_a))
    results, indices = predict_and_sample(output_model, x_initializer, a_initializer, c_initializer)
    for i in indices.flatten():
        if ix_to_char[i] == '\n':
            print()
            break
        print(ix_to_char[i],end="")
#     print("np.argmax(results[12]) =", np.argmax(results[12]))
#     print("np.argmax(results[17]) =", np.argmax(results[17]))
#     print("list(indices[12:18]) =", list(indices[12:18]))

anppreese
apprest
apprrest
apprest
aondersound
aondersound
aickit conder
arese
aeedio
apprrest
aeartice spores
apprest
aeadio
araph
apprest
apprest
apprest
apprest
apprrest
arpserde


About half of them are the same as Argmax result. You can adjust temperature in one_hot function to adjust diversity.

<font color = 'darkblue'>'Aondersound' sounds like a new headphone or some new sound technology. 'Aeedio' sounds creative name. 'Araph' and 'Arese' are also good candidate for a new start-up name.</font>

By the way they all start with 'a'.

In [52]:
ct = 0
for name in SFnames:
    if name[0] == 'a':
        ct+=1
print(ct)
print(1/26)
print(ct/len(SFnames))

137
0.038461538461538464
0.058823529411764705


Slightly more companies starting with 'a' in the training set than it would be in a random selection 1/26.

In [53]:
from collections import defaultdict
ct = defaultdict(int)
for name in SFnames:
    ct[name[0]]+=1

In [60]:
df = pd.DataFrame.from_dict(ct, orient='index', columns = ['count'])

In [64]:
(df/len(SFnames)).sort_values(by='count', ascending = False)

Unnamed: 0,count
s,0.122799
c,0.083298
t,0.066982
p,0.065693
m,0.061829
b,0.060541
a,0.058824
l,0.045084
r,0.039502
g,0.039502


- Surprising that the model's first pick on the first letter is 'a' with zero initialization while 6 other characters appear more than 'a'.

### - Random sampling with non-zero initialization

Let's give some more variation using random initial states.

In [40]:
iteration = 20
for _ in range(iteration):
    x_initializer = np.zeros((1, 1, n_values))
    a_initializer = np.random.randn(1, n_a)*0.5
    c_initializer = np.random.randn(1, n_a)*0.5
#     a_initializer = np.zeros((1, n_a))
#     c_initializer = np.zeros((1, n_a))
    results, indices = predict_and_sample(output_model, x_initializer, a_initializer, c_initializer)
    for i in indices.flatten():
        if ix_to_char[i] == '\n':
            print()
            break
        print(ix_to_char[i],end="")
#     print("np.argmax(results[12]) =", np.argmax(results[12]))
#     print("np.argmax(results[17]) =", np.argmax(results[17]))
#     print("list(indices[12:18]) =", list(indices[12:18]))

ripprene
sporemant
eneestree
topers ate
hippriseal
@7sheappphayerable
adtate
appaaseare
heapprist
️igrom
bollabs
loond 
tretachen mectieng
rivens
trealist
crander
pppaster
harophate
hindersound
ted


<font color = 'darkblue'>We see 'Ted' which is an already existing name. ('Ted' is not in the training set.) 'Hindersound' definitely can be some product name or even a company name, 'Igrom' can fit to any field, and 'Rivens' can be a name of a game. Most of these names sound great for me! </font>