# Twitgen

A basic text generator designed to work with tweets. Bases off of: https://www.tensorflow.org/tutorials/text/text_generation

This is a work in progress. This notebook contains inital efforts to generate tweets from four different users.

https://github.com/taspinar/twitterscraper was used to generate tweet json files.

In [0]:
# For Google Colab
try:
  %tensorflow_version 2.x
except Exception:
  pass
import tensorflow as tf

import numpy as np
import os
import re
import json

## Preparing the training data

In [0]:
with open('trump_tweets_post2017.json') as data:
    raw_tweets = json.load(data)

In [0]:
regex = re.compile(r"(http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+)|(pic\.twitter\.com/(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+)")
extract_tweet = lambda tweet: regex.sub('', tweet['text']).strip()
is_not_empty = lambda s: len(s) > 0
tweet_text = list(filter(is_not_empty, map(extract_tweet, raw_tweets)))

In [0]:
vocab = ['END_TWEET'] + sorted(set(''.join(tweet_text)))
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)

In [0]:
text_as_int = []
for tweet in tweet_text:
    text_as_int.extend([char2idx[c] for c in tweet])
    text_as_int.append(0)
text_as_int = np.array(text_as_int)

In [0]:
# Create training examples / targets
seq_length = 32
examples_per_epoch = len(text_as_int)//(seq_length+1)

char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

In [0]:
def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

dataset = sequences.map(split_input_target)
# Batch size
BATCH_SIZE = 16

# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000

dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

## Building the model:

In [0]:
# Length of the vocabulary in chars
vocab_size = len(vocab)

# The embedding dimension
embedding_dim = 256

# Number of RNN units
rnn_units = 1024

In [0]:
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size,
                              embedding_dim,
                              batch_input_shape=[batch_size, None]),
    tf.keras.layers.GRU(rnn_units,
                        return_sequences=True,
                        stateful=True,
                        recurrent_initializer='glorot_uniform',
                        dropout=0.2),
    tf.keras.layers.Dense(vocab_size)
  ])
  return model

training_model = build_model(
  vocab_size = vocab_size,
  embedding_dim=embedding_dim,
  rnn_units=rnn_units,
  batch_size=BATCH_SIZE)

In [0]:
def loss(labels, logits):
  return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

training_model.compile(optimizer='adam', loss=loss)

In [0]:
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

In [12]:
EPOCHS=40
history = training_model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])

Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40


In [0]:
weights = training_model.get_weights()
predict_model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)
predict_model.set_weights(weights)

## Generating Text

In [0]:
def generate_text(model, idx2char, temperature=1):
  # Number of characters to generate
  num_generate = 280

  # Converting our start string to numbers (vectorizing)
  input_eval = tf.expand_dims([0], 0)

  # Empty string to store our results
  text_generated = []

  model.reset_states()
  n = 0
  for i in range(num_generate):
      predictions = model(input_eval)
      # remove the batch dimension
      predictions = tf.squeeze(predictions, 0)

      # using a categorical distribution to predict the word returned by the model
      predictions = predictions / (0.5**n * temperature)
      predicted_id = tf.random.categorical(predictions, num_samples=1)[0,0].numpy()

      if predicted_id == 0:
        return ''.join(text_generated)

      input_eval = tf.expand_dims([predicted_id], 0)

      text_generated.append(idx2char[predicted_id])
      n = 0 if idx2char[predicted_id] == ' ' else n + 1

  return ''.join(text_generated)

#### @realdonaldtrump

In [15]:
for i in range(10):
  print(generate_text(predict_model, idx2char, temperature=1.2), '\n')

White House, with the United States will not allow other won the amazing In many ways this is the successful model we will make a BIG difference! 

Thank you to everyone at @reat to protect y from Tuesday. A false narrative that she was great day in Puerto Rico failed. I won't fail. 

White House, Virginia against the Impeachment Proceeding, who should never ending Witch Hunt, look locked down, no ran RobiDed What’s going on? 

Will be heading he The lleting so many years and all sorts to mention that the candidate Impeachment Proceeding, and have your back. We will ALWAYS be wasted! 

Italy, @GiuseppeConteIT, a really great guy who can fight for Healthcare but not for Friday! 

Thank you to PERDON mys for the obviously needed Wall (they overrode recommendations of Border Patrol experts), but they don’t even want to take muderers into custody! What’s going on? 

My Administration is nowhich abuts and is part of the United States Coasting, when it recommendations of law. Great reviews f

In [0]:
predict_model.save('trump.h5')

In [0]:
np.save('trump_idx2char.npy', idx2char)

## Loading Models based on other users

#### @arianagrande

In [0]:
ariana_model = tf.keras.models.load_model('arianagrande.h5', compile=False)
idx2char = np.load('arianagrande_idx2char.npy')

In [19]:
for i in range(10):
  print(generate_text(ariana_model, idx2char, temperature=1.4), '\n')

Don't wait you’ll ever hear a live album one  for everything. 

five doys til dwt? 

ᶦman Tour 

♡ I LOVE U SO MUCH … 

five days til sweetener preorder and tlic ♡ 

yooooo 

she LOVES it. … 

#mandest, funniest, brightest light. i can’t wait to spend more time togetha. … 

25%  til dwt? more. i love you. 

) ♡  … 



#### @BBCWorld

In [0]:
bbc_model = tf.keras.models.load_model('bbc.h5', compile=False)
idx2char = np.load('bbc_idx2char.npy')

In [21]:
for i in range(10):
  print(generate_text(bbc_model, idx2char, temperature=1.4), '\n')

At least 11 killed as fast-spreames Puiwdemont   #10Oct 

Trump aide Kellyanne Conway say "I where people attempted a cuments from buses

We are so grateful country!"

Thousands of families, tourists and merchants are queuing every day to return for Robbie Butina deported from Abries $2018: Is to 'The day I was diagnosed was the sea — w 

Steenhuisen to head South Africa's cash-in-trafficking trial starts in French kitchen deal diegnts attempt an ,uetallegat rules to 'restore tranquillity' 

Trump protests: LGBTQ rally in New York 

Catalonia: Spain head hours @

[tap to expand] 

Katie Hill: House ban not backing down after decree repeal 

Zara advert gets China asking: Are freckles beautiful? 

Zara out a living in Haiti's what our son Noah chat park day to return to Syria

[tap to most prolific figures, Karina election: Polls open as voters choose between Cuntay marked the 11th day of protests in the country, where people attempted a human camera 

Trump protests: LGBTQ rally in New

#### @Wendys

In [0]:
wendys_model = tf.keras.models.load_model('wendys.h5', compile=False)
idx2char = np.load('wendys_idx2char.npy')

In [23]:
for i in range(10):
  print(generate_text(wendys_model, idx2char, temperature=1.4), '\n')

That's not okay. Thanks! 

Qunchise ry your number, and we'll make it up to you. 

Jason is the best! M us with info on this location, along with your number, and we'll make it up to you. 

Just as up to you. 

IThat's not okay! Please DM us the restaurant location and your phone # so we can make this right. 

That's not okay. Please DM us with info on this location, along with your phone number, and we'll make it up to you. 

We're disappointed your phone # so we can make this up to you. 

Our Breakfast is delicious and names it after er. 

Wwwww like to hear that! Please DM us your email address so we for maing address so we can improve. Thank you! 

It maybe not eating. 

