# Finale Aufgabe für Praktikum Deep Learning <br>Textgenerierung mit RNN: Textgenerierung

* **Name:** Fabian Schotte
* **Email:** fabian.schotte@rwu.de
* **Matrikelnummer:** 35604
* **Studiengang:** Angewandte Informatik

## 1. GRU-Modell

In [1]:
import os
os.environ['TF_GPU_ALLOCATOR'] = 'cuda_malloc_async'

import pandas as pd
import numpy as np
import gc
import tensorflow as tf
from tensorflow import keras
from keras import backend as K

# 1. Load your trained model
model = keras.models.load_model('work/models/gru_model_1.keras')

# 2. Rebuild your char‐level vocabulary/lookups from the training CSV
df = pd.read_csv('work/kaggle_sentiment/tweet_sentiment_train.csv',
                 encoding='utf-8', encoding_errors='replace')
# concatenate all tweets into one long string
text = df['text'].str.cat(sep='\n')

# get sorted list of all unique characters in the corpus
vocab = sorted(set(text))
vocab_size = len(vocab) + 1   # +1 for the OOV token that StringLookup will insert

# make lookup layers exactly as in your notebook
ids_from_chars = keras.layers.StringLookup(
    vocabulary=vocab, mask_token=None  # no PAD token
)
chars_from_ids = keras.layers.StringLookup(
    vocabulary=ids_from_chars.get_vocabulary(), 
    invert=True, 
    mask_token=None
)

# helper to turn a string into a vector of IDs
def text_to_ids(s: str):
    # unicode_split → a TF string Tensor of shape (len(s),)
    chars = tf.strings.unicode_split([s], 'UTF-8')
    return ids_from_chars(chars)

# helper to turn a list of IDs back into text
def ids_to_text(ids):
    return tf.strings.reduce_join(chars_from_ids(ids), axis=-1).numpy().astype(str)

# 3. Seed + pad to the fixed window length your model expects
seed = "Hey, "
seq_length = 100   # <— use whatever you trained with

# convert seed to IDs and pad or trim to length
seed_ids = text_to_ids(seed).numpy()[0]   # shape (len(seed),)
seed_ids = seed_ids[-seq_length:]        # keep last seq_length chars
seed_ids = np.expand_dims(seed_ids, 0)    # make batch of 1
seed_ids = keras.preprocessing.sequence.pad_sequences(
    seed_ids, maxlen=seq_length, padding='pre'
)

# 4. Generate one char at a time
generated_ids = []
num_chars = 512
temperature = 1.0

for _ in range(num_chars):
    # predict next‐char distribution
    preds = model.predict(seed_ids, verbose=0)[0, -1, :]  # (vocab_size,)
    # apply temperature
    preds = np.log(preds + 1e-8) / temperature
    preds = np.exp(preds) / np.sum(np.exp(preds))
    # sample
    next_id = np.random.choice(len(preds), p=preds)
    generated_ids.append(next_id)
    # shift window and append
    seed_ids = np.roll(seed_ids, -1, axis=1)
    seed_ids[0, -1] = next_id

# 5. Decode back to a string
gen_text = ids_to_text(np.array([generated_ids]))
print(seed)

for text in gen_text:
    print(text)

del model
K.clear_session()
gc.collect()


2025-06-09 17:45:07.207427: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-06-09 17:45:07.214528: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1749491107.222845  155976 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1749491107.225289  155976 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-06-09 17:45:07.234002: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instr

Hey, 
Just...  Goodness! ( I see no cheeses
Gothangs broke  I have to work at 4am now  #canucks!  I guess perfect sanacious and no where don`t you say hi!!! It would! Too much lol
Think I have sunshine for _yang 'happy T-Laughs economic wtl! I`m CRick in college in the teenale! I wish i was even more watching the outsiders  smelly it says it new someone   i need for absolutly not manicure
WELc like christmas tear, I`m so freaking both of your tweets  I was a wonderful move to my bfast with me. I better get to see


0

## 2. GRU-Modell

In [2]:
import os
os.environ['TF_GPU_ALLOCATOR'] = 'cuda_malloc_async'

import pandas as pd
import numpy as np
import gc
import tensorflow as tf
from tensorflow import keras
from keras import backend as K

# 1. Load your trained model
model = keras.models.load_model('work/models/gru_model_2.keras')

# 2. Rebuild your char‐level vocabulary/lookups from the training CSV
df = pd.read_csv('work/kaggle_sentiment/tweet_sentiment_train.csv',
                 encoding='utf-8', encoding_errors='replace')
# concatenate all tweets into one long string
text = df['text'].str.cat(sep='\n')

# get sorted list of all unique characters in the corpus
vocab = sorted(set(text))
vocab_size = len(vocab) + 1   # +1 for the OOV token that StringLookup will insert

# make lookup layers exactly as in your notebook
ids_from_chars = keras.layers.StringLookup(
    vocabulary=vocab, mask_token=None  # no PAD token
)
chars_from_ids = keras.layers.StringLookup(
    vocabulary=ids_from_chars.get_vocabulary(), 
    invert=True, 
    mask_token=None
)

# helper to turn a string into a vector of IDs
def text_to_ids(s: str):
    # unicode_split → a TF string Tensor of shape (len(s),)
    chars = tf.strings.unicode_split([s], 'UTF-8')
    return ids_from_chars(chars)

# helper to turn a list of IDs back into text
def ids_to_text(ids):
    return tf.strings.reduce_join(chars_from_ids(ids), axis=-1).numpy().astype(str)

# 3. Seed + pad to the fixed window length your model expects
seed = "Hey, "
seq_length = 100   # <— use whatever you trained with

# convert seed to IDs and pad or trim to length
seed_ids = text_to_ids(seed).numpy()[0]   # shape (len(seed),)
seed_ids = seed_ids[-seq_length:]        # keep last seq_length chars
seed_ids = np.expand_dims(seed_ids, 0)    # make batch of 1
seed_ids = keras.preprocessing.sequence.pad_sequences(
    seed_ids, maxlen=seq_length, padding='pre'
)

# 4. Generate one char at a time
generated_ids = []
num_chars = 512
temperature = 1.0

for _ in range(num_chars):
    # predict next‐char distribution
    preds = model.predict(seed_ids, verbose=0)[0, -1, :]  # (vocab_size,)
    # apply temperature
    preds = np.log(preds + 1e-8) / temperature
    preds = np.exp(preds) / np.sum(np.exp(preds))
    # sample
    next_id = np.random.choice(len(preds), p=preds)
    generated_ids.append(next_id)
    # shift window and append
    seed_ids = np.roll(seed_ids, -1, axis=1)
    seed_ids[0, -1] = next_id

# 5. Decode back to a string
gen_text = ids_to_text(np.array([generated_ids]))
print(seed)

for text in gen_text:
    print(text)

del model
K.clear_session()
gc.collect()

Hey, 
overheat nooooo where do u stay awaynt look
should be the non-logies duble
_clorahley Don`t worry you didn`t, I like them
 you are the tweet working
I want to go to work ways. Just too late for breakfast after a little phan`s first stalk tomorrow!!!
 'you`re the only one out! I want to write back then!
 Poor LostMa 4. Ha. tnagged it on the glob on the ensopro
 As  ? http://blip.fm/~5yyt3
It`s gonna be a dear and actione of my face after abit;
I don`t want to be home til since the world
 Thanks, I got some n


0

## LSTM-Modell

In [3]:
import os
os.environ['TF_GPU_ALLOCATOR'] = 'cuda_malloc_async'

import pandas as pd
import numpy as np
import gc
import tensorflow as tf
from tensorflow import keras
from keras import backend as K


# 1. Load your trained LSTM model
model = keras.models.load_model('work/models/lstm_model.keras')

# 2. Rebuild your char‐level vocabulary/lookups exactly as in training
df = pd.read_csv('work/kaggle_sentiment/tweet_sentiment_train.csv',
                 encoding='utf-8', encoding_errors='replace')
text = df['text'].str.cat(sep='\n')
vocab = sorted(set(text))
vocab_size = len(vocab) + 1   # +1 for any OOV token

ids_from_chars = keras.layers.StringLookup(
    vocabulary=vocab, mask_token=None
)
chars_from_ids = keras.layers.StringLookup(
    vocabulary=ids_from_chars.get_vocabulary(),
    invert=True,
    mask_token=None
)

def text_to_ids(s: str):
    chars = tf.strings.unicode_split([s], 'UTF-8')
    return ids_from_chars(chars)

def ids_to_text(ids):
    return tf.strings.reduce_join(chars_from_ids(ids), axis=-1).numpy().astype(str)

# 3. Prepare your seed and pad/trim to the sequence length you trained with
seed = "Hey, this is my LSTM: "
seq_length = 100   # ← must match the seq length you used during training

seed_ids = text_to_ids(seed).numpy()[0]      # shape (len(seed),)
seed_ids = seed_ids[-seq_length:]           # keep the last seq_length tokens
seed_ids = np.expand_dims(seed_ids, 0)       # batch size 1
seed_ids = keras.preprocessing.sequence.pad_sequences(
    seed_ids, maxlen=seq_length, padding='pre'
)

# 4. Sampling loop
generated_ids = []
num_chars = 512
temperature = 1.0

for _ in range(num_chars):
    # model.predict will return logits over the vocab
    logits = model.predict(seed_ids, verbose=0)[0, -1, :]  # shape (vocab_size,)
    # apply temperature
    logits = np.log(logits + 1e-8) / temperature
    probs = np.exp(logits) / np.sum(np.exp(logits))
    # draw one character ID
    next_id = np.random.choice(len(probs), p=probs)
    generated_ids.append(next_id)
    # slide the window one step and append the new ID
    seed_ids = np.roll(seed_ids, -1, axis=1)
    seed_ids[0, -1] = next_id

# 5. Decode and print
gen_text = ids_to_text(np.array([generated_ids]))
print("Seed:", seed)
print("Generated continuation:\n")
for t in gen_text:
    print(t)

del model
K.clear_session()
gc.collect()


Seed: Hey, this is my LSTM: 
Generated continuation:

one thing that should have been edited to take a load of two weeks, and then done!!
LaLaLand... why am i liking that song so much, BE still seriously~
?????? Oh, but my thumb hurts lol  cry excited here! Giviz you is for long and your unstant forget? laughter  its a whole day out low and play  http://plurk.com/p/rpa16
 We were in trouble, I did it make it through the first couple of days then I love and missed them
   .: sorry you don`t forget that: http://www.stpecides.com/ - http://twitpic.com/66uj2 - hah


0