## Defining the problem

**What is Writer’s Block?**
- Defined as “the inability to begin or continue writing for reasons other than a lack of basic skill or commitment.”
- Personal Experience: Can be very frustrating, caused me sleepless nights when I was affected by it. Can also happen to artists.
- Put plainly, it is a sudden lack of motivation which results in writers being unable to continue their work.

**Storyboarding, and how writer’s block has a negative effect:**
- Crucial step where creators plan the visual flow of the story, depicting scenes and panel layouts.
- Storyboarding is the bridge between a written script and the final artwork.
- Writer's block can extend to storyboarding, where creators struggle to visualize the scenes and transitions

**Writer’s Block is prevalent in the manga industry:**
- Berserk: Kentaro Miura was known for taking hiatuses due to creative challenges leading to long waits.
- Evangelion: Yoshiyuki Sadamoto took multiple hiatuses which were attributed to creative challenges.
- Hunter X Hunter: Yoshihiro Togashi, experienced writer's block, leading to year-long hiatuses.

**Creators’ relentless workload is worsening the issue:**
- There are many manga creators who end up suffering from their work
- Creators work long hours affecting physical or mental health or suffer direct abuse from the system.
- Leads to a loss of motivation, inability to produce new ideas, and worsens the problem of writer’s block.

**How is AI helping with writer's block?**
- ChatGPT is helping to write next instalment of manga hit One Piece as author runs into writer’s block.
- “Cyberpunk: Peach John” is the world’s first complete AI manga work.

We thus arrive at the following problem statement:

## Problem Statement
“Can we build AI Models that address the problem of writer’s block in the manga industry by helping with ideation and storyboarding?”

## Contents:
- [Imports](#Imports)
- [Data Preprocessing](#Data-Preprocessing)
- [Model Tuning and Training](#Model-Tuning-And-Training)
- [Saving the model as a tokenizer](#Saving-the-model-as-a-toeknizer)
- [Text Generation from the Trained Model](Using-the-trained-model-to-generate-text)
- [Evaluation](#Evaluation)
  - [BLEU](#BLEU)
  - [Rouge](#Rouge)

## Dataset Used
- filtered_manga_data.csv

## Imports

In [1]:
import pandas as pd
from transformers import GPT2Tokenizer, GPT2LMHeadModel, Trainer, TextDataset, DataCollatorForLanguageModeling, TrainingArguments
import nltk
from nltk.tokenize import sent_tokenize
import re
from sacrebleu import corpus_bleu
from rouge_score import rouge_scorer

In this block, the necessary libraries and modules are imported. Let's break down the imports and their purposes:

- pandas is imported for data manipulation, allowing us to work with tabular data in a structured way.
- GPT2Tokenizer and GPT2LMHeadModel are imported from the Transformers library. These are key components for working with the GPT-2 model. The tokenizer is used to preprocess text data, and the LMHeadModel is the core GPT-2 language model.
- nltk is imported, which is a natural language processing library.
- From nltk.tokenize, the sent_tokenize function is imported, which is used for splitting text into sentences.
- re is imported for regular expressions, which can be helpful for text cleaning and pattern matching.
- sacrebleu and Rouge are imported to enable the calculation of BLEU and ROUGE scores, which are metrics for evaluating the quality of generated text.

### Data Preprocessing

In [2]:
# Read the manga data CSV file
df = pd.read_csv('../data/filtered_manga_data.csv')

# Extract the description and genre columns
descriptions = df['Description']
genres = df['Primary Genre']

# Combine descriptions and genres to create a list of text samples
text_samples = [f"A {genre} manga: {description}" for description, genre in zip(descriptions, genres)]

In this block, we start working with the manga dataset. Here's what each part of this block does:

- pd.read_csv('filtered_manga_data.csv') reads the manga data from a CSV file named 'filtered_manga_data.csv' into a DataFrame named df. The DataFrame represents the tabular data with rows and columns.
- df['Description'] and df['Primary Genre'] extract the 'Description' and 'Primary Genre' columns from the DataFrame, respectively. These columns contain the descriptions and genres of manga.
- The text_samples list is created by combining the descriptions and genres using a list comprehension. Each entry in text_samples is a string that represents a manga description and its associated genre.

In [3]:
# Initialize the GPT-2 tokenizer and model
model_name = "gpt2"  # You can choose a specific GPT-2 variant
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)

# Create a text file to store the text samples
with open('text_samples.txt', 'w', encoding='utf-8') as file:
    for sample in text_samples:
        file.write(sample + '\n')

This block initializes the GPT-2 model and tokenizer. Here's the breakdown:

- model_name = "gpt2" specifies the GPT-2 model variant we will use. In this case, it's set to the base GPT-2 model.
- GPT2Tokenizer.from_pretrained(model_name) initializes the tokenizer with the chosen model name.
- GPT2LMHeadModel.from_pretrained(model_name) initializes the GPT-2 language model with the same model name.
- A text file is created to store the text_samples using UTF-8 encoding. The content of text_samples is written to this file, with each sample separated by a newline character.

### Model Tuning and Training

In [4]:
# Tokenize the text samples and create a dataset from the file
train_dataset = TextDataset(
    tokenizer=tokenizer,
    file_path='text_samples.txt',  # Provide the path to the text file
    block_size=128  # Adjust the block size as needed
)

# Data collator for language modeling
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False
)

# Training arguments
training_args = TrainingArguments(
    output_dir="./fine-tuned-model",  # Specify the output directory
    overwrite_output_dir=True,
    num_train_epochs=3,  # Adjust the number of epochs
    per_device_train_batch_size=8,  # Adjust batch size
    save_steps=10_000,
    save_total_limit=2,
    evaluation_strategy="steps",
    eval_steps=10_000,
)

# Initialize the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset,
)

# Fine-tune the model
trainer.train()


* 'schema_extra' has been renamed to 'json_schema_extra'


Step,Training Loss,Validation Loss


TrainOutput(global_step=1872, training_loss=3.116937555818476, metrics={'train_runtime': 14526.3845, 'train_samples_per_second': 1.03, 'train_steps_per_second': 0.129, 'total_flos': 976905584640000.0, 'train_loss': 3.116937555818476, 'epoch': 3.0})

This block tokenizes, fine tunes, and trains the model. Here is an explaination of the process:
- train_dataset is being defined and initialized as a TextDataset. This dataset is essential for fine-tuning the GPT-2 model.- 
- tokenizer=tokenizer specifies the tokenizer we previously initialized. It will be used to tokenize the text samples.
- file_path='text_samples.txt' specifies the path to the text file that contains our text samples. 
- block_size=128 is an important parameter. It defines the maximum sequence length for the tokenized text. Text samples that exceed this length will be split into multiple chunks. For instance, if set to 128, it means the tokenized sequences will be no longer than 128 tokens.

The next part of the block sets up the data collator for language modeling:
- data_collator is defined as a DataCollatorForLanguageModeling. It is responsible for formatting the tokenized data into batches suitable for training the model.
- tokenizer=tokenizer links this data collator to the tokenizer used earlier.
- mlm=False specifies that Masked Language Modeling (MLM) is not being used in this case. MLM is a technique where certain tokens are replaced with [MASK] tokens, and the model is trained to predict the original tokens. Since we're fine-tuning for text generation and not MLM, mlm is set to False.

The next part of the block focuses on setting up training arguments:
- output_dir specifies the directory where the fine-tuned model and associated files will be saved. In this case, it's set to "./fine-tuned-model" 
- overwrite_output_dir=True indicates that if the specified output directory already exists, it should be overwritten. Be cautious with this option, as it will replace any existing data in the output directory.
- num_train_epochs=3 determines the number of training epochs. An epoch is a complete pass through the training data.
- per_device_train_batch_size=8 sets the batch size for training. A batch is a set of data samples used in a single forward and backward pass during training.
- save_steps=10_000 indicates that model checkpoints will be saved every 10,000 steps during training. A checkpoint is a saved version of the model that allows us to resume training or evaluate the model's performance.
- save_total_limit=2 specifies the maximum number of checkpoints to keep. When this limit is reached, the oldest checkpoints will be deleted.
- evaluation_strategy="steps" defines the evaluation strategy. It means that the model will be evaluated at specified intervals defined by eval_steps.
- eval_steps=10_000 specifies that evaluation will occur every 10,000 training steps. During evaluation, the model's performance metrics can be calculated.

Finally, we set up the Trainer to start fine-tuning the model:
- Trainer is initialized, connecting the model, training arguments, data collator, and training dataset.
- trainer.train() starts the fine-tuning process. The model will go through the training data for the specified number of epochs, adjusting its weights and learning to generate text based on the provided samples.

### Saving the model and the tokenizer

In [9]:
# Save the tokenizer
tokenizer.save_pretrained('./saved_fine_tuned_model')
# Save the fine-tuned model
trainer.save_model("./saved_fine_tuned_model")

('./saved_fine_tuned_model\\tokenizer_config.json',
 './saved_fine_tuned_model\\special_tokens_map.json',
 './saved_fine_tuned_model\\vocab.json',
 './saved_fine_tuned_model\\merges.txt',
 './saved_fine_tuned_model\\added_tokens.json')

### Using the trained model to generate text

In [5]:
# Load the fine-tuned model
model = GPT2LMHeadModel.from_pretrained("./saved_fine_tuned_model")
tokenizer = GPT2Tokenizer.from_pretrained("./saved_fine_tuned_model")

# User input genre
user_genre = "Drama"

# Generate a single description based on the user input genre
# Set max_length to control the length of the generated text
input_text = f"A {user_genre} manga:"
input_ids = tokenizer.encode(input_text, return_tensors="pt")

# Set attention_mask and pad_token_id
attention_mask = input_ids.clone()
attention_mask[attention_mask != tokenizer.pad_token_id] = 1 # Set non-pad tokens to 1
pad_token_id = tokenizer.eos_token_id

# Generate text using the model
output = model.generate(
  input_ids=input_ids,
  attention_mask=attention_mask,
  max_length=100, # Adjust max_length as needed
  num_return_sequences=1,
  no_repeat_ngram_size=2,
  pad_token_id=pad_token_id,
)

# Decode and print the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

A Drama manga: The story of the "Great War" between the two countries begins with the discovery of a mysterious "ghost" in the clouds. The ghost is a young boy named Kyouko, who is said to be the strongest in all of Japan. He is also the son of an aristocrat and a noblewoman.

Kyouka is the only child of Kyohei, a wealthy family in Japan, and his father, the head of his family, is an important


This block focuses on generating text with the fine-tuned model and presenting it to the user. Here's a detailed breakdown:

- The fine-tuned model and tokenizer are loaded from the saved directory using the from_pretrained method.
- A user-defined genre, such as "Romance," is stored in the variable user_genre.
- We generate a description based on the user input genre. To do this, we construct an initial text using the user's genre and create input IDs for the tokenizer.
- The model.generate() function generates text based on the input, controlling aspects like text length and repetition. The generated text is stored in the generated_text variable.
- Finally, the generated text is decoded and printed for the user to read.

It is observed the model does a good job of generating a description for the 'Romance' and 'Drama' genres, but it gets confused for the 'Action' and 'Comedy' genres. Moreover, the model generates about 3 coherent sentences for each genre, so the generated text cannot be used as is. There is some further cleaning required.

In [6]:
# Define a regular expression pattern to remove unwanted content
pattern = r'\(.*?\)'  # Matches anything within parentheses

# Remove unwanted patterns from the generated text
cleaned_text = re.sub(pattern, '', generated_text)

# Tokenize the cleaned text into sentences
sentences = sent_tokenize(cleaned_text)

# Keep only the first three sentences
cleaned_sentences = sentences[:3]

# Reconstruct the cleaned text from the first three sentences
cleaned_text = " ".join(cleaned_sentences)

print(cleaned_text)

A Drama manga: The story of the "Great War" between the two countries begins with the discovery of a mysterious "ghost" in the clouds. The ghost is a young boy named Kyouko, who is said to be the strongest in all of Japan. He is also the son of an aristocrat and a noblewoman.


In this block, a regular expression pattern is defined to remove unwanted content from the generated text. It removes text within parentheses. The cleaned text is then tokenized into sentences, and only the first three sentences are kept. The final cleaned text is reconstructed and printed.
- A regular expression pattern, r'\(.*?\)', is defined. This pattern matches anything within parentheses and is used for removing unwanted content.
- The re.sub() function is used to remove patterns that match the regular expression from the generated text, resulting in cleaned_text.
- The sent_tokenize() function is used to split the cleaned text into sentences, which are stored in the sentences list.
- To keep the text concise, only the first three sentences are retained in the cleaned_sentences list.
- The final cleaned_text is reconstructed by joining these sentences with spaces and is printed.

### Evaluation
For text generation models like GPT-2, numerical metrics like BLEU, ROUGE, and others can provide some automated assessment of the generated text. However, they have limitations and may not always capture the true quality or relevance of the generated content accurately. These metrics are based on n-grams, word overlaps, and other statistical measures and may not fully account for coherence, contextuality, or meaningfulness.

Subjective human evaluation, where human judges read and interpret the generated text, can provide a more comprehensive assessment of text quality, relevance, and fluency. Humans can better judge aspects like naturalness, context appropriateness, and overall coherence. This kind of evaluation is often considered the gold standard for assessing the quality of generated text.

That being said, we still need to conduct some basic evaluations for the model. These should not be taken too seriously, as ultimately, text coherence is the best indicator of model performance.

To evaluate the model, a reference.txt file was created. This file contains 4 new descriptions for the model on the trained genres. To get a more comprehensive evaluation, the size of the reference data can be increased.

### Defining the Metrics
- **ROUGE (Recall-Oriented Understudy for Gisting Evaluation):** ROUGE is a set of metrics for the automatic evaluation of machine-generated text, primarily used in the field of natural language processing. It measures the similarity between the generated text and reference text(s). ROUGE includes various metrics like ROUGE-N (measuring n-gram overlap), ROUGE-L (measuring the longest common subsequence), ROUGE-W (measuring weighted word overlap), and more. ROUGE is often used in tasks such as machine translation, text summarization, and text generation to assess the quality of the generated text concerning reference text(s).
- **BLEU (Bilingual Evaluation Understudy):** BLEU is another metric used to evaluate the quality of machine-generated text, particularly in the context of machine translation. BLEU measures the overlap in n-grams (contiguous sequences of n words) between the generated text and reference text(s). It is designed to provide a simple and automated way to assess the quality of machine translation output. BLEU scores are typically computed for multiple reference translations to capture variability in human-generated translations.

### BLEU

In [7]:
# Load the reference texts
reference_texts = []
# Read the text samples from the file with 'utf-8' encoding
with open("reference.txt", "r", encoding='utf-8') as f:
    for line in f:
        reference_texts.append(line.strip())

# Generate texts
generated_texts = []
for genre in ["Romance", "Action", "Drama", "Comedy"]:
    input_text = f"A {genre} manga:"
    input_ids = tokenizer.encode(input_text, return_tensors="pt")
    attention_mask = input_ids.clone()
    attention_mask[attention_mask != tokenizer.pad_token_id] = 1
    pad_token_id = tokenizer.eos_token_id

    output = model.generate(
        input_ids=input_ids,
        attention_mask=attention_mask,
        max_length=100,
        num_return_sequences=1,
        no_repeat_ngram_size=2,
        pad_token_id=pad_token_id,
    )

    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
    generated_texts.append(generated_text)

# Evaluate metrics
bleu_scores = corpus_bleu(reference_texts, generated_texts)  # Import the corpus_bleu function from sacrebleu

print("BLEU scores:", bleu_scores)

BLEU scores: BLEU = 0.17 1.3/0.2/0.1/0.0 (BP = 1.000 ratio = 50.167 hyp_len = 301 ref_len = 6)


### Interpretation
The BLEU (Bilingual Evaluation Understudy) score is shown in the following format:

BLEU = 0.17: This is the overall BLEU score, which is a value between 0 and 1. In this case, the BLEU score is 0.17, indicating the quality of the generated text. A higher BLEU score generally suggests better quality and relevance of the generated text compared to reference data.

1.3/0.2/0.1/0.0: These are the individual n-gram precision scores, where each number corresponds to a different n-gram size (unigram, bigram, trigram, and so on). They represent how well the generated text matches the reference data for different n-gram sizes. In this case, the individual n-gram scores are 1.3 for unigrams, 0.2 for bigrams, 0.1 for trigrams, and 0.0 for higher-order n-grams. These scores indicate how well the generated text matches the reference text for each n-gram size.

(BP = 1.000 ratio = 50.167 hyp_len = 301 ref_len = 6): These are additional statistics related to BLEU scoring:

BP (Brevity Penalty) = 1.000: The Brevity Penalty is a value that penalizes generated text that is shorter than the reference text. A BP value of 1.000 suggests that the generated text length is similar to the reference text length.

Ratio = 50.167: The ratio of the length of the generated text (hyp_len) to the length of the reference text (ref_len). In this case, the generated text is approximately 50.167 times longer than the reference text.

hyp_len = 301: The length of the generated text in terms of tokens or words.

ref_len = 6: The length of the reference text in terms of tokens or words.

In summary, a BLEU score of 0.17 with individual n-gram scores ranging from 0.0 to 1.3 suggests that the generated text matches the reference text to some extent but may not be a very close match, especially for higher-order n-grams. The Brevity Penalty (BP) of 1.000 indicates that the length of the generated text is not significantly different from the reference text length.

### Rouge

In [8]:
# Calculate ROUGE scores for each pair
scorer = rouge_scorer.RougeScorer(['rougeL'], use_stemmer=True)
rouge_scores = []
for gen_text, ref_text in zip(generated_texts, reference_texts):
    scores = scorer.score(gen_text, ref_text)
    rouge_scores.append(scores)

print("ROUGE scores:")
for idx, scores in enumerate(rouge_scores):
    print(f"Pair {idx+1}: {scores}")

ROUGE scores:
Pair 1: {'rougeL': Score(precision=0.25925925925925924, recall=0.1728395061728395, fmeasure=0.2074074074074074)}
Pair 2: {'rougeL': Score(precision=0, recall=0, fmeasure=0)}
Pair 3: {'rougeL': Score(precision=0.15584415584415584, recall=0.15584415584415584, fmeasure=0.15584415584415584)}
Pair 4: {'rougeL': Score(precision=0, recall=0, fmeasure=0)}


### Interpretation
RougeL scores measure the F1 score of the precision and recall of a generated text when compared to a reference text. The closer the F1 score is to 1, the better the generated text matches the reference text.

In our results, the ROUGEL scores for the first and third pairs are relatively high, indicating that the generated texts match the reference texts well. The F1 scores for these pairs are 0.207 and 0.155, respectively.

However, the ROUGEL score for the second and fourth pair are very low, with an F1 score of 0. The precision and recall for these pairs are both 0, indicating that the generated text does not match the reference text at all. This suggests that the model may have had difficulty generating a coherent text for these particular genres.

Overall, the ROUGE scores suggest that the model is able to generate texts that are somewhat similar to the reference texts, but there is still room for improvement. The model may need to be trained on more data or be fine-tuned to better match the style of the reference texts.

## Conclusion

In this part, we trained a GPT 2 model to generate manga descriptions based on user inputted genre, thus automating scenario and idea generation. The model can be improved by tuning or using a different base (such as Facebook BART), but due to time constraints, these methods are not explored here. This model works as a proof of concept, and generates 3 coherent sentences of manga descriptions.

This marks the end of this portion. Now we will try and build an image generator to help automate the storyboarding process.