# Jedha Fullstack final project
# Title: Interactive Story teller "Horror theme"

This notebook is intended to fine tune an NLP model

If you're opening this Notebook on colab, you will probably need to install 🤗 Transformers and 🤗 Datasets. Uncomment the following cell and run it.

Work plan:
1. Preparations

# 1) Preparations

#### 1.a) Install the transformers library

First we need to install the transformers library which is a library produced by Hugging-face. It provides APIs and tools to easily download and train pretrained models

In [39]:
! pip install datasets transformers





#### 1.b) Enable model sharing 

To be able to share our model we need to upload it to huggingFace hub "similar to Github". 

This gives the chance for the model to be used by any application. 

To be able to do so, the  The following 2 instructions are guidelines as per Hugging-face recommendations 

##### Hugging face guidelines:


Hugging Face note(1): If you're opening this notebook locally, make sure your environment has an install from the last version of those libraries.

Hugging Face note(2): To be able to share your model with the community and generate results like the one shown in the picture below via the inference API, there are a few more steps to follow.

Hugging Face guideline: First you have to store your authentication token from the Hugging Face website (sign up [here](https://huggingface.co/join) if you haven't already!) then execute the following cell and input your username and password: italicized text

Makesure , when you obtain your authentication token using the link above , to have the "write" permission checked , otherwise you'll face problems in uploading your model, like I did in the first time

Configure your account credentials:

In [40]:
!git config --global user.email "serdarcekinmez@gmail.com"
!git config --global user.name "serdarcekinmez"

When you execute the cell below you will receive a prompt to enter your username and password to log in ,
and the token that will enable you to upload your model. if not no worries it will activated automatically

In [41]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

Then you need to install Git-LFS. Uncomment the following instructions:

In [42]:
#!apt install git-lfs

Make sure your version of Transformers is at least 4.11.0 since the functionality was introduced in that version:

In [43]:
import transformers

print(transformers.__version__)

4.27.4


HuggingFace note: You can find a script version of this notebook to fine-tune your model in a distributed fashion using multiple GPUs or TPUs [here](https://github.com/huggingface/transformers/tree/master/examples/language-modeling).

# 2) Preparing the Dataset

Mounting the google drive to access the file containing our Horror story dataset

In [44]:
#from google.colab import drive
#drive.mount('/content/drive')

In [45]:
# importing load_datasets

from datasets import load_dataset
import os
from sklearn.model_selection import train_test_split
from datasets import load_dataset



In [46]:
# Hereby is the method:
#datasets = load_dataset("text", data_files={"train": path_to_train.txt, "validation": path_to_validation.txt}

# using the horror stories dataset which we exctracted from creepypastas dataset
# using the "Extract-horrotstory" notebook:

#datasets = load_dataset("text", data_files={"train": "/content/drive/MyDrive/DataScience/Train/HorrorC_train.txt", "validation": "/content/drive/MyDrive/DataScience/Test/HorrorC_test.txt"})

# the following link was used to fine tunuing using draclua story dataset
#datasets = load_dataset("text", data_files={"train": "/content/drive/MyDrive/DataScience/Train/Dracula_train.txt", "validation": "/content/drive/MyDrive/DataScience/Test/Dracula_test.txt"})
#Below code adapted: 


train_file_path = "HorrorC_train.txt"
validation_file_path = "HorrorC_test.txt"

datasets = load_dataset("text", data_files={"train": train_file_path, "validation": validation_file_path})




Found cached dataset text (C:/Users/serda/.cache/huggingface/datasets/text/default-22636d51d4b5f11a/0.0.0/cb1e9bd71a82ad27976be3b12b407850fe2837d80c22c5e03a28949843a8ace2)


  0%|          | 0/2 [00:00<?, ?it/s]

we can also load datasets from a csv or a JSON file, see the [full documentation](https://huggingface.co/docs/datasets/loading_datasets.html#from-local-files) for more information.

To access an actual element, you need to select a split first, then give an index:

In [47]:
datasets["train"][:50]

{'text': ['Six feet and 630 muscular pounds was now average adult size, so at six feet nine inches and 790 pounds with his extra-muscular symbiote J was now considerably taller than average and one of the heaviest and strongest ones. His powerful symbiote had really muscled out for some reason and gained the additional 40 pounds when it formed the extra arms. Everyone had the extra arms regardless of his or her muscular new size, though. J had been richly rewarded for finding a missing diamond wedding ring and returning it to the couple it belonged to three days after being infected and “suited up” by his symbiote and gave most of his reward money to a charity that helped Christian refugees escape unspeakable and horrific persecution in war-torn countries. Many of these people had been rescued with this donation and J’s feeder insect business was thriving. J was able to use the money he earned from selling feeder insects to go diving for tasty lionfish and other prized underwater game.

To get a sense of what the data looks like, the following function will show some examples picked randomly in the dataset.

In [48]:
from datasets import ClassLabel
import random
import pandas as pd
from IPython.display import display, HTML

def show_random_elements(dataset, num_examples=10):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)
    
    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

In [49]:
show_random_elements(datasets["train"])

Unnamed: 0,text
0,“I understand.”
1,"I’ve never posted like this before. But I suppose I’ve never needed to. If you’ve read the title, you know what to expect, and you can move on if you’d like to avoid the topic. I’ll understand. Grief is a funny thing. Professor Farina taught me that in the first class I ever took for my undergrad, and I never understood it until now."
2,"Howard was left everything, the house, the car and a sizeable chunk of money from life insurance policies. These gains did nothing to ease his loss. He felt like his life was rapidly being drawn downwards back in to the minus column. Having nothing to lose, except for rejection, he plucked up the courage to ask Brenda to be his wife. This time, to his overwhelming joy, she accepted."
3,"“How did you know I was awake?” Her voice was dry and cracked and, before she could ask for water, he had positioned the flight suit’s drinking tube into her mouth. The water tasted fine even though she couldn’t help but think about the fact that a portion of it was her own reconstituted urine, much as she did every time she drank. The flight suits were equipped with ten gallons of water and an advanced filtration system for the astronaut’s urine which would stretch the ten gallons into the fifty or sixty that they would need to stay hydrated on the trip to Olympus."
4,"Julian: Oh, yes, Sheriff Peterson, our town’s symbol of pea-"
5,
6,"The tattooed neighbor was visibly taken aback when he approached. He stared at the wolf-man-thing a moment and they seemed to be talking. I couldn’t hear what was being said, but the tattooed neighbor was growing more agitated as the conversation went on. He threw an arm out to point back at his house and it was clear he was yelling at that point. Perhaps he was also sleep-deprived and making bad choices as a result, because I wasn’t sure I’d have yelled at something clearly so inhuman."
7,"The disturbance came from downstairs, and from my room it sounded little more than the scratching of a cat on the door. I darted my eyes across the room, hoping, praying that Half-and-half had exited and was the cause for the scraping. With reluctant realization, I noticed that Half-and-half had not moved a single muscle and was yet still on the edge of the widow, his milky fur scrambling in the draft of the air conditioning vent that sat at his feet. The moon was bulbous and massive that night, making him appear very small. His ears perked though, with every successive scrape against the front door. Tears pooled up into my eyes, obscuring my vision, and I jolted out of the room, into the hallway, down the stairs, and to the door of my parent’s. I could hear clearly the raspy grazing against the front door as I called out to my parents for the third time that night. The scoring got louder. “Mom, Dad, please there is something trying to come in the house!” I whimpered, tapping on the bedroom door."
8,Dawn had begun to lighten the sky by now and Luke could see the man in all too vivid detail as he turned and looked towards Luke. The man smiled an unnervingly broad smile that looked to almost unhinge his jaw as his eyes widened and dilated like an animal glimpsing wounded prey.
9,"“Look, dad! You can get your fortune told!” Brian shouted, pointing at the tent."


# 3) Causal Language modeling

For causal language modeling (CLM) we are going to take all the texts in our dataset and concatenate them after they are tokenized. Then we will split them in examples of a certain sequence length. This way the model will receive chunks of contiguous text that may look like:
```
part of text 1
```
or 
```
end of text 1 [BOS_TOKEN] beginning of text 2
```
depending on whether they span over several of the original texts in the dataset or not. The labels will be the same as the inputs, shifted to the left.

We will use the [`distilgpt2`](https://huggingface.co/distilgpt2) model for this example. You can pick any of the checkpoints listed [here](https://huggingface.co/models?filter=causal-lm) instead:

## 3.a Choose the model to fune-tune

We used the Facebook model named opt-350m

OPT stands for : Open Pre-trained Transformer Language Model

**Why did we chose it?**

We chose it for the following three reaons , 
1. It is used for text generation based on prompts
2. It was introduced after the GPT-3 model "*on May 2022*"
3. When testing it it showed very good results



In [50]:
model_checkpoint = "facebook/opt-350m"

## 3.b) Tokenizing

**Why Tokenizing?** 

Because in order to deal with texts we need to transfer strings into numbers, the result is a sequnce of integers

**To** tokenize all our texts with the same vocabulary that was used when training the model, we have to download a pretrained tokenizer. This is all done by the `AutoTokenizer` class:

In [51]:
from transformers import AutoTokenizer
    
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True) 
# model_Checkpoint here refers to the facebook opt-350m model
def tokenize_function(examples):
    return tokenizer(examples["text"])
tokenized_datasets = datasets.map(tokenize_function, batched=True, num_proc=4, remove_columns=["text"])

Map (num_proc=4):   0%|          | 0/4300 [00:00<?, ? examples/s]

NameError: name 'tokenizer' is not defined

In [None]:
from transformers import AutoTokenizer

model_checkpoint = 'facebook/bart-large'  # replace with your actual model checkpoint

def tokenize_function(examples):
    tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True)
    return tokenizer(examples["text"])
tokenized_datasets = datasets.map(tokenize_function, batched=True, num_proc=4, remove_columns=["text"])

We can now call the tokenizer on all our texts. This is very simple, using the [`map`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.map) method from the Datasets library. First we define a function that call the tokenizer on our texts:

Then we apply it to all the splits in our `datasets` object, using `batched=True` and 4 processes to speed up the preprocessing. We won't need the `text` column afterward, so we discard it.

In [37]:
tokenized_datasets = datasets.map(tokenize_function, batched=True, num_proc=4, remove_columns=["text"])

Map (num_proc=4):   0%|          | 0/4300 [00:00<?, ? examples/s]

NameError: name 'AutoTokenizer' is not defined

If we now look at an element of our datasets, we will see the text have been replaced by the `input_ids` the model will need:

In [None]:
tokenized_datasets["train"][1]

Now for the harder part: we need to concatenate all our texts together then split the result in small chunks of a certain `block_size`. To do this, we will use the `map` method again, with the option `batched=True`. This option actually lets us change the number of examples in the datasets by returning a different number of examples than we got. This way, we can create our new samples from a batch of examples.

First, we grab the maximum length our model was pretrained with. This might be a big too big to fit in your GPU RAM, so here we take a bit less at just 128.

In [None]:
# block_size = tokenizer.model_max_length
block_size = 128

Then we write the preprocessing function that will group our texts:

In [None]:
def group_texts(examples):
    # Concatenate all texts.
    concatenated_examples = {k: sum(examples[k], []) for k in examples.keys()}
    total_length = len(concatenated_examples[list(examples.keys())[0]])
    # We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
        # customize this part to your needs.
    total_length = (total_length // block_size) * block_size
    # Split by chunks of max_len.
    result = {
        k: [t[i : i + block_size] for i in range(0, total_length, block_size)]
        for k, t in concatenated_examples.items()
    }
    result["labels"] = result["input_ids"].copy()
    return result

First note that we duplicate the inputs for our labels. This is because the model of the 🤗 Transformers library apply the shifting to the right, so we don't need to do it manually.

Also note that by default, the `map` method will send a batch of 1,000 examples to be treated by the preprocessing function. So here, we will drop the remainder to make the concatenated tokenized texts a multiple of `block_size` every 1,000 examples. You can adjust this behavior by passing a higher batch size (which will also be processed slower). You can also speed-up the preprocessing by using multiprocessing:

In [None]:
lm_datasets = tokenized_datasets.map(
    group_texts,
    batched=True,
    batch_size=1000,
    num_proc=4,
)

Map (num_proc=4):   0%|          | 0/4379 [00:00<?, ? examples/s]

Map (num_proc=4):   0%|          | 0/996 [00:00<?, ? examples/s]

And we can check our datasets have changed: now the samples contain chunks of `block_size` contiguous tokens, potentially spanning over several of our original texts.

In [None]:
tokenizer.decode(lm_datasets["train"][1]["input_ids"])

' 10:15 this morning. I earn extra money by participating in drug trials. I’m a so-called “healthy subject” who takes experimental drugs to help assess side effects. Once it was a kidney drug. A few times it’s been something for blood pressure or cholesterol. This morning they told me the drug I took was a psychoactive substance intended to accelerate brain function.</s>None of the drugs I had tested so far have ever done anything for me, in the recreational sense. In other words, none of the drugs I’ve tested have given me a killer buzz, or mellowed me'

## 3.c) Training

Now that the data has been cleaned, we're ready to instantiate our `Trainer`. We will a model:

In [None]:
from transformers import AutoModelForCausalLM     # as recommended by Hugging face documentation for opt-350 training
model = AutoModelForCausalLM.from_pretrained(model_checkpoint)

Downloading pytorch_model.bin:   0%|          | 0.00/663M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

And some `TrainingArguments`:

In [None]:
from transformers import Trainer, TrainingArguments
# training arguments facilitates having all the training parameters in one place
# for the ease of modifications as we'll see in the upcoming training command 

In [None]:
model_name = model_checkpoint.split("/")[-1]
training_args = TrainingArguments(
    f"{model_name}-HorrorCreepyPasta",  # we chose the name of our model, related to the chosen datast
    evaluation_strategy = "epoch", # the model will be evaluated after each  epoch
    learning_rate=2e-5,   # usually when fine-tunning, low learning rate values are chosen, as per recommendation
    weight_decay=0.01,  # to prevent over-fitting, although here it is a very small value
    push_to_hub=True, # each learning cycle 'epoch'the model is saved in the hub
)

The last argument to setup everything so we can push the model to the [Hub](https://huggingface.co/models) regularly during training. Remove it if you didn't follow the installation steps at the top of the notebook. If you want to save your model locally in a name that is different than the name of the repository it will be pushed, or if you want to push your model under an organization and not your name space, use the `hub_model_id` argument to set the repo name (it needs to be the full name, including your namespace: for instance `"sgugger/gpt-finetuned-wikitext2"` or `"huggingface/gpt-finetuned-wikitext2"`).

We pass along all of those to the `Trainer` class:

In [None]:
trainer = Trainer(
    model=model,     
    args=training_args,     # as per the chosen parametrs in the codeing cell above
    train_dataset=lm_datasets["train"],   # the tokenized batched training dataset
    eval_dataset=lm_datasets["validation"], # the tokenized batched validation dataset
)

Cloning https://huggingface.co/lionelsh/opt-350m-HorrorCreepyPasta into local empty directory.


And we can train our model:

In [None]:
trainer.train()



Epoch,Training Loss,Validation Loss
1,No log,3.171278
2,2.873800,3.204434
3,2.873800,3.236982


TrainOutput(global_step=750, training_loss=2.72338330078125, metrics={'train_runtime': 176.0029, 'train_samples_per_second': 34.09, 'train_steps_per_second': 4.261, 'total_flos': 1397873442816000.0, 'train_loss': 2.72338330078125, 'epoch': 3.0})

Once the training is completed, we can evaluate our model and get its perplexity on the validation set like this:

In [None]:
import math
eval_results = trainer.evaluate()
print(f"Perplexity: {math.exp(eval_results['eval_loss']):.2f}")

Perplexity: 25.46


**What is meant by perplexity?**

it is a measure of how well probability model predicts a sample. To simplify things it means perplexed means confused, consequently the smaller the number means less confusion means better model and vice vers, the bigger the value the worse model is.

Thus, perplexity metric in NLP is a way to capture the degree of 'uncertainty' a model has in predicting (assigning probabilities to) some text.

As we can see here the number seems reasonably well.


You can now **upload the result of the training to the Hub**, just execute this instruction:

In [None]:
trainer.push_to_hub()

Several commits (2) will be pushed upstream.
The progress bars may be unreliable.


Upload file pytorch_model.bin:   0%|          | 1.00/1.23G [00:00<?, ?B/s]

Upload file runs/Apr16_17-45-06_289eb2929221/events.out.tfevents.1681667141.289eb2929221.1350.0:   0%|        …

Upload file runs/Apr16_17-45-06_289eb2929221/events.out.tfevents.1681667332.289eb2929221.1350.2:   0%|        …

To https://huggingface.co/lionelsh/opt-350m-HorrorCreepyPasta
   32fcaa7..13847f1  main -> main

   32fcaa7..13847f1  main -> main

To https://huggingface.co/lionelsh/opt-350m-HorrorCreepyPasta
   13847f1..b211feb  main -> main

   13847f1..b211feb  main -> main



'https://huggingface.co/lionelsh/opt-350m-HorrorCreepyPasta/commit/13847f1628b97c3d99ebe2fd7165fef00910030e'

**Important** : Don't forget to upload the tokenizer too, otherwise the model will not work

In [None]:
tokenizer.push_to_hub("lionelsh/opt-350m-HorrorCreepyPasta")


CommitInfo(commit_url='https://huggingface.co/lionelsh/opt-350m-HorrorCreepyPasta/commit/576a0f10fa94fd79a6dcbaa85ece869e8885eff8', commit_message='Upload tokenizer', commit_description='', oid='576a0f10fa94fd79a6dcbaa85ece869e8885eff8', pr_url=None, pr_revision=None, pr_num=None)

In [None]:
finetuned_model.save_pretrained(save_directory, push_to_hub=True, repo_name="my-awesome-model"),

lionelsh/opt-350m-finetuned-Dracula1

> Indented block



You can now share this model with all your friends, family, favorite pets: they can all load it with the identifier `"your-username/the-name-you-picked"` so for instance:

```python
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("sgugger/my-awesome-model")
```

# 4) Generating Text 👍

## 4.1) import our new model

First let's import our fine-tuned model

In [None]:
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("lionelsh/opt-350m-HorrorCreepyPasta")


#opt-350m-Dracula_2 # Ignore this comment it's just a reminder of another model we trained

Downloading (…)lve/main/config.json:   0%|          | 0.00/749 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.32G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

## 4.2) Import the tokenizer

In [None]:
tokenizer = AutoTokenizer.from_pretrained("lionelsh/opt-350m-HorrorCreepyPasta")


Downloading (…)okenizer_config.json:   0%|          | 0.00/870 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/548 [00:00<?, ?B/s]

## 4.3) Let's generate a horror Text 😀

In [None]:
prompt = 'I was on a ship'
inputs = tokenizer(prompt, return_tensors="pt")

# the meaning of each parameter is explained in the annex below to avoid taking to much space
outputs = model.generate(**inputs,do_sample=True,min_new_tokens=50,max_new_tokens= 70,temperature = 0.7,top_k=50,top_p=0.95,num_return_sequences=3)

[tokenizer.decode(output) for output in outputs]

['</s>I was on a ship in the middle of the ocean. I had no idea what was going on, but I was so bored that I decided to go into the water. I had never been on a ship before, but I knew how to swim. It was a beautiful sight to me. I could see the stars and the moon, but the water was too cold for',
 '</s>I was on a ship named ‘The Sparrow’ and it was a small country in the middle of the Atlantic. We were a small ship of about 1500 people. Most of the crew were from the US and the rest were from other countries. We lived in a small shack on the edge of town called ‘The Sparrow’. It was a small town',
 '</s>I was on a ship, and we were about to land. We were going to build a camp, but we were going to need a lot of supplies. I had a feeling there was some sort of secret society out there that was trying to make money off the colony, but I had no idea what they were doing or why they were trying so hard to make me feel']

In [None]:
[tokenizer.decode(output) for output in outputs]

['</s>I was on a ship with more than a thousand men, and the captain was a tall, thin man, with a strong, sandy look about him that I did not like. He told me that he had come here on business, and that he had been looking for a friend of his, and that he had told him that he had got a letter from him, that',
 '</s>I was on a ship, and all the crew was on one side, and me on the other. I saw him just as I had seen him at the pier. He was at once recognised. I got in a cab and drove over to his house. I found him, sitting on the porch, looking at the sunrise. He was very excited. I was glad to',
 '</s>I was on a ship, and the ship was carrying cargo of goods, not only of silk and flowers but of much of the same kind. This cargo was of old, and had been for some time lost, as was the case with the other goods. The captain had asked me to watch for any change in the weather, so I went along with him. At first']

Look how beautifull the results above are

# 5) Annex

We would like to bring to your kind attention the following points:

1. There is a group of a Jupyter notebook files in which we implemented various types of testings with different: 

*   Datasets
*   Models GPT-2 and opt350
*   Text generating parameter settings


2. Hereby, here are the sources used for research

    a. Research for choosing the optimum parameters for text generating:

    https://huggingface.co/blog/introducing-csearch

    b. model documentations: opt-350m 
    https://huggingface.co/docs/transformers/main/en/model_doc/opt#transformers.OPTConfig


3. Finally, the definition of the parameters used in generating text command
outputs = model.generate(**inputs,do_sample=True,min_new_tokens=50,max_new_tokens= 70,temperature = 0.7,top_k=50,top_p=0.95,num_return_sequences=3)


**min_new_tokens=50**: This parameter specifies the minimum number of new tokens the generated text should contain. Consequently , it ensures that the generated output is at least the specified length in terms of tokens. This is useful to avoid very short or incomplete responses.

**max_new_tokens=70**: This parameter sets the maximum number of new tokens in the generated text. It limits the length of the output to a specific number of tokens. This can be useful to prevent overly long or verbose responses.

**temperature=0.7**: The temperature parameter controls the randomness of the text generation process. A higher temperature value (greater than 1.0) makes the generated output more diverse and random, while a lower value (less than 1.0) makes it more focused and deterministic. Here, the temperature is set to 0.7, indicating a moderate level of randomness.

**top_k=50**: The top_k parameter specifies the number of top-k words to consider during text generation. It restricts the sampling pool to the most probable k words at each step. A higher value of top_k allows for more diversity in the generated text, while a lower value makes the output more focused and deterministic.
in other words if you choose a low number you will have a poor voacbulary because you have limited the number of words to choose from the corpus of the model and vice-versa

**top_p=0.95**: The top_p, takes the commulative probablity of all probable roots , also known as  "p-coverage" approach, is an alternative to top_k. It considers a cumulative probability distribution and chooses from the smallest set of tokens whose cumulative probability exceeds the given threshold (top_p). A higher value of top_p allows for more diverse and varied outputs, while a lower value makes the output more focused.

**num_return_sequences=3**: This parameter specifies the number of distinct sequences or responses to generate. 

Finally a new parameter has been introduced recently , which we used in our project called penalty alpha, it has a beautiful technique that applies a penalty on the words which the model just used, meaning it will not use them until after a while. consequently we can avoid unnecessary repetition and the text becomes more creative






. 