## A metaversal traveller's guide to interdimensional foraging and general plant use
### 19019545 - Oliver Wood - o.wood0120191@arts.ac.uk - NLP for the Creative Industries

Hello, wanderer. You may think you know where you are going - the map is here, you coordinates are stated. But do you know what lies there? Do you know what is needed to aid you along the way, and where to find it?

I didn't think so. That is why I watch this portal - witches before and after me have watched it for millenia. You are going to Terra - no one mentions how strange a place it is. What you need and what you encounter is unlike any other. There, organisms grow from the ground beneath your feet. They are vivid, ranging in size and season and form. These organisms are one of the primary communities of the planet: they are so embedded in the ecosystem there that all other species use and interact with them.

These organisms are strange allies, and even once consumed, they live on in terrestrial bodies through micro-organisms and bodily process. 

Our neighbors, humans, have spent a great deal of time documenting these organisms, their nature and properties. Luckily, we have a program to decode their documentation and make use of it ourselves, on treacherous journeys such as yours... 

Each one has specific properties: medicinal, edible, poisonous (which can also be quite useful), textile, environment-building - their uses are vast and frequently unexpected. Imagine biting into a long fruit, dug from the ground, bright orange in hue - to improve your eyesight and fuel your body? Can you imagine living your life in an box made of their thick, decaying corpses? Sleeping on beds of their shredded leaves, adorning your form in their woven threads? The way our neighbors interact with these forms is unfathomably strange. We are most grateful for their documentation - the information gathered by humans is of great use to any who moves from computational space to disorganized, strange, Terra. As your body is translated to the new space, so are your needs and abilities. You, too, will become dependent on these organisms. 

Every wanderer is different - we each need ten species to survive the journey. I can see that your **Flower Handbook** is empty - as I said, you do not truly know where you are going.

No mind. Sit with me, child, we'll find the knowledge that you need and bind it together for your journey.

In [None]:
#Here, we're using two different text generation models
#to create a 'handbook' of new plant species with pseudoscientific information about them.
#It's fun to imagine this as an interdimensional traveller's toolbook however, this project has many applications
#this can also be used, with varying datasets and models, to identify trends of online misinformation
#surrounding biology, herbalism and relationships to food.

In [1]:
#import libraries
import numpy as np
import random
import re
from transformers import pipeline

ModuleNotFoundError: No module named 'transformers'

In [None]:
#install datasets
!pip install transformers datasets

In [None]:
#install textgen model
!pip install textgenrnn

In [None]:
#load in textgenrnn model for generating flower names
#this model is easily accessible, fairly customizable and is computationally efficient
#ideal for something like name generation

import textgenrnn 
textgen = textgenrnn.textgenrnn()

In [None]:
#install graphcore's gpt2 model - this one is computationally very efficient
#and fine-tuned on a dataset of wikipedia entries
#ideal for generating the kind of formal, pseudoscientific text we are looking for

#code from model page on HuggingFace - link in citations cell

from transformers import AutoTokenizer, AutoModelWithLMHead

tokenizer = AutoTokenizer.from_pretrained("Graphcore/gpt2-wikitext-103")

model = AutoModelWithLMHead.from_pretrained("Graphcore/gpt2-wikitext-103")

### Calling on the information collected by humans

In [None]:
#load in flower names dataset

flower_names = []

f = open("flowers.csv")

for i in f:
    #seperate text
    line = i.rstrip().split(',')
    flower_names.append(line)
    
#convert 2d list to 1d for simple processing
flower_names = list(np.concatenate(flower_names).flat)

In [None]:
print(flower_names)

In [None]:
#training the textgen model

textgen.reset() #Reset the model to avoid overfitting / overlapping

textgen.train_on_texts(flower_names, num_epochs=5,  gen_epochs=5, train_size=0.5, dropout=0.2)

#Using the parameters I found to be most effective for desired creative results

In [None]:
#generating 125 new flower names and saving to .txt file
textgen.generate_to_file('flower_names.txt', n=125, temperature=0.5)


In [None]:
#load in and format the generated names

gen_names = []

with open('flower_names.txt') as t:
    l = t.readlines()
    for i in l:
        #formatting text
        i = i.strip()
        i = re.sub(r"[^a-zA-Z0-9]+", ' ', i)
        #append to list
        gen_names.append(i)
    #removing any duplicates by creating a set and converting back to list
    gen_names = list(set(gen_names))

In [None]:
print(gen_names)

### Which 10 species of these strange creatures will aid you? Here, we'll call on them by generating input prompts.

In [None]:
#adding some contextual information for our model. 
#these can be changed according to creative directions
prompt2_options = ["flower", "blossom", "weed", "plant", "tree", "bush"]


In [None]:
#prompting pseudoscientific, educational or explanatory text
#these can also be changed if so desired
prompt3_options = ["is an edible species of plant generally found growing in", "is an herbal species, favoured by foragers. It is frequently found in areas with",
                  "is a floral species which blooms in", "is a seasonal wild edible found primarily in", 
                   "is a medicinal flower typically found in", "is a wild plant widely used for", "is a wildflower well-loved for it's sweet smell, typically found in",
                  "is an herbal flower found in areas with"]


In [None]:
prompts = []

for p in range (0,10):
    prompt1 = random.choice(gen_names)
    prompt2 = random.choice(prompt2_options)
    prompt3 = random.choice(prompt3_options)
    p = prompt1 + " " + prompt2 + " " + prompt3 + " "
    prompts.append(p)
prompts

### Now, wanderer, we can fill our handbook...

In [None]:
#generate pseudoscientific/pseudo-educational text from prompts

generator = pipeline("text-generation",model="Graphcore/gpt2-wikitext-103")

handbook = []

#writing our generations to our handbook
#update max_length if you'd like to know more or less

for p in prompts:
    with open('flower_handbook.txt', 'a') as flower:
        bud = generator(p, num_return_sequences=1, max_length=275)
        blossom = str(bud)
        
        #formatting text
        
        blossom = blossom.replace("\n"," ")                            
        blossom = blossom.replace("generated_text"," ")
        
        #removing some tricky delimeters and their contained headings that come through 
        #due to being trained on wikipedia raw text
        #such as '== History ==' or '//n Composition //n'
        
        #label the delimeters so we can locate them
        i = "="
        f = "\\n"
        
        #iterate the text, remove delimeters and the text inside of them
        #removes the delimeters in each different text generation, as they appear
        
        blossom = ' '.join(y.strip() for y in blossom.split(i)[::2])
        blossom = ' '.join(x.strip() for x in blossom.split(f)[::2])
        
        #using regex to remove any other extraneous characters
        
        blossom = re.sub('[^.,a-zA-Z0-9 \n\.]', '', blossom)
        blossom = blossom.strip()
        
        flower.write(blossom + "\n")
        handbook.append(blossom)

### Your handbook is complete, friend. You can take it with you, but have a look here too...

In [None]:
handbook

### citations and credit

This notebook was made using the teaching, and some notebook content, from NLP Week 5.2.1

I also referred to the Hugging Face Transformers tutorial - https://huggingface.co/course/en/chapter0/1

The models I used are:

textgenrnn by minimaxir (Max Woolf) on gitHub
https://github.com/minimaxir/textgenrnn

gpt-2 trained on wikitext by Optimum Graphcore on HuggingFace
https://huggingface.co/Graphcore/gpt2-wikitext-103

The dataset I used is:

Flowers by BloomPop on gitHub
https://github.com/Bloompop/Flowers