# Computational Creativity Seminar, LMU, WS 2021/22

## Project: Interdimensional Monopoly

### Creators: Laura Luckert & Shaoqiu Zhang

Topic Description: Zwei oder mehr Spieler spielen das klassische Monopoly, aber nach jedem "über Los" ändert sich das Thema des Spiels. Aus dem klassischen Monopoly wird ein "Star Wars" Monopoly, ein "Herr der Ringe" Monopoly etc. Die Namen der Straßen und Aktionsfelder ändern sich nach dem aktuellen Thema. Die Aktionskarten, Felder und Namen müssen sinnvoll generiert werden. Programmier-Aufwand sollte sich nicht auf eine aufwändige GUI richten, sondern auf die sinnvolle Generation neuer Dimensionen.

#### Target:
* Each new dimension is related to a popular Netflix movie / series
* Within the dimension, places and actions are generated from places that exist in this movie/series and useful actions related to the respective movie/series

#### Data Sources:
* Kaggle series / movie dataset with user rankings: https://www.kaggle.com/chasewillden/netflix-shows
* https://github.com/prosecconetwork/The-NOC-List/blob/master/NOC/DATA/Veale's%20NOC%20List/Veales%20place%20elements.xlsx
* Wikipedia API via https://pypi.org/project/Wikipedia-API/0.3.5/ 
* Regularization via https://hatebase.org and https://github.com/dariusk/wordfilter

#### How To

##### Randomization
* Random selection of topic: Movie or series from netflix dataset, we only consider titles with a rating > 90 to pick only the most popular shows
* For each topic, we retrieve the Wikipedia article, if there is none, the topic is discarded
* Filtration: if Wikipedia article is too short, the topic is discarded

##### Plagiarism
* we use the regular Monopoly action cards in combination with the Wikipedia data for action card text generation

##### Generation
For places:
* NER of wikipedia text to extract persons and locations

For actions:
* KeywordToText Generation with action card input and frequently used terms in the Wikipedia article (considering NERs)

##### Filtration & Creation
* Fitness: Find a fitness metric for the existing places and actions and compare the generated text against it
* -> Self-evaluation of system -> keep only above a certain treshold, otherwise trigger re-generation

#### Output Structure

In [1]:
"""
{
"topic": "Topic Name",

"places": {
    "general_places": [("Name of Place", 1000), ("Name of Place 2", 2000)],
    "train_stations": [("Place 1",3000), ("Place 2",3000), ("Place 3",3000), ("Place 4",3000)]
    "jail": "Name of Place",
    "free_parking": "Name of Place"
    },
    
"actions": {
    "neutral_action": ["Go three fields back", "..."],
    "reward_action": ["Generate rewarding action for the player", "..."],
    "penalty_action": ["Generate punishing action for the player", "..."]
    }
}
"""

'\n{\n"topic": "Topic Name",\n\n"places": {\n    "general_places": [("Name of Place", 1000), ("Name of Place 2", 2000)],\n    "train_stations": ["Place 1", "Place 2", "Place 3", "Place 4"]\n    "jail": "Name of Place",\n    "free_parking": "Name of Place"\n    },\n    \n"actions": {\n    "neutral_action": ["Go three fields back", "..."],\n    "reward_action": ["Generate rewarding action for the player", "..."],\n    "penalty_action": ["Generate punishing action for the player", "..."]\n    }\n}\n'

### 1. Select Topic (Dimension) via Netflix Data

In [8]:
#!conda install -n comp_creativity pandas -y
import sys
#sys.path
import pandas as pd
import os

PATH = "~/Desktop/"
FILENAME = "netflix_data.csv"

full_path = os.path.expanduser(PATH)
os.chdir(full_path)

netflix_data = pd.read_csv(FILENAME, sep=";")

In [9]:
netflix_data.head(10)

Unnamed: 0,title,rating,ratingLevel,ratingDescription,release year,user rating score,user rating size
0,White Chicks,PG-13,"crude and sexual humor, language and some drug...",80,2004,82.0,80
1,Lucky Number Slevin,R,"strong violence, sexual content and adult lang...",100,2006,,82
2,Grey's Anatomy,TV-14,Parents strongly cautioned. May be unsuitable ...,90,2016,98.0,80
3,Prison Break,TV-14,Parents strongly cautioned. May be unsuitable ...,90,2008,98.0,80
4,How I Met Your Mother,TV-PG,Parental guidance suggested. May not be suitab...,70,2014,94.0,80
5,Supernatural,TV-14,Parents strongly cautioned. May be unsuitable ...,90,2016,95.0,80
6,Breaking Bad,TV-MA,For mature audiences. May not be suitable for...,110,2013,97.0,80
7,The Vampire Diaries,TV-14,Parents strongly cautioned. May be unsuitable ...,90,2017,91.0,80
8,The Walking Dead,TV-MA,For mature audiences. May not be suitable for...,110,2015,98.0,80
9,Pretty Little Liars,TV-14,Parents strongly cautioned. May be unsuitable ...,90,2016,96.0,80


In [10]:
## select only shows with rating score > 90
netflix_subset = netflix_data[netflix_data["user rating score"] > 90]

In [11]:
## 271 titles remain
netflix_subset.shape[0]

271

In [13]:
## randomly select topic
topic = netflix_subset.sample()["title"]
print(topic)

61    The Vampire Diaries
Name: title, dtype: object


### 2. Get Data for Topic from Wikipedia API

https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/

In [2]:
import wikipediaapi

In [15]:
## for regular text output
wiki_en_wiki = wikipediaapi.Wikipedia(
        language='en',
        extract_format=wikipediaapi.ExtractFormat.WIKI)

## check if page for topic exists
if wiki_wiki.page(topic).exists():
    print("Topic is ok.")
    wiki_page = wiki_en.page(topic)
else:
    print("Find a new topic")

Topic is ok.


In [21]:
topic_text = wiki_page.text

### 3. NER on Data to Identify Places

Code from: https://towardsdatascience.com/named-entity-recognition-with-nltk-and-spacy-8c4a7d88e7da
https://medium.com/spatial-data-science/how-to-extract-locations-from-text-with-natural-language-processing-9b77035b3ea4

In [25]:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag

def preprocess(sent):
    sent = nltk.word_tokenize(sent)
    sent = nltk.pos_tag(sent)
    return sent

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/lauraluckert/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/lauraluckert/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [26]:
## pos-tagging, maybe useful for actions?
topic_text_pos = preprocess(topic_text)

In [27]:
topic_text_pos

[('The', 'DT'),
 ('Vampire', 'NNP'),
 ('Diaries', 'NNP'),
 ('is', 'VBZ'),
 ('an', 'DT'),
 ('American', 'JJ'),
 ('supernatural', 'NN'),
 ('teen', 'JJ'),
 ('drama', 'NN'),
 ('television', 'NN'),
 ('series', 'NN'),
 ('developed', 'VBN'),
 ('by', 'IN'),
 ('Kevin', 'NNP'),
 ('Williamson', 'NNP'),
 ('and', 'CC'),
 ('Julie', 'NNP'),
 ('Plec', 'NNP'),
 (',', ','),
 ('based', 'VBN'),
 ('on', 'IN'),
 ('the', 'DT'),
 ('book', 'NN'),
 ('series', 'NN'),
 ('of', 'IN'),
 ('the', 'DT'),
 ('same', 'JJ'),
 ('name', 'NN'),
 ('written', 'VBN'),
 ('by', 'IN'),
 ('L.', 'NNP'),
 ('J.', 'NNP'),
 ('Smith', 'NNP'),
 ('.', '.'),
 ('The', 'DT'),
 ('series', 'NN'),
 ('premiered', 'VBD'),
 ('on', 'IN'),
 ('The', 'DT'),
 ('CW', 'NNP'),
 ('on', 'IN'),
 ('September', 'NNP'),
 ('10', 'CD'),
 (',', ','),
 ('2009', 'CD'),
 (',', ','),
 ('and', 'CC'),
 ('concluded', 'VBD'),
 ('on', 'IN'),
 ('March', 'NNP'),
 ('10', 'CD'),
 (',', ','),
 ('2017', 'CD'),
 (',', ','),
 ('having', 'VBG'),
 ('aired', 'VBN'),
 ('171', 'CD'),
 ('

In [31]:
import spacy
from spacy import displacy
from collections import Counter
nlp = spacy.load('en_core_web_sm')

In [49]:
nlp_w = spacy.load('xx_ent_wiki_sm')

In [72]:
## roberta based
#nlp_b = spacy.load('en_core_web_trf')

In [33]:
doc = nlp(topic_text)
print([(X.text, X.label_) for X in doc.ents])

[('The Vampire Diaries', 'ORG'), ('American', 'NORP'), ('Kevin Williamson', 'PERSON'), ('Julie Plec', 'PERSON'), ('L. J. Smith', 'PERSON'), ('September 10, 2009', 'DATE'), ('March 10, 2017', 'DATE'), ('171', 'CARDINAL'), ('eight seasons', 'DATE'), ('2006', 'DATE'), ('the first season', 'DATE'), ('3.60 million', 'CARDINAL'), ('Arrow', 'ORG'), ('four', 'CARDINAL'), ('April 2015', 'DATE'), ('Nina Dobrev', 'PERSON'), ('Elena Gilbert', 'PERSON'), ('its sixth season', 'DATE'), ('Dobrev', 'ORG'), ('seventh-season', 'DATE'), ('March 2016', 'DATE'), ('CW', 'ORG'), ('an eighth season', 'DATE'), ('July of that year', 'DATE'), ('the eighth season', 'DATE'), ('16', 'CARDINAL'), ('The final season', 'DATE'), ('October 21, 2016', 'DATE'), ('March 10, 2017', 'DATE'), ('The Originals', 'WORK_OF_ART'), ('first', 'ORDINAL'), ('Mystic Falls', 'ORG'), ('Virginia', 'GPE'), ('162-year-old', 'DATE'), ('Stefan Salvatore', 'PERSON'), ('Paul Wesley', 'PERSON'), ('Stefan', 'PERSON'), ('Damon Salvatore', 'PERSON')

In [60]:
entities = set()
for item in doc.ents:
    entities.add(item.label_)

print(entities)

{'ORDINAL', 'WORK_OF_ART', 'NORP', 'ORG', 'GPE', 'FAC', 'PRODUCT', 'PERSON', 'EVENT', 'DATE', 'CARDINAL', 'TIME'}


In [66]:
type(doc)
for item in doc.ents:
    if item.label_ == "GPE":
        print(item.text)

Virginia
Petrova
Amara
Brooklyn
Jeremy
Tyler
Alaric
Vicki
Jeremy
Jenna
Alaric
Alaric
Dallas
New Orleans
Vancouver
British Columbia
Covington
Georgia
Virginia
Atlanta
Covington
United States
the United Kingdom
Brazil
Australia
the United Kingdom
Brazil
Japan
New Orleans
New Orleans
New Orleans


In [50]:
doc_w = nlp_w(topic_text)

In [67]:
entities_w = set()
for item in doc_w.ents:
    entities_w.add(item.label_)

print(entities_w)

{'PER', 'MISC', 'LOC', 'ORG'}


In [71]:
for item in doc_w.ents:
    if item.label_ == "LOC":
        print(item.text)

Mystic Falls
Virginia
Brooklyn
Ripper
Silas
Ripper
Jeremy
Mystic Falls
Caroline
Dallas
New Orleans
Vancouver
British Columbia
Covington
Georgia
Mystic Falls
Virginia
Greater Atlanta
Clark Street
Covington
Trevino
San Francisco Chronicle's Tim Goodman
Meredith
Region A
B
United Kingdom
Brazil
Australia
B
United Kingdom
Brazil
Japan
Claire Holt
Rebekah
New Orleans
New Orleans


### 3. Text Generation via Keywords
https://medium.com/mlearning-ai/generating-sentences-from-keywords-using-transformers-in-nlp-e89f4de5cf6b