Notebooks from the Seminar:
Georg Trogemann, Christian Heck, Mattis Kuhn
Grundlagenseminar Material/Skulptur/Code
Compact seminar 10:00 - 17:00 | 25.01.2021 until 29.01.2021
Online @ BigBlueButton
Academy of Media Arts Cologne
The generation of text by means of Deep Neural Nets (NLG) is currently spreading rapidly in a wide variety of areas. Among other things, text-based dialogue systems such as chatbots, assistance systems (Alexa/Siri) or robot journalism are increasingly being used in news portals, e-commerce, social media, health and logistics. Everywhere where context-based, natural language or reader-friendly texts are to be generated from structured data.
Deep writing techniques have also found their way into the arts and literature with the help of models such as ELMo (Embeddings from Language Models), BERT (Bidirectional Encoder Representations from Transformers) or GPT-2 (Generative Pre-Training Transformer). The latter was described last year as the most powerful and dangerous AI to date, so the OpenAI developer platform kept the code closed for the time being due to concerns about possible misuse (fake news on the fly, etc.). In the meantime, it is open and, as it turned out, not quite so dangerous after all.
While we will deal with these developments and our handling of the resulting artefacts in the technical seminar "Codichte - Experiments with Cognitive Systems", the focus of this basic seminar is on programming. The aim is that at the end, each student will have produced (a) text based on one of the neural language models mentioned above. Whether poem, prose, novel, essay, manifesto, shopping list, social bot, vita or new programming code.
In this seminar, you will learn from us how to generate texts on the basis of data sets (text corpora). There are many freely accessible datasets, but since our aim is to lay the foundation for our own artistic projects, it is advisable to bring your own dataset, depending on your interests. This could be, for example, the digitised work of an author, one's own email inbox, posts on social media, law book, one's own texts, bible, study regulations, etc.
No programming knowledge is necessary to participate in the basic seminar. At the beginning, we will give a compact introduction to the programming language Python. programming language.
Executing the Notebooks:
Folder in KHM-Cloud:
- Here you can find some material for the seminar
Basics in Anaconda & Jupyter Notebooks:
see repository: https://github.com/experimental-informatics/hands-on-python
- scraper_wikipedia.ipynb < extract text of specific wikipedia articles
- scrape-load_textcorpora.ipynb < some basic examples and code-snippets to srape, load and walk through datasets
- dataset-list.md < just some resources of datasets & archives
-
n_order_text_generation.ipynb < text generation from zero-order (pure random) via first-order (probability through quantity) and second-order (markov-chain based on one token) to n-order (markov chain based on n token)
-
markov_simple.ipynb < a simple ready to use version of n-order markov chains based on n_order_text_generation.ipynb
-
interactive_text_generation.ipynb < next word recommendation via markov chain
-
markov_basic.ipynb < word-level markov chain
-
markov_n-grams.ipynb < word-level markov chain based on n-grams