# Speech to Text
The purpose of this notebook is to prototype some code that can perform the following tasks

*Part 1: Listening for Speech and Storing speech in FIFO buffer*
- Detect and open the microphone on a MacBook Pro computer
- Listen for any spoken words.
- Create a list of strings that are all the spoken words 
- Add the heard words to a FIFO buuffer of heard words
- include information about the part of speech for each word in the FIFO buffer

*Part 2: Using FIFO buffer to construct prompts for Stable Diffusion*
- Generate or use predefined prompt structures and the FIFO of spoken words to generate Stable Diffusion prompts

*Part 3: Generate Images*
- Feed generated prompts into a stable diffusion network to create images based on recent conversations that occur in the proximity of the laptop

This notebook serves as a POC for an installation I am working on that passively listens to the environment it is installed and uses words spoken in the location and machine learning to create images based on what people are talking about =)


In [21]:
import pyaudio as audio
import speech_recognition as sr
import nltk
from nltk import word_tokenize
from nltk.stem import PorterStemmer
from nltk.tag import pos_tag
from nltk import RegexpParser

##############################
only_unique_words = True
###############

# Download the required datafiles for the NLTK pos_tag function
nltk.download('averaged_perceptron_tagger')

# create a list of stopwords to ignore...
stopwords = set(['shan', 'same', "wasn't", "she's", 
                 'they', 'off', "needn't", "weren't", 
                 'as', 'some', 'and', 'from', 'other', 
                 "shouldn't", "shan't", 'to', 'does', 
                 'was', 'has', 'so', 'himself', 'do', 
                 'below', "doesn't", "that'll", 'its', 
                 'these', 'are', 'more', 'aren', 'all', 
                 'whom', 'shouldn', 'too', 'over', "you've", 
                 'him', 'o', 'his', 'be', "you'll", 'out', 
                 'against', 'most', 'if', 'hasn', 'own', 
                 's', 'what', 'theirs', 'or', "it's", 
                 'will', "don't", 'is', 'been', 'who', 
                 'yourselves', 'her', 'did', 'the', 'up', 
                 'there', 'ourselves', 'during', 'mightn', 
                 "you'd", 'further', 'very', 'those', 'for', 
                 'but', 'an', 'in', 'nor', "mightn't", 've', 
                 'both', 'until', 'isn', 'ain', "didn't", 
                 'than', 'themselves', 'myself', "couldn't", 
                 'now', 'herself', 'any', 'by', "wouldn't", 
                 'about', 'after', 'here', 'doesn', 'a', 
                 'which', 'd', 'y', 'were', 'couldn', 
                 "aren't", 'i', 'then', 'being', 'just', 
                 'our', "haven't", 't', 'wouldn', 're', 
                 "mustn't", 'while', 'with', 'only', 
                 'under', 'ma', 'again', 'can', 'ours', 
                 'through', "hadn't", 'when', 'hers', 
                 "isn't", 'of', 'few', 'my', 'had', 
                 'before', 'where', 'wasn', "should've", 
                 'she', 'your', 'haven', 'weren', 'on', 
                 'have', 'he', 'between', 'me', 'down', 
                 'should', 'mustn', 'their', 'am', 'above', 
                 'll', 'such', 'why', 'no', 'you', 'it', 
                 'because', 'into', 'm', "you're", 'that', 
                 'itself', 'not', 'hadn', "won't", 'we', 
                 'don', 'doing', 'won', 'them', 'this', 
                 "hasn't", 'how', 'at', 'needn', 'once', 
                 'having', 'yours', 'each', 'yourself', 'didn'])

print("stop_words: ", stopwords)

stemmer = PorterStemmer()

# create a queue of the last 100 words identified by the program
# recent_text_q is a list of dicts with three values: word, type, and freq
recent_text_q = []
max_q_len = 100

# activate macbook microphone stream
# create a speech recognition object
recognizer = sr.Recognizer()


stop_words:  {"wasn't", 'couldn', 'it', 'yourself', "wouldn't", 'are', 'she', 'such', 'yours', "she's", 'these', 'above', 'further', 'having', 'nor', 'me', 'the', 'on', 'did', 'yourselves', 'there', 'mightn', 'needn', 'y', 'whom', 'in', "isn't", 'am', 'those', 'your', 'hasn', 'does', 'only', 'until', 'and', 'doing', 'each', 'wouldn', 'will', "needn't", 'a', 'ourselves', 'don', 'was', "weren't", 'has', 'i', 'more', 'just', 'that', 'off', "mightn't", 'than', "doesn't", 'herself', 'because', 'which', "hasn't", 'itself', "that'll", 'they', 've', 'for', 'hers', 'why', 'mustn', 'while', 'ain', "it's", 'being', 'again', 'by', 'its', 'we', 'how', 'through', 'he', 'into', "couldn't", 'from', 'here', 'an', 'between', 'won', 'himself', 'ours', 'up', "mustn't", 'some', 'out', 'themselves', 'against', 'as', 'him', 'other', 'once', 'after', 'our', 'what', 'to', 'doesn', 'where', 'own', "you've", 'have', 'm', 'not', 'same', "won't", 'very', 'her', 'aren', 'this', "aren't", 'of', "haven't", 'then', 'o

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/nathan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


In [22]:


################################################################
## figure out what audio device our microphone is
def getMacbookProMic():
    print(sr.Microphone.list_microphone_names())
    internal_macbook_name = "MacBook Pro Microphone"
    index = sr.Microphone.list_microphone_names().index(internal_macbook_name)

    print("should be the internal microphone: ", sr.Microphone.list_microphone_names()[index])

    # create a microphone object
    internal_mic = sr.Microphone(device_index = index)
    return internal_mic

def fifoInDict(lst, val, tag, max_len):
    # check if item needs to be removed
    if len(lst) >= max_len:
        lst.pop(0)
    # update the dictionary
    temp_dict = {'word': val, 'type': tag, 'freq': 1}
    lst.append(temp_dict)
    return lst

def fifoInLst(lst, val, max_len):
    # check if item needs to be removed
    if len(lst) >= max_len:
        lst.pop(0)
    # update the dictionary
    lst.append(val)
    return lst

def speechToText(mic, recognizer):
    """
    Input : mic = st.Microphone() object where an audio stream can be read
            recognizer = sr.Recognizer() object that takes in a audio clip and returns a list of words
    Output: words = list of words that are identified from words spoken into the microphone
    """
    # capture audio from the microphone
    with mic as source:
        # adjust for background noise to increase success rate
        recognizer.adjust_for_ambient_noise(source)
        # identify any spoken words in the audio
        print("Speech Recognizer Enabled");
        audio = recognizer.listen(source)
        print("audio exported from source")
        raw_string = ""
        try:
            raw_string = recognizer.recognize_google(audio)
        except:
            print(" .")

        print("{} of tokenized words returned from google: {}".format(type(raw_string), raw_string))
        # remove words from string

        return_str = raw_string.split()
        wn = len(return_str)
        # remove any words in the stopwords
        words = [i for i in return_str if i not in stopwords]
        print("{} words removed from stopwords".format(wn - len(words)))
        words = [i for i in return_str if i not in recent_text_q]
        print("{} words removed from priorwords".format(wn - len(words)))
        return words
    
def tagWords(words):
    """
    Use NLTK to tag a list of words and return a tuple (str, type)
    This function should be run before storing the words into memory so the program
    knows what part of speech the words belong and can construct sentences from those
    words accordingly
    """
    words_tags = pos_tag(words)
    print("words tagged: {}".format(words_tags))
    return words_tags

def addWordsToMemory(words, tags):
    """

    """
    # create a dict to place in the list of heard words
    # append spoken words to the running FIFO of all words
    for i in range(words):
        recent_text_q = fifoInDict(recent_text_q, words[i], tags[i], max_q_len)
        # if append to buffer according to type of grammer

    print("{} identified words: ".format(len(recent_text_q)),
            recent_text_q)
    return recent_text_q

# classify words and add them to FIFO buffers

# print current FIFO buffers

def getStrFromTuple(lst):
    r = ""
    for l in lst:
        r.join(l[0]).join(" ")
    return r

def getStrFromList(lst):
    print(lst)
    r = ""
    for i in range(len(lst)):
        print(i)
        r.join(lst[i]).join(" ")
        print(r)
    return r

def createWordDict(word_tags):
    word_dict = {}
    for word, pos in word_tags:
        if word in word_dict:
            word_dict[word]["freq"] += 1
        else:
            word_dict[word] = {"word": word, "type": pos, "freq": 1}

    consolidated_list = list(word_dict.values())
    return consolidated_list

# Testing the Speech to Text Portion of the Program
Okay great, now we have all the functions we need to detect the MacBook Pro microphone, open it,
listen for a while, and then extract the spoken text. We also have functions to remove stop words,
and tag the words with what part of speech they belong to. 

In [23]:
# keep listening until 50 words are heard and stored in memory
def listenForWords(min_words, max_words):
    new_words = []
    macbook_mic = getMacbookProMic()
    while len(new_words) < min_words:
        results = speechToText(macbook_mic, recognizer)
        if results is not []:
            new_words.extend(results)
            print("List of {} words includes: {}".format(len(new_words), new_words[:-5]))
        else:
            print("No words detected")
    return new_words[:max_words]

words = listenForWords(20, 100)
print('we found a total of {} words: {}'.format(len(words), words))

['LG HDR 4K', 'BlackHole 16ch', 'MacBook Pro Microphone', 'MacBook Pro Speakers', 'Microsoft Teams Audio', 'ZoomAudioDevice']
should be the internal microphone:  MacBook Pro Microphone
Speech Recognizer Enabled
audio exported from source
result2:
[]
 .
<class 'str'> of tokenized words returned from google: 
0 words removed from stopwords
0 words removed from priorwords
List of 0 words includes: []
Speech Recognizer Enabled
audio exported from source
result2:
[]
 .
<class 'str'> of tokenized words returned from google: 
0 words removed from stopwords
0 words removed from priorwords
List of 0 words includes: []
Speech Recognizer Enabled
audio exported from source
result2:
[]
 .
<class 'str'> of tokenized words returned from google: 
0 words removed from stopwords
0 words removed from priorwords
List of 0 words includes: []
Speech Recognizer Enabled
audio exported from source
result2:
[]
 .
<class 'str'> of tokenized words returned from google: 
0 words removed from stopwords
0 words remo

In [24]:
word_tags = tagWords(words)
recent_text_q = createWordDict(word_tags)
print(recent_text_q)

words tagged: [('how', 'WRB'), ('much', 'JJ'), ('the', 'DT'), ('shows', 'NNS'), ('were', 'VBD'), ('costing', 'VBG'), ('and', 'CC'), ('production', 'NN'), ('for', 'IN'), ('being', 'VBG'), ('run', 'VBN'), ('at', 'IN'), ('the', 'DT'), ('recruiting', 'NN'), ('onset', 'NN'), ('and', 'CC'), ('wasted', 'VBD'), ('many', 'JJ'), ('working', 'VBG'), ('there', 'RB'), ('thought', 'VBN'), ('were', 'VBD'), ('completely', 'RB'), ('unnecessary', 'JJ'), ('and', 'CC'), ('Beyond', 'NNP'), ('writing', 'NN'), ('and', 'CC'), ('producing', 'VBG'), ('employees', 'NNS'), ("there's", 'VBP'), ('a', 'DT'), ('lot', 'NN'), ('of', 'IN'), ('words', 'NNS'), ('that', 'WDT'), ('I', 'PRP'), ('want', 'VBP'), ('to', 'TO'), ('talk', 'VB'), ('about', 'IN'), ('and', 'CC'), ('things', 'NNS'), ('can', 'MD'), ('get', 'VB'), ('real', 'JJ'), ('ugly', 'RB'), ('and', 'CC'), ('scary', 'JJ'), ('and', 'CC'), ('manipulative', 'JJ'), ('and', 'CC'), ('a', 'DT'), ('whole', 'JJ'), ('bunch', 'NN'), ('of', 'IN'), ('things', 'NNS'), ('like', 'I

In [25]:
nouns = []
verbs = []
adjectives = []

def populateGrammarLists(recent_text_q, nouns, verbs, adjectives):
    for word in recent_text_q:
        print("word : {}".format(word))
        if word['type'].startswith("NN"):
            for i in range(word['freq']):
                nouns = fifoInLst(nouns, word['word'], max_q_len)
        elif word['type'].startswith("VB"):
            for i in range(word['freq']):
                verbs = fifoInLst(verbs, word['word'], max_q_len)
        elif word['type'].startswith("JJ"):
            for i in range(word['freq']):
                adjectives = fifoInLst(adjectives, word['word'], max_q_len)
        prompt_string = ""
    return nouns, verbs, adjectives

nouns, verbs, adjectives = populateGrammarLists(recent_text_q, nouns, verbs, adjectives)
print("{} nouns are saved: {}".format(len(nouns), nouns))
print("{} verbs are saved: {}".format(len(verbs), verbs))
print("{} adjectives are saved: {}".format(len(adjectives), adjectives))

word : {'word': 'how', 'type': 'WRB', 'freq': 1}
word : {'word': 'much', 'type': 'JJ', 'freq': 1}
word : {'word': 'the', 'type': 'DT', 'freq': 3}
word : {'word': 'shows', 'type': 'NNS', 'freq': 1}
word : {'word': 'were', 'type': 'VBD', 'freq': 2}
word : {'word': 'costing', 'type': 'VBG', 'freq': 1}
word : {'word': 'and', 'type': 'CC', 'freq': 8}
word : {'word': 'production', 'type': 'NN', 'freq': 1}
word : {'word': 'for', 'type': 'IN', 'freq': 1}
word : {'word': 'being', 'type': 'VBG', 'freq': 1}
word : {'word': 'run', 'type': 'VBN', 'freq': 1}
word : {'word': 'at', 'type': 'IN', 'freq': 1}
word : {'word': 'recruiting', 'type': 'NN', 'freq': 1}
word : {'word': 'onset', 'type': 'NN', 'freq': 1}
word : {'word': 'wasted', 'type': 'VBD', 'freq': 1}
word : {'word': 'many', 'type': 'JJ', 'freq': 1}
word : {'word': 'working', 'type': 'VBG', 'freq': 1}
word : {'word': 'there', 'type': 'RB', 'freq': 1}
word : {'word': 'thought', 'type': 'VBN', 'freq': 1}
word : {'word': 'completely', 'type': 'R

## Text to Speech using espeak in OS

In [34]:
import os
import random

# okay, now it is time to construct our string

def addSimplePhraise():
    # generate multiple phraises, then determine best one using nltk
    return "{} {} {} {} {}".format(randomNoun(), randomAdj(), randomNoun(), randomVerb(), randomNoun())

def randomNoun():
    return nouns[random.randint(0, len(nouns) - 1)]

def randomVerb():
    return verbs[random.randint(0, len(verbs) - 1)]

def randomAdj():
    return adjectives[random.randint(0, len(adjectives) - 1)]

prompt_string = "Cubist painting with abstract styling and high detail {}, {}".format(addSimplePhraise(), addSimplePhraise())
print(prompt_string)


Cubist painting with abstract styling and high detail shows scary words coming question, lot unnecessary mouth run bunch


In [35]:
gender = 'm' # as oppose to  'f'
vnum = str(random.randint(1, 5))
voice = '{}{}'.format(gender, vnum)
pitch = str(random.randint(10, 90))
command = 'espeak -s 180 -v {} -p {} "{}"\n'.format(voice, pitch, prompt_string)
print("using espeak to say the following command {}".format(prompt_string))
os.system(command)

using espeak to say the following command Cubist painting with abstract styling and high detail shows scary words coming question, lot unnecessary mouth run bunch


11

# Great! Now lets use the stable diffusion repo located in ../stable_diffusion to then use our generated prompts to create some images!!!

In [28]:
import sys

sys.path.append('../stable_diffusion')

import image_generator_from_text as img_generator

In [36]:
""" 
The main function of image_generator_from_text reads in a dictionary 
for arguments which needs the following fields:
output_width
output_height
prompts
batch_size
steps
seed
plot_output
upscale
"""
arg_dict = {
    'output_width': 512,
    'output_height': 512,
    'prompts': [prompt_string],
    'batch_size': 1,
    'steps' : 10,
    'seed' : 1290,
    'plot_output': True,
    'upscale': 2.0
}
print(arg_dict)

{'output_width': 512, 'output_height': 512, 'prompts': ['Cubist painting with abstract styling and high detail shows scary words coming question, lot unnecessary mouth run bunch'], 'batch_size': 1, 'steps': 10, 'seed': 1290, 'plot_output': True, 'upscale': 2.0}


In [37]:
img_generator.main(arg_dict)

By using this model checkpoint, you acknowledge that its usage is subject to the terms of the CreativeML Open RAIL-M license at https://raw.githubusercontent.com/CompVis/stable-diffusion/main/LICENSE
prompts are: ['Cubist painting with abstract styling and high detail shows scary words coming question, lot unnecessary mouth run bunch']
prompts are ['Cubist painting with abstract styling and high detail shows scary words coming question, lot unnecessary mouth run bunch']
image name generated: /Users/nathan/workspace/neural_art_bots/audio_transcription/output_images/2023_05_13_11_Cubist_painting_with_abstract_styling_an_s1290_0.png
output names: ['/Users/nathan/workspace/neural_art_bots/audio_transcription/output_images/2023_05_13_11_Cubist_painting_with_abstract_styling_an_s1290_0.png']
creating batch # 0 for prompt Cubist painting with abstract styling and high detail shows scary words coming question, lot unnecessary mouth run bunch


2023-05-13 11:22:29.332383: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int32 and shape [1,77]
	 [[{{node Placeholder/_1}}]]
2023-05-13 11:22:34.444034: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int32 and shape [1,77]
	 [[{{node Placeholder/_1}}]]
2023-05-13 11:22:55.167584: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,76



2023-05-13 11:23:36.054224: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]


 1/10 [==>...........................] - ETA: 9:53

2023-05-13 11:23:40.434562: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]
2023-05-13 11:23:44.723726: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]


 2/10 [=====>........................] - ETA: 1:08

2023-05-13 11:23:49.029683: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]
2023-05-13 11:23:53.461161: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]




2023-05-13 11:23:57.989476: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]
2023-05-13 11:24:02.511233: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]




2023-05-13 11:24:07.070765: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]
2023-05-13 11:24:12.736263: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]




2023-05-13 11:24:22.791731: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]
2023-05-13 11:24:32.465861: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]




2023-05-13 11:24:38.338400: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]
2023-05-13 11:24:43.112519: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]




2023-05-13 11:24:47.689040: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]
2023-05-13 11:24:52.811190: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]




2023-05-13 11:24:58.508367: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]
2023-05-13 11:25:04.631802: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]




2023-05-13 11:25:10.836088: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]
2023-05-13 11:25:17.060867: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]




2023-05-13 11:25:26.332447: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype float and shape [1,64,64,4]
	 [[{{node Placeholder/_0}}]]


save_image() with title of /Users/nathan/workspace/neural_art_bots/audio_transcription/output_images/2023_05_13_11_Cubist_painting_with_abstract_styling_an_s1290_0.png and image shape of (1, 512, 512, 3)




loading image at path: /Users/nathan/workspace/neural_art_bots/audio_transcription/output_images/2023_05_13_11_Cubist_painting_with_abstract_styling_an_s1290_0.png
img: <PIL.PngImagePlugin.PngImageFile image mode=RGB size=512x512 at 0x1463AE110>
img type: <class 'PIL.PngImagePlugin.PngImageFile'>
generating upscale model with input shape of (512, 512) and output shape of (1024, 1024)
Image saved to: /Users/nathan/workspace/neural_art_bots/audio_transcription/output_images/2023_05_13_11_Cubist_painting_with_abstract_styling_an_s1290_0_upscaled.png


TypeError: plot_images() missing 1 required positional argument: 'titles'

## Great, that works, but lets try to use a different method to generate image prompts that are a bit less random... 
Time to call on our old friend GPT =)

In [41]:
import openai
import json
import requests

assert os.environ.get('OPENAI_API_KEY') is not None, 'ERROR, your environment variable OPENAI_API_KEY is not set properly'

def generatePromptWithGPT(word_memory):
    max_tokens = 200
    # Create a dictionary to store our headers
    prompt = "Can you please use the following words to create an interesting prompt for stable diffusion image generation? {}".format(word_memory)

    headers = {
        'Content-Type': 'application/json',
        'Authorization': 'Bearer ' + os.environ.get('OPENAI_API_KEY')
    }
    # create the data dictionary for our API call
    data = {
        'model': 'text-babbage-001',
        'temperature': 1.2,
        'n': 1,
        'max_tokens': max_tokens,
        'prompt':  prompt  # TODO, what does the 'role' potion of this dict do?
        # 'stop' : ';'
    }
    # print("Our data dict is as follows: ", data)
    # print("Our header dict is as follows: {}".format(headers))
    # Pose the debate topic question to our first debater
    # print("{} is generating a response to the prompt of: {}".format(
    #    debater_params['name'], topic))
    response = requests.post(
        'https://api.openai.com/v1/completions', headers=headers, json=data).json()
    return response['choices'][0]['text'].replace("/n", "")

gpt_prompt = generatePromptWithGPT(words)
print(gpt_prompt)



The show was costing and production was wasting many working hours thinking they were completely unnecessary.


In [42]:
arg_dict['prompts'] = [gpt_prompt]
img_generator.main(arg_dict)

By using this model checkpoint, you acknowledge that its usage is subject to the terms of the CreativeML Open RAIL-M license at https://raw.githubusercontent.com/CompVis/stable-diffusion/main/LICENSE
prompts are: ['\n\nThe show was costing and production was wasting many working hours thinking they were completely unnecessary.']
prompts are ['\n\nThe show was costing and production was wasting many working hours thinking they were completely unnecessary.']
image name generated: /Users/nathan/workspace/neural_art_bots/audio_transcription/output_images/2023_05_13_12_

The_show_was_costing_and_production_wa_s1290_0.png
output names: ['/Users/nathan/workspace/neural_art_bots/audio_transcription/output_images/2023_05_13_12_\n\nThe_show_was_costing_and_production_wa_s1290_0.png']
creating batch # 0 for prompt 

The show was costing and production was wasting many working hours thinking they were completely unnecessary.


2023-05-13 12:23:32.957600: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int32 and shape [1,77]
	 [[{{node Placeholder/_1}}]]
2023-05-13 12:23:39.370329: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int32 and shape [1,77]
	 [[{{node Placeholder/_1}}]]
2023-05-13 12:24:03.923567: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,76

 1/10 [==>...........................] - ETA: 12:21

2023-05-13 12:25:01.840743: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]
2023-05-13 12:25:06.904259: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]


 2/10 [=====>........................] - ETA: 1:22 

2023-05-13 12:25:12.199290: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]
2023-05-13 12:25:18.100288: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]




2023-05-13 12:25:29.073458: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]
2023-05-13 12:25:42.660113: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]




2023-05-13 12:25:49.321457: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]
2023-05-13 12:25:55.237765: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]




2023-05-13 12:26:02.029768: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]
2023-05-13 12:26:14.576339: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]




2023-05-13 12:26:37.790093: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]
2023-05-13 12:26:59.652590: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]




2023-05-13 12:27:17.544410: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]
2023-05-13 12:27:26.784996: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]




2023-05-13 12:27:35.101590: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]
2023-05-13 12:27:50.063193: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]




2023-05-13 12:28:01.699839: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]
2023-05-13 12:28:09.770420: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype float and shape [1,77,768]
	 [[{{node Placeholder/_2}}]]




2023-05-13 12:28:20.673954: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype float and shape [1,64,64,4]
	 [[{{node Placeholder/_0}}]]


save_image() with title of /Users/nathan/workspace/neural_art_bots/audio_transcription/output_images/2023_05_13_12_

The_show_was_costing_and_production_wa_s1290_0.png and image shape of (1, 512, 512, 3)




loading image at path: /Users/nathan/workspace/neural_art_bots/audio_transcription/output_images/2023_05_13_12_

The_show_was_costing_and_production_wa_s1290_0.png
img: <PIL.PngImagePlugin.PngImageFile image mode=RGB size=512x512 at 0x147B97880>
img type: <class 'PIL.PngImagePlugin.PngImageFile'>
generating upscale model with input shape of (512, 512) and output shape of (1024, 1024)
Image saved to: /Users/nathan/workspace/neural_art_bots/audio_transcription/output_images/2023_05_13_12_

The_show_was_costing_and_production_wa_s1290_0_upscaled.png


TypeError: plot_images() missing 1 required positional argument: 'titles'