## Importing the SpeechRecognition api

In [3]:
pip install SpeechRecognition

Collecting SpeechRecognition
  Downloading speechrecognition-3.14.5-py3-none-any.whl.metadata (30 kB)
Downloading speechrecognition-3.14.5-py3-none-any.whl (32.9 MB)
   ---------------------------------------- 0.0/32.9 MB ? eta -:--:--
   ---------------------------------------- 0.1/32.9 MB 1.7 MB/s eta 0:00:20
   ---------------------------------------- 0.2/32.9 MB 2.5 MB/s eta 0:00:14
    --------------------------------------- 0.5/32.9 MB 3.8 MB/s eta 0:00:09
   - -------------------------------------- 1.1/32.9 MB 6.2 MB/s eta 0:00:06
   - -------------------------------------- 1.5/32.9 MB 7.0 MB/s eta 0:00:05
   -- ------------------------------------- 2.0/32.9 MB 7.5 MB/s eta 0:00:05
   --- ------------------------------------ 2.6/32.9 MB 8.4 MB/s eta 0:00:04
   ---- ----------------------------------- 3.3/32.9 MB 9.7 MB/s eta 0:00:04
   ----- ---------------------------------- 4.3/32.9 MB 10.9 MB/s eta 0:00:03
   ------ --------------------------------- 5.4/32.9 MB 12.4 MB/s eta 

### Debugging the frequency of audio that should be used 

In [49]:
import speech_recognition as sr

r = sr.Recognizer()

with sr.Microphone(device_index=18, sample_rate=48000) as source:
    audio = r.record(source, duration=5)

with open("debug.wav", "wb") as f:
    f.write(audio.get_wav_data())


In [51]:
import speech_recognition as sr
r = sr.Recognizer()

with sr.Microphone(device_index=18) as source:
    print("Say something!")
    audio = r.listen(source)
try:
    print(" thinks you said " + r.recognize_google(audio))
except sr.UnknownValueError:
    print("could not understand audio")
except sr.RequestError as e:
    print("Could not request results from Google Speech Recognition service; {0}".format(e))


Say something!
 thinks you said it was absolutely magical she loved how she got to see new toys getting made she loved exploring new parts of the factory this was just the beginning of a long


### Looping so it doesn't stop

In [13]:
import speech_recognition as sr
import keyboard 
r = sr.Recognizer()

with sr.Microphone(device_index=18) as source:
    print("Listening continuously... Press ctrl to stop.")

    while(True):
        if keyboard.is_pressed('ctrl'):
            print("\nStopping transcription.")
            break

        audio = r.record(source, duration=5)

        try:
            text = r.recognize_google(audio)
            print(">>", text)
        except sr.UnknownValueError:
            pass
        except sr.RequestError as e:
            print("API error:", e)


Listening continuously... Press ctrl to stop.
>> Batman tomorrow
>> to the Discovery Park let's talk about the tool where are we when it comes to
>> play some people might think what you mean we're at Chachi bt5 but like from your perspective
>> where do you think we are so I could I could I could
>> he's talking about you know digital twins Earth observation models
>> one of the things that have resonated with me is this concept
>> alphago
>> go Champion early on in the game and its 37th move
>> right now it's at a place where it answers questions
>> it takes an average of reality and then gives you
>> move 37 was this view into how AI can be creative

Stopping transcription.


#### Creating a context machine that constantly updates the context of whats happening in the video

In [1]:
pip install transformers torch sentencepiece

Note: you may need to restart the kernel to use updated packages.


### Download the model locally

In [1]:
from transformers import pipeline
print("Transformers working.")

  from .autonotebook import tqdm as notebook_tqdm


Transformers working.


In [5]:
from transformers import pipeline
summarizer = pipeline("summarization", model="Falconsai/text_summarization")
print("Model loaded.")



Model loaded.


In [6]:
from transformers import pipeline

summarizer = pipeline(
    "summarization",
    model="Falconsai/text_summarization"
)

text = "This is a long text that needs to be summarized. It contains multiple ideas about machine learning and context modeling."

result = summarizer(text, max_length=50, min_length=10, do_sample=False)

print(result[0]["summary_text"])

Your max_length is set to 50, but your input_length is only 27. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=13)


This text needs to be summarized . It contains multiple ideas about machine learning and context modeling.


In [21]:
import speech_recognition as sr
from transformers import pipeline
import keyboard 

print("loading summarizer...")
summarizer = pipeline(
    "summarization",
    model="Falconsai/text_summarization"
)
r = sr.Recognizer()
context_summary=""

def update_context(old_summary, new_chunk):

    if old_summary.strip() == "":
        combined = new_chunk
    else:
        combined = f"""
        Previous summary:
        {old_summary}

        New transcript:
        {new_chunk}

        Update the summary to reflect new information.
        """

    result = summarizer(
        combined,
        max_length=120,
        min_length=30,
        do_sample=False
    )[0]["summary_text"]

    return result

with sr.Microphone(device_index=18) as source:
    print("Listening continuously... Press ctrl to stop.")

    while(True):
        if keyboard.is_pressed('ctrl'):
            print("\nStopping transcription.")
            break

        audio = r.record(source, duration=10)

        try:
            audio = r.record(source, duration=10)
            chunk = r.recognize_google(audio)
            
            print("\nNew Chunk:")
            print(chunk)

            context_summary = update_context(context_summary, chunk)

            print("\nUpdated Context:")
            print(context_summary)
            print("=" * 70)
        except sr.UnknownValueError:
            pass
        except sr.RequestError as e:
            print("API error:", e)


loading summarizer...
Listening continuously... Press ctrl to stop.


Your max_length is set to 120, but your input_length is only 14. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=7)



New Chunk:
and it teaches you lessons that you will not forget

Updated Context:
and it teaches you lessons that you will not forget . it is a lesson that you won't forget if you forget it . and it will teach you lessons .


Your max_length is set to 120, but your input_length is only 71. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=35)



New Chunk:
to see Humanity in its most glorious moments War teaches you about sorrow

Updated Context:
Previous summary: and it teaches you lessons that you will not forget . it is a lesson that you won't forget if you forget it . and it will teach you lessons .


Your max_length is set to 120, but your input_length is only 70. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=35)



New Chunk:
and then that fragility War teaches you about death

Updated Context:
Previous Summary: Previous summary: and it teaches you lessons that you will not forget . it is a lesson that you won't forget if you forget it . and it will teach you lessons .

Stopping transcription.


### Making emotions from the updated context

In [34]:
from transformers import pipeline

emotion_classifier = pipeline(
    "text-classification",
    model="j-hartmann/emotion-english-distilroberta-base",
    return_all_scores=True
)

result = emotion_classifier("I lost her but death has a meaning and he death will not go in va")
result[0]

[{'label': 'anger', 'score': 0.004384801723062992},
 {'label': 'disgust', 'score': 0.004861755762249231},
 {'label': 'fear', 'score': 0.003753639990463853},
 {'label': 'joy', 'score': 0.0015306900022551417},
 {'label': 'neutral', 'score': 0.029505236074328423},
 {'label': 'sadness', 'score': 0.9535338282585144},
 {'label': 'surprise', 'score': 0.0024300655350089073}]

In [38]:
import speech_recognition as sr
from transformers import pipeline
import keyboard

print("loading summarizer...")
summarizer = pipeline(
    "summarization",
    model="Falconsai/text_summarization"
)

print("loading emotion model...")
emotion_classifier = pipeline(
    "text-classification",
    model="j-hartmann/emotion-english-distilroberta-base",
    top_k=None
)

r = sr.Recognizer()
context_summary = ""

def update_context(old_summary, new_chunk):
    combined = (old_summary + " " + new_chunk).strip()

    result = summarizer(
        "summarize: " + combined,
        max_length=80,
        min_length=20,
        truncation=True,   # added to remove warning
        do_sample=False
    )[0]["summary_text"]

    return result

def detect_emotion(text):
    scores = emotion_classifier(text)[0]
    scores = sorted(scores, key=lambda x: x["score"], reverse=True)
    return scores[0]["label"]

with sr.Microphone(device_index=18, sample_rate=48000) as source:
    print("Listening continuously... Press 's' to stop.")

    while True:

        if keyboard.is_pressed('s'):
            print("\nStopping transcription.")
            break

        try:
            audio = r.record(source, duration=10)
            chunk = r.recognize_google(audio)

            print("\nNew Chunk:")
            print(chunk)

            emotion_label = detect_emotion(chunk)
            print("detected emotion:", emotion_label)

            context_summary = update_context(context_summary, chunk)

            print("\nUpdated Context:")
            print(context_summary)
            print("=" * 70)

        except sr.UnknownValueError:
            pass
        except sr.RequestError as e:
            print("API error:", e)

loading summarizer...
loading emotion model...
Listening continuously... Press 's' to stop.

New Chunk:
is a daily set of meditations and exercises for you but you can also select your own if you've ever wanted to try meditating like I did headspace makes it easy
detected emotion: neutral


Your max_length is set to 80, but your input_length is only 45. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=22)



Updated Context:
you can also select your own if you've ever wanted to try meditating like I did headspace .


Your max_length is set to 80, but your input_length is only 57. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=28)



New Chunk:
all you have to do is sign up using the link in my description or scan the QR code currently on screen lost doesn't
detected emotion: neutral

Updated Context:
you can also select your own if you've ever wanted to try meditating like I did headspace . all you have to do is sign up using the link in my description .


Your max_length is set to 80, but your input_length is only 72. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=36)



New Chunk:
perhaps even billions of people in the world like this one this idea would be repeated by Albert Camus who famously made
detected emotion: neutral

Updated Context:
you can also select your own if you've ever wanted to try meditating like I did headspace . all you have to do is sign up using the link in my description .

New Chunk:
philosophy when we think of the desire to not exist we usually imagine something akin to this kind of thinking the idea that life is meaningless and insufferable is a popular one
detected emotion: anger

Updated Context:
you can also select your own if you've ever wanted to try meditating like I did headspace . all you have to do is sign up using the link in my description .


Your max_length is set to 80, but your input_length is only 71. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=35)



New Chunk:
but to experience non experience the desire is not a logical set of beliefs about the characteristics of life but the comp
detected emotion: fear

Updated Context:
you can also select your own if you've ever wanted to try meditating like I did headspace .


Your max_length is set to 80, but your input_length is only 63. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=31)



New Chunk:
this is entirely unbearable to not want to die but also wanting to simply not exist implies a contradictory hope that one might continue to go on
detected emotion: sadness

Updated Context:
you can also select your own if you've ever wanted to try meditating like I did headspace . this is entirely unbearable to not want to die but also wanting to simply not exist implies a contradictory hope that one might continue to go on .

New Chunk:
when a person has two contradictory beliefs but can only embody one of them they will pick the one that is easier to exhibit in this case it's easy to suggest that the people who
detected emotion: neutral

Updated Context:
you can also select your own if you've ever wanted to try meditating like I did headspace . this is entirely unbearable to not want to die but also wanting to simply not exist implies a contradictory hope that one might continue to go on .

New Chunk:
it takes a certain dedication to accept death of course there is someth

### making a reaction bot

In [2]:
import speech_recognition as sr
from transformers import pipeline
import keyboard

print("loading summarizer...")
summarizer = pipeline(
    "summarization",
    model="Falconsai/text_summarization"
)

print("loading emotion model...")
emotion_classifier = pipeline(
    "text-classification",
    model="j-hartmann/emotion-english-distilroberta-base",
    top_k=None
)

print("loading reaction model...")
reaction_model = pipeline(
    "text-generation",
    model="TinyLlama/TinyLlama-1.1B-Chat-v1.0"
)

r = sr.Recognizer()
context_summary = ""

def update_context(old_summary, new_chunk):
    combined = (old_summary + " " + new_chunk).strip()

    result = summarizer(
        "summarize: " + combined,
        max_length=80,
        min_length=20,
        truncation=True,   # added to remove warning
        do_sample=False
    )[0]["summary_text"]

    return result

def detect_emotion(text):
    scores = emotion_classifier(text)[0]
    scores = sorted(scores, key=lambda x: x["score"], reverse=True)
    return scores[0]["label"]

def generate_reaction(context, emotion):

    prompt = f"""<|system|>
You are a casual human reacting to a video with your friend.
Respond in ONE natural sentence with ONE emoji at the end.
Do not repeat the summary.
<|user|>
Detected emotion: {emotion}

Summary: {context}

React.
<|assistant|>"""

    response = reaction_model(
        prompt,
        max_new_tokens=40,
        do_sample=True,
        temperature=0.8,
        pad_token_id=reaction_model.tokenizer.eos_token_id
    )[0]["generated_text"]

    reply = response.split("<|assistant|>")[-1].strip()
    return reply

with sr.Microphone(device_index=18, sample_rate=48000) as source:
    print("Listening continuously... Press 'ctrl' to stop.")

    while True:

        if keyboard.is_pressed('ctrl'):
            print("\nStopping transcription.")
            break

        try:
            audio = r.record(source, duration=10)
            chunk = r.recognize_google(audio)

            print("\nNew Chunk:")
            print(chunk)

            emotion_label = detect_emotion(chunk)
            print("detected emotion:", emotion_label)

            context_summary = update_context(context_summary, chunk)

            print("\nUpdated Context:")
            print(context_summary)

            reaction= generate_reaction(context_summary,emotion_label)
            
            print("\nAI Reaction:")
            print(reaction)
            print("=" * 70)

        except sr.UnknownValueError:
            pass
        except sr.RequestError as e:
            print("API error:", e)

loading summarizer...




loading emotion model...
loading reaction model...
Listening continuously... Press 'ctrl' to stop.

New Chunk:
that these tapes all these tapes are pre-recorded and honestly that shouldn't be a surprise phone guys calls are recordings in every game


Your max_length is set to 80, but your input_length is only 38. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=19)


detected emotion: surprise

Updated Context:
all these tapes are pre-recorded and honestly that shouldn't be a surprise phone guys calls are recordings in every game .

AI Reaction:
I'm sorry, but I don't have an emotional response to a video. Please provide more context or details to understand my emotion.

New Chunk:
I meant for the night security guard from months ago need more proof night sixes call references a birthday party happening as the last event before the building


Your max_length is set to 80, but your input_length is only 67. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=33)


detected emotion: neutral

Updated Context:
all these tapes are pre-recorded and honestly that shouldn't be a surprise phone guys call are recordings in every game . I meant for the night security guard from months ago need more proof night sixes call references a birthday party .

AI Reaction:
Hey, I don't really see your point here. Pre-recorded tapes are definitely a thing in sports, and it's not a surprise that they can be used as

New Chunk:
play Colton telling us that one of the new toy animatronics spit someone during the day why else would the new robots be scrapped will the old ones get kept for possible re
detected emotion: neutral

Updated Context:
all these tapes are pre-recorded and honestly that shouldn't be a surprise phone guys call are recordings in every game . I meant for the night security guard from months ago need more proof night sixes call references a birthday party .

AI Reaction:
Emoticon: ðŸ˜Ž

Response: Of course, your friend! I didn't realize the whole ser