##  Generate Jokes from a Rugby Math using DeepSeek R1 - Llama 8B

#### By Mauricio Toro, Head of Data Science

### Install libraries and download models

In [1]:
!pip install --quiet --upgrade pip

In [2]:
!pip --quiet install ollama TTS pydub

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
dataprep 0.4.5 requires bokeh<3,>=2, but you have bokeh 1.4.0 which is incompatible.
dataprep 0.4.5 requires pydantic<2.0,>=1.6, but you have pydantic 2.10.6 which is incompatible.
moviepy 2.1.2 requires numpy>=1.25.0, but you have numpy 1.22.0 which is incompatible.
pywavelets 1.6.0 requires numpy<3,>=1.22.4, but you have numpy 1.22.0 which is incompatible.
statsmodels 0.14.2 requires numpy>=1.22.3, but you have numpy 1.22.0 which is incompatible.
visions 0.7.6 requires numpy>=1.23.2, but you have numpy 1.22.0 which is incompatible.
visions 0.7.6 requires pandas>=2.0.0, but you have pandas 1.5.3 which is incompatible.[0m[31m
[0m

In [3]:
!ollama pull deepseek-r1:8b

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest 
pulling 6340dc3229b0... 100% ▕████████████████▏ 4.9 GB                         
pulling 369ca498f347... 100% ▕████████████████▏  387 B                         
pulling 6e4c38e1172f... 100% ▕████████████████▏ 1.1 KB                         
pulling f4d24e9138dd... 100% ▕████████████████▏  148 B                         
pulling 0cb05c6e4e02... 100% ▕████████████████▏  487 B                         
verifying sha256 digest 
writing manifest 
success [?25h


In [4]:
!ollama show deepseek-r1:8b

  Model
    architecture        llama     
    parameters          8.0B      
    context length      131072    
    embedding length    4096      
    quantization        Q4_K_M    

  Parameters
    stop    "<｜begin▁of▁sentence｜>"    
    stop    "<｜end▁of▁sentence｜>"      
    stop    "<｜User｜>"                 
    stop    "<｜Assistant｜>"            

  License
    MIT License                    
    Copyright (c) 2023 DeepSeek    



### Import libraries and define helper functions

In [5]:
import ollama # To call the deepsearch model
import pandas as pd # To use a dataframe
import re # To manage regexes to clean the output of the model
from TTS.api import TTS # To transform text into voice
from pydub import AudioSegment # To export the audio in wav format into mp3
from IPython.display import Audio, display # To listen to the voice in Jupyter
import io # To capture TTS output
import sys # To capture TTS output
import subprocess # To call ffmpeg and make a video
from IPython.display import Video # To display the video

### Scraping Live Commentary from Premiership Rugby

In [6]:
game_link = "https://www.premiershiprugby.com/match-centre/283210/live-commentary" # To do: Scrap information from the website

In [7]:
team_1 = 'Sale Sharks'
team_2 = 'Bath Rugby'

In [8]:
live_commentary = [
                   ['14', '0-0', team_1, 'Missed Penalty!', 'Robert du Preez'],
                   ['17', '5-0', team_1, 'Try!', 'Arron Reed'],
                   ['18', '5-0', team_1, 'Missed conversion!', 'Robert du Preez'],
                   ['22', '8-0', team_1, 'Penalty!', 'Robert du Preez'],
                   ['24', '8-5', team_2, 'Try!', 'Ruaridh McConnochie'],
                   ['25', '8-5', team_2, 'Missed conversion!', 'Finn Russell'],
                   ['26', '8-5', team_1, 'Sub Off', 'Willgriff John'],
                   ['26', '8-5', team_1, 'Sub On', 'Tye Raymont'],
                   ['27', '13-5', team_1, 'Try!', 'Robert du Preez']
                  ]

In [9]:
live_commentary_df = pd.DataFrame(live_commentary, columns = ['time', 'score', 'team', 'event', 'player'])
live_commentary_df

Unnamed: 0,time,score,team,event,player
0,14,0-0,Sale Sharks,Missed Penalty!,Robert du Preez
1,17,5-0,Sale Sharks,Try!,Arron Reed
2,18,5-0,Sale Sharks,Missed conversion!,Robert du Preez
3,22,8-0,Sale Sharks,Penalty!,Robert du Preez
4,24,8-5,Bath Rugby,Try!,Ruaridh McConnochie
5,25,8-5,Bath Rugby,Missed conversion!,Finn Russell
6,26,8-5,Sale Sharks,Sub Off,Willgriff John
7,26,8-5,Sale Sharks,Sub On,Tye Raymont
8,27,13-5,Sale Sharks,Try!,Robert du Preez


## AI-Generated Jokes from Live Commentary

In [10]:
def call_chat(model_name, prompt):
    # Calls a chat with a prompt
    output =  ollama.chat(model=model_name, messages=[
            {
             'role': 'user',
             'temperature': 0,
             'seed': 42,
             'top_p':0.5,
             'content': prompt,
             },])['message']['content']
    return output

In [11]:
def remove_think_tags(input_string):
    # Regular expression to match text between <think> and </think>
    cleaned_string = re.sub(r'<think>.*?</think>', '', input_string, flags=re.DOTALL)
    return cleaned_string

In [12]:
def extract_first_inside_quotes(input_string):
    # Use regular expression to find the first content inside quotes
    match = re.search(r'"([^"]*?)"', input_string, re.DOTALL)
    if match:
        return match.group(1)
    return input_string

In [13]:
game_title = f"{team_1} Versus {team_2} a Rugby Match"
print (game_title)
ia_output = []
for _, row in live_commentary_df.iterrows(): 
    prompt = f"You are an assitant to a UK based comedian. You are going to make a joke uses UK slangs and dark humor and it is a forecast of what is going to happen in Rugby gamein the following minutes based on a live comment that I will give you. So, make a joke about the following comment:  Team 1 is {team_1} and team 2 is {team_2}. At time {row['time']},the score is {row['score']} and team {row['team']}, player {row['player']} did {row['event']}. You will only reply with a joke, only a joke to forecast what is going to happen in the next minutes based on the event. Only a joke, that is it."
    event_description = f"-> At time {row['time']}, the score is {row['score']} and team {row['team']}, player {row['player']} did a {row['event']}"
    print(event_description)
    chat_output = call_chat("deepseek-r1:8b",prompt)
    chat_output_without_think_tags = extract_first_inside_quotes(remove_think_tags(chat_output))
    print("----> "+chat_output_without_think_tags )
    ia_output += [(event_description, chat_output_without_think_tags)]

Sale Sharks Versus Bath Rugby a Rugby Match
-> At time 14, the score is 0-0 and team Sale Sharks, player Robert du Preez did a Missed Penalty!
----> 

Well, Robert's having an off-day—hope he's not trying to jinx his own team. Next few minutes might be a bit painful for Sale Sharks fans.
-> At time 17, the score is 5-0 and team Sale Sharks, player Arron Reed did a Try!
----> Sale Sharks are leading 5-0, but Bath better start trying, or it'll be 10-5 before you can say 'carrying on!'
-> At time 18, the score is 5-0 and team Sale Sharks, player Robert du Preez did a Missed conversion!
----> Apparently, Robert du Preez’s luck is so bad, he could convert a missed conversion into a winning try!
-> At time 22, the score is 8-0 and team Sale Sharks, player Robert du Preez did a Penalty!
----> Robert du Preez’s penalty has given Bath Rugby the perfect chance—and not in a good way for Sale Sharks!
-> At time 24, the score is 8-5 and team Bath Rugby, player Ruaridh McConnochie did a Try!
----> R

## Combine Comments and Jokes into One Audio File

In [14]:
def extract_bracket_content(text):
    """Extracts content inside the first pair of square brackets [ ] from a given string."""
    match = re.search(r'\[(.*?)\]', text, re.DOTALL)  # DOTALL allows newlines inside brackets
    return match.group(1).strip() if match else None  # Remove extra spaces & return

In [15]:
def create_wav(filename, input_text):
    # Convert text into wav audio, and return the splitted sentences that were generated by TTS

    # Capture the printed output
    captured_output = io.StringIO()  # Create a StringIO object
    sys.stdout = captured_output  # Redirect stdout to the StringIO object

    tts.tts_to_file(text=input_text, speaker="p243", file_path=filename+".wav")  # Call the function

    sys.stdout = sys.__stdout__  # Reset stdout back to default

    # Get the captured text
    output_text = captured_output.getvalue()

    return list(eval(extract_bracket_content(output_text)))

In [16]:
comments_and_jokes_text = game_title + " "
for event_description, chat_output_without_think_tags in ia_output:
    comments_and_jokes_text += event_description + " " + chat_output_without_think_tags + " "

In [17]:
model_name = "tts_models/en/vctk/vits"

# Init TTS
tts = TTS(model_name)
filename = "output"
splitted_sentences = create_wav(filename,comments_and_jokes_text )

 > tts_models/en/vctk/vits is already downloaded.
 > Using model: vits
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:0
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:None
 | > fft_size:1024
 | > power:None
 | > preemphasis:0.0
 | > griffin_lim_iters:None
 | > signal_norm:None
 | > symmetric_norm:None
 | > mel_fmin:0
 | > mel_fmax:None
 | > pitch_fmin:None
 | > pitch_fmax:None
 | > spec_gain:20.0
 | > stft_pad_mode:reflect
 | > max_norm:1.0
 | > clip_norm:True
 | > do_trim_silence:False
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:10
 | > hop_length:256
 | > win_length:1024
 > initialization of speaker-embedding layers.


  return torch.load(f, map_location=map_location, **kwargs)


Conversion to MP3 completed!
Video saved at output_video.mp4


In [18]:
# Load the .wav file
audio = AudioSegment.from_wav(filename+".wav")

# Export as .mp3
audio.export(filename+".mp3", format="mp3")

print("Conversion to MP3 completed!")

In [19]:
# Display the MP3 file
display(Audio(filename+".mp3"))

### Generate a video using an image and the audio

In [20]:
# Input files (image and audio)
image_path = 'deepseek.png'  # Path to the image file
audio_path = 'output.mp3'    # Path to the audio file
output_video_path = 'output_video.mp4'  # Path to the output video file

# Command to create a video from image and audio using ffmpeg
command = [
    'ffmpeg', 
    '-loop', '1',  # Loop the image
    '-framerate', '2',  # Set the frame rate (2 fps is fine for a static image)
    '-i', image_path,  # Input image
    '-i', audio_path,  # Input audio
    '-c:v', 'libx264',  # Video codec
    '-tune', 'stillimage',  # For still image video
    '-c:a', 'aac',  # Audio codec
    '-strict', 'experimental',  # For some ffmpeg versions
    '-shortest',  # The video length will be the same as the audio
    output_video_path  # Output video file
]

# Run the command
subprocess.run(command)

print(f"Video saved at {output_video_path}")

ffmpeg version 7.1 Copyright (c) 2000-2024 the FFmpeg developers
  built with Apple clang version 16.0.0 (clang-1600.0.26.4)
  configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/7.1_4 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags='-Wl,-ld_classic' --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libharfbuzz --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --e

In [21]:
# Display the video in Jupyter
Video("output_video.mp4")