In [10]:
%load_ext dotenv
%dotenv

The dotenv extension is already loaded. To reload it, use:
  %reload_ext dotenv


In [11]:
import requests
from bs4 import BeautifulSoup
import re

from openai import OpenAI
import os
from pathlib import Path
from tqdm import tqdm
from pydub import AudioSegment

In [12]:
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')

In [13]:
client = OpenAI(
    api_key=OPENAI_API_KEY
)

## prompt

In [19]:
system_prompt = r"""I want to generate a podcast from LaTeX code. Please take the following LaTeX code and transcribe all content into a format optimized for text-to-speech. Do not make any effort to summarize or compress content–all original words by the author must be preserved. All text content should be preserved or transcribed into a easily readable format. Equations and math should be transcribed such that they are human readable in text. For example, $a^2$ should be transcribed as 'a squared'. Furthermore, all commands should also be transcribed to readable text. For example, commands such as \section and \title should be read as 'section' and 'title' respectively, and \cite or \citet should transcribe the citation as an in-text citation. Figures, tables, and comments must be omitted in their entirety."""

## helpers

In [15]:
def split_latex_by_section(latex_content):
    # Pattern to match section and subsection commands
    pattern = r'(\\section\{.*?\}|\\subsection\{.*?\})'
    
    # Split the content by the pattern, keeping the delimiters
    parts = re.split(pattern, latex_content)
    
    # Combine each command with its following content
    combined_parts = []
    for i in range(1, len(parts) - 1, 2):
        combined_parts.append(parts[i] + parts[i + 1])
    
    # Add the last part if it doesn't end with a command
    if len(parts) % 2 == 1:
        combined_parts.append(parts[-1])
    
    return combined_parts

In [16]:

def generate_audio(snippet, index):
    speech_file_path = f"data/snippet_{index}.mp3"
    response = client.audio.speech.create(
        model="tts-1",
        voice="alloy",
        input=snippet
    )
    response.stream_to_file(speech_file_path)
    return speech_file_path

def generate_snippet(section):
    chat_completion = client.chat.completions.create(
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": section}
        ],
        model="gpt-3.5-turbo-0125",
        max_tokens=4096,
        n=1
    )
    snippet = chat_completion.choices[0].message.content
    return snippet

## generation

In [17]:
chonk = input("paste raw LaTeX here:")

paste raw LaTeX here: \documentclass{article} % For LaTeX2e \usepackage{iclr2023_conference,times}  % Optional math commands from https://github.com/goodfeli/dlbook_notation. \input{math_commands.tex}  \usepackage{hyperref} \usepackage{url} \usepackage{booktabs}       % professional-quality tables \usepackage{amsfonts}       % blackboard math symbols \usepackage{nicefrac}       % compact symbols for 1/2, etc. \usepackage{microtype}      % microtypography \usepackage{xcolor}         % colors \usepackage{graphicx} \usepackage{amssymb} \usepackage{tabularx} \usepackage[ruled,vlined]{algorithm2e} \usepackage{amsmath} \usepackage{subcaption} \usepackage{multirow} \usepackage{todonotes}  \newcommand{\TODO}[1]{\textcolor{red}{(TODO: #1)}} \newcommand{\mw}[1]{\textcolor{blue}{(MW: #1)}} \newcommand{\kl}[1]{\textcolor[rgb]{0,0.545,0.545}{[KL: #1]}} \newcommand{\fv}[1]{\textcolor[rgb]{0.85, 0, 0}{[Fernanda: #1]}} \newcommand{\db}[1]{\textcolor[rgb]{0.8,0,0}{[@David: #1]}} \newcommand{\hp}[1]{\te

## just text

In [20]:
text = ""

sections = split_latex_by_section(chonk)
for i, section in enumerate(tqdm(sections)):
    # Generate the chat completion
    snippet = generate_snippet(section)
    
    # Check if the snippet is too long
    if len(snippet) > 4096:
        # Split the snippet at an arbitrary newline
        split_index = snippet.find('\n', len(snippet) // 4)
        snippet_part1 = snippet[:split_index]
        snippet_part2 = snippet[split_index + 1:]
        
        # rerun cleaning for both parts
        snippet_part1 = generate_snippet(snippet_part1)
        snippet_part2 = generate_snippet(snippet_part2)
        
        text += f"{snippet_part1} \n\n"
        text += f"{snippet_part2} \n\n"
    else:
        # Generate audio for the snippet
        text += f"{snippet} \n\n"


100%|█████████████████████████████████████████| 24/24 [02:04<00:00,  5.18s/it]


In [22]:
text.split('Awk')

Introduction. Recent language models have shown an intriguing range of capabilities. Networks trained on a simple "next-word" prediction task are apparently capable of many other things, such as solving logic puzzles or writing basic code. 

Yet how this type of performance emerges from sequence predictions remains a subject of current debate. Some have suggested that training on a sequence modeling task is inherently limiting. The arguments range from philosophical (Bender, 2020) to mathematical (Merrill, 2021). A common theme is that seemingly good performance might result from memorizing "surface statistics," i.e., a long list of correlations that do not reflect a causal model of the process generating the sequence. This issue is of practical concern, since relying on spurious correlations may lead to problems on out-of-distribution data (Bender, 2021; Floridi, 2020).

On the other hand, some tantalizing clues suggest language models may do more than collect spurious correlations, i

## text + audio

In [None]:
# Create a list to store the paths of the individual audio files
audio_file_paths = []

text = ""

sections = split_latex_by_section(chonk)
for i, section in enumerate(tqdm(sections)):
    # Generate the chat completion
    snippet = generate_snippet(section)
    
    # Check if the snippet is too long
    if len(snippet) > 4096:
        # Split the snippet at an arbitrary newline
        split_index = snippet.find('\n', len(snippet) // 4)
        snippet_part1 = snippet[:split_index]
        snippet_part2 = snippet[split_index + 1:]
        
        # rerun cleaning for both parts
        snippet_part1 = generate_snippet(snippet_part1)
        snippet_part2 = generate_snippet(snippet_part2)
        
        # Generate audio for both parts
        audio_file_paths.append(generate_audio(snippet_part1, f"{i}_1"))
        audio_file_paths.append(generate_audio(snippet_part2, f"{i}_2"))
        
        text += f"{snippet_part1} \n\n"
        text += f"{snippet_part2} \n\n"
    else:
        # Generate audio for the snippet
        audio_file_paths.append(generate_audio(snippet, i))
        text += f"{snippet} \n\n"

print('concatenating audio files...')
# Concatenate all the audio files
combined_audio = AudioSegment.empty()
for path in tqdm(audio_file_paths):
    audio = AudioSegment.from_mp3(path)
    combined_audio += audio

# Export the combined audio to a single MP3 file
combined_audio.export("belinkov.mp3", format="mp3")