In [1]:
import csv
import pandas as pd
from openai import OpenAI
import os
from dotenv import load_dotenv

load_dotenv()
api_key = os.getenv("API_KEY")

client = OpenAI(api_key=api_key)

csv_path = "/Users/kyle/GitHub/gpt-anki-card-creator/anki-add.csv"

df = pd.read_csv(csv_path, header=None, names=["Question", "Answer"])

content_string = "\n\n".join(
    f"Q: {q}\nA: {a}" for q, a in zip(df["Question"], df["Answer"])
)


In [6]:
system_prompt = """
You are an assistant tasked with removing duplicate questions from a set of flashcards. Each flashcard contains a question (labeled "Q:") and an answer (labeled "A:"). The questions can be formulated differently, but ultimately the contents asked should not be there more than once. The answers can be identical or not.

Instructions:
1. Review each question and identify any duplicates.
2. For any duplicate questions, keep only the first instance and remove the others. Do not alter the content or structure of non-duplicate flashcards.
3. Provide the output in the same format, with each question and answer pair clearly separated.
"""

user_prompt = """
Hey, I have a set of flashcard question and answer pairs and need you to find the duplicates. They can be formulated differently. Please delete those that ask the same contents. Think wise, I am tired.
Flashcards:
{file_contents}
"""


def remove_duplicates(file_contents, system_prompt, user_prompt):
    system_prompt = system_prompt.format(file_contents=file_contents)
    user_prompt = user_prompt.format(file_contents=file_contents)
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ],
        temperature=0,
        max_tokens=16384,
    )

    cleaned_flashcards = response.choices[0].message.content
    return cleaned_flashcards


cleaned_flashcards = remove_duplicates(content_string, system_prompt, user_prompt)
print(cleaned_flashcards)


Here is the revised set of flashcards with duplicates removed:

Q: What are the main structures derived from each of the three primary brain vesicles?
A: Prosencephalon, Mesencephalon, and Rhombencephalon lead to various brain structures.

Q: What is the significance of understanding normal brain anatomy in the context of neuroscience?
A: It provides a necessary basis to understand diseases.

Q: What is the significance of fiber tracts in brain communication?
A: They facilitate the rapid transmission of information between brain regions.

Q: What are the major divisions of the human nervous system?
A: Central nervous system (CNS) and peripheral nervous system (PNS).

Q: What are neural tube defects and their common forms?
A: Defects such as Anencephalus and Spina bifida due to failure of closure.

Q: How does the structure of the spinal cord change during growth?
A: The spinal cord doesn't grow as fast as the spine, leading to elongation of rootlets.

Q: What types of fibers are includ

In [8]:
import spacy

nlp = spacy.blank("en")


def text_to_csv(text, output_csv):
    doc = nlp(text)

    questions = []
    answers = []

    for line in text.splitlines():
        line = line.strip()

        if line.startswith("Q: "):
            questions.append(line[3:].strip())
        elif line.startswith("A: "):
            answers.append(line[3:].strip())

    # Daten in einen DataFrame übertragen und als CSV speichern
    df = pd.DataFrame({"Question": questions, "Answer": answers})
    df.to_csv(output_csv, index=False, header=False, encoding="utf-8")

    print(f"Bereinigte Flashcards erfolgreich in {output_csv} gespeichert!")


text_to_csv(cleaned_flashcards, "anki_flashcards_no_duplicates.csv")


Bereinigte Flashcards erfolgreich in anki_flashcards_no_duplicates.csv gespeichert!


Create podcast script lol!


In [9]:
system_prompt = """
Convert lecture transcripts into engaging, listener-friendly podcast scripts.

Your task is to transform the provided lecture transcripts into scripts suitable for podcast episodes. Ensure the content is engaging, cohesive, and caters to a podcast audience rather than a lecture audience. Adjust the tone, structure, and content to be more conversational and listener-friendly while preserving the core information.

# Steps

1. **Review the Transcript**: Carefully read through the lecture transcript to understand the key points and overall message.
2. **Identify Key Sections**: Break down the lecture into main concepts or sections that will become segments of the podcast.
3. **Simplify and Clarify**: Simplify complex ideas for clarity and brevity. Use accessible language that would suit a general audience.
4. **Add Transitions**: Include transitions to smoothly link podcast sections and maintain a narrative flow. These transitions are not music or sound effects, everything in this script has to be read out loud, so don't include anything that can't be spoken.

# Goal
Your end goal is to produce a script that sounds as if it were written for a lively, informative podcast episode rather than a formal lecture.

"""

user_prompt = """
I have a lecture transcript that I want to convert into an engaging podcast script. The podcast should be suitable for neuroscience students and should feel dynamic, conversational, and easy to follow. Please adapt the tone, structure, and content to be more listener-friendly, while keeping the core information intact.

Instructions:
1. **Adjust the tone** to be friendly, conversational, and relatable, making it accessible for students who may listen anywhere and learn as they go.
2. **Structure the content** into clear segments, each focusing on a key concept or takeaway, to ensure a cohesive narrative.
3. **Simplify complex terms** and use accessible language to make it easy for a general neuroscience audience to understand.
4. **Add transitions** to maintain a smooth flow between segments, enhancing the script’s narrative quality.

Here is the transcript for conversion:
{transcript}
"""


In [10]:
import re


def clean_transcript(text):
    # Entferne Timecodes im Format [00:00:00] oder (00:00)
    text = re.sub(r"\[\d{2}:\d{2}:\d{2}\]|\(\d{2}:\d{2}\)", "", text)
    return text


with open("/Users/kyle/GitHub/gpt-anki-card-creator/transcript1.txt", "r") as file:
    raw_text = file.read()
clean_text = clean_transcript(raw_text)

In [12]:
def create_podcast_script(text, system_prompt, user_prompt):
    system_prompt = system_prompt
    user_prompt = user_prompt.format(transcript=text)

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ],
        temperature=0.5,
        max_tokens=16384,
    )

    script = response.choices[0].message.content
    return script


script = create_podcast_script(clean_text, system_prompt, user_prompt)
print(script)

**Podcast Episode Script: Understanding the Nervous System - From Spinal Cord to Brain Development**

---

**[INTRO MUSIC FADES IN]**

**Host:**  
Welcome back to another episode of "Neuroscience Unplugged," where we break down the fascinating world of the nervous system into bite-sized pieces! I’m your host, [Your Name], and today we’re diving deep into the intricacies of the spinal cord, sensory and motor neurons, and even a little bit about brain development. So, grab your notebooks, or just sit back and relax, because we’re about to make some complex concepts much more digestible!

---

**[SEGMENT 1: The Basics of the Nervous System]**

**Host:**  
Alright, let’s kick things off with the spinal cord. Picture this: your spinal cord is like a highway, where information travels in both directions. On one side, we have motor neurons sending signals from the brain to the body. Think of them as the delivery trucks, carrying instructions for movement and action. 

But here’s the twist! Th

In [13]:
with open("/Users/kyle/GitHub/gpt-anki-card-creator/podcast_script.txt", "w") as file:
    file.write(script)