
# Summarize lecture note
- Summarize Youtube's script by chapter creater configured. 
  - Create `markdown_note.md` with script and summary.
- Use [yt-dlp](https://pypi.org/project/yt-dlp/), [pydub](https://pypi.org/project/pydub/), [OpenAI-Whisper](https://pypi.org/project/openai-whisper/), [langchain](https://github.com/hwchase17/langchain), and [OpenAI](https://github.com/openai/openai-python) package. 


# Initialize system

In [1]:

# Temporary save data into file 
import os 
import json 

script_by_chapter = {}
with open("temp_script_by_chapter.json", "r") as st_json:
    script_by_chapter = json.load(st_json)

# Write note by summarizing contents


In [2]:
import os
import llm
from dotenv import load_dotenv
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Setup OpenAI API key 
load_dotenv()

# Prepare the file path for text
text_file_folder_path = os.path.join( os.getcwd(), 'text')

# Create temporary folder to store note
note_file_folder_path = os.path.join( os.getcwd(), 'note')
if not os.path.exists( note_file_folder_path ):
    os.makedirs(note_file_folder_path) 

# Summarize each chapter
for index, c in enumerate(script_by_chapter):
    
    # if there are note file, then skip it. 
    note_text_file = os.path.join(note_file_folder_path, f'{index}.txt')
    if not os.path.exists(note_text_file):

        # Read text data from file.
        transcript = ""
        with open( os.path.join(text_file_folder_path, f'{index}.txt'), "r" ) as file:
            transcript = file.read()

        # Log
        title = c["title"]
        print( f"Chapter {title} is processing..")

        # Replace transcript with rectified version. 
        c["script"] = transcript


        # Summarize the text 
        summarized_text = llm.summary_docs(transcript, chunk_size=1000, chunk_overlap=100)
        c["summary"] = summarized_text

        # Save note into file
        with open( note_text_file, "w") as file:
            file.write(summarized_text)



Chapter 윤석열은 도대체 왜 이러는 걸까? is processing..


Created a chunk of size 1136, which is longer than the specified 1000
Created a chunk of size 2723, which is longer than the specified 1000


Chapter 나는 왜 유튜브를 하려고 하는가? 왜 하필 라이브(live)로. is processing..


Created a chunk of size 3054, which is longer than the specified 1000
Created a chunk of size 2134, which is longer than the specified 1000


Chapter 냉철한 사랑, 즉 조직론적 사랑에 대하여 is processing..


Created a chunk of size 11619, which is longer than the specified 1000
Created a chunk of size 8454, which is longer than the specified 1000
Created a chunk of size 4110, which is longer than the specified 1000


Chapter 조직론적 사랑은 논문표절을 어떻게 처리할까? is processing..




## Publish markdown document

Will write down all contents into `markdown_note.md`. This is the summarized note for this Youtube video. 

In [3]:
import os

# Save chapter data into file 
with open( "script_by_chapter.json", "w") as file:
    file.write( json.dumps(script_by_chapter, indent=2, ensure_ascii=False) )


In [4]:
full_markdown_text = ""

for c in script_by_chapter:
    full_markdown_text += f"# {c['title']} \n\n"
    full_markdown_text += f"## Summary \n"
    full_markdown_text += f"{c['summary']} \n\n"
    full_markdown_text += f"## Script \n\n"
    full_markdown_text += f"{c['script']} \n"
    full_markdown_text += "\n\n"

In [5]:

# Write markdown document for note.
with open( "markdown_note.md", "w") as file:
    file.write(full_markdown_text)

In [6]:
# Remove temporary data
# os.remove("temp_script_by_chapter.json")
os.remove("script_by_chapter.json")