
# Summarize lecture note
- Summarize Youtube's script by chapter creater configured. 
  - Create `markdown_note.md` with script and summary.
- Use [yt-dlp](https://pypi.org/project/yt-dlp/), [pydub](https://pypi.org/project/pydub/), [OpenAI-Whisper](https://pypi.org/project/openai-whisper/), [langchain](https://github.com/hwchase17/langchain), and [OpenAI](https://github.com/openai/openai-python) package. 


# Initialize system

In [1]:

# Temporary save data into file 
import os 
import json 

script_by_chapter = {}
with open("temp_script_by_chapter.json", "r") as st_json:
    script_by_chapter = json.load(st_json)

# Write note by summarizing contents


In [2]:
import os
import llm
from dotenv import load_dotenv
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Setup OpenAI API key 
load_dotenv()

# define text splitter
chunk_size = 800
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = chunk_size,
    chunk_overlap  = 0,
    length_function = len
)


# Prepare the file path for text
text_file_folder_path = os.path.join( os.getcwd(), 'text')

# Create temporary folder to store note
note_file_folder_path = os.path.join( os.getcwd(), 'note')
if not os.path.exists( note_file_folder_path ):
    os.makedirs(note_file_folder_path) 


# Summarize each chapter
index = 0
for index, c in enumerate(script_by_chapter):
    
    # if there are note file, then skip it. 
    note_text_file = os.path.join(note_file_folder_path, f'{index}.txt')
    if os.path.exists(note_text_file):
        break

    transcript = ""
    # Read text data from file.
    with open( os.path.join(text_file_folder_path, f'{index}.txt'), "r" ) as file:
        transcript = file.read()


    # Log
    title = c["title"]
    print( f"Chapter {title} is processing..")

    # Replace transcript with rectified version. 
    c["script"] = transcript


    # Summarize the text 
    summarized_text = llm.summary_docs(transcript)
    c["summary"] = summarized_text

    # Save note into file
    with open( note_text_file, "w") as file:
        file.write(summarized_text)



Chapter 시작 is processing..


Created a chunk of size 1220, which is longer than the specified 1000
Created a chunk of size 1395, which is longer than the specified 1000
Created a chunk of size 1258, which is longer than the specified 1000
Created a chunk of size 1319, which is longer than the specified 1000
Created a chunk of size 1847, which is longer than the specified 1000
Created a chunk of size 1266, which is longer than the specified 1000


Chapter 댓글 읽어보기: 강진구 기자에 대하여 is processing..
Chapter 불합리한 것은 모두 억압하고 착취하기 위한 것이다: ‘그놈정신’을 갖자 is processing..


Created a chunk of size 1380, which is longer than the specified 1000
Created a chunk of size 1074, which is longer than the specified 1000
Created a chunk of size 1082, which is longer than the specified 1000
Created a chunk of size 1595, which is longer than the specified 1000


Chapter 억압과 착취의 메커니즘과 국가경영의 지배구조 is processing..


Created a chunk of size 1140, which is longer than the specified 1000


Chapter 억압과 착취의 메커니즘: 뉴스타파 봉지욱 기자의 고발: 화천대유는 누구의 것 is processing..
Chapter 우리의 현실과 민주당의 인식 is processing..
Chapter 불합리한 기득권에 안주하고 있는 사람들: 문재인, 이낙연, 윤석열 is processing..


Created a chunk of size 1098, which is longer than the specified 1000
Created a chunk of size 1102, which is longer than the specified 1000


Chapter 진리: 사실부합성과 인간부합성의 조건 is processing..
Chapter 철학, 실력, 용기가 없으면 반드시 착취당하게 된다. is processing..
Chapter 구체적이고도 시급한 치유방안 is processing..
Chapter 억압과 착취의 구조 → 대화와 토론의 구조 is processing..


Created a chunk of size 1686, which is longer than the specified 1000


Chapter 장엄한 대전환을 위하여 ‘그놈정신’을 발휘하자 is processing..


Created a chunk of size 1054, which is longer than the specified 1000


Chapter 부탁의 말씀 is processing..
Chapter 정리 is processing..


## Publish markdown document

Will write down all contents into `markdown_note.md`. This is the summarized note for this Youtube video. 

In [3]:
import os

# Save chapter data into file 
with open( "script_by_chapter.json", "w") as file:
    file.write( json.dumps(script_by_chapter, indent=2, ensure_ascii=False) )


In [4]:
full_markdown_text = ""

for c in script_by_chapter:
    full_markdown_text += f"# {c['title']} \n\n"
    full_markdown_text += f"## Summary \n"
    full_markdown_text += f"{c['summary']} \n\n"
    full_markdown_text += f"## Script \n\n"
    full_markdown_text += f"{c['script']} \n"
    full_markdown_text += "\n\n"

In [5]:

# Write markdown document for note.
with open( "markdown_note.md", "w") as file:
    file.write(full_markdown_text)

In [6]:
# Remove temporary data
os.remove("temp_script_by_chapter.json")
os.remove("script_by_chapter.json")