
# Summarize lecture note
- Summarize Youtube's script by chapter creater configured. 
  - Create `markdown_note.md` with script and summary.
- Use [yt-dlp](https://pypi.org/project/yt-dlp/), [pydub](https://pypi.org/project/pydub/), [OpenAI-Whisper](https://pypi.org/project/openai-whisper/), [langchain](https://github.com/hwchase17/langchain), and [OpenAI](https://github.com/openai/openai-python) package. 


# Initialize system

In [1]:

# Temporary save data into file 
import os 
import json 

script_by_chapter = {}
with open("temp_script_by_chapter.json", "r") as st_json:
    script_by_chapter = json.load(st_json)

# Write note by summarizing contents


In [2]:
import os
import llm
from dotenv import load_dotenv
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Setup OpenAI API key 
load_dotenv()

# define text splitter
chunk_size = 800
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = chunk_size,
    chunk_overlap  = 0,
    length_function = len
)


# Prepare the file path for text
text_file_folder_path = os.path.join( os.getcwd(), 'text')

# Create temporary folder to store note
note_file_folder_path = os.path.join( os.getcwd(), 'note')
if not os.path.exists( note_file_folder_path ):
    os.makedirs(note_file_folder_path) 


# Summarize each chapter
index = 0
for index, c in enumerate(script_by_chapter):
    
    # if there are note file, then skip it. 
    note_text_file = os.path.join(note_file_folder_path, f'{index}.txt')
    if os.path.exists(note_text_file):
        break

    transcript = ""
    # Read text data from file.
    with open( os.path.join(text_file_folder_path, f'{index}.txt'), "r" ) as file:
        transcript = file.read()


    # Log
    title = c["title"]
    print( f"Chapter {title} is processing..")

    # Replace transcript with rectified version. 
    c["script"] = transcript


    # Summarize the text 
    summarized_text = llm.summary_docs(transcript)
    c["summary"] = summarized_text

    # Save note into file
    with open( note_text_file, "w") as file:
        file.write(summarized_text)
    index += 1

    print('\n')



Chapter 시작 is processing..
..

Chapter 댓글 읽어보기 is processing..
...............

Chapter 조직을 어떻게 분석할 것인가? 인사조직론의 구조 is processing..
...

Chapter 앵글로색슨 모형의 피라미드형 계급구조 vs. 게르만 모형의 네트워크형 수평구조 is processing..
.....

Chapter 대전환의 의미: 이것이 개혁이다 is processing..
.......

Chapter 대의원제도의 폐해: 원내대표 박광온을 비롯한 수박들의 사고체계 is processing..
................

Chapter 경영플랫폼의 설계와 운용에서 가장 중요한 것은 무엇인가? is processing..
..

Chapter 이재명의 민주당의 비전은 무엇인가? 기본사회를 향하여 is processing..
.

Chapter 기본사회라는 비전은 어떻게 전략으로 전환되는가? is processing..
........

Chapter 부탁의 말씀 is processing..
.........

Chapter 전략이란 무엇인가? 이니셔티브들의 조합 is processing..
..

Chapter 전략의 실행은 철학, 실력, 용기를 필요로 하며 자율성을 위한 분권화를 말한다 is processing..
........

Chapter 촛불혁명의 정신을 생각하자 is processing..
.

Chapter 질서와 자유를 조화시켰던 독일의 지식인들과 조화의 사상가 조소앙의 삼균주의 is processing..
....

Chapter 정리 is processing..
.



## Publish markdown document

Will write down all contents into `markdown_note.md`. This is the summarized note for this Youtube video. 

In [3]:
import os

# Remove temporary data
os.remove("temp_script_by_chapter.json")

# Save chapter data into file 
with open( "script_by_chapter.json", "w") as file:
    file.write( json.dumps(script_by_chapter, indent=2, ensure_ascii=False) )


In [4]:
full_markdown_text = ""

for c in script_by_chapter:
    full_markdown_text += f"# {c['title']} \n\n"
    full_markdown_text += f"## Summary \n"
    full_markdown_text += f"{c['summary']} \n\n"
    full_markdown_text += f"## Script \n\n"
    full_markdown_text += f"{c['script']} \n"
    full_markdown_text += "\n\n"

In [5]:

# Write markdown document for note.
with open( "markdown_note.md", "w") as file:
    file.write(full_markdown_text)

In [6]:
# Remove temporary data
os.remove("script_by_chapter.json")