# 책을 요약해서 영상으로 만드는 인공지능

> 유튜브 [빵형의 개발도상국](https://www.youtube.com/@bbanghyong)

Process

1. PDF에서 텍스트 추출 - PyMuPDF
2. 내용 요약 - GPT-4
3. 요약 내용을 기반으로 삽화 그리기 - DALL-E
4. 60초 스크립트 작성 및 번역 - GPT-4
5. 말하는 아바타 만들기 - D-ID Studio
6. 영상 편집 - MoviePy

> Reference: [I Built a Secret AI Youtube Channel](https://youtu.be/4r-_iW8fmWU) by Siraj Raval

## 1. Extract text from PDF file

PDF 파일에서 텍스트 추출

In [11]:
import fitz # pip install --upgrade pymupdf

book_path = 'papers/Demian.pdf'

doc = fitz.open(book_path)

text = doc.get_page_text(pno=2)

print(text)

Prologue 
I cannot tell my story without going a long way back. 
If it were possible I would go back much farther still to 
the very earliest years of my childhood and beyond them 
to my family origins. 
When poets write novels they are apt to behave as if 
they were gods, with the power to look beyond and com-
prehend any human story and serve it up as if the 
Almighty himself, omnipresent, were relating it in all 
its naked truth. That I am no more able to do than the 
poets. But my story is more important to me than any 
poet's story to him, for it is my own-and it is the story 
of a huffian being-not an invented, idealised person 
but a real, live, uniq:-e being. What constitutes a real, 
live human being is more of a mystery than ever these 
days, and men-each one of whom is a valuable, unique 
experiment on the part of nature-are shot down whole-
sale. If, however, we were not something more than 
unique human beings and each man jack of us could 
really be dismissed from this wo

## 2. Summarize

OpenAI GPT-4

[API 키 발급](https://platform.openai.com/account/api-keys)받고 카드 등록 유료로 사용

주의: GPT-4 모델은 1회 요청시 8,192 tokens이 한계이므로 15페이지씩 끊어서 요약한다.

토큰 개수 계산: https://platform.openai.com/tokenizer

In [35]:
import re
from tqdm import tqdm
import openai

openai.api_key = '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'

start_pno = 2
summarize_every = 15
summary_list = [{
    'role': 'system',
    'content': 'You are a helpful assistant for summarizing books.'
}]

count = 0
content = ''

for pno in tqdm(range(start_pno, doc.page_count)):
    text = doc.get_page_text(pno=pno)

    # Preprocess text
    text = re.sub(r"\s+", " ", text)
    text = text.replace('Downloaded from https://www.holybooks.com', '').strip()
    # Remove page number
    text = ' '.join(text.split(' ')[:-1])
    
    content += text + ' '
    count += 1
    
    if count == summarize_every:
        messages = [{
            'role': 'system',
            'content': 'You are a helpful assistant for summarizing books.'
        }, {
            'role': 'user',
            'content': f'Summarize this: {content}'
        }]

        res = openai.ChatCompletion.create(
            model='gpt-4',
            messages=messages
        )

        msg = res['choices'][0]['message']['content']

        summary_list.append({
            'role': 'user',
            'content': msg
        })

        count = 0
        content = ''

100%|██████████| 180/180 [04:03<00:00,  1.35s/it]


In [36]:
summary_list

[{'role': 'system',
  'content': 'You are a helpful assistant for summarizing books.'},
 {'role': 'user',
  'content': '"Demian" is a coming-of-age novel that takes place in a small town, following the life of the protagonist (who is unnamed) as he navigates the two worlds, that of his parents and that of his darker side or the "other" world. The novel begins when the protagonist befriends a rough boy named Franz Kromer, who blackmails him into stealing apples. The protagonist is torn between his loyalty to his family, friends, and the promise of salvation, and the allure of the dark world of crime, passion, and adventure. It is a story of self-discovery, where the protagonist learns to confront his fear and guilt, and make choices that will shape his future.\n\nThe protagonist\'s inner struggle is embodied in the character Demian, who becomes a guiding force in his life. Demian represents the part of the protagonist that seeks to break free from the constraints of society and explore 

## 3. Generate images

삽화 생성

In [63]:
import urllib
import os

os.makedirs('temp', exist_ok=True)

for i, summary in tqdm(enumerate(summary_list)):
    if summary['role'] != 'user':
        continue
    try:
        res_img = openai.Image.create(
            prompt=f'book illustration, {summary["content"][-350:]}',
            n=1,
            size='512x512'
        )

        img_url = res_img['data'][0]['url']
        img_path = f'temp/{str(i).zfill(3)}.png'

        urllib.request.urlretrieve(img_url, img_path)
    except:
        print('skip for violating content policy')

12it [01:15,  6.29s/it]


## 4. Generate script

In [51]:
summary_list.append({
    'role': 'user',
    'content': '위 문장들을 60초 발표 분량으로 요약해줘'
})

In [53]:
res = openai.ChatCompletion.create(
    model='gpt-4',
    messages=summary_list
)

script = res['choices'][0]['message']['content']

print(script)

"Demian" follows the life of protagonist Sinclair, who struggles with self-discovery and identity. Through encounters with various people like Demian, Frau Eva, and Pistorius, he learns the importance of following one's own path, embracing individualism, and overcoming obstacles. Eva becomes a symbol of Sinclair's inner self, and together they explore love, philosophy, and self-realization. The story emphasizes the significance of accepting change, forming deep connections, and personal growth in the face of uncertainty and life's inevitable challenges.


In [55]:
messages = [{
    'role': 'system',
    'content': 'You are a helpful assistant for summarizing and translating books.'
}, {
    'role': 'user',
    'content': f'한국어로 번역해줘: {script}'
}]

res = openai.ChatCompletion.create(
    model='gpt-4',
    messages=messages
)

script_ko = res['choices'][0]['message']['content']

print(script_ko)

"Demian"은 주인공 싱클레어의 삶을 따르며, 자기 발견과 정체성에 대한 고민을 겪습니다. 데미안, 프라우 에바, 피스토리우스와 같은 다양한 사람들과의 만남을 통해 그는 자신만의 길을 따르는 중요성, 개인주의를 받아들이기, 그리고 장애물을 극복하는 법을 배웁니다. 에바는 싱클레어의 내면적인 자아의 상징이 되며, 그들은 함께 사랑, 철학, 그리고 자기 실현을 탐구합니다. 이 이야기는 불확실성과 인생의 필연적인 도전에 맞서 개인의 성장, 변화를 받아들이는 것의 중요성, 그리고 깊은 인간관계 형성을 강조합니다.


## AI 아바타 생성

매드 사이언티스트 사라 (Mad Scientist Sarah)

> Midjourney로 생성한 이미지

<img src="assets/mad scientist sarah.png" width="300px">

## 5. Generate speaking avatar

말하는 아바타 만들기 D-ID Studio (음성 및 말하는 얼굴 합성)

https://www.d-id.com

#### 음성 선택

https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support?tabs=tts#supported-languages

#### API

- https://d-id.readme.io/reference/speaking_portraits_examples
- https://d-id.readme.io/reference/talks-create
- https://d-id.readme.io/reference/talks-get-id

In [56]:
import requests

url = "https://api.d-id.com/talks"

payload = {
    "script": {
        "type": "text",
        "provider": {
            "type": "microsoft",
            "voice_id": "ko-KR-SeoHyeonNeural", # 음성 종류
        },
        "ssml": "false",
        "input": script_ko # 스크립트
    },
    "config": {
        "fluent": "false",
        "pad_audio": "0.0"
    },
    "source_url": "https://i.imgur.com/AkrJpZb.png" # 아바타 이미지 URL
}

headers = {
    "accept": "application/json",
    "content-type": "application/json",
    "authorization": "Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6Ik53ek53TmV1R3ptcFZTQjNVZ0J4ZyJ9.eyJodHRwczovL2QtaWQuY29tL2ZlYXR1cmVzIjoiIiwiaXNzIjoiaHR0cHM6Ly9hdXRoLmQtaWQuY29tLyIsInN1YiI6Imdvb2dsZS1vYXV0aDJ8MTEyMzk1MzY0MjA3ODI4MzgyNjQwIiwiYXVkIjpbImh0dHBzOi8vZC1pZC51cy5hdXRoMC5jb20vYXBpL3YyLyIsImh0dHBzOi8vZC1pZC51cy5hdXRoMC5jb20vdXNlcmluZm8iXSwiaWF0IjoxNjc5MTE5NDM4LCJleHAiOjE2NzkyMDU4MzgsImF6cCI6Ikd6ck5JMU9yZTlGTTNFZURSZjNtM3ozVFN3MEpsUllxIiwic2NvcGUiOiJvcGVuaWQgcHJvZmlsZSBlbWFpbCByZWFkOmN1cnJlbnRfdXNlciB1cGRhdGU6Y3VycmVudF91c2VyX21ldGFkYXRhIG9mZmxpbmVfYWNjZXNzIn0.kPbPp62YFKVR_KuE1UZyCkDGilVfEigWUNWtan9wdl6b3-_gaRrzIsot3Mx1h_spvC8iTIfoJG-p_yvPny3lumDZ77uNm5IfkqJ7s12rkdOxdCFlbNLAd8uCT9h0t_eIsWCGxAtrZ7ZRIGhSPZBfK329Ij8jaQKi0xRMBeiz0wyU6MVg6lOP4fBFufdREFsL0a0FbnnV4kXUAJ8-YP01T4rMWkJ_DI125jbPUHQznlVLlbQbhGGH0rYw0EB77PQUf1TNCZeK6JQ1zi-TX9-EoMVlzCT2hrGVjVCGeSl39ziGLQNKxnTAE3AFZmSVK9bOyHahhGNSpXa_8HJx99gzhQ"
}

response = requests.post(url, json=payload, headers=headers)

print(response.text)

{"id":"tlk_Us1AAocgqL7n7U7M4__WV","created_at":"2023-03-18T11:37:23.914Z","created_by":"google-oauth2|112395364207828382640","status":"created","object":"talk"}


주의: 영상이 생성될때까지 60초 정도 기다린다.

In [57]:
url = f"https://api.d-id.com/talks/{response.json()['id']}"

headers = {
    "accept": "application/json",
    "authorization": "Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6Ik53ek53TmV1R3ptcFZTQjNVZ0J4ZyJ9.eyJodHRwczovL2QtaWQuY29tL2ZlYXR1cmVzIjoiIiwiaXNzIjoiaHR0cHM6Ly9hdXRoLmQtaWQuY29tLyIsInN1YiI6Imdvb2dsZS1vYXV0aDJ8MTEyMzk1MzY0MjA3ODI4MzgyNjQwIiwiYXVkIjpbImh0dHBzOi8vZC1pZC51cy5hdXRoMC5jb20vYXBpL3YyLyIsImh0dHBzOi8vZC1pZC51cy5hdXRoMC5jb20vdXNlcmluZm8iXSwiaWF0IjoxNjc5MTE5NDM4LCJleHAiOjE2NzkyMDU4MzgsImF6cCI6Ikd6ck5JMU9yZTlGTTNFZURSZjNtM3ozVFN3MEpsUllxIiwic2NvcGUiOiJvcGVuaWQgcHJvZmlsZSBlbWFpbCByZWFkOmN1cnJlbnRfdXNlciB1cGRhdGU6Y3VycmVudF91c2VyX21ldGFkYXRhIG9mZmxpbmVfYWNjZXNzIn0.kPbPp62YFKVR_KuE1UZyCkDGilVfEigWUNWtan9wdl6b3-_gaRrzIsot3Mx1h_spvC8iTIfoJG-p_yvPny3lumDZ77uNm5IfkqJ7s12rkdOxdCFlbNLAd8uCT9h0t_eIsWCGxAtrZ7ZRIGhSPZBfK329Ij8jaQKi0xRMBeiz0wyU6MVg6lOP4fBFufdREFsL0a0FbnnV4kXUAJ8-YP01T4rMWkJ_DI125jbPUHQznlVLlbQbhGGH0rYw0EB77PQUf1TNCZeK6JQ1zi-TX9-EoMVlzCT2hrGVjVCGeSl39ziGLQNKxnTAE3AFZmSVK9bOyHahhGNSpXa_8HJx99gzhQ"
}

response = requests.get(url, headers=headers)

print(response.text)

{"metadata":{"driver_url":"bank://lively/driver-05/original","mouth_open":false,"num_faces":1,"num_frames":1115,"processing_fps":52.070323049421184,"resolution":[512,512],"size_kib":11598.21875},"audio_url":"https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%7C112395364207828382640/tlk_Us1AAocgqL7n7U7M4__WV/microsoft.wav?AWSAccessKeyId=AKIA5CUMPJBIK65W6FGA&Expires=1679225846&Signature=5tXxU4BZ6AZFXj1y1g%2B33zh%2FKV0%3D","created_at":"2023-03-18T11:37:23.914Z","face":{"mask_confidence":-1,"detection":[349,305,675,702],"overlap":"no","size":594,"top_left":[215,207],"face_id":0,"detect_confidence":0.9990527033805847},"config":{"stitch":false,"pad_audio":0,"align_driver":true,"sharpen":true,"auto_match":true,"normalization_factor":1,"logo":{"url":"d-id-logo","position":[0,0]},"motion_factor":1,"result_format":".mp4","fluent":false,"align_expand_factor":0.3},"source_url":"https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%7C112395364207828382640/tlk_Us1AAocgqL7n7

In [58]:
import urllib.request

urllib.request.urlretrieve(response.json()['result_url'], 'temp/avatar.mp4') 

('temp/avatar.mp4', <http.client.HTTPMessage at 0x7fb225c4bd30>)

### Resize avatar video

In [69]:
!ffmpeg -hide_banner -loglevel error -i temp/avatar.mp4 -s 170x170 -c:a copy temp/avatar170.mp4

## 6. Edit video

MoviePy를 사용한 영상 편집

In [2]:
from moviepy.editor import *
from moviepy.audio.io.AudioFileClip import AudioFileClip

avatar_clip = VideoFileClip('temp/avatar170.mp4')

avatar_clip.duration

44.6

음악 생성 Riffusion

https://colab.research.google.com/drive/1Vhp0_QTi88EL-3QIZLWfOGq555W51JDP?usp=sharing

In [20]:
audio_clip = AudioFileClip('temp/bgm.mp3')

audio_clip = audio_clip.volumex(0.2)
audio_clip = audio_clip.set_duration(avatar_clip.duration)

print(audio_clip.duration)

44.6


In [21]:
from glob import glob

paper_imgs = sorted(glob('temp/*.png'))
print(len(paper_imgs))

clips = [ImageClip(m).set_duration(avatar_clip.duration / len(paper_imgs)) for m in paper_imgs]

paper_clip = concatenate_videoclips(clips, method="compose")

paper_clip = paper_clip.set_duration(avatar_clip.duration)

print(paper_clip.duration)

11
44.6


In [4]:
# textclip 한글이 안나오는 경우 font 수정
# font issue: https://github.com/Zulko/moviepy/issues/79
# font: https://fonts.google.com/specimen/Nanum+Gothic
[font for font in TextClip.list('font') if 'NanumGothic' in font]

['NanumGothic', 'NanumGothicBold', 'NanumGothicExtraBold']

In [22]:
w, h = paper_clip.size

print('Resize avatar clip and move position to bottom right')
avatar_clip = avatar_clip.set_pos(('right', 'bottom'))

print('Text animation')
txt = TextClip("데미안 1분 요약", color='white', font='NanumGothic-Bold', fontsize=30)
txt_col = txt.on_color(
    size=(txt.w + 10, txt.h + 10),
    color=(0, 0, 0),
    pos=(6, 'center'),
    col_opacity=0.6)
txt_mov = txt_col.set_pos(('center', h / 10))
txt_mov = txt_mov.set_duration(avatar_clip.duration)

print('Composite and write the video file')
result = CompositeVideoClip([paper_clip, avatar_clip, txt_mov])
audios = CompositeAudioClip([avatar_clip.audio, audio_clip])
result = result.set_audio(audios)

result.write_videofile(
    'result.mp4',
    temp_audiofile='temp/audio.m4a',
    remove_temp=True,
    codec='libx264',
    audio_codec='aac',
    threads=32)

Resize avatar clip and move position to bottom right
Text animation
Composite and write the video file
Moviepy - Building video result.mp4.
MoviePy - Writing audio in temp/audio.m4a


                                                                    

MoviePy - Done.
Moviepy - Writing video result.mp4



                                                                

Moviepy - Done !
Moviepy - video ready result.mp4
