# Whisper48

<font size='4'>**对于中文用户，推荐在使用前阅读[常见问题说明](https://github.com/Ayanaminn/N46Whisper/blob/main/FAQ.md)。如果你觉得本应用对你有所帮助，欢迎帮助扩散给更多的人。**


## 更新/What's New：
2023.3.15:
* 添加按空格分割同一行中的多词/句功能/ Allow users to split multiple words/sententces in one line.
* 修订文档以及其它一些优化。/ Update doc and other minor fixes.

2023.3.12:
* 添加chatGPT翻译并生成双语字幕功能/ Add chatGPT translation and bilingual subtitle file generation features.
* 修订文档以及其它一些优化。/ Update doc and other minor fixes.

2023.1.26:
* 修正代码以反映Whisper的更新/ Update
 script to reflect recent updates from Whisper.

2022.12.31：
* 添加了允许用户从挂载的谷歌云盘中直接选择要转换的文件的功能。本地上传文件的选项仍然保留。/ Allow user to select files directly from mounted google drive.
* 修订文档以及其它一些优化。/ Update doc and other minor fixes.

## **<font size='5'>顺次点击下方每个单元格左侧的“运行”图标，以开始使用**

In [None]:
#@markdown **挂载你的谷歌网盘/Mount Google Drive** 
#@markdown **</br>【重要】:** 务必在"修改"->"笔记本设置"->"硬件加速器"中选择GPU！否则处理速度会非常慢。
#@markdown **</br>【IMPORTANT】:** Make sure you select GPU as hardware accelerator in notebook settings, otherwise the processing speed will be very slow.
!pip install geemap
from google.colab import drive
from google.colab import files
import os
import logging
from IPython.display import clear_output 
import geemap

clear_output()
drive.mount('/drive')
print('Google Drive mounted，please execute next cell')
print('谷歌云盘挂载完毕，请执行下个单元格')

In [None]:
#@markdown **配置Whisper/Setup Whisper**

! pip install git+https://github.com/openai/whisper.git
! wget https://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@raw.githubusercontent.com/Ayanaminn/N46Whisper/main/srt2ass.py
clear_output()
print('Whisper installed，please execute next cell')
print('语音识别库配置完毕，请执行下个单元格')

In [None]:
#@markdown **从谷歌网盘选择文件/Select File From Google Drive**

# @markdown <font size="2">Navigate to the file you want to transcribe, left-click to highlight the file, then click 'Select' button to confirm.
# @markdown <br/>从网盘目录中选择要转换的文件(视频/音频），单击选中文件，点击'Select'按钮以确认。</font><br/>
# @markdown <br/><font size="2">If use local file, ignore this cell and move to the next.
# @markdown <br/>若希望从本地上传文件，则跳过此步执行下一单元格。</font><br/>
# @markdown <br/><font size="2">If file uploaded to drive after execution, execute this cell again to refresh.
# @markdown <br/>若到这一步才上传文件到谷歌盘，则重复执行本单元格以刷新文件列表。</font>
from ipytree import Tree, Node
import ipywidgets as widgets
from ipywidgets import interactive
import os
from google.colab import output 
output.enable_custom_widget_manager()
use_drive = True
global drive_dir
drive_dir = ''

def file_tree():
    # create widgets as a simple file browser
    full_widget = widgets.HBox()
    left_widget = widgets.VBox()
    right_widget = widgets.VBox()

    path_widget = widgets.Text()
    path_widget.layout.min_width = '300px'
    select_widget = widgets.Button(
      description='Select', button_style='primary', tooltip='Select current media file.'
      )
    drive_url = widgets.Output()

    right_widget.children = [select_widget]
    full_widget.children = [left_widget]

    tree_widget = widgets.Output()
    tree_widget.layout.max_width = '300px'
    tree_widget.overflow = 'auto'

    left_widget.children = [path_widget,tree_widget]

    # init file tree
    my_tree = Tree(multiple_selection=False)
    my_tree_dict = {}
    media_names = []

    def select_file(b):
        global drive_dir 
        drive_dir = path_widget.value
        # full_widget.disabled = True
        clear_output()
        print('File selected，please execute next cell')
        print('已选择文件，请执行下个单元格')
    #     if (out_file not in my_tree_dict.keys()) and (out_dir in my_tree_dict.keys()):
    #         node = Node(os.path.basename(out_file))
    #         my_tree_dict[out_file] = node
    #         parent_node = my_tree_dict[out_dir]
    #         parent_node.add_node(node)

    select_widget.on_click(select_file)

    def handle_file_click(event):
        if event['new']:
            cur_node = event['owner']
            for key in my_tree_dict.keys():
                if (cur_node is my_tree_dict[key]) and (os.path.isfile(key)):
                    try:
                        with open(key) as f:
                            path_widget.value = key
                            path_widget.disabled = False
                            select_widget.disabled = False
                            full_widget.children = [left_widget, right_widget]
                    except Exception as e:
                        path_widget.value = key
                        path_widget.disabled = True
                        select_widget.disabled = True

                        return

    def handle_folder_click(event):
        if event['new']:
            full_widget.children = [left_widget]

    # redirect cwd to default drive root path and add nodes
    my_dir = '/drive/MyDrive'
    my_root_name = my_dir.split('/')[-1]
    my_root_node = Node(my_root_name)
    my_tree_dict[my_dir] = my_root_node
    my_tree.add_node(my_root_node)
    my_root_node.observe(handle_folder_click, 'selected')

    for root, d_names, f_names in os.walk(my_dir):
        folders = root.split('/')
        for folder in folders:
            if folder.startswith('.'):
                continue
        for d_name in d_names:
            if d_name.startswith('.'):
                d_names.remove(d_name)
        for f_name in f_names:
            # if f_name.startswith('.'):
            #     f_names.remove(f_name)
            # only add media files
            if f_name.endswith(('mp3','m4a','flac','aac','wav','mp4','mkv','ts','flv')):
                media_names.append(f_name)

        d_names.sort()
        f_names.sort()
        media_names.sort()
        keys = my_tree_dict.keys()

        if root not in my_tree_dict.keys():
          # print(f'root name is {root}') # folder path
          name = root.split('/')[-1] # folder name
          # print(f'folder name is {name}')
          dir_name = os.path.dirname(root) # parent path of folder
          # print(f'dir name is {dir_name}')
          parent_node = my_tree_dict[dir_name]
          node = Node(name)
          my_tree_dict[root] = node
          parent_node.add_node(node)
          node.observe(handle_folder_click, 'selected')

        if len(media_names) > 0:
              parent_node = my_tree_dict[root] # parent folders
              # print(parent_node)
              parent_node.opened = False
              for f_name in media_names:
                  node = Node(f_name)
                  node.icon = 'file' 
                  full_path = os.path.join(root, f_name)
                  # print(full_path)
                  my_tree_dict[full_path] = node
                  parent_node.add_node(node)
                  node.observe(handle_file_click, 'selected')
        media_names.clear()

    with tree_widget:
      tree_widget.clear_output()
      display(my_tree)

    return full_widget


tree= file_tree()
tree


In [None]:
#@markdown **从本地上传文件/Upload Local File**
# @markdown <br/><font size="2">If use file in google drive, ignore this cell and move to the next.
# @markdown <br/>若已选择谷歌盘中的文件，则跳过此步执行下一单元格。</font>

from google.colab import files
use_drive = False
uploaded = files.upload()
file_name = list(uploaded.keys())[0]

In [None]:
# @markdown **参数设置/Required settings:**


# @markdown **</br>【IMPORTANT】:**<font size="2">Select uploaded file type.
# @markdown **</br>【重要】:** 选择上传的文件类型(视频-video/音频-audio）。</font>

# encoding:utf-8
file_type = "audio"  # @param ["audio","video"]

# @markdown <font size="2">Model size will affect the processing time and transcribe quality.
# @markdown <br/>The default source language is Japanese.Please input your own source language if applicable.
# @markdown <br/>模型大小将影响转录时间和质量, 默认使用最新发布的large-v2模型以节省试错时间
# @markdown <br/>默认识别语言为日语，若使用其它语言的视频请自行输入即可。
# @markdown <br/>请注意：large-v2在某些情况下可能未必优于large-v1，请用户自行选择
model_size = "large-v2"  # @param ["base","small","medium", "large-v1","large-v2"]
language = "japanese"  # @param {type:"string"}
# @markdown <font size="2">Option for split line text by spaces. The splited lines all use the same time stamp, with 'adjust_required' label as remark for manual adjustment.
# @markdown <br/>将存在空格的单行文本分割为多行（多句）。分割后的若干行均临时采用相同时间戳，且添加了adjust_required标记提示调整时间戳避免叠轴
# @markdown <br/>普通分割（Modest): 当空格后的文本长度超过5个字符，则另起一行
# @markdown <br/>全部分割（Aggressive): 只要遇到空格即另起一行
is_split = "No"  # @param ["No","Yes"]
split_method = "Modest"  # @param ["Modest","Aggressive"]
# @markdown <font size="2">Please contact us if you want to have your sub style integrated.
# @markdown <br/>当前支持生成字幕格式：
# @markdown <br/><li>ikedaCN - 特蕾纱熊猫观察会字幕组
# @markdown <br/><li>sugawaraCN - 坂上之月字幕组
# @markdown <br/><li>kaedeCN - 三番目の枫字幕组
# @markdown <br/><li>taniguchiCN - 泪痣愛季応援団
# @markdown <br/><li>asukaCN - 暗鳥其实很甜字幕组
sub_style = "default"  # @param ["default", "ikedaCN", "kaedeCN","sugawaraCN","taniguchiCN","asukaCN"]

# @markdown **高级设置/Andvanced settings:（尚不可用/Under development）**

# @markdown <font size="2">Don't change anything here unless you know what you are doing.
# @markdown <br/>调节以下参数可能会提高转录质量并避免一些问题，但是不懂请不要调

compression_ratio_threshold = 2.4 # @param {type:"number"}
no_speech_threshold = 0.6 # @param {type:"number"}
logprob_threshold = -1.0 # @param {type:"number"}
condition_on_previous_text = "True" # @param ["True", "False"]

In [None]:
#@markdown **运行Whisper/Run Whisper**
#@markdown </br>完成后ass文件将自动下载到本地/ass file will be auto downloaded after finish.

import os
import ffmpeg
import subprocess
import torch
import whisper
import time
import pandas as pd
from urllib.parse import quote_plus
from pathlib import Path
import sys
# assert file_name != ""
# assert language != ""

if use_drive:
    output_dir = os.path.dirname(drive_dir)
    try:
        file_name = drive_dir
        # print(file_name)
        file_basename = file_name.split('.')[0]
        # print(file_basename)
        output_dir = os.path.dirname(drive_dir)
    except Exception as e:
            print(f'error: {e}')
else:
    sys.path.append('/drive/content')
    if not os.path.exists(file_name):
      raise ValueError(f"No {file_name} found in current path.")
    else:
        try:
            file_basename = Path(file_name).stem
            output_dir = Path(file_name).parent.resolve()
            # print(file_basename)
            # print(output_dir)      
        except Exception as e:
            print(f'error: {e}')

if file_type == "video":
  print('提取音频中 Extracting audio from video file...')
  os.system(f'ffmpeg -i {file_name} -f mp3 -ab 192000 -vn {file_basename}.mp3')
  print('提取完毕 Done.') 

torch.cuda.empty_cache()
print('加载模型 Loading model...')
model = whisper.load_model(model_size)

#Transcribe
tic = time.time()
print('识别中 Transcribe in progress...')
result = model.transcribe(audio = f'{file_name}', language= language, verbose=False)
toc = time.time()
print('识别完毕 Done')
print(f'Time consumpution {toc-tic}s')

#Write SRT file
from whisper.utils import WriteSRT
with open(Path(output_dir) / (file_basename + ".srt"), "w", encoding="utf-8") as srt:
    writer = WriteSRT(output_dir)
    writer.write_result(result, srt)
#Convert SRT to ASS

from srt2ass import srt2ass
assSub = srt2ass(file_basename + ".srt", sub_style, is_split,split_method)
print('ASS subtitle saved as: ' + assSub)
files.download(assSub)
# os.remove(file_basename + ".srt")
torch.cuda.empty_cache()
print('字幕生成完毕 All done!')

torch.cuda.empty_cache()

In [None]:
# @markdown **【实验功能】Experimental Features:**

# @markdown **AI文本翻译/AI Translation:**
# @markdown **</br>**<font size="2"> 此功能允许用户使用AI翻译服务对识别的字幕文件做逐行翻译，并以相同的格式生成双语对照字幕。
# @markdown **</br>**阅读项目文档以了解更多。</font>
# @markdown **</br>**<font size="2"> This feature allow users to translate previously transcribed subtitle text line by line using AI translation.
# @markdown **</br>**Then generate bilingual subtitle files in same sub style.Read documentaion to learn more.</font>

# @markdown **chatGPT:**
# @markdown **</br>**<font size="2"> 要使用chatGPT翻译，请填入你自己的OpenAI API Key，然后执行单元格。</font>
# @markdown **</br>**<font size="2"> Please input your own OpenAI API Key, then execute this cell.</font>
# @markdown **</br>**<font size="2">【注意】 免费的API对速度有所限制，需要较长时间，用户可以自行考虑付费方案。</font>
# @markdown **</br>**<font size="2">【Note】There are limitaions on usage for free API, consider paid plan to speed up.</font>

!pip install openai
import sys
import os
import re
import codecs
import regex as re
import openai
from srt2ass import STYLE_DICT

# test for code obfuscation
class ChatGPTAPI ():#line:12
    def __init__ (OO000OOOO0OOOOOOO ,OO00O0000O0O0O0O0 ,O00O00OO0OO0OO0OO ):#line:13
        OO000OOOO0OOOOOOO .key =OO00O0000O0O0O0O0 #line:14
        OO000OOOO0OOOOOOO .language =O00O00OO0OO0OO0OO #line:16
        OO000OOOO0OOOOOOO .key_len =len (OO00O0000O0O0O0O0 .split (","))#line:17
    def translate (OOO0OO000O0OOO0OO ,OO00O0OOOO00O0O00 ):#line:23
        # print (OO00O0OOOO00O0O00 )#line:24
        openai .api_key =OOO0OO000O0OOO0OO .key #line:26
        try :#line:27
            OO00OO000O0O000O0 =openai .ChatCompletion .create (model ="gpt-3.5-turbo",messages =[{"role":"user","content":f"Please help me to translate,`{OO00O0OOOO00O0O00}` to {OOO0OO000O0OOO0OO.language}, please return only translated content not include the origin text",}],)#line:37
            O00O000000000O00O =(OO00OO000O0O000O0 ["choices"][0 ].get ("message").get ("content").encode ("utf8").decode ())#line:44
        except Exception as OO0O000000OO00000 :#line:45
            O00O00O00OOOO00O0 =int (60 /OOO0OO000O0OOO0OO .key_len )#line:47
            time .sleep (O00O00O00OOOO00O0 )#line:48
            print (OO0O000000OO00000 ,f"will sleep  {O00O00O00OOOO00O0} seconds")#line:49
            openai .api_key =OOO0OO000O0OOO0OO .key #line:51
            OO00OO000O0O000O0 =openai .ChatCompletion .create (model ="gpt-3.5-turbo",messages =[{"role":"user","content":f"Please help me to translate,`{OO00O0OOOO00O0O00}` to {OOO0OO000O0OOO0OO.language}, please return only translated content not include the origin text",}],)#line:60
            O00O000000000O00O =(OO00OO000O0O000O0 ["choices"][0 ].get ("message").get ("content").encode ("utf8").decode ())#line:67
        return O00O000000000O00O #line:69

# original code
# class ChatGPTAPI():
#     def __init__(self, key, language):
#         self.key = key
#         # self.keys = itertools.cycle(key.split(","))
#         self.language = language
#         self.key_len = len(key.split(","))


#     # def rotate_key(self):
#     #     openai.api_key = next(self.keys)

#     def translate(self, text):
#         print(text)
#         # self.rotate_key()
#         openai.api_key = self.key
#         try:
#             completion = openai.ChatCompletion.create(
#                 model="gpt-3.5-turbo",
#                 messages=[
#                     {
#                         "role": "user",
#                         # english prompt here to save tokens
#                         "content": f"Please help me to translate,`{text}` to {self.language}, please return only translated content not include the origin text",
#                     }
#                 ],
#             )
#             t_text = (
#                 completion["choices"][0]
#                 .get("message")
#                 .get("content")
#                 .encode("utf8")
#                 .decode()
#             )
#         except Exception as e:
#             # TIME LIMIT for open api , pay to reduce the waiting time
#             sleep_time = int(60 / self.key_len)
#             time.sleep(sleep_time)
#             print(e, f"will sleep  {sleep_time} seconds")
#             # self.rotate_key()
#             openai.api_key = self.key
#             completion = openai.ChatCompletion.create(
#                 model="gpt-3.5-turbo",
#                 messages=[
#                     {
#                         "role": "user",
#                         "content": f"Please help me to translate,`{text}` to {self.language}, please return only translated content not include the origin text",
#                     }
#                 ],
#             )
#             t_text = (
#                 completion["choices"][0]
#                 .get("message")
#                 .get("content")
#                 .encode("utf8")
#                 .decode()
#             )
#         # print(t_text)
#         return t_text

class SubtitleTranslator():
    def __init__(self, srt_src, model, key, language, sub_style):
        self.srt_src = srt_src
        self.translate_model = model(key, language)
        self.sub_style = sub_style


    def read_srt(self, srt_src):
        # use correct codec to encode the input file
        encodings = ["utf-32", "utf-16", "utf-8", "cp1252", "gb2312", "gbk", "big5"]
        tmp = ''
        for enc in encodings:
            try:
                with codecs.open(srt_src, mode="r", encoding=enc) as fd:
                    # return an instance of StreamReaderWriter
                    tmp = fd.read()
                    break
            except:
                # print enc + ' failed'
                continue
        return [tmp, enc]

    def extract_srt(self):
        src = self.read_srt(self.srt_src)
        content = src[0]
        # encoding = src[1] # Will not encode so do not need to pass codec para
        src = ''
        utf8bom = ''

        if u'\ufeff' in content:
            content = content.replace(u'\ufeff', '')
            utf8bom = u'\ufeff'

        content = content.replace("\r", "")
        sub_lines = [x.strip() for x in content.split("\n") if x.strip()]
        return sub_lines

    def translate_by_line(self):
        utf8bom = ''
        subLines = ''
        dlgLines = ''
        lineCount = 0
        sub_lines = self.extract_srt()
        output_file = '.'.join(self.srt_src.split('.')[:-1])
        output_file += '_translate.ass'

        for ln in range(len(sub_lines)):
            line = sub_lines[ln]
            # if line index element
            if line.isdigit() and re.match('-?\d\d:\d\d:\d\d', sub_lines[(ln + 1)]):
                # for each index, create an empty dialogue line for construct ass line
                if dlgLines:
                    subLines += dlgLines + "\n"
                dlgLines = ''
                lineCount = 0
                continue
            else:
                # if time stamp element, construct the time stamp part for the dialogue line
                if re.match('-?\d\d:\d\d:\d\d', line):
                    line = line.replace('-0', '0')
                    if self.sub_style == 'default':
                        dlgLines += 'Dialogue: 0,' + line + ',default,,0,0,0,,'
                    elif self.sub_style == 'ikedaCN':
                        dlgLines += 'Dialogue: 0,' + line + ',池田字幕1080p,,0,0,0,,'
                    elif self.sub_style == 'sugawaraCN':
                        dlgLines += 'Dialogue: 0,' + line + ',中字 1080P,,0,0,0,,'
                    elif self.sub_style == 'kaedeCN':
                        dlgLines += 'Dialogue: 0,' + line + ',den SR红色,,0,0,0,,'
                    elif self.sub_style == 'taniguchiCN':
                        dlgLines += 'Dialogue: 0,' + line + ',正文_1080P,,0,0,0,,'
                # if text element, construct(append) the text part for the dialogue line
                else:
                    if lineCount < 2:
                        t_line = self.translate_model.translate(line)
                        dlgLines += line + (r'\N' + t_line.strip())

                        print(line + (r'\N' + t_line.strip()))
                    else:
                        t_line = self.translate_model.translate(line)
                        dlgLines += "\n" + line + (r'\N' + t_line.strip())

                        print(line + (r'\N' + t_line.strip()))
                lineCount += 1
            ln += 1

        subLines += dlgLines + "\n"

        subLines = re.sub(r'\d(\d:\d{2}:\d{2}),(\d{2})\d', '\\1.\\2', subLines)
        subLines = re.sub(r'\s+-->\s+', ',', subLines)

        if self.sub_style == 'default':
            head_name = 'head_str_default'
        elif self.sub_style == 'ikedaCN':
            head_name = 'head_str_ikeda'
        elif self.sub_style == 'sugawaraCN':
            head_name = 'head_str_sugawara'
        elif self.sub_style == 'kaedeCN':
            head_name = 'head_str_kaede'
        elif self.sub_style == "taniguchiCN":
            head_name = 'head_str_taniguchi'

        head_str = STYLE_DICT.get(head_name)
        output_str = utf8bom + head_str + '\n' + subLines
        # encode again for head string
        output_str = output_str.encode('utf8')

        with open(output_file, 'wb') as output:
            output.write(output_str)

        output_file = output_file.replace('\\', '\\\\')
        output_file = output_file.replace('/', '//')
        return output_file



clear_output()

translate_model = ChatGPTAPI
openai_key = '' # @param {type:"string"}
target_language = 'zh-hans'
srt_file = file_basename + ".srt"

assert translate_model is not None, "unsupported model"
OPENAI_API_KEY = openai_key

if not OPENAI_API_KEY:
    raise Exception(
        "OpenAI API key not provided, please google how to obtain it"
    )
# else:
#     OPENAI_API_KEY = openai_key

t = SubtitleTranslator(
    srt_src=srt_file,
    model= translate_model,
    key = OPENAI_API_KEY,
    language=target_language,
    sub_style = sub_style)

translation = t.translate_by_line()
files.download(translation)
print('双语字幕生成完毕 All done!')

# @markdown **</br>**<font size='4'>**实验功能的开发亦是为了尝试帮助大家更有效率的制作字幕。但是只有在用户实际使用体验反馈的基础上，此应用才能不断完善，如果您有任何想法，都欢迎以任何方式联系我，提出[issue](https://github.com/Ayanaminn/N46Whisper/issues)或者分享在[讨论区](https://github.com/Ayanaminn/N46Whisper/discussions)。**
# @markdown **</br>**<font size='4'>**The efficacy of this application cannot get improved without the feedbacks from everyday users.Please feel free to share your thoughts with me or post it [here](https://github.com/Ayanaminn/N46Whisper/discussions)**