<a href="https://colab.research.google.com/github/nosai-01/geojson/blob/main/whisper_mock_en.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 【Master】 whisper-mock
Whisper is a general-purpose speech recognition model open-sourced by OpenAI.

## 📖 How to use
1. Run "Setting up".
2. Open the folder icon from the left sidebar.
3. Upload audio files into the `content`.
4. Input the audio file name into `fileName`.
5. Select output language.
5. Run "Transcription".

In [1]:
#@title Setting up
# Install packages
!pip install git+https://github.com/openai/whisper.git

import os

# Add folders
checkContentFolder = os.path.exists("content")
checkDownLoadFolder = os.path.exists("download")
if not checkContentFolder:
  os.mkdir("content")
if not checkDownLoadFolder:
  os.mkdir("download")

Collecting git+https://github.com/openai/whisper.git
  Cloning https://github.com/openai/whisper.git to /tmp/pip-req-build-2hr2n8ql
  Running command git clone --filter=blob:none --quiet https://github.com/openai/whisper.git /tmp/pip-req-build-2hr2n8ql
  Resolved https://github.com/openai/whisper.git to commit e8622f9afc4eba139bf796c210f5c01081000472
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting tiktoken==0.3.3 (from openai-whisper==20230314)
  Downloading tiktoken-0.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: openai-whisper
  Building wheel for openai-whisper (pyproject.toml) ... [?25l[?25hdone
  Created wheel for openai-whisper: filename=openai_whisper-20230314-py3-

In [2]:
#@title 文字起こし
import whisper

fileName = "audio1829392884_2.mp3"#@param {type:"string"}
lang = "ja"#@param ["ja", "en"]
model_size = "large"#@param ["tiny", "base", "small", "medium", "large"]
model = whisper.load_model(model_size)

# オーディオファイル読み込み
audio = whisper.load_audio(f"content/{fileName}")
outputTextsArr = []

while audio.size > 0:
  tirmedAudio = whisper.pad_or_trim(audio)
  startIdx = tirmedAudio.size
  audio = audio[startIdx:]

  mel = whisper.log_mel_spectrogram(tirmedAudio).to(model.device)

  options = whisper.DecodingOptions(language=lang, without_timestamps=True)
  result = whisper.decode(model, mel, options)
  outputTextsArr.append(result.text)

outputTexts = '\n'.join(outputTextsArr)  # 各文を改行で結合する
print(outputTexts)

# テキストファイル書込み
with open(f'download/{fileName}.txt', 'w') as f:
  f.write(f'▼{fileName}の書き起こし\n\n')  # 2つの改行を追加
  f.write(outputTexts)

100%|██████████████████████████████████████| 2.87G/2.87G [00:20<00:00, 148MiB/s]




In [3]:
#@title Download a transcription file
from google.colab import files
!zip -r download.zip download
files.download("download.zip")

  adding: download/ (stored 0%)
  adding: download/audio1829392884_2.mp3.txt (deflated 70%)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>