Skip to content

Quick, light and transparent audio transcription - locally.

License

Notifications You must be signed in to change notification settings

sondelll/crokket

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crokket

- The "I didn't come here to be a stenographer" ASR AI, powered by pretrained models from OpenAI

Requirements

  • Windows
  • Python 3.11
  • Venv (Should be included with the Python install, otherwise: pip install venv)
  • Cuda-enabled graphics card (Only necessary if you want it to be.. not slow.)
  • Cuda toolkit

Updates

If there has been an update, there's a possibility that some dependencies changed, if you're unsure you can run conda env update to update any dependency changes that has been made.

Installation

There are a couple of steps, but it should be a fairly quick job. This guide assumes you are at the root of the repository when running commands.

Install Python and CUDA Toolkit

  • winget install Python.Python.3.11
  • winget install Nvidia.CUDA

Create and activate your virtual environment

  • python -m venv .venv
  • .venv/Scripts/activate

Install the PIP requirements

  • python -m pip install -r requirements.txt

Getting started

We suggest using the terminal inside Visual Studio Code, but if you have a preferred way of command-line usage - go right ahead.

Take your audio file, (only .wav files are supported as of right now), and place it somewhere you can easily copy or remember the file path.

To transcribe the audio, use python main.py path/to/file.wav. This runs Crokket in direct invokation mode, skipping the interactive part and performing slightly better in terms of speed. (Be sure to use paths that look/something/like/this.wav and not\like\this.wav, there's no support for backslash paths yet) If you would prefer to use the graphical interface, just run python main.py and press enter on the message about using the GUI. The transcript with the same name as your audio file can be found under the data folder in this directory.

NOTE: If it is the first time you launch the program, it will download around 3 GB (assuming you're using the default medium model).

A somewhat modern consumer computer with a 30-series or newer graphics card is expected to transcribe in the ranges of 4-6 hours of audio per hour of runtime. The output is of course not perfect but it should save you some time.

ALWAYS CHECK YOUR OUTPUT AND MAKE THE NECESSARY CORRECTIONS

Best of luck.

About

Quick, light and transparent audio transcription - locally.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages