A simple Python project that provides transcription, speaker diarization, and face detection in a simple package.
More to come!
-
Prepare the
.env
file in the root of the cloned project with the following keys:HUGGING_FACE_TOKEN
: Specifies the Hugging Face token to useSHOULD_TRAIN_FACES_BEFORE_EXECUTION
: Specifies whether face recognition data should be retrained before executionSHOULD_SHOW_PREVIEWS
: Specifies whether a preview of the current frame will be displayed during processingFILENAME_TO_PROCESS
: Specifies the filename to process, without its extension
HUGGING_FACE_TOKEN=xxxxx SHOULD_TRAIN_FACES_BEFORE_EXECUTION=1 SHOULD_SHOW_PREVIEWS=1 FILENAME_TO_PROCESS="Rick Astley - Never Gonna Give You Up"
-
Accept the conditions of use for the pyannote/speaker-diarization-3.1 pipeline on Hugging Face.
-
Install Homebrew, and run
brew install ffmpeg cmake
in your Terminal. -
Run
pip install -r requirements.txt
in your Terminal, from the project root.
- Create a
media
folder in the project root. - In the
media
folder, create aninput
folder, and put your video files in that folder.
If you are using face recognition (label_faces
), you will need to prepare a set of embeddings to use.
- Create a
models
folder in the project root. - In the
models
folder, create afaces
folder. - In the
faces
folder, insert photos of people you want to identify, separated by folders based on the person. - Run
training.py
to generate the embeddings required for face recognition.
- Run
main.py
. - Open the
output
folder in themedia
folder to see the generated output.