Skip to content
Scripts to download videos using particular search phrases from Pornhub, split the audio components out and run audiogrep to transcribe the audio files
Branch: master
Clone or download
Pull request Compare This branch is even with arunk:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
audio
video
README.md
main.py
process.sh
requirements.txt

README.md

About

These set of files downloads videos from the PornHub website, strip the audio from the video files and run an audio transcription tool called audiogrep on the audio files. Video files are downloaded to ./video and stripped audio files and transcribed text files to ./audio

Install

Use with virtualenv if possible, otherwise to install system-wide ensure that you have installed pip then run the following in a shell

$ sudo pip install -r requirements.txt

Leave out the sudo if you installing it inside a virtual environment.

Running

Open main.py and edit the following if required:

SEARCH_PHRASES - add search phrases to this list. the videos listed on the search result pages of these phrases will be downloaded

NUM_PAGES - the maximum number of pages of search results to download videos from.

MAX_DURATION - the longest duration of video to download.

Now run

$ python main.py

This might take a few hours to run depending on the number of search phrases, number of pages, maximum duration, internet connection speed etc.

Then run

$ bash process.sh

This might take a few hours again depending on the number of videos downloaded.

Finding god

Use grep

$ grep -lir "god" audio/*.txt
You can’t perform that action at this time.