Version 0.1
A context-aware EPUB scanner identifying objectionable content (profanity, hate speech, violence, sexual content) using AI models and customizable optimization levels.
- Keyword-based initial filtering with context-aware exclusions
- Transformer models for AI analysis:
- Toxic-BERT (profanity)
- RoBERTa Hate Speech
- Twitter-RoBERTa Sentiment
- Phase 3 optimization levels (A/B/C) for 8GB VRAM:
- Option A: Enhanced analysis + violence detector + caching
- Option B: Option A + 8-bit quantization + batch processing
- Option C: Option B + 4-bit quantization + quantized Llama Guard
- Custom violence detector
- Smart caching to avoid redundant AI calls
- Configurable entirely via
scanner_settings.py
- Clone the repository:
git clone https://github.com/jree101/book-content-scanner.git cd book-content-scanner - Install dependencies:
pip install -r requirements.txt
- Place EPUB files in the
books/folder. - Run the scanner:
python book_scanner.py
Edit scanner_settings.py:
PHASE3_LEVEL = "A"# Change to "B" or "C" for more optimizations- Adjust
CONFIDENCE_LEVELSthresholds - Enable/disable models based on hardware
book-content-scanner/
|-- .gitignore
|-- LICENSE
|-- README.md
|-- requirements.txt
|-- content_config.py
|-- scanner_settings.py
|-- book_scanner_phase2.py
|-- books/ # Place EPUBs here
|-- scan_reports/ # Generated reports
This project is licensed under the MIT License. See the LICENSE file for details.