Book Content Scanner

Version 0.1

A context-aware EPUB scanner identifying objectionable content (profanity, hate speech, violence, sexual content) using AI models and customizable optimization levels.

Features

Keyword-based initial filtering with context-aware exclusions
Transformer models for AI analysis:
- Toxic-BERT (profanity)
- RoBERTa Hate Speech
- Twitter-RoBERTa Sentiment
Phase 3 optimization levels (A/B/C) for 8GB VRAM:
- Option A: Enhanced analysis + violence detector + caching
- Option B: Option A + 8-bit quantization + batch processing
- Option C: Option B + 4-bit quantization + quantized Llama Guard
Custom violence detector
Smart caching to avoid redundant AI calls
Configurable entirely via scanner_settings.py

Installation

Clone the repository:

git clone https://github.com/jree101/book-content-scanner.git
cd book-content-scanner

Install dependencies:
```
pip install -r requirements.txt
```
Place EPUB files in the books/ folder.
Run the scanner:
```
python book_scanner.py
```

Configuration

Edit scanner_settings.py:

PHASE3_LEVEL = "A" # Change to "B" or "C" for more optimizations
Adjust CONFIDENCE_LEVELS thresholds
Enable/disable models based on hardware

Repository Layout

book-content-scanner/
|-- .gitignore
|-- LICENSE
|-- README.md
|-- requirements.txt
|-- content_config.py
|-- scanner_settings.py
|-- book_scanner_phase2.py
|-- books/            # Place EPUBs here
|-- scan_reports/     # Generated reports

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Book Content Scanner

Features

Installation

Configuration

Repository Layout

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Books		Books
scan_reports		scan_reports
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
book_scanner.py		book_scanner.py
content_config.py		content_config.py
requirements.txt		requirements.txt
scanner_settings.py		scanner_settings.py

Folders and files

Latest commit

History

Repository files navigation

Book Content Scanner

Features

Installation

Configuration

Repository Layout

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages