<a href="https://colab.research.google.com/github/rahulraimau/ai-book-publisher/blob/main/AI_Book_Publisher_Colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 📚 AI Book Publisher - Google Colab Setup

You're now ready to run `pipeline.py`, `app.py`, or individual components!

In [84]:
!pip install -q google-generativeai sentence-transformers beautifulsoup4 nltk

In [88]:
import nltk
nltk.download('punkt')
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

In [89]:
import os
os.environ['GEMINI_API_KEY'] = "AIzaSyAjuRwW8XKIGV8u2xmeQ7lPcABRuHlSSYE"

In [94]:
import os, requests, nltk, uuid
from bs4 import BeautifulSoup
from sentence_transformers import SentenceTransformer, util
import google.generativeai as genai
import chromadb
from chromadb.utils.embedding_functions import SentenceTransformerEmbeddingFunction
from gtts import gTTS

# Download required NLTK data
nltk.download('punkt')

# Gemini API setup
genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
reward_model = SentenceTransformer("all-MiniLM-L6-v2")

# ChromaDB setup
chroma_client = chromadb.Client()
embedding_fn = SentenceTransformerEmbeddingFunction("all-MiniLM-L6-v2")
collection = chroma_client.create_collection("chapters", embedding_function=embedding_fn)

# Step 1: Scrape chapter
def fetch_chapter(url):
    res = requests.get(url)
    soup = BeautifulSoup(res.text, 'html.parser')
    paragraphs = soup.select("div#mw-content-text p")
    return "\n\n".join([p.get_text() for p in paragraphs if len(p.get_text()) > 50])

# Step 2: Use Gemini to rewrite chapter
def ai_writer(text):
    model = genai.GenerativeModel("models/gemini-1.5-flash")
    prompt = f"Paraphrase and creatively rewrite the following chapter text:\n\n{text[:2000]}"
    response = model.generate_content(prompt)
    return response.text.strip()

# Step 3: RL-style reviewer
def ai_reviewer(original, spun):
    orig_tokens = set(nltk.word_tokenize(original.lower()))
    spun_tokens = set(nltk.word_tokenize(spun.lower()))
    jaccard_sim = len(orig_tokens & spun_tokens) / len(orig_tokens | spun_tokens)
    orig_emb = reward_model.encode(original, convert_to_tensor=True)
    spun_emb = reward_model.encode(spun, convert_to_tensor=True)
    sim = util.pytorch_cos_sim(orig_emb, spun_emb).item()
    novelty = 1 - sim
    reward = 0.4 * sim + 0.4 * novelty + 0.2 * (1 - jaccard_sim)
    return {
        "semantic_similarity": round(sim, 3),
        "novelty": round(novelty, 3),
        "jaccard_diversity": round(1 - jaccard_sim, 3),
        "final_reward_score": round(reward, 3)
    }

# Step 4: TTS feedback
def speak(text, filename="tts_output.mp3"):
    tts = gTTS(text)
    tts.save(filename)
    return filename

# Step 5: Save version to ChromaDB
def save_version(text, meta):
    version_id = str(uuid.uuid4())
    collection.add(documents=[text], metadatas=[meta], ids=[version_id])
    return version_id

# Step 6: Semantic search among versions
def semantic_search(query_text):
    result = collection.query(query_texts=[query_text], n_results=1)
    return result['documents'][0] if result['documents'] else "No match found."

# Full human-in-the-loop workflow
def run_interactive_pipeline(url):
    original = fetch_chapter(url)
    current = ai_writer(original)
    print("\n🌀 AI-Spun Version (excerpt):\n", current[:400])

    for i in range(3):
        print(f"\n✍️ Iteration {i+1}: You may refine the text manually below")
        current = input("\nYour rewrite (or press Enter to keep AI version):\n") or current
        reward = ai_reviewer(original[:512], current)
        version_id = save_version(current, {"iteration": i + 1, "reward": reward})
        print("✅ Reward:", reward)
        print("💾 Saved as version:", version_id)
        speak(current[:300], f"readout_{i+1}.mp3")

    return current, reward

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [95]:
chapter_url = "https://en.wikisource.org/wiki/The_Gates_of_Morning/Book_1/Chapter_1"

original, spun, reward = run_pipeline(chapter_url)

print("📘 Original (excerpt):\n", original[:400], "\n")
print("🌀 Spun Version (excerpt):\n", spun[:400], "\n")
print("🏆 RL Reward Metrics:\n", reward)

📘 Original (excerpt):
 DICK standing on a ledge of coral cast his eyes to the South.


Behind him the breakers of the outer sea thundered and the spindrift scattered on the wind; before him stretched an ocean calm as a lake, infinite, blue, and flown about by the fishing gulls—the lagoon of Karolin.


Clipped by its forty-mile ring of coral this great pond was a sea in itself, a sea of storm in heavy winds, a lake of az 

🌀 Spun Version (excerpt):
 High on a coral shelf, Dick surveyed his kingdom.  The furious ocean roared behind him, a stark contrast to the placid, sapphire lagoon of Karolin stretching before – a forty-mile ring of coral cradling a sea that shifted between tempest and tranquility.  It was *his* sea now, claimed just yesterday.

Below, the vibrant tapestry of Karolin's life unfolded in the sun-drenched sand: women mending ne 

🏆 RL Reward Metrics:
 {'semantic_similarity': 0.493, 'novelty': 0.507, 'jaccard_diversity': 0.87, 'final_reward_score': 0.574}


In [93]:
!pip install -q gtts

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/98.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m98.2/98.2 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25h