<a href="https://colab.research.google.com/github/MEROO1010/AI-Knowledge-Creativity/blob/main/AI_%2B_Knowledge_%26_Creativity.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AI + Knowledge & Creativity

This notebook contains four small, self-contained projects using free/open-source tools:

1. **AI Historian** — reconstruct events with RAG-style retrieval from Wikipedia (FAISS + sentence-transformers)
2. **Myth vs Fact AI** — split user text into claims and perform simple verification via Wikipedia
3. **AI Story Continuator** — continue a user story in a chosen author's style using a small HF model
4. **AI Music Mood Composer** — detect mood from text and generate a simple MIDI melody

Each section includes runnable cells. This notebook is meant as a starter/prototype — swap models or extend components as needed.

⚠️ Notes:
- Some cells will download ML models on first run (transformers, sentence-transformers). These can be large.
- If you're running in Colab, prefix pip installs with `!` in a cell or use `%pip install`.

## 0 — Install dependencies
Run this cell once to install required packages. If you are in Google Colab, run as-is. In local Jupyter, you can run it from the notebook (it may take several minutes).

- The installs are conservative: we avoid heavy TensorFlow/Magenta by using a simple MIDI generator (pretty_midi).

In [1]:
%pip install --quiet transformers sentence-transformers faiss-cpu wikipedia pretty_midi librosa soundfile torch==2.0.1 tqdm

# Note: If you're in an environment that already has these packages, pip will skip reinstalling.

  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.6/5.6 MB[0m [31m31.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[31mERROR: Could not find a version that satisfies the requirement torch==2.0.1 (from versions: 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.4.0, 2.4.1, 2.5.0, 2.5.1, 2.6.0, 2.7.0, 2.7.1, 2.8.0)[0m[31m
[0m[31mERROR: No matching distribution found for torch==2.0.1[0m[31m
[0m

## 1 — AI Historian (RAG with Wikipedia + local generator)
This section builds a small FAISS index of a few Wikipedia pages, retrieves top pages for a query, and then uses a small text-generation model to produce a reconstruction with inline citations.

In [8]:
# ====================================================
# 🧠 AI Historian - Colab Version
# ====================================================
# This notebook reconstructs historical events using:
# - Wikipedia + Sentence Transformers for retrieval
# - GPT-2 for generative reconstruction
# - 100% free + works in Colab (no FAISS)
# ====================================================

# --- 1️⃣ Install dependencies ---
!pip install -q transformers sentence-transformers wikipedia scikit-learn torch tqdm

# --- 2️⃣ Imports ---
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import wikipedia
import numpy as np
import pickle
from pathlib import Path
from transformers import pipeline

# --- 3️⃣ Configuration ---
EMB_MODEL = 'all-MiniLM-L6-v2'  # Fast + accurate small model
INDEX_PATH = Path('hist_embeddings.npy')
META_PATH = Path('hist_meta.pkl')

# --- 4️⃣ Wikipedia Retriever ---
class SimpleWikiRetriever:
    def __init__(self, index_path=INDEX_PATH, meta_path=META_PATH, emb_model=EMB_MODEL):
        self.model = SentenceTransformer(emb_model)
        self.index_path = Path(index_path)
        self.meta_path = Path(meta_path)
        self.embeddings = None
        self.meta = []
        if self.index_path.exists() and self.meta_path.exists():
            self._load()

    def build_index(self, titles):
        print("📚 Fetching and embedding Wikipedia pages...")
        pages = []
        for t in titles:
            try:
                content = wikipedia.page(t).content
                pages.append((t, content))
            except Exception as e:
                print(f'⚠️ Could not fetch {t}:', e)
        texts = [p[1] for p in pages]
        self.meta = [p[0] for p in pages]
        emb = self.model.encode(texts, show_progress_bar=True, convert_to_numpy=True)
        np.save(self.index_path, emb)
        with open(self.meta_path, 'wb') as f:
            pickle.dump(self.meta, f)
        self.embeddings = emb
        print('✅ Index built with', len(self.meta), 'pages')

    def _load(self):
        self.embeddings = np.load(self.index_path)
        with open(self.meta_path, 'rb') as f:
            self.meta = pickle.load(f)

    def query(self, text, k=4):
        if self.embeddings is None:
            raise ValueError("Index not built or loaded yet.")
        query_emb = self.model.encode([text], convert_to_numpy=True)
        sims = cosine_similarity(query_emb, self.embeddings)[0]
        topk_idx = np.argsort(sims)[::-1][:k]
        results = []
        for i in topk_idx:
            title = self.meta[i]
            try:
                summary = wikipedia.summary(title, sentences=3)
            except Exception:
                summary = ''
            results.append({'title': title, 'summary': summary, 'score': float(sims[i])})
        return results


# --- 5️⃣ Build or load index ---
seed_titles = [
    'French Revolution', 'Industrial Revolution', 'World War II',
    'Fall of the Western Roman Empire', 'American Civil War'
]

retriever = SimpleWikiRetriever()
if retriever.embeddings is None:
    retriever.build_index(seed_titles)

# --- 6️⃣ Text Generation ---
generator = pipeline('text-generation', model='gpt2', device=-1)

def reconstruct_event(query):
    docs = retriever.query(query, k=4)
    context = '\n\n'.join([f"{d['title']}: {d['summary']}" for d in docs])
    prompt = (
        f"Reconstruct the historical event: {query}\n"
        f"Use these verified sources:\n{context}\n\n"
        f"Write a concise, sourced reconstruction (cite titles inline)."
    )
    out = generator(prompt, max_length=300, num_return_sequences=1)[0]['generated_text']
    return out


# --- 7️⃣ Example Query ---
print("🔍 Example: Reconstructing 'Causes of the French Revolution'...\n")
print(reconstruct_event("Rise of the Roman Empire"))


Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=300) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


🔍 Example: Reconstructing 'Causes of the French Revolution'...

Reconstruct the historical event: Rise of the Roman Empire
Use these verified sources:
Fall of the Western Roman Empire: The fall of the Western Roman Empire, also called the fall of the Roman Empire or the fall of Rome, was the loss of central political control in the Western Roman Empire, a process in which the Empire failed to enforce its rule, and its vast territory was divided among several successor polities. The Roman Empire lost the strengths that had allowed it to exercise effective control over its Western provinces; modern historians posit factors including the effectiveness and numbers of the army, the health and numbers of the Roman population, the strength of the economy, the competence of the emperors, the internal struggles for power, the religious changes of the period, and the efficiency of the civil administration. Increasing pressure from invading peoples outside Roman culture also contributed greatly t

## 2 — Myth vs Fact
Split text into claims and verify them with a quick Wikipedia-based heuristic. This is a prototype — for production use, integrate fact-check corpora and claim-checking models.


In [3]:
# ====================================================
# 🧠 Myth vs Fact AI - Colab Version
# ====================================================
# Upload or type in myths / conspiracy claims,
# and the AI checks them against Wikipedia summaries
# to determine if they're likely true, uncertain, or unknown.
# ====================================================

# --- 1️⃣ Install dependencies ---
!pip install -q wikipedia

# --- 2️⃣ Imports ---
import re
import wikipedia

# --- 3️⃣ Helper: Split input text into separate claims ---
def split_claims(text):
    """
    Splits an input paragraph into individual factual claims or sentences.
    """
    sents = re.split(r'(?<=[.?!])\s+', text.strip())
    return [s for s in sents if len(s) > 8]


# --- 4️⃣ Wikipedia-based claim verifier ---
def verify_claim_via_wikipedia(claim):
    """
    Verifies a single claim by searching for it on Wikipedia,
    then comparing keywords from the claim with top summaries.
    Verdicts: likely_true, uncertain, or unknown.
    """
    try:
        hits = wikipedia.search(claim, results=3)
        if not hits:
            return {'claim': claim, 'verdict': 'unknown', 'evidence': []}

        evidence = []
        for h in hits:
            try:
                sm = wikipedia.summary(h, sentences=2)
                evidence.append({'title': h, 'summary': sm, 'url': wikipedia.page(h).url})
            except Exception:
                continue

        # Simple heuristic: if enough key tokens overlap with a summary → "likely_true"
        score = 0
        key_tokens = [t.lower() for t in claim.split() if len(t) > 3][:6]
        for e in evidence:
            txt = e['summary'].lower()
            if sum(1 for k in key_tokens if k in txt) >= max(1, len(key_tokens)//3):
                score += 1
        verdict = 'likely_true' if score >= 1 else 'uncertain'

        return {'claim': claim, 'verdict': verdict, 'evidence': evidence}

    except Exception as ex:
        return {'claim': claim, 'verdict': 'error', 'error': str(ex), 'evidence': []}


# --- 5️⃣ Example demo ---
sample_text = (
    "The pyramids were built by aliens. "
    "The moon landing was staged. "
    "The French Revolution began in 1789."
)

print("🧩 Splitting input into claims...\n")
claims = split_claims(sample_text)
print("Detected", len(claims), "claims.\n")

for c in claims:
    res = verify_claim_via_wikipedia(c)
    print("⚙️ CLAIM:", res['claim'])
    print("🧾 VERDICT:", res['verdict'])
    for ev in res['evidence']:
        print("   •", ev['title'], "-", ev['url'])
    print("-" * 80)

🧩 Splitting input into claims...

Detected 3 claims.





  lis = BeautifulSoup(html).find_all('li')


⚙️ CLAIM: The pyramids were built by aliens.
🧾 VERDICT: likely_true
   • Ancient astronauts - https://en.wikipedia.org/wiki/Ancient_astronauts
   • Chinese pyramids - https://en.wikipedia.org/wiki/Chinese_pyramids
--------------------------------------------------------------------------------
⚙️ CLAIM: The moon landing was staged.
🧾 VERDICT: likely_true
   • Apollo 11 - https://en.wikipedia.org/wiki/Apollo_1
   • Moon landing - https://en.wikipedia.org/wiki/Moon_landing
   • Moon landing conspiracy theories - https://en.wikipedia.org/wiki/Moon_landing_conspiracy_theories
--------------------------------------------------------------------------------
⚙️ CLAIM: The French Revolution began in 1789.
🧾 VERDICT: likely_true
   • French Revolution - https://en.wikipedia.org/wiki/French_Revolution
   • Batavian Revolution - https://en.wikipedia.org/wiki/Batavian_Revolution
   • Estates General of 1789 - https://en.wikipedia.org/wiki/Estates_General_of_1789
-----------------------------------

## 3 — AI Story Continuator
Use a small GPT-Neo model to continue a story. Optionally, provide short example excerpts to emulate an author's style. For copyright safety, use short public-domain excerpts as style examples.


In [4]:
# ====================================================
# 📖 AI Story Continuator - Colab Version
# ====================================================
# This notebook lets you start a story and the AI
# continues it in different author styles.
#
# Uses: GPT-Neo 125M (free, runs locally in Colab)
# ====================================================

# --- 1️⃣ Install dependencies ---
!pip install -q transformers torch accelerate

# --- 2️⃣ Imports and setup ---
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Choose model (small, runs fast on CPU or GPU)
MODEL_NAME = 'EleutherAI/gpt-neo-125M'

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('⚙️ Using device:', device)

# Load model + tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME).to(device)

# --- 3️⃣ Story continuation function ---
def continue_story(prompt, author_examples=None, max_new_tokens=200, temperature=0.9):
    """
    Continue a story based on an input prompt.
    Optionally guide style using example text from a chosen author.
    """
    if author_examples:
        full = f"Author examples:\n{author_examples}\n\nStory start:\n{prompt}\n\nContinuation:"
    else:
        full = f"Story start:\n{prompt}\n\nContinuation:"

    inputs = tokenizer(full, return_tensors='pt').to(device)
    out = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
        do_sample=True,
        top_p=0.95,
        temperature=temperature,
        pad_token_id=tokenizer.eos_token_id
    )
    text = tokenizer.decode(out[0], skip_special_tokens=True)
    return text[len(full):].strip()

# --- 4️⃣ Example demo ---
prompt = "The lanterns shivered in the harbor wind as she stepped onto the dock."

print("🧠 Generating story continuation...\n")
story = continue_story(prompt, author_examples=None, max_new_tokens=150)
print("📜 Story Continuation:\n", story)

⚙️ Using device: cpu


tokenizer_config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/357 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/526M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/119 [00:00<?, ?B/s]

🧠 Generating story continuation...

📜 Story Continuation:

Hitchak said, "What about the ship?"

"That wasn't enough."

"That's not what I was thinking."

She nodded. "That was stupid, and I wouldn't have allowed it."

"Why?"

"You know why."

Hitchak said, "You don't think we were stupid, I know that."

"You don't think we were stupid. I know that, too. But I think you're right, but I don't think we weren't stupid, too."

"How can you be so sure?"


## 4 — AI Music Mood Composer (simple MIDI generator)
Detect mood from text (using a simple sentiment model) and generate a short MIDI melody using `pretty_midi`.


In [9]:
# ====================================================
# 🎶 AI Music Mood Composer - Colab Version
# ====================================================
# Converts mood detected from text into a generated
# short melody (MIDI file).
#
# Uses:
# - transformers (sentiment-analysis)
# - pretty_midi (MIDI composition)
# ====================================================

# --- 1️⃣ Install dependencies ---
!pip install -q transformers torch pretty_midi

# --- 2️⃣ Imports ---
from transformers import pipeline
import pretty_midi
import numpy as np
from IPython.display import Audio

# --- 3️⃣ Initialize sentiment analysis pipeline ---
sent = pipeline('sentiment-analysis')

# --- 4️⃣ Define musical scales for moods ---
MOOD_SCALES = {
    'positive': [60, 62, 64, 65, 67, 69, 71, 72],  # C major scale (happy)
    'negative': [60, 62, 63, 65, 67, 68, 70, 72],  # C minor scale (sad)
}

# --- 5️⃣ Detect mood from text ---
def detect_mood(text):
    """
    Detects mood (positive / negative) from text using Hugging Face sentiment model.
    """
    r = sent(text)[0]
    lab = r['label'].lower()
    if 'pos' in lab:
        return 'positive'
    elif 'neg' in lab:
        return 'negative'
    else:
        return 'positive'

# --- 6️⃣ Create a simple melody in MIDI format ---
def create_midi_for_mood(mood, filename='mood_melody.mid', length=16):
    """
    Creates a simple melody that reflects the detected mood.
    """
    notes = MOOD_SCALES.get(mood, MOOD_SCALES['positive'])
    pm = pretty_midi.PrettyMIDI()
    inst = pretty_midi.Instrument(program=0)  # Acoustic Grand Piano
    start = 0.0
    dur = 0.5
    for i in range(length):
        note_number = notes[i % len(notes)]
        note = pretty_midi.Note(
            velocity=80, pitch=note_number, start=start, end=start+dur
        )
        inst.notes.append(note)
        start += dur
    pm.instruments.append(inst)
    pm.write(filename)
    return filename

# --- 7️⃣ Demo: Analyze text and compose melody ---
text = "I feel great about the new project."
mood = detect_mood(text)
print("🧠 Detected mood:", mood)

midi_file = create_midi_for_mood(mood, filename="demo_mood.mid")
print("🎵 MIDI file generated:", midi_file)

# --- 8️⃣ Optional: Convert to audio preview inside Colab ---
try:
    import pretty_midi
    pm = pretty_midi.PrettyMIDI(midi_file)
    audio = pm.fluidsynth()
    display(Audio(audio, rate=44100))
except Exception as e:
    print("Audio playback not available:", e)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


🧠 Detected mood: positive
🎵 MIDI file generated: demo_mood.mid
Audio playback not available: fluidsynth() was called but pyfluidsynth is not installed.
