# AI + Knowledge & Creativity

This notebook contains four small, self-contained projects using free/open-source tools:

1. **AI Historian** — reconstruct events with RAG-style retrieval from Wikipedia (FAISS + sentence-transformers)
2. **Myth vs Fact AI** — split user text into claims and perform simple verification via Wikipedia
3. **AI Story Continuator** — continue a user story in a chosen author's style using a small HF model
4. **AI Music Mood Composer** — detect mood from text and generate a simple MIDI melody

Each section includes runnable cells. This notebook is meant as a starter/prototype — swap models or extend components as needed.

⚠️ Notes:
- Some cells will download ML models on first run (transformers, sentence-transformers). These can be large.
- If you're running in Colab, prefix pip installs with `!` in a cell or use `%pip install`.


## 0 — Install dependencies
Run this cell once to install required packages. If you are in Google Colab, run as-is. In local Jupyter, you can run it from the notebook (it may take several minutes).

- The installs are conservative: we avoid heavy TensorFlow/Magenta by using a simple MIDI generator (pretty_midi).


In [None]:
%pip install --quiet transformers sentence-transformers faiss-cpu wikipedia pretty_midi librosa soundfile torch==2.0.1 tqdm

# Note: If you're in an environment that already has these packages, pip will skip reinstalling.


## 1 — AI Historian (RAG with Wikipedia + local generator)
This section builds a small FAISS index of a few Wikipedia pages, retrieves top pages for a query, and then uses a small text-generation model to produce a reconstruction with inline citations.


In [None]:
from sentence_transformers import SentenceTransformer
import faiss
import wikipedia
import numpy as np
import pickle
from pathlib import Path
from transformers import pipeline

EMB_MODEL = 'all-MiniLM-L6-v2'  # small, fast
INDEX_PATH = Path('hist_vector.index')
META_PATH = Path('hist_meta.pkl')

class SimpleWikiRetriever:
    def __init__(self, index_path=INDEX_PATH, meta_path=META_PATH, emb_model=EMB_MODEL):
        self.model = SentenceTransformer(emb_model)
        self.index_path = Path(index_path)
        self.meta_path = Path(meta_path)
        self.index = None
        self.meta = []
        if self.index_path.exists() and self.meta_path.exists():
            self._load()

    def build_index(self, titles):
        pages = []
        for t in titles:
            try:
                content = wikipedia.page(t).content
                pages.append((t, content))
            except Exception as e:
                print(f'Could not fetch {t}:', e)
        texts = [p[1] for p in pages]
        self.meta = [p[0] for p in pages]
        emb = self.model.encode(texts, show_progress_bar=True)
        d = emb.shape[1]
        self.index = faiss.IndexFlatL2(d)
        self.index.add(np.array(emb).astype('float32'))
        faiss.write_index(self.index, str(self.index_path))
        with open(self.meta_path, 'wb') as f:
            pickle.dump(self.meta, f)
        print('Index built with', len(self.meta), 'pages')

    def _load(self):
        self.index = faiss.read_index(str(self.index_path))
        with open(self.meta_path, 'rb') as f:
            self.meta = pickle.load(f)

    def query(self, text, k=4):
        emb = self.model.encode([text]).astype('float32')
        D, I = self.index.search(emb, k)
        results = []
        for i in I[0]:
            if i < len(self.meta):
                title = self.meta[i]
                try:
                    summary = wikipedia.summary(title, sentences=3)
                except Exception:
                    summary = ''
                results.append({'title': title, 'summary': summary})
        return results

# Example usage: build a small index (run once)
seed_titles = [
    'French Revolution', 'Industrial Revolution', 'World War II',
    'Fall of the Western Roman Empire', 'American Civil War'
]
retriever = SimpleWikiRetriever()
if not retriever.index:
    retriever.build_index(seed_titles)

# Query function that uses a small generator
generator = pipeline('text-generation', model='gpt2', device=-1)

def reconstruct_event(query):
    docs = retriever.query(query, k=4)
    context = '\n\n'.join([f"{d['title']}: {d['summary']}" for d in docs])
    prompt = f"Reconstruct the historical event: {query}\nUse these sources:\n{context}\n\nWrite a concise, sourced reconstruction (cite titles inline)."
    out = generator(prompt, max_length=300, num_return_sequences=1)[0]['generated_text']
    return out

# Try a sample query
print(reconstruct_event('Causes of the French Revolution'))


## 2 — Myth vs Fact
Split text into claims and verify them with a quick Wikipedia-based heuristic. This is a prototype — for production use, integrate fact-check corpora and claim-checking models.


In [None]:
import re
import wikipedia


def split_claims(text):
    sents = re.split(r'(?<=[.?!])\s+', text.strip())
    return [s for s in sents if len(s) > 8]


def verify_claim_via_wikipedia(claim):
    try:
        hits = wikipedia.search(claim, results=3)
        if not hits:
            return {'claim': claim, 'verdict': 'unknown', 'evidence': []}
        evidence = []
        for h in hits:
            try:
                sm = wikipedia.summary(h, sentences=2)
                evidence.append({'title': h, 'summary': sm, 'url': wikipedia.page(h).url})
            except Exception:
                continue
        # simple heuristic: if many query words appear in a hit summary, mark likely_true
        score = 0
        key_tokens = [t.lower() for t in claim.split() if len(t) > 3][:6]
        for e in evidence:
            txt = e['summary'].lower()
            if sum(1 for k in key_tokens if k in txt) >= max(1, len(key_tokens)//3):
                score += 1
        verdict = 'likely_true' if score >= 1 else 'uncertain'
        return {'claim': claim, 'verdict': verdict, 'evidence': evidence}
    except Exception as ex:
        return {'claim': claim, 'verdict': 'error', 'error': str(ex), 'evidence': []}

# Demo
sample = 'The pyramids were built by aliens. The moon landing was staged. The French Revolution began in 1789.'
claims = split_claims(sample)
for c in claims:
    res = verify_claim_via_wikipedia(c)
    print('\nCLAIM:', res['claim'])
    print('VERDICT:', res['verdict'])
    for ev in res['evidence']:
        print('-', ev['title'], '-', ev['url'])


## 3 — AI Story Continuator
Use a small GPT-Neo model to continue a story. Optionally, provide short example excerpts to emulate an author's style. For copyright safety, use short public-domain excerpts as style examples.


In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

MODEL_NAME = 'EleutherAI/gpt-neo-125M'

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Using device', device)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME).to(device)


def continue_story(prompt, author_examples=None, max_new_tokens=200):
    if author_examples:
        full = f"Author examples:\n{author_examples}\n\nStory start:\n{prompt}\n\nContinuation:" 
    else:
        full = f"Story start:\n{prompt}\n\nContinuation:"
    inputs = tokenizer(full, return_tensors='pt').to(device)
    out = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=True, top_p=0.95, temperature=0.9)
    text = tokenizer.decode(out[0], skip_special_tokens=True)
    # Return only continuation portion (after the prompt)
    return text[len(full):]

# Demo
start = 'The lanterns shivered in the harbor wind as she stepped onto the dock.'
print(continue_story(start, author_examples=None, max_new_tokens=150))


## 4 — AI Music Mood Composer (simple MIDI generator)
Detect mood from text (using a simple sentiment model) and generate a short MIDI melody using `pretty_midi`.


In [None]:
from transformers import pipeline
import pretty_midi
import numpy as np

sent = pipeline('sentiment-analysis')

MOOD_SCALES = {
    'positive': [60, 62, 64, 65, 67, 69, 71, 72],  # C major
    'negative': [60, 62, 63, 65, 67, 68, 70, 72],  # minor-ish
}

def detect_mood(text):
    r = sent(text)[0]
    lab = r['label'].lower()
    if lab == 'positive':
        return 'positive'
    elif lab == 'negative':
        return 'negative'
    else:
        return 'positive'

def create_midi_for_mood(mood, filename='mood_melody.mid', length=16):
    notes = MOOD_SCALES.get(mood, MOOD_SCALES['positive'])
    pm = pretty_midi.PrettyMIDI()
    inst = pretty_midi.Instrument(program=0)
    start = 0.0
    dur = 0.5
    for i in range(length):
        note_number = notes[i % len(notes)]
        note = pretty_midi.Note(velocity=80, pitch=note_number, start=start, end=start+dur)
        inst.notes.append(note)
        start += dur
    pm.instruments.append(inst)
    pm.write(filename)
    return filename

# Demo
text = 'I feel upbeat and excited about the new project.'
mood = detect_mood(text)
print('Detected mood:', mood)
fn = create_midi_for_mood(mood, filename='demo_mood.mid')
print('MIDI written to', fn)


## Wrap-up
This notebook gives you a compact, runnable starting point for all four projects.

Next steps (suggestions):
- Replace `gpt2` / `gpt-neo-125M` with larger HF models if you have GPU access.
- For Myth vs Fact, integrate Google FactCheck Tools API or claim-checking datasets (FEVER).
- For richer music, incorporate Magenta's MelodyRNN/MusicVAE (requires TensorFlow).

If you'd like, I can now:
- (A) Add example cells to show how to convert the MIDI to WAV using FluidSynth, or
- (B) Split this single notebook into four separate notebooks and provide them as a zip.

Tell me which you'd like next, or download the notebook using the link below.
