In [2]:
%%capture
%load_ext autoreload
%autoreload 2

import sys
import os
from utils import Content, login_hf, load_model
login_hf()

# Webvox
Get audio summaries of any website, blog or paper

[![GitHub](https://img.shields.io/badge/GitHub-View_on_GitHub-blue?logo=GitHub)](https://github.com/puravparab/webvox)

---
## Table of Contents

1. [Data](#1.-Data-)
    - 1.1 [Blog](#1.1-Blog-)
    - 1.2 [Website](#Website-)
    - 1.3 [Paper](#1.2-Paper-)
2. [Summarization](#2.-Summarization-)
    - 2.1 [Llama 3.2B](#2.1-Llama-3.2-3B-Instruct-GGUF-)
3. [Audio](#Audio-)

---
## 1. Data <a id='1.-Data-'></a>

Let's import data from blogs, websites and papers that we can summarize

### 1.1 Blog <a id='1.1-Blog-'></a>

In [3]:
# Insert url of a blog below
url = "https://paulgraham.com/foundermode.html"

blog = Content(url, 'blog')
blog.scrape()
print(f"Token count: {blog.token_count}")
print(f"\nBlog content:\n'{blog.text[:400]}'")

Token count: 1582

Blog content:
'Founder Mode September 2024 At a YC event last week Brian Chesky gave a talk that everyone who
was there will remember. Most founders I talked to afterward said
it was the best they'd ever heard. Ron Conway, for the first time
in his life, forgot to take notes. I'm not going to try to reproduce
it here. Instead I want to talk about a question it raised. The theme of Brian's talk was that the conve'


### 1.2 Paper <a id='1.2-Paper-'></a>

In [4]:
# Insert url of a paper below
url = "https://ar5iv.labs.arxiv.org/html/1706.03762"

paper = Content(url, 'blog')
paper.scrape()
print(f"Token count: {paper.token_count}")
print(f"\nBlog content:\n'{paper.text[1754:3000]}'")

Token count: 12519

Blog content:
'Abstract The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles, by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models f

---
## 2. Summarization <a id='2.-Summarization-'></a>

Using various LLMs to summarize content

### 2.1 Llama-3.2-3B-Instruct-GGUF <a id='Llama-3.2-3B-Instruct-GGUF-'></a>
We are using 4 bit quantized version of Llama 3.2B Instruct to summarize

https://huggingface.co/lmstudio-community/Llama-3.2-3B-Instruct-GGUF

In [5]:
# import model from hugging face
llm = load_model(
    repo_id="lmstudio-community/Llama-3.2-3B-Instruct-GGUF",
	filename="Llama-3.2-3B-Instruct-Q4_K_M.gguf",
    verbose=False,
    context_length=15000
)

Loading existing model: `Llama-3.2-3B-Instruct-Q4_K_M.gguf` from models/


**Inference:**

In [6]:
output = {"blog": {}, "paper": {}} # Store summarization

Blog:

In [7]:
%%time

# Blog
messages = [
    {"role": "system", "content": "You are a helpful assistant that accurately summarizes content given to you."},
    {"role": "user", "content": f"Summarize the following content:\n\n{blog.text[:]}"}
]
output["blog"] = llm.create_chat_completion(messages=messages)

CPU times: user 8min 2s, sys: 847 ms, total: 8min 3s
Wall time: 57.2 s


In [19]:
blog_content = output['blog']['choices'][0]['message']['content']
print(blog_content)

 The article discusses the concept of "Founder Mode" and its potential to revolutionize the way companies are run. The author was inspired by a recent talk by Brian Chesky, the founder of Airbnb, who shared his experiences of navigating the challenges of scaling a company and discovering a better way to run it. Chesky's approach, which emphasizes the importance of founders being engaged and hands-on in the company's operations, differs from the conventional wisdom of hiring professional managers and giving them room to operate.

The author suggests that there are two different modes of running a company: "founder mode" and "manager mode". Founder mode is characterized by the CEO engaging directly with key stakeholders, including employees, customers, and partners, and making decisions that are driven by intuition and passion. This approach is in contrast to manager mode, which relies on hierarchical structures and compartmentalized decision-making.

The article notes that the conventio

Paper:

In [8]:
%%time

# Blog
messages = [
    {"role": "system", "content": "You are a helpful assistant that accurately summarizes content given to you."},
    {"role": "user", "content": f"Summarize the following content:\n\n{paper.text[:]}"}
]
output["paper"] = llm.create_chat_completion(messages=messages)

CPU times: user 1h 22min 13s, sys: 5.44 s, total: 1h 22min 18s
Wall time: 8min 7s


In [21]:
paper_content = output['paper']['choices'][0]['message']['content']
print(paper_content)

 The article presents the Transformer, a sequence transduction model based solely on self-attention mechanisms. It replaces traditional recurrent neural networks (RNNs) and convolutional neural networks (CNNs) with self-attention, which allows the model to parallelize computations and reduce training time.

Here is a summary of the key points:

1. **Transformer Architecture**: The Transformer consists of an encoder and a decoder, both of which use stacked self-attention and point-wise, fully connected layers.
2. **Self-Attention Mechanism**: Self-attention allows the model to attend to different positions of the same sequence and compute a representation of the sequence.
3. **Multi-Head Attention**: Multi-head attention allows the model to jointly attend to information from different representation subspaces at different positions.
4. **Advantages**: The Transformer has several advantages, including parallelization, reduced training time, and improved performance on machine translation