PaperCast is a project that turns any research articles into podcasts using AI generated audio. It is inspired by Illuminate https://illuminate.withgoogle.com/ and ScienceCast https://sciencecast.org/.
The author doesn't know any people working on Illuminate project nor their methods. The author is still in the waiting list for its beta release.
Aug 30th: Illuminate eventually becomes available, give it is a try https://illuminate.google.com/
PaperCast | Illuminate | |
---|---|---|
Open Source | ✅ Yes | 🟡 Not yet |
Fine-grain control | ✅ Yes | 🟡 Only arxiv links |
Research field | ✅ Any research | 🟡 Only Computer Science |
Audio quality | ✅ Good | ✅ Very good |
Voice tone | ✅ Conversational | 🟡 Flat |
Paper source | ✅ Any papers | 🟡 ArXiv only |
Allow multiple papers | 🟡 Not yet | ✅ Yes |
Content understanding | ✅ Good | ✅ Good |
Computing resource | 💻 Local | ☁️ Cloud |
Generation Limit | ✅ As many | 🟡 5 per day |
Has Red Panda? | Yes, Justin and Emma | Only humans🧑🎓 |
- July 29th, 2024: refactorize arxiv reader and leverage its HTML render and parse to JSON + Markdown
- Jun 16th, 2024: add author interview mode, by adding "author_interview_prompt" in
prompt.yaml
andadditional_questions
provided by authors; add PDF mode so it can extract necessary information for any PDF paper frompdfs
directory. Checkexamples/run_cognitive.yaml
for example. - Jun 15th, 2024: add subtitle
srt
file generation. Seeexamples/run_gorilla.yaml
to setoffset
if any intro audio, and example video at PaperCast EP5: "Gorilla: Large Language Model Connected with Massive APIs"
To generate a podcast for "Attention is all you need", you can simply run the following command:
python run.py examples/run_attention.yaml
It should produce 1706.03762.json
in the transcript
directory and 1706.03762.wav
in the audio
directory.
Please also try a few example videos on Youtube. The play list link is at here
Setup OpenAI API key
export OPENAI_API_KEY=sk-xxxx
Check out repo and put ChatTTS
in the directory
git clone https://github.com/phunterlau/papercast
cd papercast/
git clone https://github.com/2noise/ChatTTS
cd ChatTTS
pip install -r requirements.txt
cd ..
pip install -r requirements.txt
Please note that ChatTTS
is still very experimental. Please refer to its repo for issues and helps.
Use examples/run_attention.yaml
for example. It contains a few keys:
url: "https://arxiv.org/abs/1706.03762"
use_cache: true
episode: 3
prompt: "dialogue_prompt"
background_knowledge: |
Current year is 2024. Attention is all you need is known as the transformer paper published in 2017 by Google.
It is the foundation paper of the current large language model research.
url
: an Arxiv URL (abs or pdf) or a local file path of a PDF file.use_cache
: if load the cached LLM-generated transcript or start over.episode
: Episode number.prompt
: refer toprompt.YAML
for the podcast style, dialogue or monologue etc.- (optional)
background_knowledge
: additional knowledge for better context understanding. Use "None" if not available. - (optional)
additional_questions
: additional research questions for input.
I prefer the podcast in the question answering style, so the transcript must include a smooth conversation for a general overview, a few interesting questions, and the discussion onto them. The process includes 3 steps
- predicting the research field of given article
- LLM role play as a senior researcher in the research field, ask a few questions.
- Generate a podcast by addressing these questions
- The question generation is limited to the article's title and abstract only. A better tree-level question generation using the full text might bring deeper and better questions.
- It depends on ChatTTS https://github.com/2noise/ChatTTS for audio generation. The features are still very experimental and the speaker voice lottery is very tricky.
- more article readers beyond arxiv loader
- a good PDF loader to parse article meta data and sections
- Add Chinese voices
- Better question generation using full text
- Support multi-persons discussions with agentic workflow
- Support different interview modes, e.g. host vs author
This repo uses MIT License. It uses ChatTTS for audio generation and ChatTTS doesn't allow commercial use. The music in the podcast is generated by Suno.AI.
- Jina.ai has a good reader API https://jina.ai/reader/
- ChatTTS https://github.com/2noise/ChatTTS