- Text-audio-graph GPT: A simple ChatGPT based multimodal dialog generation engine that can
- "see" through clip-interrogator,
- "draw" through stable-diffusion-v2,
- "hear" through assemblyai-transcript,
- and "speak" through gTTS
- Add your
API_KEY
forOPENAI
,REPLICATE
, andAAI
into the environment variables
export OPENAI_API_KEY='...'
export REPLICATE_API_TOKEN='...'
export AAI_API_KEY='...'
- Install required packages
openai
replicate
gtts
requests
datetime
pillow
time
- Images are expressed as
[[[./path/to/image]]]
- Audios are expressed as
<<<./path/to/audio>>>
END
is the end-of-input marker- Examples:
You: Here is a picture [[[./images/in/dogpizza.jpg]]], replace the dogs with white cute cats.
END
ChatGPT: Here is your updated picture: [[[./images/out/sd2_2023-03-25_12-49-24.png]]]. Enjoy!
You: <<<./audios/in/assemblyai.mp3>>>
please help me writa a short introduction about AssemblyAI END
ChatGPT: AssemblyAI is a deep learning company that ...
TagGPT
├── AIGC
│ ├── clip_interrogator.py
│ ├── dall_e2.py
│ ├── gtts_t2a.py
│ ├── stable_diffusion_2.py
│ └── transcribe.py
├── audios
│ ├── in
| └── out
├── images
│ ├── in
| └── out
└── run_gpt_3.5.py
- Build a UI (maybe)
- if you meet with
requests.exceptions.ConnectionError
, changehttp
tohttps
in./AIGC/utils.py
might help :)