Speech and Text image

Chatbot/Virtual Assistant.

A lightweight conversational assistant that allows users to interact using both speech and text. This system utilizes state-of-the-art models for speech recognition and text-to-speech (TTS) generation, including Whisper for transcription and Coqui TTS for speech synthesis. The project is built using Django and supports additional models and APIs via Hugging Face.

Features

Speech-to-Text: Uses Whisper for accurate speech recognition across 30+ languages.
Text-to-Speech: Powered by Coqui TTS for multilingual voice synthesis.
Microphone Input: Supports real-time speech input via microphone.
Image Generation: Integrated with open-source models and Hugging Face API for generating images from text.
Django Web Interface: Accessible through a local server for easy interaction.

view sreen recording for project

NB ON Secrets: I wll remove the secretes after one week you can go to hugging face and get your own api key from hugging face.

file path
src/src/secrets/secrets.txt

Installation

Prerequisites

Ensure you have the following installed:

Python 3.x
Django
FFmpeg
pip
Other dependencies listed in requirements.txt

Steps

Running the Application

You have two options for running this application: Docker or Local Setup. Choose the method that best suits your environment.

Option 1: Running with Docker

If you have Docker and WSL installed, you can run the application in a Docker container. This method simplifies dependencies and environment setup.

Build Docker: Ensure you have Docker installed and configured and then build your docker container.
```
docker-compose up --build
```
you can stop the service and start the service at any point. dont build the container twice not unless its necessary.

manually install ffmpeg inside the running Docker container without rebuilding it. Here’s how to do that step-by-step:
1. Access the Running Container First, you need to get a shell into your running Django container. You can do this with the following command:
```
docker-compose exec django /bin/bash
```
  This command opens a bash shell in the django* container.
2. Install ffmpeg Manually Once inside the container, you can install ffmpeg using apt-get. Run the following commands:
```
apt-get update
apt-get install -y ffmpeg
```
3. Verify the Installation After the installation is complete, you can verify that ffmpeg is installed correctly by running:
```
ffmpeg -version
```
  This should display the version of ffmpeg that was installed.
4. Exit container To exit container type
```
exit
```
Start the Services:
```
docker-compose up -d
```
Check Logs (Optional):
```
docker-compose logs -f django
```
Stop Services:
```
docker-compose down
```
Access the Interface: Open your browser and go to http://localhost:8000/api/interface/ to interact with the assistant.

Option 2: Original Local Setup

Clone the Repository:

git clone https://github.com/Josewathome/speech-text-and-text-speech.git
cd speech-text-and-text-speech

Create a Virtual Environment (Optional but Recommended):

python -m venv venv
source venv/bin/activate    # On Windows: venv\Scripts\activate.bat

Install Requirements using WSL subsystem or linux:

pip install -r requirements.txt

If using windows

use this file to install the requirements manually in Windows_Requirements.docx

Set Up the Django Application: Make and apply migrations for the Django project.
```
python manage.py makemigrations text
python manage.py migrate text
```
Run the Django Server: Start the development server.
```
python manage.py runserver
```
Access the Interface: Open your browser and go to http://localhost:8000/api/interface/.

Setup

Whisper Model Setup (Speech-to-Text)

The system uses OpenAI's Whisper model for speech-to-text functionality. To use Whisper:

Install Whisper:
```
pip install openai-whisper
```
Install FFmpeg: Whisper requires FFmpeg for processing audio files. Follow the instructions based on your operating system:
- Ubuntu/Debian:
```
sudo apt update && sudo apt install ffmpeg
```
- macOS (Homebrew):
```
brew install ffmpeg
```
- Windows: Download FFmpeg from the official website and add it to your system PATH.
Additional Dependencies: You may also need to install additional dependencies:
```
pip install setuptools-rust
```

Coqui TTS Setup (Text-to-Speech)

Coqui TTS is used for synthesizing speech in multiple languages. To set it up:

Install Coqui TTS:
```
pip install TTS
```
For more details about customizing TTS models, refer to the Coqui TTS GitHub page.

Usage

Running the Application

Start the Django Server:
```
python manage.py runserver
```
Access the Application: Open your browser and go to http://localhost:8000/api/interface/ to interact with the assistant.

Speech-to-Text

The Whisper model is used to transcribe spoken language into text. You can use the microphone input to provide speech commands to the assistant.

Text-to-Speech

Coqui TTS will convert the generated text response back into speech. This enables natural conversations with the assistant.

Image Generation

You can generate images based on text input using models hosted locally or via Hugging Face's API.

Models and API Integration

Whisper API

If you prefer using OpenAI's Whisper API for faster transcription, you can integrate it into your setup. The API supports multiple formats, including m4a, mp3, and wav, with pricing at $0.006 per minute of transcription. OpenAI's Whisper API can handle over 30 languages.

Sign Up for API Access: You can obtain your API key from the OpenAI website.
Modify Settings: Update the settings in your Django app to use the Whisper API instead of the local model for transcription.

Hugging Face API for Image Generation

Some of the models for image generation are integrated via Hugging Face API keys. Follow these steps to set it up:

Sign Up for Hugging Face: Visit Hugging Face to create an account.
Obtain API Key: Once registered, get your API key from your Hugging Face account settings.
Configure the API Key in Django: Set the Hugging Face API key in your environment or directly in the Django configuration file to enable API-based model access.
```
file path
src/src/secrets/secrets.txt
```

Contributing

We welcome contributions to improve the system. To contribute:

Fork the repository.
Create a new feature branch (git checkout -b feature-branch).
Commit your changes (git commit -m 'Add new feature').
Push to the branch (git push origin feature-branch).
Open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

This `README.md` provides step-by-step instructions, covering the setup, usage, and model integration processes, while also mentioning the APIs used. It should be easy for anyone to follow along and use the system. Let me know if you want further customizations!

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
screenshot		screenshot
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
Windows_Requirements.docx		Windows_Requirements.docx
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Speech and Text image

Chatbot/Virtual Assistant.

Features

Table of Contents

Installation

Prerequisites

Steps

Running the Application

Option 1: Running with Docker

you can stop the service and start the service at any point. dont build the container twice not unless its necessary.

Option 2: Original Local Setup

Setup

Whisper Model Setup (Speech-to-Text)

Coqui TTS Setup (Text-to-Speech)

Usage

Running the Application

Speech-to-Text

Text-to-Speech

Image Generation

Models and API Integration

Whisper API

Hugging Face API for Image Generation

Contributing

License

About

Uh oh!

Releases

Packages

Languages

scratchandscript/AI-simple-python-chat-using-an-open-source-model

Folders and files

Latest commit

History

Repository files navigation

Speech and Text image

Chatbot/Virtual Assistant.

Features

Table of Contents

Installation

Prerequisites

Steps

Running the Application

Option 1: Running with Docker

you can stop the service and start the service at any point. dont build the container twice not unless its necessary.

Option 2: Original Local Setup

Setup

Whisper Model Setup (Speech-to-Text)

Coqui TTS Setup (Text-to-Speech)

Usage

Running the Application

Speech-to-Text

Text-to-Speech

Image Generation

Models and API Integration

Whisper API

Hugging Face API for Image Generation

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages