Skip to content

Commit

Permalink
Add readme for audio and cache plugins (#247)
Browse files Browse the repository at this point in the history
* Add audio and cache plugins readme

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* refine audio plugin code

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* fix typo

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* fix pylint issues

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* fix neuralchat requirements name

Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com>

* avoid build from source when pylint

Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com>

---------

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>
Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com>
Co-authored-by: Wenxin Zhang <wenxin.zhang@intel.com>
  • Loading branch information
lvliang-intel and VincyZhang committed Sep 6, 2023
1 parent 1436bdc commit 9b81f05
Show file tree
Hide file tree
Showing 12 changed files with 404 additions and 297 deletions.
9 changes: 5 additions & 4 deletions .github/workflows/script/formatScan/pylint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,11 @@ git config --global --add safe.directory "*"
git submodule update --init --recursive

$BOLD_YELLOW && echo "---------------- run python setup.py sdist bdist_wheel -------------" && $RESET
pip install build --upgrade
python3 -m build -s -w
$BOLD_YELLOW && echo "---------------- pip install binary -------------" && $RESET
pip install dist/intel_extension_for_transformers*.whl
#pip install build --upgrade
#python3 -m build -s -w
export PYTHONPATH=`pwd`
#$BOLD_YELLOW && echo "---------------- pip install binary -------------" && $RESET
#pip install -e .
pip list

cd /intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/
Expand Down
2 changes: 1 addition & 1 deletion intel_extension_for_transformers/neural_chat/chatbot.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
from .config import DeviceOptions
from .models.base_model import get_model_adapter
from .utils.common import get_device_type
from .pipeline.plugins.caching.cache import init_similar_cache_from_config
from .pipeline.plugins.caching.cache import CachePlugin
from .pipeline.plugins.audio.asr import AudioSpeechRecognition
from .pipeline.plugins.audio.asr_chinese import ChineseAudioSpeechRecognition
from .pipeline.plugins.audio.tts import TextToSpeech
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
The Audio Processing and Text-to-Speech (TTS) Plugin is a software component designed to enhance audio-related functionality in Neural Chat, specially for TalkingBot. This plugin offers a range of capabilities, primarily focused on processing audio data and converting text into spoken language. Here is a general overview of its key features:

- **Audio Processing**: This component includes a suite of tools and algorithms for manipulating audio data. It can perform tasks such as cut Video, split audio, convert video to audio, noise reduction, equalization, pitch shifting, and audio synthesis, enabling developers to improve audio quality and add various audio effects to their applications.

- **Text-to-Speech (TTS) Conversion**: The TTS plugin can convert written text into natural-sounding speech by synthesizing human-like voices. Users can customize the voice, tone, and speed of the generated speech to suit their specific requirements.

- **Speech Recognition**: The ASR plugin support speech recognition, allowing it to transcribe spoken words into text. This can be used for applications like voice commands, transcription services, and voice-controlled interfaces. It supports both English and Chinese.

- **Multi-Language Support**: The plugin typically supports multiple languages and accents, making it versatile for global applications and catering to diverse user bases. It supports both English and Chinese now.

- **Integration**: Developers can easily integrate this plugin into their applications or systems using APIs.


# Install System Dependency

Ubuntu Command:
```bash
sudo apt-get install ffmpeg
wget http://nz2.archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb
sudo dpkg -i libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb
```

For other operating systems such as CentOS, you will need to make slight adjustments.

# English Automatic Speech Recognition (ASR)

## Dependencies Installation

To use the English ASR module, you need to install the necessary dependencies. You can do this by running the following command:

```bash
pip install transformers datasets pydub
```

## Usage

The AudioSpeechRecognition class provides functionality for converting English audio to text. Here's how to use it:

```python
from intel_extension_for_transformers.neural_chat.pipeline.plugins.audio import AudioSpeechRecognition
asr = AudioSpeechRecognition()
audio_path = "~/audio.wav" # Replace with the path to your English audio file (supports MP3 and WAV)
result = asr.audio2text(audio_path)
print("ASR Result:", result)
```

# Chinese Automatic Speech Recognition (ASR)

## Dependencies Installation

To use the Chinese ASR module, you need to install the necessary dependencies. You can do this by running the following command:

```bash
pip install paddlespeech paddlepaddle
```

## Usage

The ChineseAudioSpeechRecognition class provides functionality for converting Chinese audio to text. Here's how to use it:

```python
from intel_extension_for_transformers.neural_chat.pipeline.plugins.audio import ChineseAudioSpeechRecognition
asr = ChineseAudioSpeechRecognition()
audio_path = "~/audio.wav" # Replace with the path to your audio file
result = asr.audio2text(audio_path)
print("ASR Result:", result)
```

# English Text-to-Speech (TTS)

## Dependencies Installation

To use the English TTS module, you need to install the required dependencies. Run the following command:

```bash
pip install transformers soundfile speechbrain
```

## Usage

The TextToSpeech class in your module provides the capability to convert English text to speech. Here's how to use it:

```python
from intel_extension_for_transformers.neural_chat.pipeline.plugins.audio import TextToSpeech
tts = TextToSpeech()
text_to_speak = "Hello, this is a sample text." # Replace with your text
output_audio_path = "./output.wav" # Replace with the desired output audio path
voice = "default" # You can choose between "default," "pat," or a custom voice
tts.text2speech(text_to_speak, output_audio_path, voice)
```

# Chinese Text-to-Speech (TTS)

## Dependencies Installation

To use the Chinese TTS module, you need to install the required dependencies. Run the following command:

```bash
pip install paddlespeech paddlepaddle
```

## Usage

The ChineseTextToSpeech class within your module provides functionality for TTS. Here's how to use it:

```python
from intel_extension_for_transformers.neural_chat.pipeline.plugins.audio import ChineseTextToSpeech
# Initialize the TTS module
tts = ChineseTextToSpeech()
# Define the text you want to convert to speech
text_to_speak = "你好,这是一个示例文本。" # Replace with your Chinese text
# Specify the output audio path
output_audio_path = "./output.wav" # Replace with your desired output audio path
# Perform text-to-speech conversion
tts.text2speech(text_to_speak)

# If you want to stream the generation of audio from a text generator (e.g., a language model),
# you can use the following method:
# audio_generator = your_text_generator_function() # Replace with your text generator
# tts.stream_text2speech(audio_generator)
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
paddlepaddle
paddlespeech
transformers
soundfile
datasets
pydub
python-multipart
speechbrain
librosa
Original file line number Diff line number Diff line change
Expand Up @@ -90,9 +90,9 @@ def cut_video(args, outdir):
name, _ = os.path.splitext(file_name)
name = str(name) + "_" + str(mark)
mark += 1
command = 'ffmpeg -i {} -ss {}:{}:{} -to {}:{}:{} -ac 1 -ar {} -f wav {}'.format(os.path.join(path,file_name),
start_hour, start_min, start_sec, end_hour, end_min, end_sec, shlex.quote(args.sr),
os.path.join(save_path, str(name))+'.wav')
command = 'ffmpeg -i {} -ss {}:{}:{} -to {}:{}:{} -ac 1 -ar {} -f wav {}'.format(
os.path.join(path,file_name), start_hour, start_min, start_sec, end_hour,
end_min, end_sec, shlex.quote(args.sr), os.path.join(save_path, str(name))+'.wav')
print(start_hour, start_min, start_sec)
print(end_hour, end_min, end_sec)
try:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,11 @@ def convert_video_to_wav(path, output_sample_rate, is_mono=True):
elif filename_suffix == '.mp4' or filename_suffix == '.mp3':
# file name should not contain space.
if is_mono:
cmd = "ffmpeg -i {} -ac 1 -ar {} -f wav {}".format(input_file_path, output_sample_rate, output_file_path)
cmd = "ffmpeg -i {} -ac 1 -ar {} -f wav {}".format(
input_file_path, output_sample_rate, output_file_path)
else:
cmd = "ffmpeg -i {} -ac 2 -ar {} -f wav {}".format(input_file_path, output_sample_rate, output_file_path)
cmd = "ffmpeg -i {} -ac 2 -ar {} -f wav {}".format(
input_file_path, output_sample_rate, output_file_path)
try:
subprocess.run(cmd, check=True)
except subprocess.CalledProcessError as e:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# 🚀 What is caching plugin?

When LLM service encounters higher traffic levels, the expenses related to LLM API calls can become substantial. Additionally, LLM services might exhibit slow response times. Hence, we leverage GPTCache to build a semantic caching plugin for storing LLM responses. This README.md file provides an overview of the functionality of caching plugin, how to use it, and some example code snippets.

# 😎 What can this help with?

Caching plugin offers the following primary benefits:

- **Decreased expenses**: Caching plugin effectively minimizes expenses by caching query results, which in turn reduces the number of requests and tokens sent to the LLM service.
- **Enhanced performance**: Caching plugin can also provide superior query throughput compared to standard LLM services.
- **Improved scalability and availability**: Caching plugin can easily scale to accommodate an increasing volume of of queries, ensuring consistent performance as your application's user base expands.

# 🤔 How does it work?

Online services often exhibit data locality, with users frequently accessing popular or trending content. Cache systems take advantage of this behavior by storing commonly accessed data, which in turn reduces data retrieval time, improves response times, and eases the burden on backend servers. Traditional cache systems typically utilize an exact match between a new query and a cached query to determine if the requested content is available in the cache before fetching the data.

However, using an exact match approach for LLM caches is less effective due to the complexity and variability of LLM queries, resulting in a low cache hit rate. To address this issue, GPTCache adopt alternative strategies like semantic caching. Semantic caching identifies and stores similar or related queries, thereby increasing cache hit probability and enhancing overall caching efficiency. GPTCache employs embedding algorithms to convert queries into embeddings and uses a vector store for similarity search on these embeddings. This process allows GPTCache to identify and retrieve similar or related queries from the cache storage.

<a target="_blank" href="https://github.com/zilliztech/GPTCache/blob/main/docs/GPTCacheStructure.png">
<p align="center">
<img src="https://github.com/zilliztech/GPTCache/blob/main/docs/GPTCacheStructure.png" alt="Cache Structure" width=600 height=200>
</p>
</a>

# Installation
To use the caching plugin functionality, you need to install the `gptcache` library first. You can do this using pip:

```bash
pip install -r requirements.txt
```

# Usage
## Initializing

Before using the functionality of caching plugin, you need to initialize the caching plugin with the desired configuration. The following code demonstrates how to initialize caching plugin:

```python
from intel_extension_for_transformers.neural_chat.pipeline.plugins.cache import CachePlugin
cache_plugin = CachePlugin()
cache_plugin.init_similar_cache_from_config()
```

## Caching Data

Once cache plugin is initialized, you can start caching data using the `put`` function. Here's an example of how to cache data:

```python
prompt = "Tell me about Intel Xeon Scable Processors."
response = chatbot.predict(prompt)
cache_plugin.put(prompt, response)
```

## Retrieving Cached Data

To retrieve cached data, use the get function. Provide the same prompt/question text used for caching, and it will return the cached answer. Here's an example:

```python
answer = cache_plugin.get("Tell me about Intel Xeon Scable Processors.")
```
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,5 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from .cache import CachePlugin

0 comments on commit 9b81f05

Please sign in to comment.