Add readme for audio and cache plugins (#247)

* Add audio and cache plugins readme Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * refine audio plugin code Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * fix typo Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * fix pylint issues Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * fix neuralchat requirements name Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com> * avoid build from source when pylint Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com> --------- Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com> Co-authored-by: Wenxin Zhang <wenxin.zhang@intel.com>
intel · Sep 6, 2023 · 9b81f05 · 9b81f05
1 parent 1436bdc
commit 9b81f05
Show file tree

Hide file tree

Showing 12 changed files with 404 additions and 297 deletions.
diff --git a/.github/workflows/script/formatScan/pylint.sh b/.github/workflows/script/formatScan/pylint.sh
@@ -7,10 +7,11 @@ git config --global --add safe.directory "*"
 git submodule update --init --recursive
 
 $BOLD_YELLOW && echo "---------------- run python setup.py sdist bdist_wheel -------------" && $RESET
-pip install build --upgrade
-python3 -m build -s -w
-$BOLD_YELLOW && echo "---------------- pip install binary -------------" && $RESET
-pip install dist/intel_extension_for_transformers*.whl
+#pip install build --upgrade
+#python3 -m build -s -w
+export PYTHONPATH=`pwd`
+#$BOLD_YELLOW && echo "---------------- pip install binary -------------" && $RESET
+#pip install -e .
 pip list
 
 cd /intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/

diff --git a/intel_extension_for_transformers/neural_chat/chatbot.py b/intel_extension_for_transformers/neural_chat/chatbot.py
@@ -24,7 +24,7 @@
 from .config import DeviceOptions
 from .models.base_model import get_model_adapter
 from .utils.common import get_device_type
-from .pipeline.plugins.caching.cache import init_similar_cache_from_config
+from .pipeline.plugins.caching.cache import CachePlugin
 from .pipeline.plugins.audio.asr import AudioSpeechRecognition
 from .pipeline.plugins.audio.asr_chinese import ChineseAudioSpeechRecognition
 from .pipeline.plugins.audio.tts import TextToSpeech

diff --git a/intel_extension_for_transformers/neural_chat/pipeline/plugins/audio/README.md b/intel_extension_for_transformers/neural_chat/pipeline/plugins/audio/README.md
@@ -0,0 +1,121 @@
+The Audio Processing and Text-to-Speech (TTS) Plugin is a software component designed to enhance audio-related functionality in Neural Chat, specially for TalkingBot. This plugin offers a range of capabilities, primarily focused on processing audio data and converting text into spoken language. Here is a general overview of its key features:
+
+- **Audio Processing**: This component includes a suite of tools and algorithms for manipulating audio data. It can perform tasks such as cut Video, split audio, convert video to audio, noise reduction, equalization, pitch shifting, and audio synthesis, enabling developers to improve audio quality and add various audio effects to their applications.
+
+- **Text-to-Speech (TTS) Conversion**: The TTS plugin can convert written text into natural-sounding speech by synthesizing human-like voices. Users can customize the voice, tone, and speed of the generated speech to suit their specific requirements.
+
+- **Speech Recognition**: The ASR plugin support speech recognition, allowing it to transcribe spoken words into text. This can be used for applications like voice commands, transcription services, and voice-controlled interfaces. It supports both English and Chinese.
+
+- **Multi-Language Support**: The plugin typically supports multiple languages and accents, making it versatile for global applications and catering to diverse user bases. It supports both English and Chinese now.
+
+- **Integration**: Developers can easily integrate this plugin into their applications or systems using APIs.
+
+
+# Install System Dependency
+
+Ubuntu Command:
+```bash
+sudo apt-get install ffmpeg
+wget http://nz2.archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb
+sudo dpkg -i libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb
+```
+
+For other operating systems such as CentOS, you will need to make slight adjustments.
+
+# English Automatic Speech Recognition (ASR)
+
+## Dependencies Installation
+
+To use the English ASR module, you need to install the necessary dependencies. You can do this by running the following command:
+
+```bash
+pip install transformers datasets pydub
+```
+
+## Usage
+
+The AudioSpeechRecognition class provides functionality for converting English audio to text. Here's how to use it:
+
+```python
+from intel_extension_for_transformers.neural_chat.pipeline.plugins.audio import AudioSpeechRecognition
+asr = AudioSpeechRecognition()
+audio_path = "~/audio.wav"  # Replace with the path to your English audio file (supports MP3 and WAV)
+result = asr.audio2text(audio_path)
+print("ASR Result:", result)
+```
+
+# Chinese Automatic Speech Recognition (ASR)
+
+## Dependencies Installation
+
+To use the Chinese ASR module, you need to install the necessary dependencies. You can do this by running the following command:
+
+```bash
+pip install paddlespeech paddlepaddle
+```
+
+## Usage
+
+The ChineseAudioSpeechRecognition class provides functionality for converting Chinese audio to text. Here's how to use it:
+
+```python
+from intel_extension_for_transformers.neural_chat.pipeline.plugins.audio import ChineseAudioSpeechRecognition
+asr = ChineseAudioSpeechRecognition()
+audio_path = "~/audio.wav"  # Replace with the path to your audio file
+result = asr.audio2text(audio_path)
+print("ASR Result:", result)
+```
+
+# English Text-to-Speech (TTS)
+
+## Dependencies Installation
+
+To use the English TTS module, you need to install the required dependencies. Run the following command:
+
+```bash
+pip install transformers soundfile speechbrain
+```
+
+## Usage
+
+The TextToSpeech class in your module provides the capability to convert English text to speech. Here's how to use it:
+
+```python
+from intel_extension_for_transformers.neural_chat.pipeline.plugins.audio import TextToSpeech
+tts = TextToSpeech()
+text_to_speak = "Hello, this is a sample text."  # Replace with your text
+output_audio_path = "./output.wav"  # Replace with the desired output audio path
+voice = "default"  # You can choose between "default," "pat," or a custom voice
+tts.text2speech(text_to_speak, output_audio_path, voice)
+```
+
+# Chinese Text-to-Speech (TTS)
+
+## Dependencies Installation
+
+To use the Chinese TTS module, you need to install the required dependencies. Run the following command:
+
+```bash
+pip install paddlespeech paddlepaddle
+```
+
+## Usage
+
+The ChineseTextToSpeech class within your module provides functionality for TTS. Here's how to use it:
+
+```python
+from intel_extension_for_transformers.neural_chat.pipeline.plugins.audio import ChineseTextToSpeech
+# Initialize the TTS module
+tts = ChineseTextToSpeech()
+# Define the text you want to convert to speech
+text_to_speak = "你好，这是一个示例文本。"  # Replace with your Chinese text
+# Specify the output audio path
+output_audio_path = "./output.wav"  # Replace with your desired output audio path
+# Perform text-to-speech conversion
+tts.text2speech(text_to_speak)
+
+# If you want to stream the generation of audio from a text generator (e.g., a language model),
+# you can use the following method:
+# audio_generator = your_text_generator_function()  # Replace with your text generator
+# tts.stream_text2speech(audio_generator)
+```
diff --git a/intel_extension_for_transformers/neural_chat/pipeline/plugins/audio/requirements.txt b/intel_extension_for_transformers/neural_chat/pipeline/plugins/audio/requirements.txt
@@ -0,0 +1,9 @@
+paddlepaddle
+paddlespeech
+transformers
+soundfile
+datasets
+pydub
+python-multipart
+speechbrain
+librosa
diff --git a/...s/neural_chat/pipeline/tools/cut_video.py → ...pipeline/plugins/audio/utils/cut_video.py b/...s/neural_chat/pipeline/tools/cut_video.py → ...pipeline/plugins/audio/utils/cut_video.py
@@ -90,9 +90,9 @@ def cut_video(args, outdir):
                 name, _ = os.path.splitext(file_name)
                 name = str(name) + "_" + str(mark)
                 mark += 1
-                command = 'ffmpeg -i {} -ss {}:{}:{} -to {}:{}:{} -ac 1 -ar {} -f wav {}'.format(os.path.join(path,file_name),
-                                                start_hour, start_min, start_sec, end_hour, end_min, end_sec, shlex.quote(args.sr),
-                                                os.path.join(save_path, str(name))+'.wav')
+                command = 'ffmpeg -i {} -ss {}:{}:{} -to {}:{}:{} -ac 1 -ar {} -f wav {}'.format(
+                    os.path.join(path,file_name), start_hour, start_min, start_sec, end_hour,
+                    end_min, end_sec, shlex.quote(args.sr), os.path.join(save_path, str(name))+'.wav')
                 print(start_hour, start_min, start_sec)
                 print(end_hour, end_min, end_sec)
                 try:

diff --git a/...neural_chat/pipeline/tools/split_audio.py → ...peline/plugins/audio/utils/split_audio.py b/...neural_chat/pipeline/tools/split_audio.py → ...peline/plugins/audio/utils/split_audio.py
diff --git a/...eural_chat/pipeline/tools/video_to_wav.py → ...eline/plugins/audio/utils/video_to_wav.py b/...eural_chat/pipeline/tools/video_to_wav.py → ...eline/plugins/audio/utils/video_to_wav.py
@@ -51,9 +51,11 @@ def convert_video_to_wav(path, output_sample_rate, is_mono=True):
         elif filename_suffix == '.mp4' or filename_suffix == '.mp3':
             # file name should not contain space.
             if is_mono:
-                cmd = "ffmpeg -i {} -ac 1 -ar {} -f wav {}".format(input_file_path, output_sample_rate, output_file_path)
+                cmd = "ffmpeg -i {} -ac 1 -ar {} -f wav {}".format(
+                    input_file_path, output_sample_rate, output_file_path)
             else:
-                cmd = "ffmpeg -i {} -ac 2 -ar {} -f wav {}".format(input_file_path, output_sample_rate, output_file_path)
+                cmd = "ffmpeg -i {} -ac 2 -ar {} -f wav {}".format(
+                    input_file_path, output_sample_rate, output_file_path)
             try:
                 subprocess.run(cmd, check=True)
             except subprocess.CalledProcessError as e:

diff --git a/intel_extension_for_transformers/neural_chat/pipeline/plugins/caching/README.md b/intel_extension_for_transformers/neural_chat/pipeline/plugins/caching/README.md
@@ -0,0 +1,59 @@
+# 🚀 What is caching plugin?
+
+When LLM service encounters higher traffic levels, the expenses related to LLM API calls can become substantial. Additionally, LLM services might exhibit slow response times. Hence, we leverage GPTCache to build a semantic caching plugin for storing LLM responses. This README.md file provides an overview of the functionality of caching plugin, how to use it, and some example code snippets.
+
+# 😎 What can this help with?
+
+Caching plugin offers the following primary benefits:
+
+- **Decreased expenses**: Caching plugin effectively minimizes expenses by caching query results, which in turn reduces the number of requests and tokens sent to the LLM service.
+- **Enhanced performance**: Caching plugin can also provide superior query throughput compared to standard LLM services.
+- **Improved scalability and availability**: Caching plugin can easily scale to accommodate an increasing volume of of queries, ensuring consistent performance as your application's user base expands.
+
+# 🤔 How does it work?
+
+Online services often exhibit data locality, with users frequently accessing popular or trending content. Cache systems take advantage of this behavior by storing commonly accessed data, which in turn reduces data retrieval time, improves response times, and eases the burden on backend servers. Traditional cache systems typically utilize an exact match between a new query and a cached query to determine if the requested content is available in the cache before fetching the data.
+
+However, using an exact match approach for LLM caches is less effective due to the complexity and variability of LLM queries, resulting in a low cache hit rate. To address this issue, GPTCache adopt alternative strategies like semantic caching. Semantic caching identifies and stores similar or related queries, thereby increasing cache hit probability and enhancing overall caching efficiency. GPTCache employs embedding algorithms to convert queries into embeddings and uses a vector store for similarity search on these embeddings. This process allows GPTCache to identify and retrieve similar or related queries from the cache storage.
+
+<a target="_blank" href="https://github.com/zilliztech/GPTCache/blob/main/docs/GPTCacheStructure.png">
+<p align="center">
+  <img src="https://github.com/zilliztech/GPTCache/blob/main/docs/GPTCacheStructure.png" alt="Cache Structure" width=600 height=200>
+</p>
+</a>
+
+# Installation
+To use the caching plugin functionality, you need to install the `gptcache` library first. You can do this using pip:
+
+```bash
+pip install -r requirements.txt
+```
+
+# Usage
+## Initializing
+
+Before using the functionality of caching plugin, you need to initialize the caching plugin with the desired configuration. The following code demonstrates how to initialize caching plugin:
+
+```python
+from intel_extension_for_transformers.neural_chat.pipeline.plugins.cache import CachePlugin
+cache_plugin = CachePlugin()
+cache_plugin.init_similar_cache_from_config()
+```
+
+## Caching Data
+
+Once cache plugin is initialized, you can start caching data using the `put`` function. Here's an example of how to cache data:
+
+```python
+prompt = "Tell me about Intel Xeon Scable Processors."
+response = chatbot.predict(prompt)
+cache_plugin.put(prompt, response)
+```
+
+## Retrieving Cached Data
+
+To retrieve cached data, use the get function. Provide the same prompt/question text used for caching, and it will return the cached answer. Here's an example:
+
+```python
+answer = cache_plugin.get("Tell me about Intel Xeon Scable Processors.")
+```
diff --git a/intel_extension_for_transformers/neural_chat/pipeline/plugins/caching/__init__.py b/intel_extension_for_transformers/neural_chat/pipeline/plugins/caching/__init__.py
@@ -14,3 +14,5 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+
+from .cache import CachePlugin