Skip to content

Named after a charm in the Harry Potter Universe. When cast successfully, it places the other organism completely under the caster's control. In muggles' terminology, this is a repository of modules for interacting to and through an avatar robot.

License

Notifications You must be signed in to change notification settings

pensieves/imperio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

imperio

Named after a charm in the Harry Potter Universe. When cast successfully, it places the other organism completely under the caster's control. In muggles' terminology, this is a repository of modules for interacting to and through an avatar or a tele-operated robot, including using Speech and Natural Language for action execution on robots.

Table of Contents:

Installation:

Install python modules mentioned in requirements.txt:

pyaudio and librosa/soundfile have dependencies on portaudio and libsndfile1. For Ubuntu, the same can be installed by executing:

sudo apt install portaudio19-dev libsndfile1

Then install requirements by executing:

pip install -r requirements.txt

or install using conda in a conda environment.

Install other external dependencies:

  • hr_msgs (from Hanson Robotics)
  • Other, if any, vendor dependent installations for the installed ROS distribution.

Environment set up:

Source the ROS setup file:

source /opt/ros/noetic/setup.bash

Sourcing ROS is required to access rospy in ROS 1 (or rclpy in ROS 2) and other ROS modules.

Note: Google Application Credentials is only required to be provided when using Google Cloud's speech to text api.

Update the environment exporting file and source it:

source .env

Or set environment variables explicitly by exporting e.g.:

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/google-cloud-credentials.json
export ROS_MASTER_URI=http://<ros_master_ip>:<ros_master_port>
export ROS_IP=<local_machine_ros_ip>

Refer .env file to see examples of environment values and format.

Docker set up:

Ensure that external installation files are present in the docker/externals folder. Execute the following commands from the project root directory i.e. current directory:

Make the build and run shell scripts executable by:

chmod +x docker/docker_build.sh
chmod +x docker/docker_run.sh

Now, to build the imperio docker image, execute:

docker/docker_build.sh

If required, update the ROS_IP and ROS_MASTER_URI environment variables in the docker/run.env. Finally, to run the imperio docker container, execute:

docker/docker_run.sh <container_name>

The container_name could be specified based on the task e.g. it can be named imperio_speech or imperio_audio for streaming_speech or audio tasks respectively.

Then follow the running instructions for the required tasks e.g. to run streaming_speech with flask server and tuning, execute:

docker/docker_run.sh imperio_speech
python3 python3 examples/streaming_speech_flask/streaming_speech_server.py

To run streaming_speech without flask server, ui and tuning execute:

docker/docker_run.sh imperio_speech
python3 examples/streaming_speech.py --voice_conv_fn change_pitch --multiplier 1.5

To receive audio from the avatar to operator, execute:

docker/docker_run.sh imperio_audio
python3 examples/audio.py

Alternately, execute the tmuxinator (install using sudo apt install tmuxinator) to run with default configs for speech and audio by executing:

tmuxinator start audio

To stop tmuxinator use:

tmuxinator stop audio

Running instructions:

  • Operator (Caster) to Robot/Avatar (Imperiused) direct speech relay or execution: Receives speech input from operator and plays it back on robot's/avatar's side directly maintaining audio content as is, with gender conversion if specified. To relay speech without gender voice manipulation execute:

python3 examples/streaming_speech.py

To relay speech with gender voice manipulation, two praat voice conversion functions are supported - change_gender and change_pitch, with change_pitch being preferable. To convert male voice to female, specify multiplier values > 1 and vice-versa for female to male voice conversion. A sample command to execute is:

python3 examples/streaming_speech.py --voice_conv_fn change_pitch --multiplier 1.5

  • Operator (Caster) to Robot/Avatar (Imperiused) indirect speech relay or execution: Receives speech input from operator and either plays it back on robot's/avatar's side, or if wake/context words such as Asha is recognized then execute the subsequent action specified.

python3 examples/speech.py --ros_init

for on-device Facebook's Wav2Vec2 model made available by Hugging Face. For using Google cloud's speech to text execute:

python3 examples/google-speech.py --ros_init

To modify the execution parameters of the on-device model such as providing GPU device index in case of availability, the program can be run as:

python3 examples/speech.py --ros_init --gpu_idx 0

  • Robot/Avatar (Imperiused) to Operator (Caster) audio reception: Receives audio input from robot/avatar captured through its microphone and plays it back on speaker on operator's machine.

python3 examples/audio.py

About

Named after a charm in the Harry Potter Universe. When cast successfully, it places the other organism completely under the caster's control. In muggles' terminology, this is a repository of modules for interacting to and through an avatar robot.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published