#From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

In this tutorial, we will walk you through setting up the environment and running the gradio app that will let you drive a photorealistic avatar using your voice.

More useful links: [Arxiv]() | [Code](https://github.com/facebookresearch/audio2photoreal/) | [Project page](https://people.eecs.berkeley.edu/~evonne_ng/projects/audio2photoreal/)

# Environment setup
Simply run through all of the 3 cells below. This will install the proper environment, download assets, and place them in the right places.

In [3]:
# Setup environment and install requirements
!pip install -r scripts/requirements.txt

Collecting blobfile (from -r scripts/requirements.txt (line 2))
  Downloading blobfile-3.0.0-py3-none-any.whl.metadata (15 kB)
Collecting einops (from -r scripts/requirements.txt (line 3))
  Downloading einops-0.8.0-py3-none-any.whl.metadata (12 kB)
Collecting fairseq (from -r scripts/requirements.txt (line 4))
  Downloading fairseq-0.12.2.tar.gz (9.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.6/9.6 MB[0m [31m95.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Installing backend dependencies ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting gradio (from -r scripts/requirements.txt (line 5))
  Downloading gradio-4.44.1-py3-none-any.whl.metadata (15 kB)
Collecting matplotlib (from -r scripts/requirements.txt (line 6))
  Downloading matplotlib-3.9.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

In [4]:
# download models, rendering assets, and prerequisite models respectively
!wget http://audio2photoreal_models.berkeleyvision.org/PXB184_models.tar
!tar xvf PXB184_models.tar
!rm PXB184_models.tar

!mkdir -p checkpoints/ca_body/data/
!wget https://github.com/facebookresearch/ca_body/releases/download/v0.0.1-alpha/PXB184.tar.gz
!tar xvf PXB184.tar.gz --directory checkpoints/ca_body/data/
!rm PXB184.tar.gz

!wget http://audio2photoreal_models.berkeleyvision.org/asset_models.tar
!tar xvf asset_models.tar
!rm asset_models.tar

--2024-10-15 12:52:34--  http://audio2photoreal_models.berkeleyvision.org/PXB184_models.tar
Resolving audio2photoreal_models.berkeleyvision.org (audio2photoreal_models.berkeleyvision.org)... 128.32.162.150
Connecting to audio2photoreal_models.berkeleyvision.org (audio2photoreal_models.berkeleyvision.org)|128.32.162.150|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1350748160 (1.3G) [application/octet-stream]
Saving to: 'PXB184_models.tar'


2024-10-15 12:56:16 (5.83 MB/s) - 'PXB184_models.tar' saved [1350748160/1350748160]

checkpoints/diffusion/c1_face/model000155000.pt
checkpoints/diffusion/c1_face/args.json
checkpoints/diffusion/c1_pose/model000340000.pt
checkpoints/diffusion/c1_pose/args.json
checkpoints/guide/c1_pose/args.json
checkpoints/guide/c1_pose/checkpoints/iter-0100000.pt
checkpoints/vq/c1_pose/args.json
checkpoints/vq/c1_pose/net_iter300000.pth
--2024-10-15 12:56:43--  https://github.com/facebookresearch/ca_body/releases/downlo

In [8]:
# install pytorch3d

import sys
import torch
pyt_version_str=torch.__version__.split("+")[0].replace(".", "")
version_str="".join([
    f"py3{sys.version_info.minor}_cu",
    torch.version.cuda.replace(".",""),
    f"_pyt{pyt_version_str}"
])
!pip install fvcore iopath
!pip install "git+https://github.com/facebookresearch/pytorch3d.git"

Collecting git+https://github.com/facebookresearch/pytorch3d.git
  Cloning https://github.com/facebookresearch/pytorch3d.git to /tmp/pip-req-build-jln21pxn
  Running command git clone --filter=blob:none --quiet https://github.com/facebookresearch/pytorch3d.git /tmp/pip-req-build-jln21pxn
  Resolved https://github.com/facebookresearch/pytorch3d.git to commit e17ed5cd50a1b43ed60ca4ff7a9a2e329c17c012
  Preparing metadata (setup.py) ... [?25ldone


# Run the model
**Important!!** Before you can run the model, there are two things you must fix.


1.   Fix the runtime settings for collections. With python >= 3.10, google collab will complain `ImportError: cannot import name 'Mapping' from 'collections`.
As a result, you *will* manually need to correct the path for collections from `import collections` to `import collections.abc` for all files that the environment complains about. You can just directly click into those files and change the path. See [this post](https://stackoverflow.com/questions/69381312/importerror-cannot-import-name-mapping-from-collections-using-python-3-10) for more details.

2.   Change the demo script to deploy a public link. You will need to go into `audio2photoreal/demo/demo.py` and on line 272, change from `demo.launch(show_api=False)` to `demo.launch(share=True)`

These are all the file paths I had to change:


* /usr/local/lib/python3.10/dist-packages/attrdict/mapping.py
* /usr/local/lib/python3.10/dist-packages/attrdict/mixins.py
* /usr/local/lib/python3.10/dist-packages/attrdict/merge.py
* /usr/local/lib/python3.10/dist-packages/attrdict/default.py
* /content/audio2photoreal/demo/demo.py

If anyone knows how to revert colab to python==3.9 and would like to share that tidbit with me, would greatly appreciate an email ping :)

**👆 Tip!** It takes at least 8-12 seconds of audio before the lip synching starts working better. It's definitely a limitation of our work, but with clips longer than 8 seconds, you end up getting much better results.

After you finish those two changes, you can go ahead and run the below cell. It will return a *public URL* that you can click into.


In [2]:
import torch
torch.cuda.is_available()

True

In [None]:
!python -m pip install pip==23.1.1

Collecting pip==23.1.1
  Downloading pip-23.1.1-py3-none-any.whl.metadata (4.1 kB)
Downloading pip-23.1.1-py3-none-any.whl (2.1 MB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.1 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.3/2.1 MB[0m [31m7.4 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m2.1/2.1 MB[0m [31m31.3 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m24.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 24.1.2
    Uninstalling pip-24.1.2:
      Successfully uninstalled pip-24.1.2
Successfully installed pip-23.1.1


In [6]:
!pip install gradio
!pip install attrdict
!pip install fairseq
!pip install mediapy

Collecting gradio
  Using cached gradio-4.44.1-py3-none-any.whl.metadata (15 kB)
Collecting aiofiles<24.0,>=22.0 (from gradio)
  Using cached aiofiles-23.2.1-py3-none-any.whl.metadata (9.7 kB)
Collecting fastapi<1.0 (from gradio)
  Using cached fastapi-0.115.2-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from gradio)
  Using cached ffmpy-0.4.0-py3-none-any.whl.metadata (2.9 kB)
Collecting gradio-client==1.3.0 (from gradio)
  Using cached gradio_client-1.3.0-py3-none-any.whl.metadata (7.1 kB)
Collecting huggingface-hub>=0.19.3 (from gradio)
  Using cached huggingface_hub-0.25.2-py3-none-any.whl.metadata (13 kB)
Collecting importlib-resources<7.0,>=1.3 (from gradio)
  Using cached importlib_resources-6.4.5-py3-none-any.whl.metadata (4.0 kB)
Collecting matplotlib~=3.0 (from gradio)
  Using cached matplotlib-3.9.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Collecting orjson~=3.0 (from gradio)
  Using cached orjson-3.10.7-cp39-cp39-many

In [7]:
!python -m demo.demo

/home/shannon/.conda/envs/cas3ubuntu/bin/python: Error while finding module specification for 'demo.demo' (ModuleNotFoundError: No module named 'demo')
