#From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

In this tutorial, we will walk you through setting up the environment and running the gradio app that will let you drive a photorealistic avatar using your voice.

More useful links: [Arxiv]() | [Code](https://github.com/facebookresearch/audio2photoreal/) | [Project page](https://people.eecs.berkeley.edu/~evonne_ng/projects/audio2photoreal/)

# Environment setup
Simply run through all of the 3 cells below. This will install the proper environment, download assets, and place them in the right places.

In [2]:
# Setup environment and install requirements
!pip install -r scripts/requirements.txt

[31mERROR: Could not open requirements file: [Errno 2] No such file or directory: 'scripts/requirements.txt'[0m[31m
[0m

In [None]:
# download models, rendering assets, and prerequisite models respectively
!wget http://audio2photoreal_models.berkeleyvision.org/PXB184_models.tar
!tar xvf PXB184_models.tar
!rm PXB184_models.tar

!mkdir -p checkpoints/ca_body/data/
!wget https://github.com/facebookresearch/ca_body/releases/download/v0.0.1-alpha/PXB184.tar.gz
!tar xvf PXB184.tar.gz --directory checkpoints/ca_body/data/
!rm PXB184.tar.gz

!wget http://audio2photoreal_models.berkeleyvision.org/asset_models.tar
!tar xvf asset_models.tar
!rm asset_models.tar

--2024-08-14 12:39:35--  http://audio2photoreal_models.berkeleyvision.org/PXB184_models.tar
Resolving audio2photoreal_models.berkeleyvision.org (audio2photoreal_models.berkeleyvision.org)... 128.32.162.150
Connecting to audio2photoreal_models.berkeleyvision.org (audio2photoreal_models.berkeleyvision.org)|128.32.162.150|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1350748160 (1.3G) [application/octet-stream]
Saving to: ‘PXB184_models.tar’


2024-08-14 12:40:01 (50.3 MB/s) - ‘PXB184_models.tar’ saved [1350748160/1350748160]

checkpoints/diffusion/c1_face/model000155000.pt
checkpoints/diffusion/c1_face/args.json
checkpoints/diffusion/c1_pose/model000340000.pt
checkpoints/diffusion/c1_pose/args.json
checkpoints/guide/c1_pose/args.json
checkpoints/guide/c1_pose/checkpoints/iter-0100000.pt
checkpoints/vq/c1_pose/args.json
checkpoints/vq/c1_pose/net_iter300000.pth
--2024-08-14 12:40:04--  https://github.com/facebookresearch/ca_body/releases/download/v0.0.1-alpha/PX

In [None]:
# install pytorch3d

import sys
import torch
pyt_version_str=torch.__version__.split("+")[0].replace(".", "")
version_str="".join([
    f"py3{sys.version_info.minor}_cu",
    torch.version.cuda.replace(".",""),
    f"_pyt{pyt_version_str}"
])
!pip install fvcore iopath
!pip install "git+https://github.com/facebookresearch/pytorch3d.git"

Collecting fvcore
  Downloading fvcore-0.1.5.post20221221.tar.gz (50 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.2/50.2 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting iopath
  Downloading iopath-0.1.10.tar.gz (42 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.2/42.2 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting yacs>=0.1.6 (from fvcore)
  Downloading yacs-0.1.8-py3-none-any.whl.metadata (639 bytes)
Collecting portalocker (from iopath)
  Downloading portalocker-2.10.1-py3-none-any.whl.metadata (8.5 kB)
Downloading yacs-0.1.8-py3-none-any.whl (14 kB)
Downloading portalocker-2.10.1-py3-none-any.whl (18 kB)
Building wheels for collected packages: fvcore, iopath
  Building wheel for fvcore (setup.py) ... [?25l[?25hdone
  Created wheel for fvcore: filename=fvcore-0.1.5.post20221221-py3-none-any.whl size

# Run the model
**Important!!** Before you can run the model, there are two things you must fix.


1.   Fix the runtime settings for collections. With python >= 3.10, google collab will complain `ImportError: cannot import name 'Mapping' from 'collections`.
As a result, you *will* manually need to correct the path for collections from `import collections` to `import collections.abc` for all files that the environment complains about. You can just directly click into those files and change the path. See [this post](https://stackoverflow.com/questions/69381312/importerror-cannot-import-name-mapping-from-collections-using-python-3-10) for more details.

2.   Change the demo script to deploy a public link. You will need to go into `audio2photoreal/demo/demo.py` and on line 272, change from `demo.launch(show_api=False)` to `demo.launch(share=True)`

These are all the file paths I had to change:


* /usr/local/lib/python3.10/dist-packages/attrdict/mapping.py
* /usr/local/lib/python3.10/dist-packages/attrdict/mixins.py
* /usr/local/lib/python3.10/dist-packages/attrdict/merge.py
* /usr/local/lib/python3.10/dist-packages/attrdict/default.py
* /content/audio2photoreal/demo/demo.py

If anyone knows how to revert colab to python==3.9 and would like to share that tidbit with me, would greatly appreciate an email ping :)

**👆 Tip!** It takes at least 8-12 seconds of audio before the lip synching starts working better. It's definitely a limitation of our work, but with clips longer than 8 seconds, you end up getting much better results.

After you finish those two changes, you can go ahead and run the below cell. It will return a *public URL* that you can click into.


In [1]:
import torch
torch.cuda.is_available()

True

In [None]:
!python -m pip install pip==23.1.1

Collecting pip==23.1.1
  Downloading pip-23.1.1-py3-none-any.whl.metadata (4.1 kB)
Downloading pip-23.1.1-py3-none-any.whl (2.1 MB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.1 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.3/2.1 MB[0m [31m7.4 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m2.1/2.1 MB[0m [31m31.3 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m24.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 24.1.2
    Uninstalling pip-24.1.2:
      Successfully uninstalled pip-24.1.2
Successfully installed pip-23.1.1


In [None]:
!pip install gradio
!pip install attrdict
!pip install fairseq
!pip install mediapy

Collecting gradio
  Downloading gradio-4.41.0-py3-none-any.whl (12.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.6/12.6 MB[0m [31m27.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting aiofiles<24.0,>=22.0 (from gradio)
  Downloading aiofiles-23.2.1-py3-none-any.whl (15 kB)
Collecting fastapi (from gradio)
  Downloading fastapi-0.112.0-py3-none-any.whl (93 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m93.1/93.1 kB[0m [31m13.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting ffmpy (from gradio)
  Downloading ffmpy-0.4.0-py3-none-any.whl (5.8 kB)
Collecting gradio-client==1.3.0 (from gradio)
  Downloading gradio_client-1.3.0-py3-none-any.whl (318 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m318.7/318.7 kB[0m [31m33.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting httpx>=0.24.1 (from gradio)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 k

In [None]:
!python -m demo.demo

2024-08-14 15:43:12.136742: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-14 15:43:12.157071: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-14 15:43:12.163559: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-08-14 15:43:12.179085: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
running on... cuda:0
[92m adding lip conditioning ./