Skip to content

oaishi/3DScene_from_text

Repository files navigation

Static and Animated 3D Scene Generation from Free-form Text Descriptions

We propose a novel approach to generate 3D scenes (both animated and static) from text using a Transformer based NLP architecture and non-differential renderer.

Dependencies

  1. Blender (version 2.78c) (Please see details from here)
  2. Pytorch (==1.2.0)
  3. Transformers (==3.0.2)
  4. Numpy
  5. Pickel
  6. OpenCV

A simplistic .yml file has been added for reference.

Few Examples

Description Ground truth Generated Scene
"A rocking cyan matte sphere, a small rocking gray
matte object, a small rocking shiny cylinder,
a large rocking blue matte object, a spinning blue matte cube,
a small moving brown sphere
and a rocking blue shiny cube."
"Draw a large yellow colored cylinder of matte texture, a large cyan colored cube of
matte texture, a large brown colored cylinder of shiny texture, a large red colored
cube of matte texture and a large brown colored
cylinder of shiny texture."

All other examples are under '\Output' folder.

Run Prediction

To run prediction on the model, Mstatic -

cd scripts
python runner.py --type "image" --target "image" --pred_count 15

Here, pred_count specifies number of prediction to run. For evaluation, 64 sample test files have been attached.

To run prediction on the model, Manimated -

cd scripts
python runner.py --type "video" --target "video" --pred_count 15

To run prediction on the model, Mfull -

cd scripts
python runner.py --type "combined" --target "image" --pred_count 15

Replace target with video to generate videos instead. An image takes around 3-4 seconds to be rendered. A video takes around 2-4 minutes to be rendered.

All generated images(static scenes) and videos(animated scenes) are saved in the output folder.

Run Evaluation

To run evaluation on the model, Mstatic -

cd scripts
python runner.py --type "image" --sector "evaluate"

To run evaluation on the model, Manimated -

cd scripts
python runner.py --type "video" --sector "evaluate" 

To run evaluation on the model, Mfull -

cd scripts
python runner.py --type "combined" --sector "evaluate"

Run Prediction on user given input

For the model, Mstatic -

cd scripts
python runner.py --type "image" --sector "predict_single" --description <YOUR_DESCRIPTION> 

For the model, Manimated -

cd scripts
python runner.py --type "video" --sector "predict_single" --description <YOUR_DESCRIPTION> 

For the model, Mfull -

cd scripts
python runner.py --type "combined" --sector "predict_single" --description <YOUR_DESCRIPTION> 

Dataset Generation

Our dataset is generated on top of the CLEVR dataset. The CLEVR dataset includes some JSON files that include all the scenes they used. We take these JSON files and generate 13 kinds of scene descriptions for each of these files. Follow these steps to generate the dataset:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published