Skip to content

Commit

Permalink
Merge pull request #87 from princeton-vl/rc_1.0.4
Browse files Browse the repository at this point in the history
v1.0.4 - Pregenerated download tools, ground truth updates, render throughput improvements
  • Loading branch information
araistrick committed Oct 16, 2023
2 parents 0f208a8 + 359f08e commit a8ba86a
Show file tree
Hide file tree
Showing 80 changed files with 3,220 additions and 1,530 deletions.
10 changes: 10 additions & 0 deletions .github/ISSUE_TEMPLATE/other.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
name: Other
about: ""
title: ""
labels:
assignees: ''

---


1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ Next, see our ["Hello World" example](docs/HelloWorld.md) to generate an image &
- [Installation Guide](docs/Installation.md)
- ["Hello World": Generate your first Infinigen scene](docs/HelloWorld.md)
- [Configuring Infinigen](docs/ConfiguringInfinigen.md)
- [Downloading pre-generated data](docs/PreGeneratedData.md)
- [Extended ground-truth](docs/GroundTruthAnnotations.md)
- [Generating individual assets](docs/GeneratingIndividualAssets.md)
- [Implementing new materials & assets](docs/ImplementingAssets.md)
Expand Down
15 changes: 14 additions & 1 deletion docs/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,17 @@
v1.0.0 - Beta code release <br>

v1.0.1 - BSD-3 license, expanded ground-truth docs, show line-credits, miscellaneous fixes <br>

v1.0.2 - New documentation, plant improvements, disk and reproducibility improvements <br>
v1.0.3 - Fluid code release, implementing assets documentation, render tools improvements, integration tests <br>

v1.0.3
- Fluid code release
- implementing assets documentation
- render tools improvements
- integration testing script

v1.0.4
- Tools and docs to download preliminary pre-generated data release,
- Reformat "frames" folder to be more intuitive / easier to dataload
- ground truth updates
- render throughput improvements
161 changes: 30 additions & 131 deletions docs/GroundTruthAnnotations.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,5 @@
# Ground-Truth Annotations

### Changelog

- 07/03/23: Add specification for Blender's built-in annotations. Save built-in annotations as numpy arrays. Add more information to Objects_XXXX_XX_XX.json. Significant changes to built-in segmentation masks (fixes & filenames). Improve visualizations for built-in annotations. Always save camera parameters in frames/. Update docs.

### Agenda

- Save forward and backward flow for both built-in and advanced annotations.
Expand Down Expand Up @@ -66,97 +62,54 @@ bash worldgen/tools/install/compile_opengl.sh

### Extended Hello-World

Continuing the [Hello-World](/README.md#generate-a-scene-step-by-step) example, we can produce the full set of annotations that Infinigen supports. Following step 3:
To generate the hello-world scene using our custom annotation system, run:

4. Export the geometry from blender to disk
```
$BLENDER -noaudio --background --python generate.py -- --seed 0 --task mesh_save -g desert simple --input_folder outputs/helloworld/fine --output_folder outputs/helloworld/saved_mesh
```
5. Generate dense annotations
```
../process_mesh/build/process_mesh --frame 1 -in outputs/helloworld/saved_mesh -out outputs/helloworld/frames
```
6. Summarize the file structure into a single JSON
```
python tools/summarize.py outputs/helloworld # creating outputs/helloworld/summary.json
```
7. (Optional) Select for a segmentation mask of certain semantic tags, e.g. cactus
```
python tools/ground_truth/segmentation_lookup.py outputs/helloworld 1 --query cactus
python -m tools.manage_datagen_jobs --output_folder outputs/hello_world/0 --num_scenes 1 --specific_seed 0 \
--configs desert.gin simple.gin --pipeline_configs local_16GB.gin monocular.gin opengl_gt.gin --pipeline_overrides LocalScheduleHandler.use_gpu=False
```
This is the [the previous manage_datagen_jobs command](https://github.com/princeton-vl/infinigen_internal/blob/oc16_update_docs/docs/HelloWorld.md#generate-images-in-one-command), but replacing `blender_gt.gin` with `opengl_gt.gin`

## Specification

**File structure:**

We provide a python script `summarize.py` which will aggregate all relevant output file paths into a JSON:
```
python tools/summarize.py <output-folder>
```
The resulting `<output-folder>/summary.json` will contains all file paths in the form:
```
{
<type>: {
<file-ext>: {
<rig>: {
<sub-cam>: {
<frame>: <file-path>
}
}
}
}
}
```

`<rig>` and `<sub-cam>` are typically both "00" in the monocular setting; `<file-ext>` is typically "npy" or "png" for the the actual data and the visualization, respectively; `<frame>` is a 0-padded 4-digit number, e.g. "0013". `<type>` can be "SurfaceNormal", "Depth", etc. For example
`summary_json["SurfaceNormal"]["npy"]["00"]["00"]["0001"]` -> `'frames/SurfaceNormal_0001_00_00.npy'`
Below, we specify the data format and resolution of all ground truth passes exported by the default infinigen configuration. Where applicable, H and W refer to the height and width of the RGB image; some ground truth is stored at integer-multiples of this resolution, as described below.

*Note: Currently our advanced ground-truth has only been tested for the aspect-ratio 16-9.*
Note: In cases where both a .png and .npy file are available, we recommend you use the .png file only for visualization, and default to using the .npy file for training.

**Depth**

Depth is stored as a 2160 x 3840 32-bit floating point numpy array.

*Path:* `summary_json["Depth"]["npy"]["00"]["00"]["0001"]` -> `frames/Depth_0001_00_00.npy`

*Visualization:* `summary_json["Depth"]["png"]["00"]["00"]["0001"]` -> `frames/Depth_0001_00_00.png`
Depth is stored as a 2H x 2W 32-bit floating point numpy array.

<p align="center">
<img src="docs/images/gt_annotations/Depth_0001_00_00.png" width="400" />
<img src="images/gt_annotations/Depth_0001_00_00.png" width="400" />
</p>

The depth and camera parameters can be used to warp one image to another frame by running:
```
python tools/ground_truth/rigid_warp.py <folder> <first-frame> <second-frame>
python -m tools.ground_truth.rigid_warp <folder> <first-frame> <second-frame>
```

**Surface Normals**

Surface Normals are stored as a 1080 x 1920 x 3 32-bit floating point numpy array.
Surface Normals are stored as a H x W x 3 32-bit floating point numpy array.

The coordinate system for the surface normals is +X -> Right, +Y -> Up, +Z Backward.

*Path:* `summary_json["SurfaceNormal"]["npy"]["00"]["00"]["0001"]` -> `frames/SurfaceNormal_0001_00_00.npy`

*Visualization:* `summary_json["SurfaceNormal"]["png"]["00"]["00"]["0001"]` -> `frames/SurfaceNormal_0001_00_00.png`

<p align="center">
<img src="docs/images/gt_annotations/SurfaceNormal_0001_00_00.png" width="400" />
<img src="images/gt_annotations/SurfaceNormal_0001_00_00.png" width="400" />
</p>

### Occlusion Boundaries :large_blue_diamond:

Occlusion Boundaries are stored as a 2160 x 3840 png, with 255 indicating a boundary and 0 otherwise.

*Path/Visualization:* `summary_json["OcclusionBoundaries"]["png"]["00"]["00"]["0001"]` -> `frames/OcclusionBoundaries_0001_00_00.png`
Occlusion Boundaries are a >= 2H x 2W png, with 255 indicating a boundary and 0 otherwise.

<p align="center">
<img src="docs/images/gt_annotations/OcclusionBoundaries_0001_00_00.png" width="400" />
<img src="images/gt_annotations/OcclusionBoundaries_0001_00_00.png" width="400" />
</p>

### Optical Flow

Optical Flow / Scene Flow is stored as a 2160 x 3840 x 3 32-bit floating point numpy array.
Optical Flow / Scene Flow is a H x W x 3 32-bit floating point numpy array.

*Note: The values won't be meaningful if this is the final frame in a series, or in the single-view setting.*

Expand All @@ -169,112 +122,58 @@ Channel 3 is the depth change between this frame and the next.
To see an example of how optical flow can be used to warp one frame to the next, run

```
python tools/ground_truth/optical_flow_warp.py <folder> <frame-number>
python -m tools.ground_truth.optical_flow_warp <folder> <frame-number>
```

*Path:* `summary_json["Flow3D"]["npy"]["00"]["00"]["0001"]` -> `frames/Flow3D_0001_00_00.npy`

*Visualization:* `summary_json["Flow3D"]["png"]["00"]["00"]["0001"]` -> `frames/Flow3D_0001_00_00.png`

For the built-in versions, replace `Flow3D` with `Flow`.
If using `blender_gt.gin` rathern than `opengl_gt.gin` replace `Flow3D` with `Flow`, since Blender does not export 3D flow.

### Optical Flow Occlusion :large_blue_diamond:

The mask of occluded pixels for the aforementioned optical flow is stored as a 2160 x 3840 png, with 255 indicating a co-visible pixel and 0 otherwise.
The mask of occluded pixels for the aforementioned optical flow is stored as a H x W png, with 255 indicating a co-visible pixel and 0 otherwise.

*Note: This mask is computed by comparing the face-ids on the triangle meshes at either end of each flow vector. Infinigen meshes often contain multiple faces per-pixel, resulting in frequent false-negatives (negative=occluded). These false-negatives are generally distributed uniformly over the image (like salt-and-pepper noise), and can be reduced by max-pooling the occlusion mask down to the image resolution.*

*Path/Visualization:* `summary_json["Flow3DMask"]["png"]["00"]["00"]["0001"]` -> `frames/Flow3DMask_0001_00_00.png`
### Camera Intrinsics & Extrinsics

### Camera Intrinsics
Camera intrinsics and extrinsics are stored as a numpy ".npz" file inside the "camview" folder.

Infinigen renders images using a pinhole camera model. The resulting camera intrinsics for each frame are stored as a 3 x 3 numpy matrix.

*Path:* `summary_json["Camera Intrinsics"]["npy"]["00"]["00"]["0001"]` -> `frames/K_0001_00_00.npy`

### Camera Extrinsics

The camera pose is stored as a 4 x 4 numpy matrix mapping from camera coordinates to world coordinates.

As is standard in computer vision, the assumed world coordinate system in the saved camera poses is +X -> Right, +Y -> Down, +Z Forward. This is opposed to how Blender internally represents geometry, with flipped Y and Z axes.

*Path:* `summary_json["Camera Pose"]["npy"]["00"]["00"]["0001"]` -> `frames/T_0001_00_00.npy`

### Panoptic Segmentation

Infinigen saves three types of semantic segmentation masks: 1) Object Segmentation 2) Tag Segmentation 3) Instance Segmentation

*Object Segmentation* distinguishes individual blender objects, and is stored as a 2160 x 3840 32-bit integer numpy array. Each integer in the mask maps to an object in Objects_XXXX_XX_XX.json with the same value for the `"object_index"` field. The definition of "object" is imposed by Blender; generally large or complex assets such as the terrain, trees, or animals are considered one singular object, while a large number of smaller assets (e.g. grass, coral) may be grouped together if they are using instanced-geometry for their implementation.

*Instance Segmentation* distinguishes individual instances of a single object from one another (e.g. separate blades of grass, separate ferns, etc.), and is stored as a 2160 x 3840 32-bit integer numpy array. Each integer in this mask is the *instance-id* for a particular instance, which is unique for that object as defined in the Object Segmentation mask and Objects_XXXX_XX_XX.json.

*Object Segmentation* distinguishes individual blender objects, and is stored as a H x W 32-bit integer numpy array. Each integer in the mask maps to an object in Objects_XXXX_XX_XX.json with the same value for the `"object_index"` field. The definition of "object" is imposed by Blender; generally large or complex assets such as the terrain, trees, or animals are considered one singular object, while a large number of smaller assets (e.g. grass, coral) may be grouped together if they are using instanced-geometry for their implementation.

*Paths:*
*Instance Segmentation* distinguishes individual instances of a single object from one another (e.g. separate blades of grass, separate ferns, etc.), and is stored as a H x W x 3 32-bit integer numpy array. Each integer in this mask is the *instance-id* for a particular instance, which is unique for that object as defined in the Object Segmentation mask and Objects_XXXX_XX_XX.json.

`summary_json["ObjectSegmentation"]["npy"]["00"]["00"]["0001"]` -> `frames/ObjectSegmentation_0001_00_00.npy`

`summary_json["InstanceSegmentation"]["npy"]["00"]["00"]["0001"]` -> `frames/InstanceSegmentation_0001_00_00.npy`

`summary_json["Objects"]["json"]["00"]["00"]["0001"]` -> `frames/Objects_0001_00_00.json`

*Visualizations:*

`summary_json["ObjectSegmentation"]["png"]["00"]["00"]["0001"]` -> `frames/ObjectSegmentation_0001_00_00.png`

`summary_json["InstanceSegmentation"]["png"]["00"]["00"]["0001"]` -> `frames/InstanceSegmentation_0001_00_00.png`

#### **Tag Segmentation** :large_blue_diamond:

*Tag Segmentation* distinguishes vertices based on their semantic tags, and is stored as a 2160 x 3840 64-bit integer numpy array. Infinigen tags all vertices with an integer which can be associated to a list of semantic labels in `MaskTag.json`. Compared to Object Segmentation, Infinigen's tagging system is less automatic but much more flexible. Missing features in the tagging system are usually possible and straightforward to implement, wheras in the automaically generated Object Segmentation they are not.

*Paths:*

`summary_json["TagSegmentation"]["npy"]["00"]["00"]["0001"]` -> `frames/TagSegmentation_0001_00_00.npy`

`summary_json["Mask Tags"][<frame>]` -> `fine/MaskTag.json`

*Visualization:*

`summary_json["TagSegmentation"]["png"]["00"]["00"]["0001"]` -> `frames/TagSegmentation_0001_00_00.png`

Generally, most useful panoptic segmentation masks can be constructed by combining the aforementioned three arrays in some way. As an example, to visualize the 2D and [3D bounding boxes](#object-metadata-and-3d-bounding-boxes) for objects with the *blender_rock* semantic tag in the hello world scene, run
Generally, most useful panoptic segmentation masks can be constructed by combining the aforementioned two arrays in some way. As an example, to visualize the 2D and [3D bounding boxes](#object-metadata-and-3d-bounding-boxes) for rock objects in the hello world scene, run
```
python tools/ground_truth/segmentation_lookup.py outputs/helloworld 1 --query blender_rock --boxes
python tools/ground_truth/bounding_boxes_3d.py outputs/helloworld 1 --query blender_rock
python -m tools.ground_truth.segmentation_lookup outputs/hello_world/0 48 --query rock --boxes
python -m tools.ground_truth.bounding_boxes_3d outputs/hello_world/0 48 --query rock
```
which will output

<p align="center">
<img src="docs/images/gt_annotations/blender_rock_2d.png" width="400" /> <img src="docs/images/gt_annotations/blender_rock_3d.png" width="400" />
</p>

By ommitting the --query flag, a list of available tags will be printed.

One could also produce a mask for only *flower petals*:
#### **Tag Segmentation** :large_blue_diamond:

```
python tools/ground_truth/segmentation_lookup.py outputs/helloworld 1 --query petal
```
<p align="center">
<img src="docs/images/gt_annotations/petal.png" width="400" />
</p>
*Tag Segmentation* distinguishes vertices based on their semantic tags, and is stored as a H x W 64-bit integer numpy array. Infinigen tags all vertices with an integer which can be associated to a list of semantic labels in `MaskTag.json`. Compared to Object Segmentation, Infinigen's tagging system is less automatic but much more flexible. Requested features in the tagging system are usually possible and straightforward to implement, wheras in the automaically generated Object Segmentation they are not.

A benefit of our tagging system is that one can produce a segmentation mask for things which are not a distinct object, such as terrain attributes. For instance, we can highlight only *caves* or *warped rocks*

```
python tools/ground_truth/segmentation_lookup.py outputs/helloworld 1 --query cave
python tools/ground_truth/segmentation_lookup.py outputs/helloworld 1 --query warped_rocks
```
<p align="center">
<img src="docs/images/gt_annotations/caves.png" width="400" /> <img src="docs/images/gt_annotations/warped_rocks.png" width="400" />
<img src="images/gt_annotations/caves.png" width="400" /> <img src="images/gt_annotations/warped_rocks.png" width="400" />
</p>

### Object Metadata and 3D bounding boxes

Each item in `Objects_0001_00_00.json` also contains other metadata about each object:
```
# Load object meta data
object_data = json.loads((Path("output/helloworld") / summary_json["Objects"]["json"]["00"]["00"]["0001"]).read_text())
object_data = json.loads(Path("outputs/helloworld/frames/Objects/camera_0/Objects_0_0_0001_0.json").read_text())
# select nth object
obj = object_data[n]
Expand All @@ -293,7 +192,7 @@ More fields :large_blue_diamond:
obj["tags"] # list of tags which appear on at least one vertex
obj["min"] # min-corner of bounding box, in object coordinates
obj["max"] # max-corner of bounding box, in object coordinates
obj["model_matrices"] # mapping from instance-ids to 4x4 obj->world transformation matrices
obj["model_matrices"] # 4x4 obj->world transformation matrices for all instances
```

The **3D bounding box** for each instance can be computed using `obj["min"]`, `obj["max"]`, `obj["model_matrices"]`. For an example, refer to [the bounding_boxes_3d.py example above](#tag-segmentation-large_blue_diamond).
The **3D bounding box** for each instance can be computed using `obj["min"]`, `obj["max"]`, `obj["model_matrices"]`. For an example, refer to [the bounding_boxes_3d.py example above](#panoptic-segmentation).
12 changes: 7 additions & 5 deletions docs/HelloWorld.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,19 +21,21 @@ cd worldgen
mkdir outputs
# Generate a scene layout
$BLENDER -noaudio --background --python generate.py -- --seed 0 --task coarse -g desert.gin simple.gin --output_folder outputs/helloworld/coarse
$BLENDER -noaudio --background --python generate.py -- --seed 0 --task coarse -g desert.gin simple.gin --output_folder outputs/hello_world/coarse
# Populate unique assets
$BLENDER -noaudio --background --python generate.py -- --seed 0 --task populate fine_terrain -g desert.gin simple.gin --input_folder outputs/helloworld/coarse --output_folder outputs/helloworld/fine
$BLENDER -noaudio --background --python generate.py -- --seed 0 --task populate fine_terrain -g desert.gin simple.gin --input_folder outputs/hello_world/coarse --output_folder outputs/hello_world/fine
# Render RGB images
$BLENDER -noaudio --background --python generate.py -- --seed 0 --task render -g desert.gin simple.gin --input_folder outputs/helloworld/fine --output_folder outputs/helloworld/frames
$BLENDER -noaudio --background --python generate.py -- --seed 0 --task render -g desert.gin simple.gin --input_folder outputs/hello_world/fine --output_folder outputs/hello_world/frames
# Render again for accurate ground-truth
$BLENDER -noaudio --background --python generate.py -- --seed 0 --task render -g desert.gin simple.gin --input_folder outputs/helloworld/fine --output_folder outputs/helloworld/frames -p render.render_image_func=@flat/render_image
$BLENDER -noaudio --background --python generate.py -- --seed 0 --task render -g desert.gin simple.gin --input_folder outputs/hello_world/fine --output_folder outputs/hello_world/frames -p render.render_image_func=@flat/render_image
```

Output logs should indicate what the code is working on. Use `--debug` for even more detail. After each command completes you can inspect it's `--output_folder` for results, including running `$BLENDER outputs/helloworld/coarse/scene.blend` or similar to view blender files. We hide many meshes by default for viewport stability; to view them, click "Render" or use the UI to unhide them.
:warning: If you recieve the error message `-noaudio: command not found` you most likely missed the `export BLENDER=...` step of the Installation instructions.

Output logs should indicate what the code is working on. Use `--debug` for even more detail. After each command completes you can inspect it's `--output_folder` for results, including running `$BLENDER outputs/hello_world/coarse/scene.blend` or similar to view blender files. We hide many meshes by default for viewport stability; to view them, click "Render" or use the UI to unhide them.

## Generate image(s) in one command

Expand Down

0 comments on commit a8ba86a

Please sign in to comment.