Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scene data serialization #427

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from
Draft

Scene data serialization #427

wants to merge 16 commits into from

Conversation

mosra
Copy link
Owner

@mosra mosra commented Mar 11, 2020

Continuation of #371, goes together with a branch of the same name in magnum-plugins. Current state:

$ du -h lucy.blob
482M    lucy.blob

$ time cp lucy.blob b.blob 
real    0m0,583s
user    0m0,001s
sys     0m0,467s

$ time magnum-sceneconverter lucy.blob b.blob 
real    0m0,460s
user    0m0,067s
sys     0m0,362s

WIP docs: https://doc.magnum.graphics/magnum-meshdata-cereal-killer/blob.html

Scene format conversion

Now done mostly independently of this PR.

  • Create a new AbstractMeshConverter plugin interface for operating on
    the MeshData ... or
    AbstractSceneConverter plugin interface
    • Provide AnySceneConverter
    • extend for multiple meshes
    • extend for generic scenes? needs a non-shitty representation of objects (SceneData rework #525)
  • Bootstrap a new magnum-meshconverter magnum-sceneconverter utility (1c51b98)
    • Make it actually call the converter plugins
    • add a --mesh selector -- 036207f
      • takes just a mesh of given name (and falls back to ID)
      • same for imageconverter, --image -- 413dc56
    • invent some way to chain converters (e.g. to optimize indices and then output to a gltf) -- 749ae98
    • expose useful meshtools like normal generation in it can't, it would cause a cyclic dependency :/ a plugin? how to name it? MagnumSceneConverter is already taken... overload its purpose?
    • make it operate in-place on imported data (needs Zero-copy importer plugin APIs #240 as well, tho) in progress
  • Integrate meshoptimizer -- mosra/magnum-plugins@ae69193
  • basic PLY converter -- mosra/magnum-plugins@485ca93
    • handle custom attributes
  • glTF converter, as PLY alone can't handle multiple meshes -- mosra/magnum-plugins@63f1458

Serialization

  • making the MeshData easily serializable, allowing projects to convert arbitrary mesh formats to blobs that could be loaded directly via mmaping:

    magnum-sceneconverter file.blend --mesh "chair" chair.blob
    magnum-sceneconverter scene.glb --mesh "tree" tree.blob
    ...
    
    cat chair.blob tree.blob car.blob > blobs.blob # because why not
    // Takes basically no time
    auto blob = Utility::Directory::mapRead("blobs.blob");
    
    // Does a bunch of checks and returns views onto `blob`
    Containers::Optional<Trade::MeshData> chair = Trade::MeshData::deserialize(blob);
    • same for material data
    • same for animation data
    • same for skin data (uhh, or merge them with SceneData? these classes feel extremely silly)
    • same for scene data
    • same for image data
      • needs the size checks for compressed image data first
      • probably good to have some way to represent image metadata in there first (color space, EXIF, ...?)
  • Each *Data would contain the same header with some magic, version info and total size. Magic and version info for sanity checks, total size to be able to simply cat multiple blobs together without having to worry about metadata. Similar to RIFF chunks (can't be 100% compatible because it's limited to 4 GB max RIFF64?, can look there for inspiration).

    struct DataHeader {
        // some magic and version info (ideally stolen from PNG to detect broken encodings, CRLF conversion, bad endiannes etc.)
        DataType type;
        uint64_t size;
        // ... more stuffs
    
        const DataHeader* nextData() const {
            return reinterpret_cast<const char*>(this) + size;
        }
    };
    
    struct Trade::MeshData: DataHeader { ... };
  • Chunk iteration, something like

    Containers::ArrayView<const char> data;
    for(auto i = static_cast<const DataHeader*>(data); i < data.end(); i = i->nextData()) {
        if(i->type == DataType::MeshData) {
            const auto& mesh = static_cast<const Trade::MeshData&>(*i);
            ...
        } else if(i-> type == DataType::AnimationData) {
            ...
        } else continue;
    }

    It needs some way to differentiate between end of iteration and invalid data. Returning nullptr in both cases would falsely look like there's no more data when we encounter an error, returning a non-null pointer on a failure would require the user to do an additional explicit check on chunk validity.

  • The MeshData need to support both self-contained operation (data bundled internally in an Array) and the above-shown "reinterpreted" operation. Tried this at first but it turned out to be extremely non-feasible. Every access would need to special-case this, would make releaseData() more complex, move constructors impossible and the amount of testing was just expanding beyond any reasonable bound (then imagine the corner cases serializing deserialized data). The binary representation was also containing weird "do not use" / "set to 0" fields, being much less compact than it could be. Instead, the mesh metadata are parsed from a packed representation and a MeshData instance referencing vertex/index/attribute data in the original view.

  • In order to avoid extra work, the data layout should be consistent between 32bit and 64bit systems -- otherwise it won't be possible to serialize data on a (64bit) desktop and use them on a (32bit) Emscripten It needs to be endian-specific at least, currently it's also different for 32 and 64 bits to support files over 4 GB and still have compact representation on 32b. Might reconsider and inflate the 32-bit representation to 64 bits to save some implementation work (the MeshAttributeData wouldn't suffer that much) but this might become problematic when some data type need a lot of internal offsets (animations?).

    • Write MagnumImporter and MagnumSceneConverter that provide import/conversion of different bitness / endianness
    • Recognize those in AnySceneImporter / AnySceneConverter
  • Some assets might have one huge buffer and particular meshes being just views on it. Ideally the buffer should be uploaded as-is in a single piece, with meshes then referring subranges. In this case, the serialized MeshData need to have a way to reference an external "data chunk" somehow -- some flag saying the offset not internal but to a different chunk + a data chunk "ID"? Or having that implicit -- it'll always be the first data chunk after a couple of data-less MeshData chunks?

    • While this won't cover all cases (there still can be a single buffer but incompatible vertex layouts / index type), what about providing mesh views (MeshViewObjectData) that have index / vertex offsets to a single MeshData? the SceneData (SceneData rework #525) will have a multidraw-compatible index/vertex offset & size fields
  • Hook directly into sceneconverter (detect a *.blob suffix, provide an inline AbstractImporter/AbstractSceneConverter implementation that mmaps the input/output and calls [de]serialize)

  • before pinning down the MeshData binary layout, check if it's possible to store arbitrary data in MeshAttributeData (e.g. AABBs, bounding sphere etc.) -- there's 16 bytes free, once it's 64bit-only

    • or put that into SceneData instead?
    • or in the general key/value store (below)?
  • some high-level structure describing metadata (basically everything that's now exposed only via AbstractImporter APIs)

    • string mapping for objects, scene fields, mesh attributes
    • stuff like "this should go on the GPU", "this on the CPU"? "this only grows", "this needs bucket allocator"...
    • "this is an uncompressed texture fallback for editors", "this is debug drawing", "this is a high-quality LOD 0 that you don't want to load implicitly" ?
  • ⚠️ arbitrary key/value data for image properties (colorspace, EXIF info, ...)

    • generalize MaterialAttribute data for this? or make a general copy of it without layers and other material-specific stuff
      • overcome the fixed-size field limitation (a field of the same name with an "extension" bit right after the original could have extra data?)
        • this could be useful also backported to the MaterialData, replacing the "unique name" requirement with something more useful
    • probably also useful for cameras (intrinsics / extrinsics properties), lights
    • and animations, textures, meshes, scenes?? ... all?
    • also probably for the whole importer (... so maybe replacing the importerState everywhere? sounds easiest backwards-compat-wise)
  • ⚠️ storing extra mesh/image levels right after the original but with some additional bit set to know what's the next mesh and what just next level of the same mesh when iterating chunks

  • some way to attach a name / checksum / other stuff to a chunk -- nested chunks? needs to have the semantics clearly documented -- treat nested chunks as inseparable? for example a mesh + image (or more of those) wrapped in a chunk that tells what's the image for? and when the outer chunk is torn apart, the meshes/images no longer have a relation to each other?

  • ability to reference external files somehow?

    • "this file is self-contained, but this additional blob has the uncompressed input data for editing"
    • "here are PNGs for textures" (much easier to inspect & edit)
  • make it possible to record a command-line used to produce a particular chunk? (e.g. as proposed in gcc) ... or other comments? might be useful for repeatability

    • create a set of chunk IDs starting with # (#cmd, ####, ...) for arbitrary comments / info about the creator, copyright, command-line etc.?
  • come up with something like https://github.com/ValveSoftware/Fossilize for (de)serializing whole pipelines, ability to replay them back, extract meshes out of them etc. etc.

  • ⚠️ Drop the 32/64bit differences, use a 64-bit layout on 32-bit as well and have only LE/BE variants (too annoying otherwise)

Versioning

Ensuring forward compatibility, avoiding breakages, being able to look into 3rd party data and figure out at least a part of them.

  • Pin VertexFormat enum values to ensure backwards compatibility of serialized data with new values added
    • Same for MeshPrimitive, Trade::MeshAttribute, MeshIndexType
    • And Trade::MaterialAttribute, Trade::MaterialAttributeType
    • Trade::SceneField, Trade::SceneFieldType
    • PixelFormat, CompressedPixelFormat
    • Provide a way to query if given pixel / vertex / ... format is supported (checking if the ID is smaller than the total count)
  • Consider storing some kind of schema in a well-known format so we (or 3rd party code) don't need to write tons of code for backwards compatibility but instead extract it using the schema (https://twitter.com/dotstdy/status/1319929427774623745) ... something like Khronos Data Format, but not just for pixel data? the high-level MeshData, SceneData, MaterialData structures are the schema already, all data in those is unpacked into (strided) arrays instead of having to describe layout of complex structures, yay!
  • what else ??

@mosra mosra mentioned this pull request Mar 11, 2020
70 tasks
@mosra
Copy link
Owner Author

mosra commented Mar 29, 2020

Initial bits of magnum-sceneconverter pushed to master in 1c51b98.

@mosra mosra force-pushed the meshdata-cereal-killer branch 7 times, most recently from b0b6ab1 to 7305a2c Compare April 17, 2020 11:31
@mosra mosra added this to the 2020.0a milestone Apr 17, 2020
@mosra mosra force-pushed the meshdata-cereal-killer branch 5 times, most recently from 903ca6a to a2374e4 Compare April 18, 2020 08:05
@mosra mosra mentioned this pull request Apr 19, 2020
87 tasks
@mosra mosra force-pushed the meshdata-cereal-killer branch 2 times, most recently from 2e48e66 to 0325ba7 Compare April 23, 2020 19:52
@codecov-io
Copy link

codecov-io commented Apr 23, 2020

Codecov Report

Merging #427 into master will decrease coverage by 10.55%.
The diff coverage is 97.12%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #427       +/-   ##
===========================================
- Coverage   78.27%   67.72%   -10.56%     
===========================================
  Files         389      366       -23     
  Lines       21140    17284     -3856     
===========================================
- Hits        16548    11706     -4842     
- Misses       4592     5578      +986     
Impacted Files Coverage Δ
src/Magnum/Trade/Data.h 100.00% <ø> (ø)
src/Magnum/Trade/Trade.h 100.00% <ø> (ø)
...numPlugins/AnySceneConverter/AnySceneConverter.cpp 87.50% <87.50%> (ø)
src/Magnum/Trade/AbstractSceneConverter.cpp 92.45% <92.45%> (ø)
src/Magnum/Trade/AbstractSceneConverter.h 100.00% <100.00%> (ø)
src/Magnum/Trade/Data.cpp 100.00% <100.00%> (ø)
src/Magnum/Trade/MeshData.cpp 100.00% <100.00%> (ø)
src/Magnum/Trade/MeshData.h 100.00% <100.00%> (ø)
...umPlugins/AnySceneConverter/importStaticPlugin.cpp 100.00% <100.00%> (ø)
...agnumPlugins/AnySceneImporter/AnySceneImporter.cpp 44.52% <100.00%> (-0.41%) ⬇️
... and 259 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 902e805...2916c46. Read the comment docs.

@mosra mosra force-pushed the meshdata-cereal-killer branch 6 times, most recently from 4155494 to 99f248a Compare April 30, 2020 21:43
Replaces the old & extremely useless Profiler. Doesn't have everything I
want yet (missing stddev and fancier GPU queries), that'll come later.
mosra added 14 commits April 30, 2020 23:59
I want to use these in command-line arguments.
At first I attempted to make the whole thing reinterpret_cast-able from
a blob of memory (i.e., truly zero-overhead), but while that sounded
cool and all, it moved the overhead to basically all other code -- each
function had to special-case access to attribute/vertex/index data as
the pointers were no longer pointers, the binary representation had
various weird unexplainable gaps ("here an array deleter is stored, set
that to null and don't ask"), release*() functions got more complicated
and when I got to issues with move construction/assignment I knew this
was not the right path.

Now the MeshData internals are packed to a much more compact
representation (with the first attempt it was 128 bytes, now it's just
64) and the serialization doesn't make everything else slower, more
complex or harder to test, which is a win.
Time to do the following (with a 482 MB file)

    magnum-sceneconverter lucy.blob b.blob

goes down from about 1.9 seconds to 450 ms, ~equivalent to what

    cp lucy.blob b.blob

takes (and of course this includes all range and validity checks).
Those will simply produce serialized blobs on output.

TODO: make this more generic? like, if I specify a *.ply at the end, it
uses some other converter after that?
@janbajana
Copy link

janbajana commented May 6, 2020

Testing this out.
I have difficulties to get to see any data from FrameProfiler.

I have checked out the same branch in magnum-plugins and magnum-extracts. So I guess magnum-player is using FrameProfiler? As you showed on your picture here:
https://files.gitter.im/mosra/magnum/jQ2e/image.png

For magnum-plugins do I have to build those plugins to have working FrameProfiler?

        -DWITH_MAGNUMIMPORTER=ON \
        -DWITH_MAGNUMSCENECONVERTER=ON \
        -DWITH_MESHOPTIMIZERSCENECONVERTER=ON \

I guess external dependency is zeux/meshoptimizer only?

But I still do not understand how magnum-player is using FrameProfiler in your branch to output data?

@mosra
Copy link
Owner Author

mosra commented May 6, 2020

@janbajana err, sorry for the confusion.

The FrameProfiler is in master now, no need to use this branch for it (it was briefly here, but because it got stable pretty fast, I put it in master). No need for any of those plugins, either. For the magnum-player it however wasn't anywhere yet, I pushed a WIP to magnum-extras next just now: mosra/magnum-extras@551cbc0.

@mosra
Copy link
Owner Author

mosra commented May 6, 2020

Eh, forgot to say -- you need to press the P key to toggle it.

@janbajana
Copy link

Super that worked. I have statistics now.

Opening file ../../scenes/Buggy.gltf
Loading 0 textures
Loading 148 materials
Loading 148 meshes
Loading 251 objects
Loading scene 0 
Last 50 frames:
  Frame time: 16.68 ms
  CPU duration: 16.30 ms
  GPU duration: 11.68 ms

Thanks for the help.

@mosra
Copy link
Owner Author

mosra commented May 9, 2020

Partially merged to master:

  • initial scene converter plugin interfaces -- 2dc4783
  • AnySceneConverter plugin -- 0ad5a89
  • support in magnum-sceneconverter -- 0da8f89

What's left in this PR is everything related to mesh serialization, which needs a few more iterations before it's ready.

@mosra mosra modified the milestones: 2020.0a, 2020.0b Jun 24, 2020
@mosra mosra mentioned this pull request Jul 16, 2020
58 tasks
@mosra mosra mentioned this pull request Jul 10, 2021
81 tasks
@mosra mosra changed the title Mesh data serialization, meshlet support and scene conversion plugin APIs Scene data serialization and scene conversion plugin APIs Nov 16, 2021
@mosra mosra changed the title Scene data serialization and scene conversion plugin APIs Scene data serialization Apr 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

3 participants