Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.splat universal format discussion #47

Open
dylanebert opened this issue Nov 7, 2023 · 35 comments
Open

.splat universal format discussion #47

dylanebert opened this issue Nov 7, 2023 · 35 comments

Comments

@dylanebert
Copy link

dylanebert commented Nov 7, 2023

Hello! I'm the author of gsplat.js, in which I'm using the splat format as provided in antimatter15/splat

I have opened an issue on splat compression, and I think it would be great if we can have a universal representation, with a consistent header to support different compression methods.

I can replicate your compressed format, but maybe we can open a common format repo with test files, so we can stay on the same page?

What do you think?

@mkkellogg
Copy link
Owner

mkkellogg commented Nov 8, 2023

I think that is a great idea; given all the different splat viewers that have been developed, I think this was pretty much inevitable.

I think one of the first questions we need to answer is: have we identified all the potential stakeholders in such a project? Ideally the decisions we make here would produce a universal compact format that would be beneficial to anyone that has implemented a viewer (commercial, open-source, and so on). Or maybe my thinking is a little too grandiose at this point and we should just move forward and hope others will join. I'm sure you've read over Aras Pranckevičius's blog about compressing splat files; it would also probably be good to reach out to Kevin Kwok (if you haven't already).

As far as the actual approaches we use to organize the data and/or compress it, I'm totally open to suggestions. I admit my current implementation is fairly quick & dirty; I just wanted to get something in place that would cut down the size of the .splat files (and they're still pretty big).

@dylanebert
Copy link
Author

Sounds great! I think a universal compact format would be very nice, or at least a plan how the open source community can stay in sync as it keeps improving.

I have reached out to Aras Pranckevičius and Kevin Kwok and pointed them here.

@aras-p
Copy link

aras-p commented Nov 8, 2023

Wall of text! ⚠️

Unity Gaussian Splatting format

Thought process behind it is in my blog posts (one, two), but it's kinda like this:

enum VectorFormat {
    Float32 = 0, // 12 bytes: 3x float32
    Norm16 = 1,  // 6 bytes: 3x unorm16
    Norm11 = 2,  // 4 bytes: 11+10+11 bit unorm
    Norm6 = 3    // 2 bytes: 5+6+5 bit unorm
}
enum SHFormat {
    // same as above
    Float32,
    Norm16,
    Norm11,
    Norm6,
    // "palette", each SH data is an index into a separate table.
    // table itself uses Float16 ("half") data
    Cluster64k,
    Cluster32k,
    Cluster16k,
    Cluster8k,
    Cluster4k,
}
enum ColorFormat {
	Float32x4,	// 16 bytes: 4x float32
	Float16x4,	// 8 bytes: 4x float16	
	Unorm8x4,	// 4 bytes: 4x unorm8 (colors are *not* sRGB)
	BC7, 		// 1 byte: BC7 GPU format, not sRGB (PC/consoles only, not mobile!)
	// I don't have these yet, but potential candidates:
	ASTC4x4,	// 1 byte: ASTC 4x4 GPU format (mobile & Mac)
	ASTC5x5, 	// 0.64 byte: ASTC 5x5 GPU format
	ASTC6x6, 	// 0.44 byte: ASTC 6x6 GPU format
	UASTC,		// 1 byte: Basis Universal UASTC format, can be transcoded to either BC7 or ASTC4x4 at load
}
// Each variable is min,max value of that thing per chunk.
// shR/shG/shB are not used if one of Cluster SH formats is in use.
struct ChunkInfo {
	half2 colR, colG, colB, colA;
	float2 posX, posY, posZ;
	half2 scaleX, scaleY, scaleZ;
	half2 shR, shG, shB;
}

Data "header" is like:

struct Header {
	uint splatCount;
	VectorFormat posFormat;
	VectorFormat scaleFormat;
	ColorFormat colorFormat;
	SHFormat shFormat;
}

Now the data itself is separate "conceptual files" of:

  • Chunk information (size: (splatCount+255)/256), array of ChunkInfo.
  • Splat positions: array of posFormat-dependent items.
  • Splat colors: 2D array (texture) of colorFormat-dependent items. Notes on layout below.
  • Splat rotation+scale+SHindex data: array of scaleFormat+shFormat-dependent items.
  • SH data: per-splat SH entries (dependent on shFormat), or palette of N SH items for Cluster SH formats.

For color data (which is RGB color plus opacity), I store that in a 2D texture, to enable GPU compression. The texture width is always 2048 (allows for up to 32M splats total, given that max GPU texture height is 16k); height is dependent on splat count but always multiple of 16. And the order that the splat data is laid out inside the texture is not simple row-major, but rather each "chunk" (256 splats) is put into 16x16 blocks, and within a block pixels are arranged in Morton order.

uint2 DecodeMorton2D_16x16(uint t)
{
    t = (t & 0xFF) | ((t & 0xFE) << 7); // -EAFBGCHEAFBGCHD
    t &= 0x5555;                        // -E-F-G-H-A-B-C-D
    t = (t ^ (t >> 1)) & 0x3333;        // --EF--GH--AB--CD
    t = (t ^ (t >> 2)) & 0x0f0f;        // ----EFGH----ABCD
    return uint2(t & 0xF, t >> 8);  // --------EFGHABCD
}
int SplatIndexToPixelIndex(uint idx)
{
    uint2 xy = DecodeMorton2D_16x16(idx);
    uint width = kTextureWidth / 16;
    idx >>= 8;
    uint x = (idx % width) * 16 + xy.x;
    uint y = (idx / width) * 16 + xy.y;
    return (int)(y * kTextureWidth + x);
}

For rotation+scale+SH data, it's like this:

  • Rotation quaternion is always 4 bytes, stored in "smallest three" encoding, using 10+10+10+2 bits. Ten bits for each of three smallest components, and then 2 bits for the index of which component was the largest one. The smallest components can only ever be in -0.707..+0.707 range so the bits are spread over that range.
  • Scale is dependent on scaleFormat.
  • SH index is only present for Cluster SH formats, and if present, it's two bytes (16 bit index).

Additional data transformations that are done:

  • Scale is first converted into linear scaling factor (i.e. not log scale as in PLY data), and then raised to 1/8th power. The reason is that most scales I've found tend to be really on the "small side" of the per-chunk scale range, and this distributes scale values over available few bits better.
  • Color (SH 0 aka DC component) is transformed to regular color (dc0*0.2820948+0.5).
  • Opacity is first transformed into regular opacity (apply sigmoid function to PLY value), and then apply a sort of "square it, centered around 0.5" transformation. The reason is that most opacities tend to be towards "almost fully opaque" or "almost fully transparent" ends, and this again distributes values towards available bits better.
    float SquareCentered01(float x)
    {
      x -= 0.5f;
      x *= x * sign(x);
      return x * 2.0f + 0.5f;
    }
    

In my implementation the "Very Low" preset is the one that's not absolutely smallest allowed by these formats,
but the one where I deemed that "it still looks acceptable", and that is:

  • Positions: Unorm11 (4 bytes/splat)
  • Rotations: 10.10.10.2 quaternion (4 bytes/splat)
  • Scale: Unorm6 (2 bytes/splat)
  • Color+opacity: BC7 (1 bytes/splat)
  • SH: Cluster4k entries, plus 2 bytes/splat SH index.
  • Chunk data is 64 bytes/chunk, or 0.25 bytes/splat.

So overall this is like 13.25 bytes/splat, plus 360 KB of SH palette data for the whole splat cloud. For a million splats, this
would be 13.6 MB.

antimatter15/splat .splat format

From what I can tell, the format there is 32 bytes/splat (for a million splats: 32 MB):

struct Splat {
	float3 position;
	float3 scale;
	unorm8x4 color; // color + opacity
	unorm8x4 rotation; // quaternion
}

The format drops all SH data, so some of the realistm of "shininess" of surfaces is lost when looking at them from different angles.

Some "probably easy" ways of making this data smaller:

  • Store position in unorm16x3, uniformly quantized over the bounding box (store bounding box separately somewhere). Or just store as half3. 12 bytes -> 6 bytes.
  • Store scale as half3. 12 bytes -> 6 bytes.
  • Store rotation as 10.10.10.2 quaternion. Does not improve storage, but way better accuracy.

The above would get it down to 20 bytes/splat, but the format would still not have spherical harmonics.

gsplat.tech format

I don't recall that format details right now, but IIRC it was something like:

struct Splat {
	half3 position; // 6 bytes
	half6 covariance3d; // 12 bytes
	unorm8x4 color; // 4 bytes
	ushort16 shIndex; // 2 bytes
}

Difference here is that instead of storing rotation+scale, they store 6 numbers of the "3D covariance". This saves a tiny bit of calculations in the shader, but you can't easily factor this out into rotation/scale again if you want to visualize splats as something else than splats (but e.g. oriented boxes). Their genius bit is the clustered/paletted SH table idea. They also do something with opacities, as if they are not stored directly in 8 bits available, but also each of 256 opacity values indexes into a premade table. I guess this achieves similar effect as my "non-linear opacity transformation" above.

Their splat data is thus 24 bytes/splat, and IIRC they always store 64k possible SH entries, at half precision float format, so that table is always like 6 MB. For a million splats: 30 MB of data.

Wot I think a format could be

So the first question is, do you want a "simple" format like .splat or gsplat.tech, where there's no "chunks" but rather data for each splat is just stored somehow quantized acceptably. This is simple, but probably hard to get below ~20 bytes/splat.

With "chunking" like what Unity project does, it get a bit more complicated, but since each chunk stores min/max value range, it is possible to quantize the actual splat values in smaller amount of bits, while still retaining good quality. This is important for positions, scales and color data.

Another question is, do you want to have spherical harmonics data or not. It's probably out of the question that each splat would store any form of SH data per-splat, since it's way too large. Even if you cut it down massively (e.g. BC1 GPU compression like in https://aras-p.info/blog/2023/09/13/Making-Gaussian-Splats-smaller/), that is still 7.5 bytes/splat just for SH. For the web, I think the only practical choices are:

  • Either discard SH data completely (like what .splat does),
  • Or cluster all possible SH values into a "palette" (like what gsplat.tech and unity project does), and store index into palette per splat,
  • Or maybe keep only the 1st order SH data, like the first 3 RGB entries. But that's still quite large, unless you go for like BC1 GPU format.

I quite like the chunking approach TBH, and it's not terribly complexicated. Keeping WebGL2 in mind that can't read from arbitrary data buffers, there's a certain elegance of putting all the possible data formats into GPU textures, and letting the GPU hardware do all the sampling and decoding. I initially had that in the unity project, but then backed out of that partially because things like "float3" just don't exist as a GPU format (WebGL has it, but internally for GPUs that gets turned into a float4, thus wasting some VRAM). However, that is not a big deal, and specifically for the web, I doubt anyone would use float3 format options. So it might make sense to put everything textures, laid out in the same order as color data in unity project case, i.e.:

All the splats are put into "chunks" of 256 splats size. These are preferably put in some sort of "chunk is small / close in space" fashion, e.g. by rearranging splats in 3D morton order by position or some other way. Each chunk stores min/max values of: position (float3 x2), scale (half3 x2), color and opacity (half4 x2). This is 52 bytes/chunk (or 0.2 bytes/splat). For WebGL2 usage, this could be put into a R32UI texture, with rows of 13 pixels containing raw chunk data bits, and within the shader you convert from raw bits into floats and halfs (using unpackHalf2x16 etc.).

Now, you also have more textures, with per-splat data:

  • Positions (various formats, R10G10B10A2 good default),
  • Rotations (always R10G10B10A2),
  • Scales (various formats, R5G6B5 good default),
  • Color+opacity (various formats, R8G8B8A8 is easy, UASTC viable),
  • SH index (ushort16)

So defaults listed above would be 16 bytes/splat (or 13 bytes/splat when using UASTC).

And the SH palette data would be stored in half3 format, each of 15 RGB SH values arranged in 4x4 pixel block (with one pixel unused), similar to how gsplat.tech does it. For 4k SH palette that would be 386 KB.

But if really really needed, you could get crazily lower, with zero added complexity (since all the data is "just textures" and shader code does not really care how GPU decodes them): positions, rotations, scales and color all using UASTC, and drop SH index. Now it's just 4 bytes/splat; it would look a bit like https://aras-p.info/img/blog/2023/gaussian-splat/GsTruck_4VeryLow.jpg but if you really need to go super small then heck why not.

Now, another question is whether all this data should be in one file, or multiple files. I don't know much about web practices, like is it better to load one file, or multiple files in parallel? One file (like .splat or .ply) is very convenient to use. gsplat.tech IIRC loads from something like 4 files at once. I don't know which approach is better.

Conceptually, if it's one file, I'd put data in this order:

  • First header (splat count, data formats),
  • Chunk data,
  • Positions,
  • Colors,
  • Rotations,
  • Scales,
  • SH indices,
  • SH palette.

This way you can display "something" while it's loading, kinda similar to Luma's "magic reveal" but not quite:

  • Even when you only have chunks (which is very small amount of data), I think you could display "something": you know for each group of 256 splats, their center (middle of min/max) position, center color and center scale. Could display some "blob" in there.
  • Once positions start coming in, you can start displaying them as points (similar to Luma). The actual chunks could be ordered so they start out in the middle of the scene, and since that's where most of interest is in typical splat files.
  • Later on their colors start coming in, so start displaying those too.
  • And then once rotations & scales are there, do those.
  • Finally SHs once they arrive.

"Technology" needed to build all of the above (all/most of that exists in Unity project, but it's all written in C#):

  • Reorder input splats in spatially aware order (Morton), to cut into chunks.
  • Cluster SH data using k-means or similar clustering mechanism.
  • Compress data into GPU compression formats (typically you'd only do this for color/opacity, but can do on others if need to go really low). For web use case, I'd start by looking into Basis Universal UASTC format, which can work on both desktop and mobile, is 1 byte/pixel (same as BC7 or ASTC4x4), and supposedly has fast web-friendly transcoders. I have not personally used this myself though.

All of the above maybe could be done as some ad-hoc format, or maybe as some way of using glTF2. @hybridherbst might know more there.

@Snosixtyboo
Copy link

Hi,

we would be very much on board with finding a good universal representation for the ref implementation too!

Speaking only for myself, my current thought is that it makes sense to leave this mainly to the community, but there's one concrete concern: In order to preserve the usability of the representation in scientific contexts as well, it would require at least an OPTION where everything, including SHs, is stored losslessly.

We would love to be kept in the loop! If a reasonable consensus is found, we would do our best to quickly support it in the ref!

Best,
Bernhard

@aras-p
Copy link

aras-p commented Nov 8, 2023

In order to preserve the usability of the representation in scientific contexts as well, it would require at least an OPTION where everything, including SHs, is stored losslessly.

Yeah that makes a lot of sense. This is what I do in the unity project too, e.g. some of the (WIP) splat editing tools only are enabled and only work if literally everything is full floats. Then I actually don't do any "chunking" at all (since it's both cumbersome when editing splats, plus might lead to a precision loss).

@chris-aeviator
Copy link

needs:

  • dynamic fields (e.g. all 64 SH channels vs. just some)
  • to consider additional data (e.g. movement fields from 4DGS) now/ in the future
  • compression/ quantization
  • streaming!
  • global coordinate spaces (think stitching multiple splats)

@slimbuck
Copy link

slimbuck commented Nov 8, 2023

In order to preserve the usability of the representation in scientific contexts as well, it would require at least an OPTION where everything, including SHs, is stored losslessly.

Isn't this just original PLY files?

@dylanebert
Copy link
Author

Thanks @aras-p for the amazing wall of text! I'm on board with the layout you described.

My general sense is that .splat tends to leans toward compression, and .ply tends to lean toward losslessness/flexibility
(similar to .jpg vs .png). I think that we can support both and cover most needs / use-cases well. What do you think?

@aras-p
Copy link

aras-p commented Nov 8, 2023

Not sure if .png vs .jpg analogy holds up all that well. PLY is completely uncompressed, full float32 precision data (whereas .png is "compressed losslessly"). FWIW I tried doing some lossless compression of 3DGS data, but it does not compress well, mostly because things like rotations and scales are very random.

Anyway, probably first question is what's the scope that we're targeting. Everything I wrote above is more towards "this represents a single gaussian splat cloud, nothing else". @chris-aeviator comment above indicates needs for it to be more extensible and/or ability to augment it with some additional metadata.

I think (ab)using glTF2 might be a very viable way to look into that. glTF2 itself would provide ability to put "more than just the splat" into a file in case someone needs it (e.g. positions of the cameras, transform of the splat itself, etc.). A good question is though, how exactly to represent the data of the splats. If we go towards the "everything is actually put into textures" idea as in my previous comment, then maybe splat data could be put into glTF2 roughly like so:

  • The glTF2 file defines "something dummy" for a mesh (if that's needed at all), like one quad or something.
  • Then it defines / references all the data textures needed for the splats.
  • And then it defines a custom "splat material" with some "extra" properties that have nothing to do with standard PBR materials, but instead reference the needed data textures, as well as any other data as needed.

Advantage of glTF2 is that it's very much "native" for the web stack, i.e. almost any 3D engine on the web supports it, including things like UASTC texture data transcoding handling.

And it would be somewhat "extensible" for future (animated splats, etc.), because the "file format" is just glTF2.

But whether anything above makes sense at all, would have to be evaluated by someone who actually knows anything about glTF2. Maybe I'll ask some people around :)

@antimatter15
Copy link

antimatter15 commented Nov 8, 2023

I haven't fully digested all the great points above (and I don't have any experience with texture compression), so some of this might be wrong! But so far here's how I've been thinking:

I would like the format to support streaming— where something can be shown as soon as possible. And in particular, I would like to support a kind of "early termination" where on devices that are either compute and/or bandwidth constrained, some subset of the splats are loaded. i.e. I would like it if mobile devices could just fetch the first n-MB of a file and then abort the transfer and still be able to deliver an acceptable interactive user experience.

I would like the format to be deliverable as a single file, rather than a number of files or a folder.

I would like the file format to support sharing additional information— for instance the contents of cameras.json. Ideally information about the authorship and what tool generated it. Perhaps information about real-world units when available.

I think it would be nice to support palletized spherical harmonics, but loading them after all the uniformly colored splats.

Additionally, I would like the format to be fairly simple to parse and to generate. I think .ply is a fine format for representing "raw" uncompressed data, and I think that having a single format for both compressed and uncompressed information might lead to user confusion.

Another thing I have played around with a little bit is to take the "far away" splats and condense them into a panoramic skybox. At the very least I think that the format should be able to represent whether the background is assumed to be black, white or transparent. But having an arbitrary skybox cubemap texture might also be useful for compression.

I think that the space is probably evolving too fast for this format to be the "last word" on splat shipping. I haven't really thought of what the right format would be for dynamic/animated splats, for scenes consisting of multiple splats (either somewhat naively composed, or arranged into some regular grid). Perhaps in the future there might be a way to do coarse-to-fine/LOD splats.

My thinking was a new .splat file (i like the filename) which is actually a .tar file containing a number of different internal files.

test.splat is a tar file containing (in order):

- metadata.json - authorship, generator tool, etc.
- cameras.json - camera views
- main.splatdata: Splat[]
- main.shdata: SHPaletteEntry[]

struct Splat {
	half3 position;
	unorm8x4 color; // color + opacity
	half6 cov3d;  // potentially split this out into separate scale/rotation
}

struct SHPaletteEntry {
	int splat_count;
        int[] splat_indices;
        half48 sh_coeffs;
}

@chris-aeviator
Copy link

chris-aeviator commented Nov 8, 2023 via email

@zalo
Copy link

zalo commented Nov 9, 2023

I'd be thrilled if GLTF2 could be made to store splats; on the surface, it seems to support both the streaming and extensibility features one would want for an evolving, bleeding-edge rendering primitive.

This article provides context for how BabylonJS handles the API for streaming GLTF2 loading via Microsoft's LOD Extension: https://doc.babylonjs.com/features/featuresDeepDive/importers/glTF/progressiveglTFLoad

Gaussian Splats have the added benefit that LODs are additive; presumably the low LODs will consist of the largest % of splats, with smaller, more transparent splats loading as part of the high LODs.

@mmalex
Copy link

mmalex commented Nov 9, 2023

I love aras's chunked approach, yes yes and more yes. for LOD, I wonder if you could do something within the 256 splat chunks that is like, a mini treelet. so you have a single mega splat representing the whole chunk, then do a radix-4 or radix-16 tree from there, storing deltas to your parent params? I guess you'd have to see if it actually helped but im kinda thinking along the lines of - more likely to make the splat values close to 0, so that a byte oriented compression of the output (brotli or lz4 or something) would get to squish it harder, without overly complicating the format. you could even 'delta' and 'undelta' at serialisation time, in-place, so that the in-memory / in-texture format is exactly as aras describes (plus the tree structure for LOD, I guess) but the on disk version has had the parent values subtracted out to make things more compressible.
ANYWAY. thinking out loud. I do love the chunked style.

@aras-p
Copy link

aras-p commented Nov 9, 2023

I do love the chunked style

@mmalex you know where I got the idea for the chunked style, right? From your own Dreams presentation :)

@mkkellogg
Copy link
Owner

I second what several others have already mentioned about supporting "future" data. This space is indeed evolving fast and I think whatever strategy we ultimately land on should to be adaptable/flexible enough to evolve with it (I know that's easy to say and possibly not so easy to do :) ).

I also believe supporting some sort of LOD mechanism is very important and will ultimately be required if we want these viewers to be capable of rendering large scenes. Gaussian splat LODs are out of my wheelhouse so I am unsure whether or not the lower-fidelity data should be produced at the same time as the original .ply and ultimately included in the .splat file, if it is something we can generate during the .ply to .splat conversion, or if it is something that could be generated on-the-fly when a scene is loaded, thereby eliminating the need for the .splat file to contain any LOD data.

@zalo and @mmalex -- I would definitely like to learn more about the LOD strategies you are proposing. @zalo -- If Gaussian splat LODs are additive, would supporting LODs simply be a matter of properly ordering the base data (highest fidelity) within the .splat file? It seems like we'd just get LODs for free, and it would be great for streaming/progressive loading. @mmalex -- For the approach you described, it sounds like we'd also only need to store the highest precision data in the .splat file, and LODs would be computed at runtime. However it seems like a lower-LOD-first streaming strategy wouldn't be feasible, since lower-LOD splats would need to be constructed from higher-LOD splats. Sorry if my questions are confusing and please correct me if I'm wrong in any of my thinking here :)

As far as compression goes, I'm a big fan of the chunked approach as well -- thank you @aras-p for sharing your very detailed and insightful thoughts on this matter.

@donmccurdy
Copy link

donmccurdy commented Nov 12, 2023

glTF 2.0 has a concept of "extensions", and that's usually the path by which new features are added and adopted. Here, I'd imagine defining EXT_splat, extending a glTF node:

"scenes": [
  "children": [ 0 ],
],
"nodes": [
  {
    "name": "MySplat",
    "extensions": {
      "EXT_splat": {
        "count": 1024,
        "chunkData": 25, // accessor index to f32[] data?
        "positionTexture": 0, // texture index
        "rotationTexture": 1,
        "scaleTexture": 2,
        "colorTexture": 3,
        "shPaletteTexture": 4,
      }
    }
  }
],

The texture indices resolve to a texture associated with the file, which could be PNG or UASTC or something else. Future extensions could add new texture formats to glTF 2.0, and that wouldn't affect the EXT_splat definition above.

The chunkData index points to an accessor (N scalar or vector elements stored in a binary buffer), which glTF clients will know how to resolve.

If anyone would like to help with creating input data (.png or .exr uncompressed textures?) and defining the metadata, I'd be happy to help with converting textures to KTX2/UASTC and constructing glTF files using the hypothetical extension above. Also see https://github.com/donmccurdy/KTX2-Samples/blob/main/encode.sh for examples of KTX2 encoding steps (requires the latest KTX Software CLI alpha release).

Now, another question is whether all this data should be in one file, or multiple files. I don't know much about web practices, like is it better to load one file, or multiple files in parallel?

Neither is strictly better. Web clients can use range requests to grab chunks of a file as if they were multiple files. But not all applications or servers implement range requests, and there is a bit of overhead on each request, so choices vary.

glTF has some flexibility here — .glb uses embedded resources, .gltf uses external resources, and conversion between the two (including any glTF extensions) is trivial.

@fasteinke
Copy link

We're obviously heading at breakneck speed towards the Splataverse ... someone had better reserve the domain names, etc, for this creature ...

@oreasono
Copy link

We're obviously heading at breakneck speed towards the Splataverse ... someone had better reserve the domain names, etc, for this creature ...

I took the domain splats.ai last month LOL.

glTF 2.0 has a concept of "extensions", and that's usually the path by which new features are added and adopted. Here, I'd imagine defining EXT_splat, extending a glTF node:

"scenes": [
  "children": [ 0 ],
],
"nodes": [
  {
    "name": "MySplat",
    "extensions": {
      "EXT_splat": {
        "count": 1024,
        "chunkData": 25, // accessor index to f32[] data?
        "positionTexture": 0, // texture index
        "rotationTexture": 1,
        "scaleTexture": 2,
        "colorTexture": 3,
        "shPaletteTexture": 4,
      }
    }
  }
],

The texture indices resolve to a texture associated with the file, which could be PNG or UASTC or something else. Future extensions could add new texture formats to glTF 2.0, and that wouldn't affect the EXT_splat definition above.

The chunkData index points to an accessor (N scalar or vector elements stored in a binary buffer), which glTF clients will know how to resolve.

If anyone would like to help with creating input data (.png or .exr uncompressed textures?) and defining the metadata, I'd be happy to help with converting textures to KTX2/UASTC and constructing glTF files using the hypothetical extension above. Also see https://github.com/donmccurdy/KTX2-Samples/blob/main/encode.sh for examples of KTX2 encoding steps (requires the latest KTX Software CLI alpha release).

Now, another question is whether all this data should be in one file, or multiple files. I don't know much about web practices, like is it better to load one file, or multiple files in parallel?

Neither is strictly better. Web clients can use range requests to grab chunks of a file as if they were multiple files. But not all applications or servers implement range requests, and there is a bit of overhead on each request, so choices vary.

glTF has some flexibility here — .glb uses embedded resources, .gltf uses external resources, and conversion between the two (including any glTF extensions) is trivial.

I came from 3D graphics background and played with glTF a lot, I will see what I can do to help 'glTF2 extension that loads splat'. Even if the community agreed on creating a new file format, the glTF extension approach would still be valueable because glTF easily reaches the broader 3D audience. Do you think this glTF extension idea should be discussed seperately?

@slimbuck
Copy link

We're busy implementing a slightly compressed GS PLY format as interim solution while an all bells and whistles format is being thrashed out.

It takes ideas directly from dreams(@mmalex) and @aras-p and packages the data into a standard PLY file (though using non-standard PLY properties).

The PLY file contains two elements:

  • chunks storing float32 position and scale bounding box of 256 splats
  • vertices storing 4 uint32s per splat for position, rotation, scale and color

This gives roughly 4x saving over uncompressed PLY data without much quality degradation (visual tests still to be done).

We have a PR implementing decompression here.

It's a simple format with narrow scope, but please do let us know if we've missed anything obvious!

@aras-p
Copy link

aras-p commented Nov 18, 2023

It's a simple format with narrow scope, but please do let us know if we've missed anything obvious!

@slimbuck nice! I very much like the simplicity. I would think that within the same size, you could improve quality slightly (just a guess, I haven't actually tested it), at expense of a small amount of complexity:

  • Store quaternion in "smallest 3" format - 10 bits for three smallest components (each within -0.7..+0.7 range), two bits for which component index was the largest. Still 4 bytes, but quite a bit more accurate than just storing xyz in -1..+1 range.
  • For opacity, do transform similar to what I mentioned above, since most opacities are either almost fully opaque, or almost fully transparent.
  • For scale, raise it to power since most scales are towards the small end.
  • Super minor thing, but I would think that storing "min, max" for chunk bounds might be better than storing "min, range". With min/max and essentially a "lerp" to reconstruct the value you can be guaranteed that both min and max can be exactly represented under floating point, whereas with "min, range" the max value might not be. Probably does not matter all that much overall though.

@slimbuck
Copy link

Thanks @aras-p! I wasn't sure the extra complexity was worth it in such a simple format, but perhaps it is. I'll make these changes and compare the results to see if they make any difference. Thanks again!

@slimbuck
Copy link

I did some testing today with the garden scene, train scene and guitar scene and found the 2/10/10/10 quaternion format lowered reconstruction error deviation by a good 12-33%. I've updated the format to adopt this. I also changed chunks to store min/max instead of min/size, which just makes a lot of sense.

I tried the squared opacity mapping change too, but actually found that it resulted in worse error variance with my test scenes. Not too sure why this might be. So we're just storing plain old opacity for now.

We added this format to our editor tool's import & export if anyone is interested to give it a try.

Next up we will likely investigate some sort of splat LOD and support for skinning. It would be great to hear if anyone has started investigating either of these!

@drcmda
Copy link

drcmda commented Nov 30, 2023

if the splat format is being renewed, would it be possible to include bounds in the header somewhere? especially for streaming it would be very useful to be able to dynamically center/position a splat before it has fully loaded. i am using the antimatter format currently with streaming, so the model appears immediately, but it has no interop with center and fitting components.

@koktavy
Copy link

koktavy commented Jan 19, 2024

Would the LightGaussian paper be of help here?
https://github.com/VITA-Group/LightGaussian

@nepluno
Copy link

nepluno commented Feb 8, 2024

Do you think this glTF extension idea should be discussed seperately?

Using extensions is great. It is one of the gems of the GLTF format.

For the GLTF Gsplat format, here's our (Adobe's preliminary) practice:
Example for a GLTF Gsplat node (actual data omitted for brevity):

{
  "nodes": [
    {
      "extensions": {
        "ADOBE_gsplat_asset": {
          "splat_count": 154945,
          "x": "data:application/octet-stream;base64,...",
          "x_type": "float32",
          "y": "data:application/octet-stream;base64,...",
          "y_type": "float32",
          "z": "data:application/octet-stream;base64,...",
          "z_type": "float32",
          "nx": "data:application/octet-stream;base64,...",
          "nx_type": "float32",
          "ny": "data:application/octet-stream;base64,...",
          "ny_type": "float32",
          "nz": "data:application/octet-stream;base64,...",
          "nz_type": "float32",
          "f_dc_0": "data:application/octet-stream;base64,...",
          "f_dc_0_type": "float32",
          "f_dc_1": "data:application/octet-stream;base64,...",
          "f_dc_1_type": "float32",
          "f_dc_2": "data:application/octet-stream;base64,...",
          "f_dc_2_type": "float32",
          "f_rest_0": "data:application/octet-stream;base64,...",
          "f_rest_0_type": "float32",
          "f_rest_1": "data:application/octet-stream;base64,...",
          "f_rest_1_type": "float32",
          "f_rest_2": "data:application/octet-stream;base64,...",
          "f_rest_2_type": "float32",
          "f_rest_3": "data:application/octet-stream;base64,...",
          "f_rest_3_type": "float32",
          "f_rest_4": "data:application/octet-stream;base64,...",
          "f_rest_4_type": "float32",
          "f_rest_5": "data:application/octet-stream;base64,...",
          "f_rest_5_type": "float32",
          "f_rest_6": "data:application/octet-stream;base64,...",
          "f_rest_6_type": "float32",
          "f_rest_7": "data:application/octet-stream;base64,...",
          "f_rest_7_type": "float32",
          "f_rest_8": "data:application/octet-stream;base64,...",
          "f_rest_8_type": "float32",
          "f_rest_9": "data:application/octet-stream;base64,...",
          "f_rest_9_type": "float32",
          "f_rest_10": "data:application/octet-stream;base64,...",
          "f_rest_10_type": "float32",
          "f_rest_11": "data:application/octet-stream;base64,...",
          "f_rest_11_type": "float32",
          "f_rest_12": "data:application/octet-stream;base64,...",
          "f_rest_12_type": "float32",
          "f_rest_13": "data:application/octet-stream;base64,...",
          "f_rest_13_type": "float32",
          "f_rest_14": "data:application/octet-stream;base64,...",
          "f_rest_14_type": "float32",
          "f_rest_15": "data:application/octet-stream;base64,...",
          "f_rest_15_type": "float32",
          "f_rest_16": "data:application/octet-stream;base64,...",
          "f_rest_16_type": "float32",
          "f_rest_17": "data:application/octet-stream;base64,...",
          "f_rest_17_type": "float32",
          "f_rest_18": "data:application/octet-stream;base64,...",
          "f_rest_18_type": "float32",
          "f_rest_19": "data:application/octet-stream;base64,...",
          "f_rest_19_type": "float32",
          "f_rest_20": "data:application/octet-stream;base64,...",
          "f_rest_20_type": "float32",
          "f_rest_21": "data:application/octet-stream;base64,...",
          "f_rest_21_type": "float32",
          "f_rest_22": "data:application/octet-stream;base64,...",
          "f_rest_22_type": "float32",
          "f_rest_23": "data:application/octet-stream;base64,...",
          "f_rest_23_type": "float32",
          "f_rest_24": "data:application/octet-stream;base64,...",
          "f_rest_24_type": "float32",
          "f_rest_25": "data:application/octet-stream;base64,...",
          "f_rest_25_type": "float32",
          "f_rest_26": "data:application/octet-stream;base64,...",
          "f_rest_26_type": "float32",
          "f_rest_27": "data:application/octet-stream;base64,...",
          "f_rest_27_type": "float32",
          "f_rest_28": "data:application/octet-stream;base64,...",
          "f_rest_28_type": "float32",
          "f_rest_29": "data:application/octet-stream;base64,...",
          "f_rest_29_type": "float32",
          "f_rest_30": "data:application/octet-stream;base64,...",
          "f_rest_30_type": "float32",
          "f_rest_31": "data:application/octet-stream;base64,...",
          "f_rest_31_type": "float32",
          "f_rest_32": "data:application/octet-stream;base64,...",
          "f_rest_32_type": "float32",
          "f_rest_33": "data:application/octet-stream;base64,...",
          "f_rest_33_type": "float32",
          "f_rest_34": "data:application/octet-stream;base64,...",
          "f_rest_34_type": "float32",
          "f_rest_35": "data:application/octet-stream;base64,...",
          "f_rest_35_type": "float32",
          "f_rest_36": "data:application/octet-stream;base64,...",
          "f_rest_36_type": "float32",
          "f_rest_37": "data:application/octet-stream;base64,...",
          "f_rest_37_type": "float32",
          "f_rest_38": "data:application/octet-stream;base64,...",
          "f_rest_38_type": "float32",
          "f_rest_39": "data:application/octet-stream;base64,...",
          "f_rest_39_type": "float32",
          "f_rest_40": "data:application/octet-stream;base64,...",
          "f_rest_40_type": "float32",
          "f_rest_41": "data:application/octet-stream;base64,...",
          "f_rest_41_type": "float32",
          "f_rest_42": "data:application/octet-stream;base64,...",
          "f_rest_42_type": "float32",
          "f_rest_43": "data:application/octet-stream;base64,...",
          "f_rest_43_type": "float32",
          "f_rest_44": "data:application/octet-stream;base64,...",
          "f_rest_44_type": "float32",
          "opacity": "data:application/octet-stream;base64,...",
          "opacity_type": "float32",
          "scale_0": "data:application/octet-stream;base64,...",
          "scale_0_type": "float32",
          "scale_1": "data:application/octet-stream;base64,...",
          "scale_1_type": "float32",
          "scale_2": "data:application/octet-stream;base64,...",
          "scale_2_type": "float32",
          "rot_0": "data:application/octet-stream;base64,...",
          "rot_0_type": "float32",
          "rot_1": "data:application/octet-stream;base64,...",
          "rot_1_type": "float32",
          "rot_2": "data:application/octet-stream;base64,...",
          "rot_2_type": "float32",
          "rot_3": "data:application/octet-stream;base64,...",
          "rot_3_type": "float32"
        }
      },
      "name": "Sneaker"
    }
  ],
  "scenes": [
    {
      "nodes": [
        0
      ]
    }
  ]
}

Our GLTF format is a direct translation of the *.PLY format. A GLTF Gaussian splats file contains an extension named ADOBE_gsplat_asset, which contains all the attributes in the PLY formats. Each of the attributes is stored in a GZIP-compressed binary array encoded in Base64.

  • Why encode in Base64?
    • It's web-friendly, and it's a balance between editing simplicity and file size. We want to have an ASCII representation for the general structure so that we can change the hierarchy, and transformations, or compose different assets together. But we also want it to be sufficiently small so that people can share it online. For the weights, most of them are SH coefficients, which are difficult to manipulate by hand anyway.

In addition, we also have a *_type key (e.g., x_type and f_dc_0_type, etc.) to record the data type of the corresponding attribute after decompression. Currently, the data type can be float32 or float16.

Finally, we have a splat_count field to record the number of splats in the current asset.

@arpu
Copy link

arpu commented Feb 8, 2024

maybe @zeux has some Ideas from meshopt compression?

@aras-p
Copy link

aras-p commented Feb 8, 2024

@arpu I have tried using meshopt on gaussian splat data, and the results are, well, "not stellar". Things like rotations and scales are very random, and don't losslessly compress well.

@zeux
Copy link

zeux commented Feb 8, 2024

@aras-p Unsure if you tried filters (eg rotations would need to use the quaternion filter, scales a shared exponent filter), but they would be a requirement for sure. Realistically probably would also need custom filters for some data types for best performance. Lossless is definitely a no-go in any event. Finally, for geometry data meshopt codecs rely on the implicit spatial coherence; GS data would need to be clustered in some way for codecs to work. (one potential issue is that different components of GS might not be well corellated unlike vertex data, so this may not work as well)

@pwais
Copy link

pwais commented Feb 10, 2024

Each of the attributes is stored in a GZIP-compressed binary array encoded in Base64

@nepluno It would be cool if there were a way to compose with draco compression as is already implemented and sort-of-standardized for point cloud data (it exists tho not sure how often used https://github.com/google/draco/blob/7d58126d076bc3f5f9d8c114d1700b7311faecfe/src/draco/io/point_cloud_io.h#L56 )

In theory, couldn't gsplat GLTF use no extensions, just use a bunch of standard buffers, then leave it up to the viewer gl code to interpret the data? That provides some fallback (e.g. store points as a standard point cloud) and gives some wiggle room for viewer improvements that are likely in the coming 6-24 months. That said, I know a main point of this discussion is to attempt to get some consensus versus hacky mitigations ...

@nepluno
Copy link

nepluno commented Feb 12, 2024

It would be cool if there were a way to compose with draco compression

@pwais Yes I fully agree on this! There's already some discussion on supporting Draco for point cloud in GLTF. KhronosGroup/glTF#1809
I think Gsplat is a good reason to gather more consensus there and to push this initiative further.

In theory, couldn't gsplat GLTF use no extensions, just use a bunch of standard buffers, then leave it up to the viewer gl code to interpret the data?

That could also be doable...but the main goal of using GLTF is that we need some hierarchy to do some sort of composition (otherwise using just PLY is fine), and hence we need some tag to identify that a buffer is related to a Gsplat and another buffer is not.

@donmccurdy
Copy link

donmccurdy commented Feb 12, 2024

If attributes were stored in accessors rather than buffers (see #47 (comment)), we would:

  1. avoid +33% filesize overhead of base64 and extra parsing cost
  2. enable Meshopt compression out of the box. no extension needed, except perhaps to add custom filters

By using the existing glTF accessor/bufferview/buffer constructs, you have the choice of storing data as base64 data URIs, or as binary. The three.js community uses glTF extensively, and I always recommend avoiding these data URIs because the loading cost is significant, unless the data involved is trivially small.

@pwais
Copy link

pwais commented Feb 13, 2024

That could also be doable...but the main goal of using GLTF is that we need some hierarchy to do some sort of composition (otherwise using just PLY is fine)

The hierarchy and organization is def helpful, but pretty sure that the performance benefit of GLTF is that the buffers largely skip any JS intervention / SERDES, unlike PLY (at least canonical THREE PLY loader appears to be non-native code). That might make a marginally small benefit if you just load one single object / scene to viz, even if it's a few megabytes...

But if you want to viz a stream of splats / point cloud data, then GLTF beats the pants off anything that requires javascript SERDES. E.g. try loading 1,000 frames of ~1MB PLYs (or protobufs or your fav buffer format) versus 1,000 frames of GLTFs. And scrub over those frames. Browsers / phones from even 2-4 years ago, GLTFs are night-and-day better, especially above 1M points. (Like imagine if you're working in robotics, and you want to introduce a replacement for rosbag... you surely would try out GLTF wouldn't you? no?!)

By using the existing glTF accessor/bufferview/buffer constructs, you have the choice of storing data as base64 data URIs

@donmccurdy 💯 💯 base64 data uris are extremely handy hacks but +1 to optional but not required

@nepluno
Copy link

nepluno commented Feb 20, 2024

If attributes were stored in accessors rather than buffers

Agreed. Storing the attributes into individual accessors would definitely be better, just as how mesh vertices are stored.

@nepluno
Copy link

nepluno commented Feb 20, 2024

But if you want to viz a stream of splats / point cloud data, then GLTF beats the pants off anything that requires javascript SERDES.

Yes! That's another reason we'd make a schema to store Gsplat in GLTFs.

@arcman7
Copy link

arcman7 commented Mar 5, 2024

A bit late the party here, but is there magic number or specific file header that we all agree on that I can reliably use to detect whether or not a given file is a .splat file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests