Use mesh autoencoder as is for chatml finetuning #2

fire · 2023-12-08T11:56:48Z

Hi! I want to thank you for your project.

Since you were able to get autoencoding working, I have a question. I was trying to get this to be encoded using your code. This is a mock-up of the syntax, but it's supposed to be a 3d-simplex (tetrahedron) translated by 1 meters.

Meshgpt

If I am able to get the meshes in this form, we can take "off the shelf" training services like https://together.ai and start processing models from https://huggingface.co/datasets/allenai/objaverse-xl with translation, rotation (for like a basketball) and scale (affine transforms) to make a 3d mesh foundational model.

Can you provide some insight how to do this?

[
    {
        "role": "system",
        "content": "This assistant can understand 3D models using the meshgpt-pytorch Unicode plane 15 codebook for 16384 triangles and the .ply 3d format."
    },
    {
        "role": "user",
        "content": "Please convert this to the meshgpt-pytorch codebook. ```ply\nformat ascii 1.0\nelement vertex 4\nproperty float x\nproperty float y\nproperty float z\nelement face 4\nproperty list uchar int vertex_index\nend_header\n0 0 0\n1 0 0\n0 1 0\n0 0 1\n3 0 1 2\n3 0 2 3\n3 0 1 3\n3 1 2 3```"
    },
    {
        "role": "assistant",
        "content": "􀀀􀀁􀀂􀀃"
    }
    {
        "role": "user",
        "content": "Please translate this mesh by 1 meter"
    },
    {
        "role": "assistant",
        "content": "􀀄􀀅􀀆􀀇"
    },
    {
        "role": "user",
        "content": "Can you convert it back to .ply format?"
    },
    {
        "role": "assistant",
        "content": "Sure, here is your translated mesh in .ply format: ```ply\nformat ascii 1.0\nelement vertex 4\nproperty float x\nproperty float y\nproperty float z\nelement face 4\nproperty list uchar int vertex_index\nend_header\n1 0 0\n2 0 0\n1 1 0\n1 0 1\n3 0 1 2\n3 0 2 3\n3 0 1 3\n3 1 2 3```"
    }
]

We can also map the codebook to .ply in chatml to be more obvious to the large language model since .ply support is very common.

ply
format ascii 1.0
element vertex 4
property float x
property float y
property float z
element face 4
property list uchar int vertex_index
end_header
0 0 0
1 0 0
0 1 0
0 0 1
3 0 1 2
3 0 2 3
3 0 1 3
3 1 2 3

See chatml description in https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B and

The text was updated successfully, but these errors were encountered:

lucidrains · 2023-12-08T14:49:27Z

@fire foundation model sounds like a great plan, for VR, or for multimodality in general! 🚀

lucidrains · 2023-12-08T14:51:31Z

@fire best way to participate is to help pull request the necessary dataset classes that can produce the vertices and faces. as described in the paper, they took special care for the face ordering

i still need to wrap up the training code and add text conditioning, ETA middle of this month

fire · 2023-12-08T15:55:02Z

Can you point me to where I can insert this code? It's in godot engine gdscript, but I can translate to python.

func compare_vertices(vertex_a, vertex_b):
	if vertex_a.z < vertex_b.z:
		return -1
	elif vertex_a.z > vertex_b.z:
		return 1
	elif vertex_a.y < vertex_b.y:
		return -1
	elif vertex_a.y > vertex_b.y:
		return 1
	elif vertex_a.x < vertex_b.x:
		return -1
	elif vertex_a.x > vertex_b.x:
		return 1

	return 0

func compare_faces(face_a, face_b):
	for i in range(3):
		var vertex_comparison = compare_vertices(face_a["vertices"][i], face_b["vertices"][i])
		if vertex_comparison != 0:
			return vertex_comparison

	if "id" in face_a and "id" in face_b:
		if face_a["id"] < face_b["id"]:
			return -1
		elif face_a["id"] > face_b["id"]:
			return 1

	return 0

func _init(mesh: ArrayMesh) -> void:
	var mesh_data_tool = MeshDataTool.new()
	mesh_data_tool.create_from_surface(mesh, 0)

	var triangles: Array = []

	for i in range(0, mesh_data_tool.get_vertex_count(), 3):
		var triangle = {
			"vertices": [],
			"normals": [],
			"tangents": [],
			"uvs": [],
			"face_area": 0,
			"angle": 0,
			"tangent": Vector3.ZERO,
			"id": i
		}

		for j in range(3):
			var index = i + j
			triangle["vertices"].append(mesh_data_tool.get_vertex(index))
			triangle["normals"].append(mesh_data_tool.get_vertex_normal(index))
			triangle["tangents"].append(mesh_data_tool.get_vertex_tangent(index))
			triangle["uvs"].append(mesh_data_tool.get_vertex_uv(index))

		triangle["face_area"] = calculate_face_area(triangle["vertices"])
		triangle["angle"] = calculate_angle(triangle["vertices"])
		triangle["tangent"] = calculate_tangent(triangle["vertices"])

		triangles.append(triangle)

	triangles.sort_custom(Callable(self, "compare_faces"))

I am not sure if this is the correct interpretation, they mentioned about a upwards convention and Godot Engine uses a different one than Unity Engine. The meshgpt code uses the convention from Unity Engine.

I don't know how well you know meshes, but in many game engines a scene contains meshes which contain submeshes. Each submesh can have a different material.

Edited:

TL;DR I prefer gltf conventions because it's fairly standard for axes. https://registry.khronos.org/glTF/specs/2.0/glTF-2.0.html

lucidrains · 2023-12-08T16:14:19Z

@fire my expertise ends at transformers and vector quantization. that is to say, i don't know and your guess is as good as mine!

fire · 2023-12-08T16:40:37Z

@lucidrains

My friend @benbot was mentioning training two objective functions for text generation and mesh reconstruction could be very difficult. Do you think it's feasible to start on an existing foundation model for text and train this triangle language?

How can we test this hypothesis?

lucidrains · 2023-12-08T16:49:14Z

@fire @benbot it isn't two objective functions, it is simply text conditioning, done like any other modality (images, voice, audio, video)

it will be optional

lucidrains · 2023-12-08T16:50:26Z

@fire you can test the hypothesis once i turn it into a hyperparameter, and you can simply train one with and without text conditioning

fire · 2023-12-16T07:12:03Z

I was able to encode the PUA unicode text and a .obj file side by side in chatml. The next step is to generate enough samples to cover, but not for now.

fire changed the title ~~Use autoencoder as is.~~ Use autoencoder as is for chatml Dec 8, 2023

fire changed the title ~~Use autoencoder as is for chatml~~ Use mesh autoencoder as is for chatml Dec 8, 2023

fire changed the title ~~Use mesh autoencoder as is for chatml~~ Use mesh autoencoder as is for chatml finetuning Dec 8, 2023

fire closed this as completed Dec 16, 2023

MarcusLoppe mentioned this issue Dec 26, 2023

Residual quantization - VRAM bottleneck #36

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use mesh autoencoder as is for chatml finetuning #2

Use mesh autoencoder as is for chatml finetuning #2

fire commented Dec 8, 2023 •

edited

lucidrains commented Dec 8, 2023

lucidrains commented Dec 8, 2023

fire commented Dec 8, 2023 •

edited

lucidrains commented Dec 8, 2023 •

edited

fire commented Dec 8, 2023 •

edited

lucidrains commented Dec 8, 2023

lucidrains commented Dec 8, 2023

fire commented Dec 16, 2023

Use mesh autoencoder as is for chatml finetuning #2

Use mesh autoencoder as is for chatml finetuning #2

Comments

fire commented Dec 8, 2023 • edited

lucidrains commented Dec 8, 2023

lucidrains commented Dec 8, 2023

fire commented Dec 8, 2023 • edited

lucidrains commented Dec 8, 2023 • edited

fire commented Dec 8, 2023 • edited

lucidrains commented Dec 8, 2023

lucidrains commented Dec 8, 2023

fire commented Dec 16, 2023

fire commented Dec 8, 2023 •

edited

fire commented Dec 8, 2023 •

edited

lucidrains commented Dec 8, 2023 •

edited

fire commented Dec 8, 2023 •

edited