Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use mesh autoencoder as is for chatml finetuning #2

Closed
fire opened this issue Dec 8, 2023 · 8 comments
Closed

Use mesh autoencoder as is for chatml finetuning #2

fire opened this issue Dec 8, 2023 · 8 comments

Comments

@fire
Copy link

fire commented Dec 8, 2023

Hi! I want to thank you for your project.

Since you were able to get autoencoding working, I have a question. I was trying to get this to be encoded using your code. This is a mock-up of the syntax, but it's supposed to be a 3d-simplex (tetrahedron) translated by 1 meters.

Meshgpt

If I am able to get the meshes in this form, we can take "off the shelf" training services like https://together.ai and start processing models from https://huggingface.co/datasets/allenai/objaverse-xl with translation, rotation (for like a basketball) and scale (affine transforms) to make a 3d mesh foundational model.

Can you provide some insight how to do this?

[
    {
        "role": "system",
        "content": "This assistant can understand 3D models using the meshgpt-pytorch Unicode plane 15 codebook for 16384 triangles and the .ply 3d format."
    },
    {
        "role": "user",
        "content": "Please convert this to the meshgpt-pytorch codebook. ```ply\nformat ascii 1.0\nelement vertex 4\nproperty float x\nproperty float y\nproperty float z\nelement face 4\nproperty list uchar int vertex_index\nend_header\n0 0 0\n1 0 0\n0 1 0\n0 0 1\n3 0 1 2\n3 0 2 3\n3 0 1 3\n3 1 2 3```"
    },
    {
        "role": "assistant",
        "content": "􀀀􀀁􀀂􀀃"
    }
    {
        "role": "user",
        "content": "Please translate this mesh by 1 meter"
    },
    {
        "role": "assistant",
        "content": "􀀄􀀅􀀆􀀇"
    },
    {
        "role": "user",
        "content": "Can you convert it back to .ply format?"
    },
    {
        "role": "assistant",
        "content": "Sure, here is your translated mesh in .ply format: ```ply\nformat ascii 1.0\nelement vertex 4\nproperty float x\nproperty float y\nproperty float z\nelement face 4\nproperty list uchar int vertex_index\nend_header\n1 0 0\n2 0 0\n1 1 0\n1 0 1\n3 0 1 2\n3 0 2 3\n3 0 1 3\n3 1 2 3```"
    }
]

We can also map the codebook to .ply in chatml to be more obvious to the large language model since .ply support is very common.

ply
format ascii 1.0
element vertex 4
property float x
property float y
property float z
element face 4
property list uchar int vertex_index
end_header
0 0 0
1 0 0
0 1 0
0 0 1
3 0 1 2
3 0 2 3
3 0 1 3
3 1 2 3

See chatml description in https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B and

.ply sample

@fire fire changed the title Use autoencoder as is. Use autoencoder as is for chatml Dec 8, 2023
@fire fire changed the title Use autoencoder as is for chatml Use mesh autoencoder as is for chatml Dec 8, 2023
@fire fire changed the title Use mesh autoencoder as is for chatml Use mesh autoencoder as is for chatml finetuning Dec 8, 2023
@lucidrains
Copy link
Owner

@fire foundation model sounds like a great plan, for VR, or for multimodality in general! 🚀

@lucidrains
Copy link
Owner

@fire best way to participate is to help pull request the necessary dataset classes that can produce the vertices and faces. as described in the paper, they took special care for the face ordering

i still need to wrap up the training code and add text conditioning, ETA middle of this month

@fire
Copy link
Author

fire commented Dec 8, 2023

Can you point me to where I can insert this code? It's in godot engine gdscript, but I can translate to python.

func compare_vertices(vertex_a, vertex_b):
	if vertex_a.z < vertex_b.z:
		return -1
	elif vertex_a.z > vertex_b.z:
		return 1
	elif vertex_a.y < vertex_b.y:
		return -1
	elif vertex_a.y > vertex_b.y:
		return 1
	elif vertex_a.x < vertex_b.x:
		return -1
	elif vertex_a.x > vertex_b.x:
		return 1

	return 0

func compare_faces(face_a, face_b):
	for i in range(3):
		var vertex_comparison = compare_vertices(face_a["vertices"][i], face_b["vertices"][i])
		if vertex_comparison != 0:
			return vertex_comparison

	if "id" in face_a and "id" in face_b:
		if face_a["id"] < face_b["id"]:
			return -1
		elif face_a["id"] > face_b["id"]:
			return 1

	return 0

func _init(mesh: ArrayMesh) -> void:
	var mesh_data_tool = MeshDataTool.new()
	mesh_data_tool.create_from_surface(mesh, 0)

	var triangles: Array = []

	for i in range(0, mesh_data_tool.get_vertex_count(), 3):
		var triangle = {
			"vertices": [],
			"normals": [],
			"tangents": [],
			"uvs": [],
			"face_area": 0,
			"angle": 0,
			"tangent": Vector3.ZERO,
			"id": i
		}

		for j in range(3):
			var index = i + j
			triangle["vertices"].append(mesh_data_tool.get_vertex(index))
			triangle["normals"].append(mesh_data_tool.get_vertex_normal(index))
			triangle["tangents"].append(mesh_data_tool.get_vertex_tangent(index))
			triangle["uvs"].append(mesh_data_tool.get_vertex_uv(index))

		triangle["face_area"] = calculate_face_area(triangle["vertices"])
		triangle["angle"] = calculate_angle(triangle["vertices"])
		triangle["tangent"] = calculate_tangent(triangle["vertices"])

		triangles.append(triangle)

	triangles.sort_custom(Callable(self, "compare_faces"))

I am not sure if this is the correct interpretation, they mentioned about a upwards convention and Godot Engine uses a different one than Unity Engine. The meshgpt code uses the convention from Unity Engine.

I don't know how well you know meshes, but in many game engines a scene contains meshes which contain submeshes. Each submesh can have a different material.

Edited:

TL;DR I prefer gltf conventions because it's fairly standard for axes. https://registry.khronos.org/glTF/specs/2.0/glTF-2.0.html

@lucidrains
Copy link
Owner

lucidrains commented Dec 8, 2023

@fire my expertise ends at transformers and vector quantization. that is to say, i don't know and your guess is as good as mine!

@fire
Copy link
Author

fire commented Dec 8, 2023

@lucidrains

My friend @benbot was mentioning training two objective functions for text generation and mesh reconstruction could be very difficult. Do you think it's feasible to start on an existing foundation model for text and train this triangle language?

How can we test this hypothesis?

@lucidrains
Copy link
Owner

@fire @benbot it isn't two objective functions, it is simply text conditioning, done like any other modality (images, voice, audio, video)

it will be optional

@lucidrains
Copy link
Owner

@fire you can test the hypothesis once i turn it into a hyperparameter, and you can simply train one with and without text conditioning

@fire
Copy link
Author

fire commented Dec 16, 2023

I was able to encode the PUA unicode text and a .obj file side by side in chatml. The next step is to generate enough samples to cover, but not for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants