Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store topological relationships, cache geometrical data #78

Closed
hannobraun opened this issue Jan 24, 2022 · 3 comments
Closed

Store topological relationships, cache geometrical data #78

hannobraun opened this issue Jan 24, 2022 · 3 comments
Assignees
Labels
topic: core Issues relating to core geometry, operations, algorithms type: development Work to ease development or maintenance, without direct effect on features or bugs

Comments

@hannobraun
Copy link
Owner

I'm not completely sure I'm using these mathematical terms correctly, so let me provide some examples.

Examples of topological relationships:

  • The vertices that bound this edge
  • The edges which bound this face
  • The neighboring faces of the edge

Examples of geometrical data:

  • The point at (0, 0, 0)
  • The line segment from (1, 2, 3) to (4, 5, 6)

Right now, we're storing multiple copies of geometrical data in various locations. For example, a vertex that is shared between two edges is stored, by value, in both of these edges. More than likely, it's also been computed multiple times before it was stored. Thanks to the well-known issues around floating point accuracy, this means it might actually have been computed differently each time.

This can lead to bugs, like the one fixed in #74. This pull request demonstrates how these kinds of issues can be addressed: By being careful when comparing floating point numbers, to make sure slightly different values that are supposed to be the same, are recognized as the same.

That approach is not great. For such comparisons, a suitable epsilon value must be chosen. This is bound to lead to problems somewhere down the line, when a use case that was not foreseen produces values that the chosen epsilon is not appropriate for. That could lead to values that are supposed to be different to instead be recognized as the same, which sounds like fuel for nightmare bugs. It's also hard to find all these cases. If we were to do it like that, we could expect a long period of plugging holes as users (hopefully) report them.

I think a better approach would be to prevent this kind of thing from happening at an architectural level. Here's what I have in mind:

  • There is only one place where geometrical data is stored: A centralized cache.
  • The values in that cache are referred to using ids.
  • Those ids are what is actually stored in other data structures.
  • If multiple locations refer to the same vertex, they do so using the same id, not multiple values that might or might not be slightly different.

To give an example:

  1. A model defines a sketch as a series of points.
  2. The code that processes this sketch stores these points in the vertex cache, getting ids back.
  3. That same code, when it creates the edges that make up the sketch, gives those ids to the edges. The edges themselves never know the actual locations of those points, and don't need to.
  4. If another piece of code needs those values (like, for example, to triangulate a face), it is provided with a reference to the cache and can look those values.

For now, all of this would happen in the host application. At a later point, it makes sense to move some of this into the fj library so models can refer to geometry they previously generated. But that can happen later.

This is going to be a large-scale refactoring effort, and I'm sure I haven't completely thought through all the details. I still wanted to open this issue, to provide some context on the existing code, and the work that's about to happen.

@hannobraun hannobraun added type: development Work to ease development or maintenance, without direct effect on features or bugs topic: core Issues relating to core geometry, operations, algorithms labels Jan 24, 2022
@hannobraun hannobraun added type: feature New features and improvements to existing features type: development Work to ease development or maintenance, without direct effect on features or bugs and removed type: development Work to ease development or maintenance, without direct effect on features or bugs type: feature New features and improvements to existing features labels Jan 26, 2022
hannobraun added a commit that referenced this issue Jan 31, 2022
This reverts a decision that was made earlier, to define the line by a
point and a vector instead. This was probably a mistake; at least it
seems that way in the context of #78.

At the very least, this will make the implementation of #78 simpler, as
the two points can just be cached, and there doesn't need to be a
special mechanism to cache the result of computation instead (for now).
@hannobraun hannobraun self-assigned this Feb 2, 2022
@hannobraun
Copy link
Owner Author

I've been working on this for almost a week now. I'm still in the design phase, but I've at least made significant progress with sorting through all the thoughts that are flying around my head. I'd like to write down my latest insights, for those following along, but mostly to a) clarify my own thinking and b) for my own later reference.

The way I framed this problem initially doesn't quite hit at the heart of the issue. Before I go into why, I'd like to clarify what use case inspired that initial problem statement, and why it might not require the solution I've prescribed here.

A motivating use case for this issue, and why a geometry cache is not actually required to solve it.

Here's how lines were defined when I started to work on this issue (simplified, not the actual code):

struct Line {
    origin: Point,
    direction: Vector,
}

For triangulation, you need an approximation of the edges that bound a face. For round edges that means you generate a number of points along the edge to satisfy given accuracy requirements. A straight edge doesn't need approximation, but the triangulation code doesn't care. It just wants a list of points, so what it gets are the points that define the line (again, not the actual code):

let a = line.origin;
let b = line.origin + line.direction;

The problem is b. You have a list of edges, and if each edge gives you the points that approximate/define it, you'll get duplicate points, because sequential edges connect together. The approximation code deals with this by filtering out duplicate points. Although, if you compute b as shown above, you're not always getting duplicate points when you should. Point b of one edge might be slightly different than point a of the next edge (because floating-point accuracy).

This was an actual bug, that I worked around but didn't really fix. I figured, a proper fix would be to prevent this by never re-computing b. Instead the first edge's b and the second edge's a would both refer to the same point in the cache, so when comparing those two points, you'd never accidentally compare slightly different points that should be the same. Because you're comparing instances of the same unique handle that point into the cache.

You don't actually need a cache to fix that problem though. It can no longer happen now, because lines are now defined like this (simplification):

struct Line {
    a: Point,
    b: Point,
}

One edge's b and the next edge's a would always be the same when compared, because they're literally the same value that's not recomputed every time. You could still mess this up when constructing those Line structs. But that would be just as possible, if you had a geometry cache and were constructing lines out of handles.

Why the cache might still be beneficial.

Now, within the context of the example I've shown here, a geometry cache might still be beneficial. Because that two-point definition of Line gives the wrong impression that the line owns those two points. Some dumbass (like me) might come along and think changing those points is okay, completely forgetting that each point might be shared by any number of edges.

At least some kind of handle type would make it clear that the line doesn't own these points. Something like an Arc<RefCell<Point>> would too, but I don't think that's better than a handle to a central cache. Because that type is too long and complicated, so we'd need a nice wrapper around it. And at that point, the whole thing becomes too magical.

(Note from me, while I'm editing this long-ass comment: Now I think a wrapped Arc<RefCell<Point>> might actually be the better solution. I'm not sure enough about that to actually change the previous paragraph. Just wanted to inform you that all of this is thinking-in-progress.)

Why we might really need the cache.

Astute readers might have noticed that I've used "line" and "edge" interchangeably, even though they're really not the same thing. A line is a straight curve, infinitely long. An edge, well, a straight one, is defined by a line, but bounded by two points on that line.

The reason I could use them interchangeably while discussing this, was that those points that bound the edge on the line are defined in 1-dimensional curve coordinates and are, right now and temporarily, always set to 0 and 1. 0 being defined as the line's point a and 1 being defined as b.

So the two points that define the line are the same as the points that bound the edge. But, I already said it, that's temporary. Long-term we're going to need the capability to use different curve coordinates to define edges. Then you might have an edge stretching from -0.5 to 1.5, and how do you convert those coordinates into 2-dimensional surface coordinates (which you need for triangulation) or 3-dimensional coordinates (which you need to actually render or export the generated triangle mesh)?

You can't compute them then and there, that's for sure. Because then we're right back to the original problem. Here we really need the cache. But not a simple one that just stores 3D points. A more sophisticated one, that we can tell "give us point -0.5 on curve c, but in surface coordinates on surface s".

And after we finished the triangulation of that surface, and someone later needs the 3D coordinates for those triangles to build the full 3D triangle mesh, the cache needs to return identical 3D points for each of those 2D surface points that are shared between faces. Because otherwise the triangle mesh won't fit together along the seams.

Why can't we just do everything in 3D?

So yeah, why do we even need 2D and 1D points?

Well, we at least need 2D. In practical terms, because we have high-quality Delaunay triangulation libraries at our disposal that I'm not going to rewrite, and those work with 2D points. In more theoretical terms, because I don't think surface triangulation in 3D is the right problem to solve.

It's easy enough to imagine a flat face and think that it might be straight-forward to do, but imagine a curved face with a steep and high protrusion on it. How is the triangulation algorithm going to know that it should connect the points around the base to the ones around the top, and not instead connect the ones around the base together, then not knowing what to do with the ones around the top?

The answer is, it will know, if you generated the relevant points in 2D surface coordinates and do the triangulation in those, because then there's no reason to believe that the points along the base are somehow closer to each other than they are to the ones at the top.

We might not need 1D though.

When I originally started to think about triangulation in surface coordinates, I figured, if we have 2D coordinates for points on a surface, it makes sense to have 1D coordinates for points on a curve. Just seemed neat and clean. It means an edge is defined in a self-consistent way without redundancy.

If an edge basically points to a curve and two 1D points on that curve, there's not much anyone can mess up there (remember, there's a dumbass poking around that code). If the edge instead points to a curve and two 3D points that might or might not actually lie on the curve... yeah, lots of space for errors.

On the other hand, not having to deal with 1D points might result in a simpler geometry cache and less complexity overall.

I'm not sure. The options are basically, keep 1D points and possibly have to simplify later, or remove 1D points and possibly have to re-add them later. I'm thinking about it.


If you made it this far, congratulations, I hope it was useful. Writing it down certainly helped me wrap my head around the whole problem a bit better. If you have any questions or suggestions, please comment!

hannobraun added a commit that referenced this issue Feb 3, 2022
Having those vertices around just makes addressing #78 a bit harder.
They can be re-added once required.
@hannobraun
Copy link
Owner Author

I've made some decisions:

  • 1D points are gone. They do have advantages, as a concept, but they weren't really used for anything yet. There's no point to make things more difficult right now, just because we might want them later.
  • 3D points will stay as they are. I'm not thrilled about the implication of ownership (see my previous comment), but in the end, none of the alternatives are clearly better. And all are more complicated. We can always change this later.
  • 2D points don't need fancy caching. We can just have a struct, SurfacePoint, that contains a Point<2> for the surface coordinates, but also a Point<3>, to store the point it was converted from. Then conversion back to 3D involves no calculation.
  • Edge approximations will definitely need to be cached, or we risk creating triangle meshes that don't fit together. I don't think this is necessary for correctness at the current point, but am not sure yet.
  • I think that approximation caching can (and should?) be done in the Edge struct itself.

I believe this is the simplest approach that will solve all the problems mentioned here. No centralized cache required! It might need refinement in the future (if we decide to re-add 1D points, for example), but that can happen at a later date.

For now, my immediate action plan is the following:

  1. Clean up the approximation code a bit.
  2. Decide whether to implement caching now or later.
  3. Implement it or note the need for it down somewhere, as appropriate.
  4. Declare victory and close this issue.

hannobraun added a commit that referenced this issue Feb 4, 2022
This follows the insights gained in #78. Please check the discussion
there for details.
@hannobraun
Copy link
Owner Author

Update:

With #133, the last known accuracy problem has been addressed, and ugly workarounds could previously be removed in #129 and #132. No caching needed to be implemented, except in the simplest form through the new SurfacePoint struct.

Due to all that, I consider this issue done now. Not the resolution I expected, but I'm very pleased that we could get away with relatively simple measures, instead of a big kernel refactoring. I don't know, if this approach will last long-term, but I definitely know it is the right approach for now. (And "good enough for now" is the best kind of good.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: core Issues relating to core geometry, operations, algorithms type: development Work to ease development or maintenance, without direct effect on features or bugs
Projects
None yet
Development

No branches or pull requests

1 participant