Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Homologation of Pose and Transform composed types #152

Closed
SteveMacenski opened this issue May 18, 2021 · 12 comments
Closed

Homologation of Pose and Transform composed types #152

SteveMacenski opened this issue May 18, 2021 · 12 comments
Assignees

Comments

@SteveMacenski
Copy link
Contributor

Feature request

Feature description

  • homologate the Transform and Pose message composed types so its easier to convert between them (which is very common within Nav2 / mobile robotics users).

E.g.

# Pose
        Point position
        Quaternion orientation
# Transform
        Vector3 translation
        Quaternion rotation

Change both to use either Point or Vector3 such that:

pose.position = transform.translation

is possible without having to specify each field:

pose.position.x = transform.translation.x
pose.position.y = transform.translation.y
pose.position.z = transform.translation.z

Its a little strange we have both Vector3 and Point containing the exact same fields (3 float64 x,y,z).

Implementation considerations

  • Breaks ABI for the message, but shouldn't involve many, if any, code changes since the field are are exactly the same
@tfoote
Copy link
Contributor

tfoote commented Jun 5, 2021

A transform and a pose are semantically different and this is why they have different datatypes. In the same way that Point and Vector have different semantics, but the same data structure.

Vectors and points are really easy to demonstrate the difference in semantic meaning. A vector subjected to a purely translational transformation not change. However a point subject to a translational transformation will change. Just because they have the same three data fields, (x,y,z) does not mean that they will behave the same way.

To go to a simpler example. The Illuminance, FluidPressure, and Temperature, and Relative humidity all have the exact same data fields and thus could use the same message. But since they're semantically different data we give them semantically different messages so that you don't try to interpret how bright it is by reading the wrong topic from the pressure tank.

The difference between a vector and a point is also the core of what makes a pose and a transform different and why they're not interchangeable. Transforms can operate on poses but result in a pose. A pose cannot operate on a pose or a transform. They're not symmetric and as such they have different datatypes to allow us to enforce these requirements.

If you want to "convert" a transform into a pose you can have the transform operate on an identity pose in one line.

If you have a pose and want to create a transform at the position of the pose you can create a one line function that takes the pose and the target frame id to generate the appropriate transform. along the lines of TransformStamped createTransformAtPose(PoseStamped pose_in, string frame_id_of_new_coordinate_frame)

Alternatively internally you're welcome to take shortcuts and possibly convert everything into homogeneous matrices. However, messages are designed to be semantically meaningful and able to be interpreted in a standalone manner without knowing how they were generated, which is why we have the distinction between the datatypes. When you start taking shortcuts and thinking of poses as transforms it's really easy to make a mistake and say multiply two poses together which has zero meaning.

@SteveMacenski
Copy link
Contributor Author

SteveMacenski commented Jun 8, 2021

Sorry, maybe I didn't communicate this well. I'm not suggesting removing the Pose or the Transform message. I'm suggesting that the translational component position in Pose and translation in Transform should be made to use the same data type (e.g. Vector3 or Point) such that they can be more easily converted between each other -- which is common in SLAM and localization methods that need to interact with both.

As both Point and Vector3 store their data as (x,y,z), nothing would change for end users accessing but they'd be able to:

pose.position = transform.translation

vs

pose.position.x = transform.translation.x
pose.position.y = transform.translation.y
pose.position.z = transform.translation.z

The suggested change is just to make the Pose.msg into:

# Pose
        Vector3 position
        Quaternion orientation

To suit. A Vector3 still semantically works here, since a Pose is also still a vector relative to some reference frame. I'd actually argue that Point in the Pose message is semantically less correct.

@tfoote
Copy link
Contributor

tfoote commented Jul 22, 2021

A pose is specifically made up of a position and orientation. A Point is the representation of a position, a vector is a representation of the transition between two places. Anything can be thought of as a vector, however the clearest distinction is that poses and points cannot be added. If you have two balls at P1 and P2, you cannot add them together.

P1 - P2 = V21
P1 - V21 = P2
V12 + V21 = V0
P1 + P2 = ?? Adding two positions doesn't mean anything.

If you collapse it to a 2d manifold, my position on the globe and your position on the globe are similar being in the same state. If you add them, you get somewhere on the other side of the world which isn't really relevant to anything.

If you just want a short hand way to do this semantically correctly. You can simply say that
zero initialize pose0, or point0

pose.position = point0 + transform.translation

or

position = point0 + translation

I think what you're actually trying to do is to do

pose = transform * pose0

Aka represent a pose at the origin of this coordinate frame which is clear from the above versus being fast and loose with the datatypes. This is still a one liner and is now actually semantically clear what you're doing to a future reader and hopefully can be checked in the future by a type analysis.

This is also more obvious if you used the stamped datatypes as the stamped datatypes can be used to validate the frameids of reference are correct.

Pose_FrameOut = Transform_FrameIn_FrameOut * Pose_FrameIn If FrameIn and FrameOut don't all match the math is invalid.

@SteveMacenski
Copy link
Contributor Author

SteveMacenski commented Jul 23, 2021

A pose is specifically made up of a position and orientation. A Point is the representation of a position, a vector is a representation of the transition between two places.

A pose is made up of a position and an orientation, which are both vectors by any common physics definition I've ever run into. 2 things that are of the same type doesn't mean that anyone is going to try to add, multiple, or subtract the values. This would be equally invalid with 2 Point or 2 Vector3's of information with different reference frames or contexts. I don't think that argument is unique to what we propose. You can of course do silly things with any data type of any message 😄

Maybe a tangible example might help. In the initial post, I linked to this snippet

pose.position.x = transform.translation.x
pose.position.y = transform.translation.y
pose.position.z = transform.translation.z

Which is common because of TF using transform messages but those transforms are often representing poses. A pose is just a vector like any other vector. The semantics between a pose and a transformation is non-existent. A pose is still where something is relative to another reference frame, just as a transform is. TF even internally maps these concepts as identical typedef tf::Transform tf::Pose, typedef tf::Vector3 tf::Point.

This request is almost singularly because we want to work with Poses in parts of our code but TF transformations cannot be directly translated. This is of course a small issue, the difference between above and pose.position = transformation.translation is trivial, but its a small developer quality of life improvement for cleaner userspace code.

Both Point and Vector3 contain the exact same fields and are semantically indistinguishable from each other. I'm asking that the Pose type use Vector3 so that we can do pose.position = transformation.translation when converting TF code to Poses since it seems silly that every time that conversion is done we have to specify every leaf type in 2 messages that contain the exact same information encoded in the exact same fields.

I can understand if that's not in the cards because changing that would be small and not super helpful for the general user, but I'd argue that removing Point is far less impactful than Pose2D which does have many unique uses

@tfoote
Copy link
Contributor

tfoote commented Jul 23, 2021

Both Point and Vector3 contain the exact same fields and are semantically indistinguishable from each other.

No, they are not semantically indistinguishable! One represents a position relative to an origin, the other represents a direction and magnitude. That was the entire point of my above response. Semantically different means that they have different meanings, a Vector3 is treated differently than a Point. Thus they are not semantically the same. They have the same representation which is true. This is no different than a Temperature, Illuminance, Fluid Pressure and Relative Humidity we represent them as unary float64 datatype + variance. Thus they have the same representation but are semantically different datatypes. Our system which gives them different types will prevent you from assigning them to each other or operating on them interchangeably because they have different meanings which if used incorrectly will give you meaningless values. The sum of two points is meaningless. Basic examples, the sum of two vectors has meaning. The product of two points is meaningless, the product of two vectors has meaning.

When you do the assignment as you've listed it, you are explictly saying. I'm creating a position which is at the end of this vector with the same x, y, z values. That's providing a specific change in the semantic meaning of that data.

If you want to do it in one line use the following

pose.position = Point() + transform.translation

This is a clear statement that you're setting the pose.position to the origin point offset by the vector values, because PointType + VectorType -> PointType. And you're explicitly grounding the reference to the origin. The vector in a transform is just saying offset by x, y, z. It does not technically have any grounding or origin, it's just a direction and magnitude. To turn it into a point you are saying that it's an offset from the origin. Which is exactly what the usage I'm using above is saying and it's validating all the types correctly.

TF even internally maps these concepts as identical typedef tf::Transform tf::Pose, typedef tf::Vector3 tf::Point.

Yes internally the tf datatypes take the shortcut because I know that internally to the library I can take the shortcuts and do it properly. The tf datatypes are also actually just a fork of the Bullet datatypes and we didn't want to diverge from the upstream representations. And I'm not saying that you can't take shortcuts internally on your systems too. However for the messages they need to be really clear and semantically valuable as possible. They need to stand on their own as often they do not come with internal representations. And similarly although they bottom out to the same methods, we do still keep the Semantically different types so that developers can use them even if the system isn't going to automatically check and enforce for them. At least they make it easier for a reviewer to know that if you multiply a tf::Point * tf::Point they should flag that as it doesn't make any sense.

@SteveMacenski
Copy link
Contributor Author

SteveMacenski commented Jul 23, 2021

One represents a position relative to an origin, the other represents a direction and magnitude.

You assume that there is such a thing as the one, true origin. What 'origin' exists in robotics? It's all just an origin of a reference frame, which can be referenced from other frames, for which there are many. This is no different from a vector...

I disagree with your viewpoint that a origin-centric vector is somehow unique or deserves special consideration. I think at this point our disagreement is philosophical and we understand where each other stand. It would be better argued over a beer 😉. I think we can close this. I don't think there's much hope of it making ground.

@jrutgeer
Copy link

jrutgeer commented Sep 7, 2023

@tfoote I agree for the case of e.g. FluidPressure and Temperature, i.e. you would not want to add or multiply a FluidPressure with a Temperature.

However, there is a one-to-one relation between "the pose of an object wrt a reference", and "the transformation of that object wrt that reference". Whereas there is no such relation between a Temperature and a FluidPressure.

Given this one-to-one relation, I don't see how a difference between a pose and a transformation holds, not from a theoretical point of view and certainly not from a practical real-life usability point of view.

An example:

Let's say for some application following parameters are read from a configuration file:

  • The pose of an object {object}, expressed wrt {world},
  • The pose of some feature of object, i.e. {feature}, expressed wrt {object},
  • A point expressed wrt {feature}.

Consider for another application, i read instead:

  • The transformation of that same object {object}, expressed wrt the same {world},
  • The transformation of the same feature of object, i.e. {feature}, expressed wrt {object},
  • The same point expressed wrt {feature}.

Both applications describe an identical physical reality.
Both applications also describe the physical reality in an equally correct way. There's no right or wrong in choosing poses vs transformations to describe positions and orientations of objects, nor is one option more preferrable than the other.

So what arguments do you have to say that one is different from the other?

Yet in the latter case I could write:

point_world = T_world_object * T_object_feature * point_feature;

whereas in the former case I would need to write:

point_world = convert_to_transform(p_world_object) * convert_to_transform(p_object_feature) * point_feature;

And this because... well, just because!



P1 + P2 = ?? Adding two positions doesn't mean anything.

If you collapse it to a 2d manifold, my position on the globe and your position on the globe are similar being in the same state. If you add them, you get somewhere on the other side of the world which isn't really relevant to anything.

This is a circular reasoning:

Obviously you cannot add P1 and P2, as pose P1 corresponds (one-to-one) to a transformation T_word_p1 and pose P2 corresponds (one-to-one) to transformation T_world_p2. And the following does not make sense either:

T = T_world_p1 * T_world_p2

The resulting T would be nonsense.

But instead: if e.g. P1 is the pose of robot wrt world, and P2 is the pose of end effector wrt robot, than it makes perfect sense to write P_ee = P1 + P2 (or whatever notation is preferred, e.g. P_ee = P1 * P2).



Both Point and Vector3 contain the exact same fields and are semantically indistinguishable from each other.

No, they are not semantically indistinguishable! One represents a position relative to an origin, the other represents a direction and magnitude.

Imo. this statement argues in favour of changing Transformatiom.translation from type Vector3 to type Point.
Otherwise you are implying that it is ok to write T = T_world_p1 * T_world_p2 as the tranformation is not "relative to an origin".

I do agree that one can think of the 'abstract concept' of a transformation as being reference-less, but that has no relevance other than purely from a theoretical-philosophical point of view. As soon as you apply a transformation, you must make a choice of reference frame, even if it's an implicit choice. Given robotics is an applied science, making a distinction between transformations and poses only distracts from the goal and adds confusion.

@tfoote
Copy link
Contributor

tfoote commented Sep 7, 2023

point_world = convert_to_transform(p_world_object) * convert_to_transform(p_object_feature) * point_feature;
And this because... well, just because!

This is because your psudocode is ignoring the extra metadata which needs to be added to validate the math. You start trying to imply that I'm saying that transforms can be applied without validating the frames match "Otherwise you are implying that it is ok to write T = T_world_p1 * T_world_p2 as the tranformation is not "relative to an origin"." Which is actually the opposite. Your example is ignoring the metadata that needs to be added to keep track of the coordinate frames. Aka a pose is actually more like an anonymous transform.

P_w = T_w->obj * T _obj->feature * P_feature

P_obj = Position of Object in world
P_feature = Position of something in the Feature frame

P_w = make_transform(P_obj, dest=obj) * make_transform(P_feature, dest=feature) * P_feature

And once you have that you can programatically check that the frames cancel world -> obj, obj -> feature, feature => world -> feature

Without this extra metadata the system can't check your transform chains are appropriately ordered.

Sometimes it makes sense to apply a name to every position in the cases that you're describing above. When you're representing a particle cloud of potential solutions naming every potential solution doesn't necessarily make sense.

Given robotics is an applied science, making a distinction between transformations and poses only distracts from the goal and adds confusion.

This argument could be applied to argue that we should get rid of all typing and semantic value and just give users numbers.

Overall this is the same sort of difference between Duration and Time, or Vector3 and Point3 Transform and Pose. They're different levels of dimentionality, and units in one case, but one is an offset and one is a position.

@jrutgeer
Copy link

jrutgeer commented Sep 7, 2023

This is because your psudocode is ignoring the extra metadata which needs to be added to validate the math.

I don't understand how this is different for a pose than for a transformation?

A pose is only defined if you know the reference frame (i.e. metadata), otherwise it is just a bunch of meaningless numbers.

But equally: a transformation is only defined if you know the reference frame (i.e metadata), otherwise it is also a bunch of meaningless numbers.


And once you have that you can programatically check that the frames cancel world -> obj, obj -> feature, feature => world -> feature

How?

Let's say I write a node that publishes both a transformation and a pose.
Both messages contain 3 + 4 doubles and no other information.
Another node receives both messages. How can it programatically check anything about the correctness?
And again: how is this different for a pose than for a transformation?


Both applications describe an identical physical reality.
Both applications also describe the physical reality in an equally correct way. There's no right or wrong in choosing poses vs transformations to describe positions and orientations of objects, nor is one option more preferrable than the other.

Do we agree on this?

@jrutgeer
Copy link

jrutgeer commented Sep 7, 2023

Overall this is the same sort of difference between Duration and Time, or Vector3 and Point3 Transform and Pose.

I gave this some further thought, and actually there is a distinct difference:

E.g. consider Duration and Time:

For me to communicate a Time to you:

  • I need to provide to you: one value,
  • And we need to agree on two specifications:
    1. The unit of the value, e.g. milliseconds,
    2. The meaning of the value, e.g. "time since epoch".

For me to communicate a Duration to you:

  • I need to provide you: one value,
  • And we need to agree on one specification:
    1. The unit of the value, e.g. milliseconds.

So if I send you a few messages of type "Double", that's annoying because you cannot discriminate between "Should I interprete this value using both 1. and 2. or only using 1."?
So instead of sending just doubles, we define specific types "Time" and "Duration" and all is clear.


Now consider Transformation and Pose:

For me to communicate a Transformation to you:

  • I need to provide to you: 7 values,
  • And we need to agree on four specifications:
    1. The meaning of the values, e.g. 3 cartesian coordinates and 4 unit quaternion coordinates,
    2. The units, e.g. [m] for the cartesian coordinates
    3. The reference frame wrt. which all values are expressed,
    4. Some identifier (e.g. a name) for the target frame

For me to communicate a Pose to you:

  • I need to provide to you: 7 values,
  • And we need to agree on four specifications:
    1. The meaning of the values, e.g. 3 cartesian coordinates and 4 unit quaternion coordinates,
    2. The units, e.g. [m] for the cartesian coordinates
    3. The reference frame wrt. which all values are expressed,
    4. Some identifier (e.g. a name) for the target frame

That's identical.

We still do not want to send a message of type "array of 7 doubles", as that could be anything (e.g. joint position values of a 7 DOF robot).
But I don't see why different types are needed for Pose and Transformation. It's the same 7 values and the same set of four specifications. There is no possibility to misinterpret the data.

@jrutgeer
Copy link

jrutgeer commented Sep 8, 2023

@tfoote I think I finally understand the point you are trying to make. :-)

EDIT: There is a logical inconsistency in this reasoning, see description halfway the text.

First consider Point vs Vector


Reasoning 1:

Consider homogeneous coordinates for points p1 and p2.

It does not make sense to state p3 = p1 + p2, since:

p1 + p2 = [x1; y1; z1; 1] + [x2; y2; z2; 1]

        = [x1+x2; y1+y2; z1+z2; *2*]

And that result is not a valid homogeneous coordinate.

Instead, the semantically correct summation operation is defined between a point and a vector, and we can convert the given point into a vector by substracting its reference (which is typically [0; 0; 0; 1]):

p1 + v2 = [x1; y1; z1; 1] + ( [x2; y2; z2; 1] - [0; 0; 0; 1] )

        = [x1+x2; y1+y2; z1+z2; 1]

For above to work, we obviously need to agree on the convention that the published messages p1 and p2 are expressed wrt. the same coordinate frame.


If we:

  • Also agree on the convention that each point always has the same reference (i.e. the origin [0;0;0;1]), and
  • Define two data types: vector and point

Then we can simplify the calculation (by omitting the homogeneous coordinate), while still keeping semantical validation (through the types).

Now, if this were all, then distinguishing between point and vector would still be nice from a theoretical point of view, but otherwise would have no true practical value. As long as we agree that all points have the same reference and that it is at [0;0;0], then saying p = p1 + p2 vs p = p1 + v2 has no practical implications; the math is identical. Our agreement on the reference point defines an implicit equivalence relation between points and vectors in so far the summation operator is concerned.


EDIT: I realized there must be a logical inconsistency in this reasoning, as the same reasoning can be made e.g. to conclude that a Time and a Duration are equivalent, which they are not.
As the remainder of this post builds on this reasoning, it is no longer valid.


Reasoning 2:

But there's more to it: summation is not the only operation:

Say T is a homogeneous transformation matrix (4x4).

If I provide you 3 coordinates x,y,z and state that they describe a point, you will transform it as follows:

p = T * [x; y; z; 1]

But if I state that it represents a vector, you will transform differently:

v = T * [x; y; z; 0]

This is different from reasoning 1, in that we cannot agree on some extra convention so that this difference is ruled out by an equivalence relation.

So the conclusion is very clear: we must distinguish the point and vector types, as there is not always an equivalence relation between these types.


Now consider Pose vs Transformation

The point you are trying to make (I think) is:

Similarly to a point being a vector with a reference, we define a pose as a transformation with a reference.

And we can agree on that.


However my point is:

Similar to reasoning 1:

For the point, we agreed on the reference [0; 0; 0; 1].
For the pose, we also agree on the reference: it is (Identity rotation and [0; 0; 0; 1] position).
Because of this agreement, there is an implicit equivalence relation between pose and transformation in so far the transfor operator (*) is concerned.

And:

I did not succeed in coming up with any example similar to reasoning 2, of an operation for which there is no equivalence relation between a pose and a transformation.


My conclusion is:

Unless there is a valid counter example similar to reasoning 2, there is no mandatory need to distinguish between pose and transformation types, as there is always an implicit equivalence relation between these types.

We can still choose to distinguish between the types, but nothing mandates this choice.


I would be very much intrigued by a valid counter example though.

@jeanchristopheruel
Copy link

jeanchristopheruel commented Nov 22, 2023

I strongly agree with @SteveMacenski. In common maths, physics and specialized libraries, there is no distinction between "point" and "vector" and between "pose" and "transform". They are all semantic Matrices. If units is the only concern here, it should be explicitly addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants