Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BoundingBox API v2 #2064

Open
edgarriba opened this issue Dec 5, 2022 · 10 comments
Open

BoundingBox API v2 #2064

edgarriba opened this issue Dec 5, 2022 · 10 comments
Labels

Comments

@edgarriba
Copy link
Member

edgarriba commented Dec 5, 2022

We would like start iterating on how to integrate the new apis from geometry like the bounding box class. I'll soon push a new api we have been iterating in the background for images too.

consider to merge:

In particular, the bbox class should be adjusted a bit to support autocast and stuff like this

class BoundingBoxFormat(Enum):
   XYXY = 0
   ....

# TODO: discuss about if bbox makes sense to be a tensor, eg.g to optimize/refine coords
class BoundingBox(TensorWrapper):
  def __init__(self, data: Union[Tensor, List[Tensor]], data_format: BoundingBoxFormat) -> None:
     self._data = data
     self._data_format = data_format

  # adapt the rest: https://github.com/kornia/kornia/blob/master/kornia/geometry/boxes.py#L167

  def convert(self, new_format: BoundingBoxFormat) -> "BoundingBox":
   # check data_format and apply the corresponding logic to convert from one format to the other
    ...

  def normalized(self) -> "BoundingBox":
     # return [-1,1] e.g to use directly in grid_sample
     ...

Originally posted by @edgarriba in #2062 (comment)

/cc @ducha-aiki @shijianjian @johnnv1 @miquelmarti @nitaifingerhut

@edgarriba edgarriba added feature request New feature or request module: geometry labels Dec 5, 2022
@johnnv1
Copy link
Member

johnnv1 commented Dec 12, 2022

is there a specific reason why we need separate APIs for 2D and 3D?

@edgarriba
Copy link
Member Author

edgarriba commented Dec 12, 2022

no reason, we could merge

@johnnv1
Copy link
Member

johnnv1 commented Dec 12, 2022

Some ideas:

have just three modes (based on the existing ones): (probably with better names than my suggestion 😅 )

  • mimmax: xmin, ymin, xmax, ymax or xmin, ymin, zmin, xmax, ymax, zmax.
  • minlen: xmin, ymin, width, height or xmin, ymin, zmin, width, height, depth
  • vertices: top-left-back, top-right-back, bottom-right-back, bottom-left-back, or top-left-back, top-right-back, bottom-right-back, bottom-left-back, top-left-front, top-right-front, bottom-right-front, bottom-left-front

and for the bb itself have

  • have xmax, xmin, width, etc as properties

    • based on the shape and mode we can determine if is in 3d, then compute if needs a stride (of one) to get the right properties. As this example
  • Have a buffer attribute, so we don't need the modes plus and we always add this buffer (default as zero) to the width, height, etc

Is the dim N (at the current implementation) referring to N bb's in the same batch?

@edgarriba
Copy link
Member Author

edgarriba commented Dec 13, 2022

Actually, similar to the other new geometry primitives we could implement in terms of Eigen AlignedBox
https://gitlab.com/libeigen/eigen/-/blob/master/Eigen/src/Geometry/AlignedBox.h

mentioned here, #1754

eventually we can alsohave PosedAlignedBox, that includes se2 or se3

@johnnv1
Copy link
Member

johnnv1 commented Apr 23, 2023

We should consider the discussion from #1142 too 😄

@edgarriba
Copy link
Member Author

@johnnv1 feel free to sketch a proposal

@cjpurackal
Copy link
Member

After using the current api, I want to point out that the default representation of bboxes as quadrilaterals feel painful when it comes to reading bboxes from existing datasets. I propose the default format be either XYXY or XYWH. We could have an internal functionality to convert between the different formats. (perhaps represented as an Enum ?)

@shijianjian
Copy link
Member

Note that if we do not consider a rotation angle, some augmentations may fail.

@edgarriba
Copy link
Member Author

XYXYΘ / XYHWΘ?

@shijianjian
Copy link
Member

XYXYΘ / XYHWΘ?

Preferrably but the network of the user may not support it. So, if a user inputs XYXY, probably Rotation needs to be disabled or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants