v0.7.0 Image API, RT-DETR and Object Detection API, LightGlue Matcher, MobileSam, new Sensors API and many more
Highlights
Image API
In this release we have added a new Image
API as placeholder to support a more generic multibackend api. You can export/import from files, numpy and dlapck.
>>> # from a torch.tensor
>>> data = torch.randint(0, 255, (3, 4, 5), dtype=torch.uint8) # CxHxW
>>> pixel_format = PixelFormat(
... color_space=ColorSpace.RGB,
... bit_depth=8,
... )
>>> layout = ImageLayout(
... image_size=ImageSize(4, 5),
... channels=3,
... channels_order=ChannelsOrder.CHANNELS_FIRST,
... )
>>> img = Image(data, pixel_format, layout)
>>> assert img.channels == 3
Object Detection API
We have added the ObjectDetector
that includes by default the RT-DETR model. The detection pipeline is fully configurable by supplying a pre-processor, a model, and a post-processor. Example usage is shown below.
from io import BytesIO
import cv2
import numpy as np
import requests
import torch
from PIL import Image
import matplotlib.pyplot as plt
from kornia.contrib.models.rt_detr import RTDETR, DETRPostProcessor, RTDETRConfig
from kornia.contrib.object_detection import ObjectDetector, ResizePreProcessor
model_type = "hgnetv2_x" # also available: resnet18d, resnet34d, resnet50d, resnet101d, hgnetv2_l
checkpoint = f"https://github.com/kornia/kornia/releases/download/v0.7.0/rtdetr_{model_type}.ckpt"
config = RTDETRConfig(model_type, 80, checkpoint=checkpoint)
model = RTDETR.from_config(config).eval()
detector = ObjectDetector(model, ResizePreProcessor(640), DETRPostProcessor(0.3))
url = "https://github.com/kornia/data/raw/main/soccer.jpg"
img = Image.open(BytesIO(requests.get(url).content))
img = np.asarray(img, dtype=np.float32) / 255
img_pt = torch.from_numpy(img).permute(2, 0, 1)
detection = detector.predict([img_pt])
for cls_score_xywh in detection[0].numpy():
class_id = int(cls_score_xywh[0])
score = cls_score_xywh[1]
x, y, w, h = cls_score_xywh[2:].round().astype(int)
cv2.rectangle(img, (x, y, w, h), (255, 0, 0), 3)
text = f"{class_id}, {score:.2f}"
font = cv2.FONT_HERSHEY_SIMPLEX
(text_width, text_height), _ = cv2.getTextSize(text, font, 1, 2)
cv2.rectangle(img, (x, y - text_height, text_width, text_height), (255, 0, 0), cv2.FILLED)
cv2.putText(img, text, (x, y), font, 1, (255, 255, 255), 2)
plt.imshow(img)
plt.show()
Deep Models
As part of the kornia.contrib
module, we started building a models
module where Deep Learning models for Computer Vision (Semantic Segmentation, Object Detection, etc.) will exist.
From an abstract base class ModelBase
, we will implement and make available these deep learning models (eg Segment anything). Similarly, we provide standard structures to be used with the results of these models such as SegmentationResults
.
The idea is that we can abstract and standardize how these models will behave with our High level APIs. Like for example interacting with the Visual Prompter
backend (today Segment Anything
is available).
ModelBase
provides methods for loading checkpoints (load_checkpoint
), and compiling itself via the torch.compile
API. And we plan to increase it according to the needs of the community.
Within this release, we are also making other models available to be used like RT_DETR
and tiny_vit
.
Example of using these abstractions to implement a model:
# Each model should be a submodule inside the `kornia.contrib.models`, and the Model class itself will be exposed under this
# `models` module.
from kornia.contrib.models.base import ModelBase
from dataclasses import dataclass
from kornia.contrib.models.structures import SegmentationResults
from enum import Enum
class MyModelType(Enum):
"""Map the model types."""
a = 0
...
@dataclass
class MyModelConfig:
model_type: str | int | SamModelType | None = None
checkpoint: str | None = None
...
class MyModel(ModelBase[MyModelConfig]):
def __init__(...) -> None:
...
@staticmethod
def from_config(config: MyModelConfig) -> MyModel:
"""Build the model based on the config"""
...
def forward(...) -> SegmentationResults:
...
RT-DETR
In most object detection models, non-maximum suppression (NMS) is necessary to remove overlapping and similar bounding boxes. This post-processing algorithm has high latency, preventing object detectors from reaching real-time speed. DETR is a new class of detectors that eliminate NMS step by using transformer decoder to directly predict bounding boxes. RT-DETR enhances Deformable DETR to achieve real-time speed on server-class GPUs by using an efficient backbone. More details can be seen here
TinyViT
TinyViT is an efficient and high-performing transformer model for images. It achieves a top-1 accuracy of 84.8% on ImageNet-1k with only 21M parameters. See TinyViT for more information.
MobileSAM
MobileSAM replaces the heavy ViT-H backbone in the original SAM with TinyViT, which is more than 100 times smaller in terms of parameters and around 40 times faster in terms of inference speed. See MobileSAM for more details.
To use MobileSAM, simply specify "mobile_sam"
in the SamConfig
:
from kornia.contrib.visual_prompter import VisualPrompter
from kornia.contrib.models.sam import SamConfig
prompter = VisualPrompter(SamConfig("mobile_sam", pretrained=True))
LightGlue matcher
Added the LightGlue
LightGlue-based matcher in kornia API. This is based on the original code from paper “LightGlue: Local Feature Matching at Light Speed”. See [LSP23] for more details.
The LightGlue algorithm won a money prize in the Image Matching Challenge 2023 @ CVPR23: https://www.kaggle.com/competitions/image-matching-challenge-2023/overview
See a working example integrating with COLMAP: #2469
New Sensors API
New kornia.sensors
module to interface with sensors like Camera, IMU, GNSS etc.
We added CameraModel
, PinholeModel
, CameraModelBase
for now.
Usage example:
Define a CameraModel
>>> # Pinhole Camera Model
>>> cam = CameraModel(ImageSize(480, 640), CameraModelType.PINHOLE, torch.Tensor([328., 328., 320., 240.]))
>>> # Brown Conrady Camera Model
>>> cam = CameraModel(ImageSize(480, 640), CameraModelType.BROWN_CONRADY, torch.Tensor([1.0, 1.0, 1.0, 1.0,
... 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]))
>>> # Kannala Brandt K3 Camera Model
>>> cam = CameraModel(ImageSize(480, 640), CameraModelType.KANNALA_BRANDT_K3, torch.Tensor([1.0, 1.0, 1.0,
... 1.0, 1.0, 1.0, 1.0, 1.0]))
>>> # Orthographic Camera Model
>>> cam = CameraModel(ImageSize(480, 640), CameraModelType.ORTHOGRAPHIC, torch.Tensor([328., 328., 320., 240.]))
>>> cam.params
tensor([328., 328., 320., 240.])
Added kornia.geometry.solvers
submodule
New module for geometric vision solvers that include the following:
This is part of an upgrade of the find_fundamental
to support the 7POINT
algorithm.
Image terminal printing
Added kornia.utils.print_image
API for printing any given image tensors or image path to terminal.
>>> kornia.utils.print_image("panda.jpg")
What's Changed
- fix skipped tests for cuda
TestColorJiggleGen
by @johnnv1 in #2341 - remove inplace operation by @Parskatt in #2346
- Replace bandit, flake8, isort, pyupgrade, and yesqa with ruff by @cclauss in #2292
- fix unused import on
geometry.conversions
by @johnnv1 in #2357 - [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2345
- Visual prompter by @edgarriba in #2356
- Integrate vector to liegroups by @cjpurackal in #2344
- add from_wxyz and from_qxyz to So3 and Se3 by @cjpurackal in #2359
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2360
- RandomGrayscale: add rgb_weights parameter by @adamjstewart in #2361
- add support for tuple of types on KORNIA_CHECK_TYPE + bug fix by @cjpurackal in #2353
- Remove examples in favor of
kornia/tutorials
repo by @johnnv1 in #2366 - add tutorials testing ci by @johnnv1 in #2367
- Update version to 0.6.13-dev by @johnnv1 in #2368
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2370
- Optional raise in kornia check functions by @alex-jw-brooks in #2375
- remove unused circle ci by @johnnv1 in #2378
- Bump accelerate from 0.18.0 to 0.19.0 by @dependabot in #2381
- Replace miniconda for
setup-python@v4
on env setup CI by @johnnv1 in #2380 - [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2382
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2388
- Use separable filter for SSIM calculation to speed up. by @dodobyte in #2383
- Bump pytest-cov from 4 to 4.1.0 by @dependabot in #2392
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2394
- Batched quaternion to matrix by @jatentaki in #2395
- Batched draw_lines by @cjpurackal in #2373
- Fix AugmentationSequential support for RandomResizedCrop by @adamjstewart in #2398
- drop python 3.7 support by @johnnv1 in #2400
- Bump accelerate from 0.19.0 to 0.20.3 by @dependabot in #2402
- Bump pytest from 7.3.1 to 7.3.2 by @dependabot in #2401
- Makefile: Drop Python code formatter yapf in favor of psf/black by @cclauss in #2404
- update the kornia governance page by @edgarriba in #2403
- Update readthedocs.yml with python 3.8 by @edgarriba in #2406
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2408
- Add collector job on CI PR's, reusable typing CI and remove unused CI by @johnnv1 in #2379
- remove old artefacts in visual_prompting.rst by @edgarriba in #2409
- use reshape instead view to convert matrix from quaternion by @edgarriba in #2413
- Revamped camera API(introduce kornia.sensors) by @cjpurackal in #2349
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2415
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2418
- Bump pytest from 7.3.2 to 7.4.0 by @dependabot in #2416
- Fix PLR0124 by @ashnair1 in #2422
- Add missing docs for warp_grid by @edgarriba in #2423
- Add GPU support apple silicon augmentation 2D by @NripeshN in #2425
- Add mps to testing init and fixing test cases by @NripeshN in #2428
- drop JIT support from
geometry.conversions
by @johnnv1 in #2424 - Update rst(docs) files to support MPS by @NripeshN in #2430
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2429
- fix engine arg missing in opening and closing by @wintertee in #2431
- Added terminal printing by @shijianjian in #2407
- add support for quaternion in Se3 instantiation by @cjpurackal in #2433
- fix missing assertion by @qingpeng9802 in #2435
- Add RT-DETR and ObjectDetection API by @gau-nernst in #2363
- Update ruff configs by @johnnv1 in #2358
- Use ruff to discover and limit code complexity by @cclauss in #2442
- Apply some ruff pytest fixes by @cclauss in #2444
- Remove deprecated code from
kornia.geometry.conversion
by @pri1311 in #2437 - Python linting: Add more ruff rules by @cclauss in #2441
- fix the
alpha
of focal loss by @qingpeng9802 in #2393 - [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2448
- Enable
disallow_untyped_defs
on mypy by @johnnv1 in #2252 - Adding LightGlue by @ducha-aiki in #2449
- Add MobileSAM by @gau-nernst in #2446
- Make TinyViT available as a standalone image classifier by @gau-nernst in #2455
- Add doc for MobileSAM by @gau-nernst in #2458
- Add 7 point algorithm by @anandhupvr in #2390
- Add pre-trained flag for SAM by @gau-nernst in #2456
- Add Image API by @edgarriba in #1562
- Fix auto augmentation transformation matrix by @shijianjian in #2355
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2462
- Fix some docs by @qingpeng9802 in #2461
- fix aug sharpness duplicated typing by @johnnv1 in #2468
- axis_angle typo fix by @cjpurackal in #2463
- solvers : Separating General Functionalities(polynomial solver) into
solvers
Submodule by @anandhupvr in #2465 kornia.sensors
docs update by @cjpurackal in #2477- Fix
KORNIA_CHECK_SAME_DEVICE
cuda test by @johnnv1 in #2479 - [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2472
- Bump accelerate from 0.20.3 to 0.21.0 by @dependabot in #2459
- [feat]
from_matrix
for Se3 and Se2 by @cjpurackal in #2473 - Add draw point2d by @alex-jw-brooks in #2387
- [CI] bump pytorch to 2.0.1 by @johnnv1 in #2369
- include print image in the Image API by @edgarriba in #2481
- Fix typo SAM docs by @johnnv1 in #2483
- kornia.geometry.solvers (polynomial_solvers) docs by @anandhupvr in #2484
- fix image print docs by @edgarriba in #2488
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2491
- Add missing docs for new
PinholeModel
camera model by @edgarriba in #2492 - bump version 0.7.0 by @edgarriba in #2476
New Contributors
- @alex-jw-brooks made their first contribution in #2375
- @dodobyte made their first contribution in #2383
- @NripeshN made their first contribution in #2425
- @wintertee made their first contribution in #2431
- @qingpeng9802 made their first contribution in #2435
- @anandhupvr made their first contribution in #2390
Full Changelog: v0.6.12...v0.7.0