-
Notifications
You must be signed in to change notification settings - Fork 42
Add RFC for KerasCV Single Stage Object Detection. #26
Conversation
# preprocess image, gt_boxes, gt_labels, such as flip, resize, and padding, and reserve 0 for background label. | ||
return image, gt_boxes, gt_labels | ||
|
||
anchor_generator = keras_cv.ops.AnchorGenerator(anchor_sizes, scales, aspect_ratios, strides) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the different submodules in keras_cv
? Everything seems to live in ops
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we proposed 2 layers, 1 loss, and 4 ops. However I'd like your input on whether BoxCoder
and BoxMatcher
should be layer instead, given our example workflow.
raw_boxes, scores = loaded_model(batched_image, training=False) | ||
decoded_boxes = box_coder.decode(raw_boxes, anchor_boxes) | ||
classes, scores, boxes, _ = detection_generator(scores, decoded_boxes) | ||
return classes, scores, boxes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider returning a dictionary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return image, gt_boxes, gt_labels | ||
|
||
anchor_generator = keras_cv.ops.AnchorGenerator(anchor_sizes, scales, aspect_ratios, strides) | ||
similarity_calculator = keras_cv.ops.IOUSimilarity() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"layers"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
mask_value: A float mask value to fill where `mask` is True. | ||
""" | ||
|
||
def __call__(self, groundtruth_boxes, anchors, mask=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work!
Just small suggestion on confusing line (at least for me).
Also a comment with respect to default values which may apply to other functions: shouldn't them be supplied?
Thanks for your work!
## Design overview | ||
|
||
This proposal includes the specific core components for building single-stage object detection models. It does not, | ||
include, however, include: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not,
include, however, include:
Those lines are confussing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
|
||
Arguments: | ||
alpha: The `alpha` weight factor for binary class imbalance. | ||
gamma: The `gamma` focusing parameter to re-weight loss. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not include default values for alpha
an gamma
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
transformed_train_ds = train_ds.map(preprocess).map(encode_label).batch(128).shuffle(1024) | ||
transformed_eval_ds = eval_ds.map(preprocess).map(encode_label).batch(128) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a preprocessing layer for the image to make the model more encapsulated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data augmentation is not part of this proposal. But yes, it is part of the roadmap.
```python | ||
import tensorflow as tf | ||
import tensorflow_datasets as tfds | ||
import keras_cv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm use to using tf.keras
. What is keras_cv
?
Can't this stuff live within tf.keras.applications
?
Also, this seems related to my feature request here: tensorflow/tensorflow#42393
This is exciting!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd probably have to briefly surf the scoping doc:
https://github.com/keras-team/governance/blob/master/rfcs/20200827-keras-cv-scoping-design.md
Essentially, keras_cv is for meeting more complicated modeling demands, such as detection and segmentation. It will directly rely on keras-application for backbone building blocks.
|
||
#### Training | ||
|
||
Case where a user want to train from scratch: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be the option to train from a pre-defined checkpoint? For example, start from
a model pre-trained on coco data set and then do transfer learning. Having those weights
from coco for example for the different backbones in tf.keras.applications
would be nice.
Much like we have the imagenet weights and there is the option to remove the head and have the pre-trained
feature extractor (as it is today for classification for example in tf.keras.applications
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! I forgot to mention this. Updated.
Update based on feedbacks.
No description provided.