About the training loop #28

murdockhou · 2019-10-22T11:25:13Z

Hi, bother again!

I'm a little confused about training data code here. I find that you return the ori_img with corresponding keypoints annotations together (maybe there are multi persons' annotation). So, as hrnet is run as a single pose network, how can you training network using the ori image instead of the crop image based the human box annotation? Is there somewhere you have done this before?

For more, clearly, i think we can train hrnet ( a single pose network) using MSCOCO dataset is that we need to crop out multi/single input image from original image because maybe one image could have multi human annotations, just like in live-demo.py you have done.

So, could you tell me what mine thought is right or not, and to be honest, i'm also confused about how to train a single person pose network with using MSCOCO dataset?

Thanks a lot.

The text was updated successfully, but these errors were encountered:

stefanopini · 2019-10-23T21:27:36Z

Hi!

Maybe I do not get the point of your question.
HRNet is a single-person HPE method, therefore input images should contain only one person (which is what happens both during training and testing).
Ms COCO provides both the keypoint annotations and the person bounding boxes, so it is possible to create a different example (with a different image crop) for each person using keypoint and bounding box annotations.
Each annotation is added to a list inthese lines:

simple-HRNet/datasets/COCO.py

Lines 152 to 228 in 2b93c23

    
           self.data = [] 
        
           # load annotations for each image of COCO 
        
           for imgId in tqdm(self.imgIds): 
        
               ann_ids = self.coco.getAnnIds(imgIds=imgId, iscrowd=False) 
        
               img = self.coco.loadImgs(imgId)[0] 
        
               if self.use_gt_bboxes: 
        
                   objs = self.coco.loadAnns(ann_ids) 
        
                   # sanitize bboxes 
        
                   valid_objs = [] 
        
                   for obj in objs: 
        
                       # Skip non-person objects (it should never happen) 
        
                       if obj['category_id'] != 1: 
        
                           continue 
        
                       # ignore objs without keypoints annotation 
        
                       if max(obj['keypoints']) == 0: 
        
                           continue 
        
                       x, y, w, h = obj['bbox'] 
        
                       x1 = np.max((0, x)) 
        
                       y1 = np.max((0, y)) 
        
                       x2 = np.min((img['width'] - 1, x1 + np.max((0, w - 1)))) 
        
                       y2 = np.min((img['height'] - 1, y1 + np.max((0, h - 1)))) 
        
                       # Use only valid bounding boxes 
        
                       if obj['area'] > 0 and x2 >= x1 and y2 >= y1: 
        
                           obj['clean_bbox'] = [x1, y1, x2 - x1, y2 - y1] 
        
                           valid_objs.append(obj) 
        
                   objs = valid_objs 
        
               else: 
        
                   objs = bboxes[imgId] 
        
               # for each annotation of this image, add the formatted annotation to self.data 
        
               for obj in objs: 
        
                   joints = np.zeros((self.nof_joints, 2), dtype=np.float) 
        
                   joints_visibility = np.ones((self.nof_joints, 2), dtype=np.float) 
        
                   if self.use_gt_bboxes: 
        
                       # COCO pre-processing 
        
                       # # Moved above 
        
                       # # Skip non-person objects (it should never happen) 
        
                       # if obj['category_id'] != 1: 
        
                       #     continue 
        
                       # 
        
                       # # ignore objs without keypoints annotation 
        
                       # if max(obj['keypoints']) == 0: 
        
                       #     continue 
        
                       for pt in range(self.nof_joints): 
        
                           joints[pt, 0] = obj['keypoints'][pt * 3 + 0] 
        
                           joints[pt, 1] = obj['keypoints'][pt * 3 + 1] 
        
                           t_vis = int(np.clip(obj['keypoints'][pt * 3 + 2], 0, 1))  # ToDo check correctness 
        
                           # COCO: 
        
                           #   if visibility == 0 -> keypoint is not in the image. 
        
                           #   if visibility == 1 -> keypoint is in the image BUT not visible (e.g. behind an object). 
        
                           #   if visibility == 2 -> keypoint looks clearly (i.e. it is not hidden). 
        
                           joints_visibility[pt, 0] = t_vis 
        
                           joints_visibility[pt, 1] = t_vis 
        
                   center, scale = self._box2cs(obj['clean_bbox'][:4]) 
        
                   self.data.append({ 
        
                       'imgId': imgId, 
        
                       'annId': obj['id'], 
        
                       'imgPath': os.path.join(self.root_path, self.data_version, '%012d.jpg' % imgId), 
        
                       'center': center, 
        
                       'scale': scale, 
        
                       'joints': joints, 
        
                       'joints_visibility': joints_visibility, 
        
                   })

Then each image is cropped (to extract a specific person) and rescaled with an affine warping in:

simple-HRNet/datasets/COCO.py

Lines 290 to 296 in 2b93c23

    
           trans = get_affine_transform(c, s, self.pixel_std, r, self.image_size) 
        
           image = cv2.warpAffine( 
        
               image, 
        
               trans, 
        
               (int(self.image_size[0]), int(self.image_size[1])), 
        
               flags=cv2.INTER_LINEAR 
        
           )

Does this answer to your question?

murdockhou · 2019-10-24T09:59:32Z

Hi, Thanks for your reply. I think I have get your point. Everytime we get one train data we need to read its corresponding image using `cv2.imread`, but actually one image may contains more than two persons so that is it a little costing time reading same image serveral times?

stefanopini · 2019-10-25T16:29:52Z

Yes, but images are too much to be stored in RAM (in ordinary machines) so you have to load them from disk and, since the samples are shuffled during training, you have to re-load the same image at different steps of each epoch.

murdockhou · 2020-03-30T06:43:43Z

@stefanopini Bother again. I find when creating dataset, you used get_affine_transform and cv2.warpAffine func to get correct single person area in dataset/COCO.py/line 293. I'm a little confused why don't you use crop function directly croped person area on ori_img? Is that has much difference between these two ways?

stefanopini · 2020-04-02T10:40:57Z

Hi @murdockhou !

The difference is that using warpAffine you can apply affine transformations instead of just cropping the person area.
This is not useful during evaluation/testing, but it is used during training for data augmentation.
If you look at the previous lines of the file (L258-L296), you can see that the parameters passed to the function get_affine_transform simply crop the image if self.is_train is False while their values are modified to change the scale and to rotate and flip the person area for data augmentation if self.is_train is True.
I hope it is clearer now.

Btw, I've adapted this code from the original implementation and some details are still unclear to me.
In particular, I don't know the meaning of the parameter pixel_std (see line 109).

murdockhou · 2020-04-13T08:57:16Z

Hi, Thanks for your reply and sorry for the late reply of mine. I get your idea and it is very clear for me. The paramete *pixel_std*, I also looking for its meaning, and looks like it just is a scale factor in code. If we change the number *pixel_std* in [here]( https://github.com/stefanopini/simple-HRNet/blob/master/datasets/COCO.py#L109), nothing changed after doing this just for creating dataset. But maybe has influence on calculating AP/AR durning training. Hopes that I understand right. Stefano <notifications@github.com> 于2020年4月2日周四下午6:41写道：

…

Hi @murdockhou <https://github.com/murdockhou> ! The difference is that using warpAffine you can apply affine transformations instead of just cropping the person area. This is not useful during evaluation/testing, but it is used during training for data augmentation. If you look at the previous lines of the file (L258-L296) <https://github.com/stefanopini/simple-HRNet/blob/master/datasets/COCO.py#L258-L296>, you can see that the parameters passed to the function get_affine_transform simply crop the image if self.is_train is False while their values are modified to change the scale and to rotate and flip the person area for data augmentation if self.is_train is True. I hope it is clearer now. Btw, I've adapted this code from the original implementation and some details are still unclear to me. In particular, I don't know the meaning of the parameter pixel_std (see line 109 <https://github.com/stefanopini/simple-HRNet/blob/master/datasets/COCO.py#L109> ). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#28 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEMCC7MSVFRN72IKJ5AZUSLRKRTUPANCNFSM4JDOIGNA> .

valentin-fngr · 2022-11-09T11:08:16Z

Thank you both of you for clarifying my understanding.
One question : @stefanopini has mentioned that we can only detect a single person per image.
When you say 'image' do you mean the cropped and rescaled bounding box area on the image ?

stefanopini · 2022-11-30T23:18:32Z

Hi @valentin-fngr , that's correct.
The HRNet model is designed as a top-down approach: person detection first (with almost any detector), then human pose estimation on the single person bounding box area with HRNet.
On the opposite, HigherHRNet is a bottom-up multiperson approach.

stefanopini added the question Further information is requested label Oct 23, 2019

murdockhou closed this as completed Oct 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the training loop #28

About the training loop #28

murdockhou commented Oct 22, 2019

stefanopini commented Oct 23, 2019

murdockhou commented Oct 24, 2019 via email •

edited

stefanopini commented Oct 25, 2019

murdockhou commented Mar 30, 2020

stefanopini commented Apr 2, 2020

murdockhou commented Apr 13, 2020 via email

valentin-fngr commented Nov 9, 2022

stefanopini commented Nov 30, 2022

About the training loop #28

About the training loop #28

Comments

murdockhou commented Oct 22, 2019

stefanopini commented Oct 23, 2019

murdockhou commented Oct 24, 2019 via email • edited

stefanopini commented Oct 25, 2019

murdockhou commented Mar 30, 2020

stefanopini commented Apr 2, 2020

murdockhou commented Apr 13, 2020 via email

valentin-fngr commented Nov 9, 2022

stefanopini commented Nov 30, 2022

murdockhou commented Oct 24, 2019 via email •

edited