Confuse of input and label #12

xiaoxiaoSummer · 2018-09-11T02:28:20Z

The dataloader of coco dataset shows the details of the work how to exploit in network training. But I check the dataloader function, the weight and output actually is same "dataloader.py---- line 43-64 and line 75--95", the input is the weighs variable from gendata function, why the input is coming from the known keypoints position information rather than the true image data? It is definitely different from what u said in the paper. It means that u use the known label to predict the known label? Does it make sense? If I have some misunderstanding about the code, please let me know.

for j in range(17):
if kpv[j] > 0:
x0 = int((kpx[j] - x) * x_scale)
y0 = int((kpy[j] - y) * y_scale)

            if x0 >= self.bbox_width and y0 >= self.bbox_height:
                output[self.bbox_height - 1, self.bbox_width - 1, j] = 1
            elif x0 >= self.bbox_width:
                output[y0, self.bbox_width - 1, j] = 1
            elif y0 >= self.bbox_height:
                try:
                    output[self.bbox_height - 1, x0, j] = 1
                except:
                    output[self.bbox_height - 1, 0, j] = 1
            elif x0 < 0 and y0 < 0:
                output[0, 0, j] = 1
            elif x0 < 0:
                output[y0, 0, j] = 1
            elif y0 < 0:
                output[0, x0, j] = 1
            else:
                output[y0, x0, j] = 1

    img_id = ann_data['image_id']
    img_data = coco.loadImgs(img_id)[0]
    ann_data = coco.loadAnns(coco.getAnnIds(img_data['id']))

    for ann in ann_data:
        kpx = ann['keypoints'][0::3]
        kpy = ann['keypoints'][1::3]
        kpv = ann['keypoints'][2::3]

        for j in range(17):
            if kpv[j] > 0:
                if (kpx[j] > bbox[0] - bbox[2] * self.threshold and kpx[j] < bbox[0] + bbox[2] * (1 + self.threshold)):
                    if (kpy[j] > bbox[1] - bbox[3] * self.threshold and kpy[j] < bbox[1] + bbox[3] * (1 + self.threshold)):
                        x0 = int((kpx[j] - x) * x_scale)
                        y0 = int((kpy[j] - y) * y_scale)

                        if x0 >= self.bbox_width and y0 >= self.bbox_height:
                            weights[self.bbox_height - 1, self.bbox_width - 1, j] = 1
                        elif x0 >= self.bbox_width:
                            weights[y0, self.bbox_width - 1, j] = 1
                        elif y0 >= self.bbox_height:
                            weights[self.bbox_height - 1, x0, j] = 1
                        elif x0 < 0 and y0 < 0:
                            weights[0, 0, j] = 1
                        elif x0 < 0:
                            weights[y0, 0, j] = 1
                        elif y0 < 0:
                            weights[0, x0, j] = 1
                        else:
                            weights[y0, x0, j] = 1

    for t in range(17):
        weights[:, :, t] = gaussian(weights[:, :, t])
    output = gaussian(output, sigma=2, mode='constant', multichannel=True)
    # weights = gaussian_multi_input_mp(weights)
    # output = gaussian_multi_output(output)
    return weights, output

The text was updated successfully, but these errors were encountered:

abeardear · 2018-09-11T06:32:39Z

Input(weights) and output are not same. Output only consider one person, every channel has only one peak. Input consider the overlap of several person, each channel may have several peak.

salihkaragoz · 2018-09-11T17:22:29Z

@xiongzihua Thank you for the clarification.

@xiaoxiaoSummer In addition to what is mentioned above;

why the input is coming from the known keypoints position information rather than the true image data?

This Repo includes the last part of the MultiPoseNet, in other words, Pose Residual Network. For more info have a look at #4 comment
We trained the PRN over the coco ground truth data.

A Sample of Input-Output Pairs;

Input has 17 channels, each channel represents one keypoint. The background image is set to a better understanding. The first channel(nose) has three different Gaussian peaks because in the bounding box there are three human noses.

Output has 17 channels same as the input. In the example, there is only one Gaussian peak in the first channel. It also belongs to the bounding-box owner.

Feel free for fresh questions otherwise please close this issue.
Hope This Helps

xiaoxiaoSummer · 2018-09-12T02:54:41Z

@salihkaragoz so this network only work as a role for matching one person pose from whole pose components given at previous stages like keypoints detection and human detections?

salihkaragoz · 2018-09-12T11:01:51Z

Yes exactly. PRN takes the output of Bounding Box results and Bottom-up keypoint estimator results as an input. Right now, it is working with Coco ground truth data.

Theoretically, it can work with any output of Bottom-up Estimator results plus Bounding-box results. But it doesn't work with current Repo adjustment. We need to configure it.
Best

salihkaragoz added the good first issue Good for newcomers label Sep 11, 2018

salihkaragoz closed this as completed Sep 13, 2018

salihkaragoz mentioned this issue Dec 26, 2018

Which path should the COCO images be placed ? mkocabas/pose-residual-network#15

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confuse of input and label #12

Confuse of input and label #12

xiaoxiaoSummer commented Sep 11, 2018

abeardear commented Sep 11, 2018

salihkaragoz commented Sep 11, 2018

xiaoxiaoSummer commented Sep 12, 2018

salihkaragoz commented Sep 12, 2018

Confuse of input and label #12

Confuse of input and label #12

Comments

xiaoxiaoSummer commented Sep 11, 2018

abeardear commented Sep 11, 2018

salihkaragoz commented Sep 11, 2018

xiaoxiaoSummer commented Sep 12, 2018

salihkaragoz commented Sep 12, 2018