Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confuse of input and label #12

Closed
xiaoxiaoSummer opened this issue Sep 11, 2018 · 4 comments
Closed

Confuse of input and label #12

xiaoxiaoSummer opened this issue Sep 11, 2018 · 4 comments
Labels
good first issue Good for newcomers

Comments

@xiaoxiaoSummer
Copy link

The dataloader of coco dataset shows the details of the work how to exploit in network training. But I check the dataloader function, the weight and output actually is same "dataloader.py---- line 43-64 and line 75--95", the input is the weighs variable from gendata function, why the input is coming from the known keypoints position information rather than the true image data? It is definitely different from what u said in the paper. It means that u use the known label to predict the known label? Does it make sense? If I have some misunderstanding about the code, please let me know.

for j in range(17):
if kpv[j] > 0:
x0 = int((kpx[j] - x) * x_scale)
y0 = int((kpy[j] - y) * y_scale)

            if x0 >= self.bbox_width and y0 >= self.bbox_height:
                output[self.bbox_height - 1, self.bbox_width - 1, j] = 1
            elif x0 >= self.bbox_width:
                output[y0, self.bbox_width - 1, j] = 1
            elif y0 >= self.bbox_height:
                try:
                    output[self.bbox_height - 1, x0, j] = 1
                except:
                    output[self.bbox_height - 1, 0, j] = 1
            elif x0 < 0 and y0 < 0:
                output[0, 0, j] = 1
            elif x0 < 0:
                output[y0, 0, j] = 1
            elif y0 < 0:
                output[0, x0, j] = 1
            else:
                output[y0, x0, j] = 1

    img_id = ann_data['image_id']
    img_data = coco.loadImgs(img_id)[0]
    ann_data = coco.loadAnns(coco.getAnnIds(img_data['id']))

    for ann in ann_data:
        kpx = ann['keypoints'][0::3]
        kpy = ann['keypoints'][1::3]
        kpv = ann['keypoints'][2::3]

        for j in range(17):
            if kpv[j] > 0:
                if (kpx[j] > bbox[0] - bbox[2] * self.threshold and kpx[j] < bbox[0] + bbox[2] * (1 + self.threshold)):
                    if (kpy[j] > bbox[1] - bbox[3] * self.threshold and kpy[j] < bbox[1] + bbox[3] * (1 + self.threshold)):
                        x0 = int((kpx[j] - x) * x_scale)
                        y0 = int((kpy[j] - y) * y_scale)

                        if x0 >= self.bbox_width and y0 >= self.bbox_height:
                            weights[self.bbox_height - 1, self.bbox_width - 1, j] = 1
                        elif x0 >= self.bbox_width:
                            weights[y0, self.bbox_width - 1, j] = 1
                        elif y0 >= self.bbox_height:
                            weights[self.bbox_height - 1, x0, j] = 1
                        elif x0 < 0 and y0 < 0:
                            weights[0, 0, j] = 1
                        elif x0 < 0:
                            weights[y0, 0, j] = 1
                        elif y0 < 0:
                            weights[0, x0, j] = 1
                        else:
                            weights[y0, x0, j] = 1

    for t in range(17):
        weights[:, :, t] = gaussian(weights[:, :, t])
    output = gaussian(output, sigma=2, mode='constant', multichannel=True)
    # weights = gaussian_multi_input_mp(weights)
    # output = gaussian_multi_output(output)
    return weights, output
@abeardear
Copy link

Input(weights) and output are not same. Output only consider one person, every channel has only one peak. Input consider the overlap of several person, each channel may have several peak.

@salihkaragoz
Copy link
Owner

@xiongzihua Thank you for the clarification.

@xiaoxiaoSummer In addition to what is mentioned above;

why the input is coming from the known keypoints position information rather than the true image data?

This Repo includes the last part of the MultiPoseNet, in other words, Pose Residual Network. For more info have a look at #4 comment
We trained the PRN over the coco ground truth data.

A Sample of Input-Output Pairs;

Input has 17 channels, each channel represents one keypoint. The background image is set to a better understanding. The first channel(nose) has three different Gaussian peaks because in the bounding box there are three human noses.
input

Output has 17 channels same as the input. In the example, there is only one Gaussian peak in the first channel. It also belongs to the bounding-box owner.
output

Feel free for fresh questions otherwise please close this issue.
Hope This Helps

@salihkaragoz salihkaragoz added the good first issue Good for newcomers label Sep 11, 2018
@xiaoxiaoSummer
Copy link
Author

@salihkaragoz so this network only work as a role for matching one person pose from whole pose components given at previous stages like keypoints detection and human detections?

@salihkaragoz
Copy link
Owner

Yes exactly. PRN takes the output of Bounding Box results and Bottom-up keypoint estimator results as an input. Right now, it is working with Coco ground truth data.

Theoretically, it can work with any output of Bottom-up Estimator results plus Bounding-box results. But it doesn't work with current Repo adjustment. We need to configure it.
Best

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants