Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Own dataset training results are not accurate #207

Open
fbas-est opened this issue Mar 15, 2022 · 39 comments
Open

Own dataset training results are not accurate #207

fbas-est opened this issue Mar 15, 2022 · 39 comments

Comments

@fbas-est
Copy link

fbas-est commented Mar 15, 2022

Hi,

I try to train the network on my own dataset but the results are not good enough despite the fact that the model converge.
I’ve a dataset of a total of 3000 annotated images.
My camera is a realsense depth camera D415 with the following parameters:
"fx": 607.3137817382812
"fy": 606.8499145507812
"ppx": 330.49334716796875
"ppy": 239.25704956054688
"height": 480
"width": 640
"depth_scale": 0.0010000000474974513
I’ve created my own dataset.py with respect to the linemod’s dataset.py but I changed the following lines:

cam_scale = 1.0
pt2 = depth_masked / cam_scale
pt0 = (ymap_masked - self.cam_cx) * pt2 / self.cam_fx
pt1 = (xmap_masked - self.cam_cy) * pt2 / self.cam_fy
cloud = np.concatenate((pt0, pt1, pt2), axis=1)
cloud = cloud / 1000.0

to:

cam_scale = self.cam_scale # 0.0010000000474974513
pt2 = depth_masked * cam_scale
pt0 = (ymap_masked - self.cam_cx) * pt2 / self.cam_fx
pt1 = (xmap_masked - self.cam_cy) * pt2/ self.cam_fy
cloud = np.concatenate((pt0, pt1, pt2), axis=1)
cloud = cloud

I also removed every division by 1000 in the code because my mesh values are already in meters.

The object’s diam is: 0.324
The estimator’s loss is: 0.0146578 and
the refiner’s loss is : 0.01338558

Any idea of what is wrong with my iplementation?
Thanks.

@jc0725
Copy link

jc0725 commented Apr 14, 2022

@fbas-est
Hello. This is unrelated to your question, but I am also trying to use DenseFusion on my own dataset.
May I ask what your environment settings are (CUDA version, etc.), and the steps for how you successfully managed to build using your own dataset?
Thank you in advance.

@Xushuangyin
Copy link

Hello, I'm also making my own datasets for training and using realsense camera to estimate the attitude of objects. I've also encountered some problems. Is it convenient to add a contact information for communication? My wechat is 18845107925

@fbas-est
Copy link
Author

fbas-est commented Apr 14, 2022

@jc0725
Hello I use CUDA 10.1 and PyTorch 1.6.
To build my dataset I used ObjectDatasetTools. You can find the source code from github: https://github.com/F2Wang/ObjectDatasetTools
In order to make it work I changed the format of the dataset to comply with the format of the DenseFusion's Linemod Dataset.

@jc0725
Copy link

jc0725 commented Apr 14, 2022

@fbas-est
Thank you for your response.
May I ask how you trained the SegNet for LINEMOD? Did you change the "--dataset_root" directory to LINEMOD instead of YCB in ./vanilla_segmentation/train.py ?

Also, after training, what script did you run to get the 6DoF results?

I apologize if my questions are quite elementary.

@fbas-est
Copy link
Author

fbas-est commented Apr 14, 2022

@jc0725
Yes. I also changed dataset.py a bit in order to work for my dataset.
A slighty different version of eval_linemod.py with some functions for visualizing the 3D bounding box

@jc0725
Copy link

jc0725 commented Apr 14, 2022

@fbas-est
Would it be possible for you to upload your working code to your repository so that I can clone it?

@Xushuangyin
Copy link

Thank you very much for your reply. I also used ObjectDatasetTools to make my own dataset. I made 10000 pictures of a single object, but after training 20epoch, the posture of the model was changed greatly when I called the model to pose the object. I wanted to ask you how many rounds you trained, and how did you get the green bounding box in your video? Thank you. @fbas-est

@Xushuangyin
Copy link

3d844a26c702f624fea6619a37124476.mp4

@fbas-est
Copy link
Author

fbas-est commented Apr 14, 2022

Here is the code for visualizing:
visualize.txt

@Xushuangyin
You produced 10000 pictures from one video or from different videos? In my case I used different videos due to RAM limitations.
The problem was that every video produce pointclouds with different rotation and translation matrices and so the model could not use the same mesh for all the combined dataset.

@Xushuangyin
Copy link

I made 10000 pictures from different videos. If there are too many pictures, the program will report an error. I made my own object grid. How can I solve the problem you said? @fbas-est

@Xushuangyin
Copy link

Thank you very much for your code! @fbas-est

@jc0725
Copy link

jc0725 commented Apr 14, 2022

@fbas-est
Thank you very much. I will let you know if I am able to make any improvements or if I come up with any suggestions for improved accuracy on your project.

@fbas-est
Copy link
Author

@Xushuangyin
I suggest to begin by finding a way to render the point cloud into the labeled dataset's color images (3D bounding box won't work). If the target pointcloud (the pointcloud used as label) is not accurate then the network won't work.
If that's the problem, then for every video collected you need to change the transforms in the file transforms.npy so that they have one mesh as reference and then label them with that mesh

@Xushuangyin
Copy link

Xushuangyin commented Apr 15, 2022 via email

@orangeRobot990
Copy link

do you guys resize images during inference ?
i get weird convolution errors :

RuntimeError: Calculated padded input size per channel: (6 x 320). Kernel size: (7 x 7). Kernel size can't be greater than actual input size

RuntimeError: Calculated padded input size per channel: (6 x 287). Kernel size: (7 x 7). Kernel size can't be greater than actual input size

its different each time, so i guess its the image or mask size ? where should i resize ?

@Xushuangyin @fbas-est
thank you

@Xushuangyin
Copy link

Xushuangyin commented Apr 23, 2022 via email

@an99990
Copy link

an99990 commented Apr 23, 2022

hi @Xushuangyin thank you for responding, i actually found the source it was because i was transposing the array incorrectly.

@an99990
Copy link

an99990 commented Apr 23, 2022

right now @Xushuangyin i a having issues with nana values in my training when i removed the /1000 since my depth and other metrics are in meters.

I also reduced the learning rate but i still get nan

@an99990
Copy link

an99990 commented Apr 23, 2022

@Xushuangyin so now i just have giant results. I confirmed that my meshes are in meters so i removed the /1000.

image

Full code here


from importlib.abc import Loader
import torch.utils.data as data
from PIL import Image
import os
import os.path
import errno
import torch
import json
import codecs
import numpy as np
import sys
import torchvision.transforms as transforms
import argparse
import json
import time
import random
import numpy.ma as ma
import copy
import scipy.misc
import scipy.io as scio
import yaml
import cv2


class PoseDataset(data.Dataset):
    def __init__(self, mode, num, add_noise, root, noise_trans, refine):
        self.objlist = [0, 1]
        self.mode = mode

        self.list_rgb = []
        self.list_depth = []
        self.list_label = []
        self.list_obj = []
        self.list_rank = []
        self.meta = {}
        self.pt = {}
        self.root = root
        self.noise_trans = noise_trans
        self.refine = refine
        min = 1000


        item_count = 0
        for item in self.objlist:
            if self.mode == 'train':
                input_file = open('{0}/data/{1}/train.txt'.format(self.root, '%d' % item))
            else:
                input_file = open('{0}/data/{1}/test.txt'.format(self.root, '%d' % item))
            while 1:
                item_count += 1
                input_line = input_file.readline()
                if self.mode == 'test' and item_count % 10 != 0:
                    continue
                if not input_line:
                    break
                if input_line[-1:] == '\n':
                    input_line = input_line[:-1]
                self.list_rgb.append('{0}/data/{1}/rgb/{2}.jpg'.format(self.root, '%d' % item, input_line))
                self.list_depth.append('{0}/data/{1}/depth/{2}.png'.format(self.root, '%d' % item, input_line))
                if self.mode == 'eval':
                    self.list_label.append('{0}/segnet_results/{1}_label/{2}_label.png'.format(self.root, '%d' % item, input_line))
                else:
                    self.list_label.append('{0}/data/{1}/mask/{2}.png'.format(self.root, '%d' % item, input_line))
                
                self.list_obj.append(item)
                self.list_rank.append(int(input_line))

            meta_file = open('{0}/data/{1}/gt.yml'.format(self.root, '%d' % item), 'r')
            self.meta[item] = yaml.safe_load(meta_file)
            self.pt[item] = npy_vtx('{0}/models/{1}.npy'.format(self.root, '%d' % item))

            if len(self.pt[item]) < min:
                min = len(self.pt[item])
            
            print("Object {0} buffer loaded".format(item))

        self.length = len(self.list_rgb)
        self.num_pt_mesh_small = min
        
        # retrieved from /usr/local/zed/settings according to 
        # https://support.stereolabs.com/hc/en-us/articles/360007497173-What-is-the-calibration-file-
        self.cam_cx = 1080.47
        self.cam_cy = 613.322
        self.cam_fx = 1057.8
        self.cam_fy = 1056.61


        self.num = num
        self.add_noise = add_noise
        self.trancolor = transforms.ColorJitter(0.2, 0.2, 0.2, 0.05)
        self.norm = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
        self.border_list = [-1, 40, 80, 120, 160, 200, 240, 280, 320, 360, 400, 440, 480, 520, 560, 600, 640, 680]
        self.num_pt_mesh_large = 500
        # self.num_pt_mesh_small = 100
        self.symmetry_obj_idx = []

    def __getitem__(self, index):
        img = Image.open(self.list_rgb[index])
        ori_img = np.array(img)
        depth = np.array(Image.open(self.list_depth[index]))
        label = np.array(Image.open(self.list_label[index]))


        self.height, self.width, _ = np.shape(img)

        self.xmap = np.array([[j for i in range(self.width)] for j in range(self.height)])
        self.ymap = np.array([[j for i in range(self.width)] for j in range(self.height)])

        # # removing alpha channel
        if np.shape(label)[-1] == 4 :
            label = label[:,:,:-1] 

        obj = self.list_obj[index]
        rank = self.list_rank[index]        

        if obj == 2:
            for i in range(0, len(self.meta[obj][rank])):
                if self.meta[obj][rank][i]['obj_id'] == 2:
                    meta = self.meta[obj][rank][i]
                    break
        else:
            meta = self.meta[obj][rank][0]
        #return array of bools
        mask_depth = ma.getmaskarray(ma.masked_not_equal(depth, 0))
        if self.mode == 'eval':
            mask_label = ma.getmaskarray(ma.masked_equal(label, np.array(255)))
        else:
            mask_label = ma.getmaskarray(ma.masked_equal(label, np.array([255, 255, 255])))[:, :, 0]
        
        mask = mask_label * mask_depth

        if self.add_noise:
            img = self.trancolor(img)

        # remove alpha channel
        img = np.array(img)[:, :, :3]
        img = np.transpose(img, (2, 0, 1))
        img_masked = img

        if self.mode == 'eval':
            rmin, rmax, cmin, cmax = get_bbox(mask_to_bbox(mask_label))
        else: #obj_bb: [minX, minY, widhtOfBbx, heigthOfBbx]
            rmin, rmax, cmin, cmax = get_bbox(meta['obj_bb'])

        img_masked = img_masked[:, rmin:rmax, cmin:cmax]
        # p_img = np.transpose(img_masked, (1, 2, 0))
        # cv2.imwrite('{0}_input.png'.format(index), p_img)

        choose = mask[rmin:rmax, cmin:cmax].flatten().nonzero()[0]
        if len(choose) == 0:
            cc = torch.LongTensor([0])
            return(cc, cc, cc, cc, cc, cc)

        if len(choose) > self.num:
            c_mask = np.zeros(len(choose), dtype=int)
            c_mask[:self.num] = 1
            np.random.shuffle(c_mask)
            choose = choose[c_mask.nonzero()]
        else:
            choose = np.pad(choose, (0, self.num - len(choose)), 'wrap')
        
        depth_masked = depth[rmin:rmax, cmin:cmax].flatten()[choose][:, np.newaxis].astype(np.float32)
        xmap_masked = self.xmap[rmin:rmax, cmin:cmax].flatten()[choose][:, np.newaxis].astype(np.float32)
        ymap_masked = self.ymap[rmin:rmax, cmin:cmax].flatten()[choose][:, np.newaxis].astype(np.float32)
        choose = np.array([choose])

        cam_scale = 1.0
        pt2 = depth_masked / cam_scale
        pt0 = (ymap_masked - self.cam_cx) * pt2 / self.cam_fx
        pt1 = (xmap_masked - self.cam_cy) * pt2 / self.cam_fy
        cloud = np.concatenate((pt0, pt1, pt2), axis=1)
        # cloud = cloud / 1000.0
        cloud = cloud 

        #fw = open('evaluation_result/{0}_cld.xyz'.format(index), 'w')
        #for it in cloud:
        #    fw.write('{0} {1} {2}\n'.format(it[0], it[1], it[2]))
        #fw.close()

        # model_points = self.pt[obj] / 1000.0
        model_points = self.pt[obj]
        dellist = [j for j in range(0, len(model_points))]
        dellist = random.sample(dellist, len(model_points) - self.num_pt_mesh_small)
        model_points = np.delete(model_points, dellist, axis=0)

        target_r = np.resize(np.array(meta['cam_R_m2c']), (3, 3))
        target_t = np.array(meta['cam_t_m2c'])
        add_t = np.array([random.uniform(-self.noise_trans, self.noise_trans) for i in range(3)])

        if self.add_noise:
            cloud = np.add(cloud, add_t)

        #fw = open('evaluation_result/{0}_model_points.xyz'.format(index), 'w')
        #for it in model_points:
        #    fw.write('{0} {1} {2}\n'.format(it[0], it[1], it[2]))
        #fw.close()

        target = np.dot(model_points, target_r.T)
        # if self.add_noise:
        #     target = np.add(target, target_t / 1000.0 + add_t)
        #     out_t = target_t / 1000.0 + add_t
        # else:
        #     target = np.add(target, target_t / 1000.0)
        #     out_t = target_t / 1000.0


        if self.add_noise:
            target = np.add(target, target_t + add_t)
            out_t = target_t + add_t
        else:
            target = np.add(target, target_t)
            out_t = target_t 
        #fw = open('evaluation_result/{0}_tar.xyz'.format(index), 'w')
        #for it in target:
        #    fw.write('{0} {1} {2}\n'.format(it[0], it[1], it[2]))
        #fw.close()

        # np.shape(cloud) (500, 3)
        # np.shape(choose) (1, 500)
        # np.shape(img_masked) (3, 120, 80)
        # np.shape(target) (24, 3)
        # np.shape(model_points) (24, 3)
  
        return torch.from_numpy(cloud.astype(np.float32)), \
               torch.LongTensor(choose.astype(np.int32)), \
               self.norm(torch.from_numpy(img_masked.astype(np.float32))), \
               torch.from_numpy(target.astype(np.float32)), \
               torch.from_numpy(model_points.astype(np.float32)), \
               torch.LongTensor([self.objlist.index(obj)])

    def __len__(self):
        return self.length

    def get_sym_list(self):
        return self.symmetry_obj_idx

    def get_num_points_mesh(self):
        if self.refine:
            return self.num_pt_mesh_large
        else:
            return self.num_pt_mesh_small

border_list = [-1, 40, 80, 120, 160, 200, 240, 280, 320, 360, 400, 440, 480, 520, 560, 600, 640, 680]

def mask_to_bbox(mask):
    mask = mask.astype(np.uint8)
    contours, _ = cv2.findContours(mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)


    x = 0
    y = 0
    w = 0
    h = 0
    for contour in contours:
        tmp_x, tmp_y, tmp_w, tmp_h = cv2.boundingRect(contour)
        if tmp_w * tmp_h > w * h:
            x = tmp_x
            y = tmp_y
            w = tmp_w
            h = tmp_h
    return [x, y, w, h]


def get_bbox(bbox):
    bbx = [bbox[1], bbox[1] + bbox[3], bbox[0], bbox[0] + bbox[2]]
    if bbx[0] < 0:
        bbx[0] = 0
    if bbx[1] >= 540:
        bbx[1] = 539
    if bbx[2] < 0:
        bbx[2] = 0
    if bbx[3] >= 960:
        bbx[3] = 959                
    rmin, rmax, cmin, cmax = bbx[0], bbx[1], bbx[2], bbx[3]
    r_b = rmax - rmin
    for tt in range(len(border_list)):
        if r_b > border_list[tt] and r_b < border_list[tt + 1]:
            r_b = border_list[tt + 1]
            break
    c_b = cmax - cmin
    for tt in range(len(border_list)):
        if c_b > border_list[tt] and c_b < border_list[tt + 1]:
            c_b = border_list[tt + 1]
            break
    center = [int((rmin + rmax) / 2), int((cmin + cmax) / 2)]
    rmin = center[0] - int(r_b / 2)
    rmax = center[0] + int(r_b / 2)
    cmin = center[1] - int(c_b / 2)
    cmax = center[1] + int(c_b / 2)
    if rmin < 0:
        delt = -rmin
        rmin = 0
        rmax += delt
    if cmin < 0:
        delt = -cmin
        cmin = 0
        cmax += delt
    if rmax > 540:
        delt = rmax - 540
        rmax = 540
        rmin -= delt
    if cmax > 960:
        delt = cmax - 960
        cmax = 960
        cmin -= delt
    return rmin, rmax, cmin, cmax


def ply_vtx(path):
    f = open(path)
    assert f.readline().strip() == "ply"
    f.readline()
    f.readline()
    N = int(f.readline().split()[-1])
    while f.readline().strip() != "end_header":
        continue
    pts = []
    for _ in range(N):
        pts.append(np.float32(f.readline().split()[:3]))
    return np.array(pts)

def npy_vtx(path):
    return np.load(path,allow_pickle=True)

Thank you for your help @Xushuangyin

@orangeRobot990
Copy link

Hey @fbas-est , I'm having issues with my training as well. Did you notice anything weird in your avg distance when you removed /1000 ? Did you remove it anywhere else than dataset.py ?

Thank you @Xushuangyin and @an99990 i solve it with the array. Now i have issues with training and gettingd nans too because my stuff are in meters ..
Thanks for any help

@Xushuangyin
Copy link

cam_scale = 0.001
pt2 = depth_masked * cam_scale
You should change these two lines of code like this

@Xushuangyin
Copy link

Because of my cam_ Scale = 0.001, so the code I modified is like this
@an99990 @orangeRobot990
148d312f447e3d5fd5762d2a20ce6b2

@an99990
Copy link

an99990 commented Apr 28, 2022

thank you so much @Xushuangyin , i was able to finally have results using cam_scale/0.001 and without dividing/1000 in getittem. I will start another training with the correct values. thank you so much !

@jc0725
Copy link

jc0725 commented May 11, 2022

Hello. May I ask how any of you were able to train your custom dataset on SegNet?
It seems like the provided code is for YCB format and not Linemod format.

My guess was that I would have to run the SegNet train.py for each of the individual objects for Linemod.

@Xushuangyin
Copy link

Xushuangyin commented May 11, 2022 via email

@jc0725
Copy link

jc0725 commented May 11, 2022

Thank you for your response.
Do you mean that you didn't train SegNet?

@Xushuangyin
Copy link

I trained 300 pictures of a single object using Seg Net. @jc0725

@jc0725
Copy link

jc0725 commented May 12, 2022

@Xushuangyin
Thank you for clarifying!
Also, were you able to successfully visualize the bounding box using the visualize.py code provided by @fbas-est ?

@fbas-est
Copy link
Author

@an99990
Hello I saw that you are using a ZED camera and from the intrinsic array I assume you didn't train the model at 480p resolution images.
Did you successfully trained the model in higher resolution?

@an99990
Copy link

an99990 commented May 12, 2022

@fbas-est I generated image from Unity. The image are 560 x 940 , if I remember correctly. My poses do not seem to be quite correct tho. Heres an image during inference. I might create a dataset with images from the ZED camera. The camera in Unity didnt have the same camera intrinsic as the ZED, so that might be why my results arent precised. I also never reached the refinement step during training.

image

@fbas-est
Copy link
Author

fbas-est commented May 12, 2022

@an99990 Yes that is probably the issue, ZED camera comes with 4 build in calibrations with the smallest being for 672x376 images. If you train the network with synthetic data I guess you have to replicate the images that your camera captures.

May I ask how you created the synthetic dataset ?

@an99990
Copy link

an99990 commented May 12, 2022

i have a Unity project to create dataset with linemode format. I cant share it tho since it is not the companies stuff :/

@jc0725
Copy link

jc0725 commented May 16, 2022

May I ask how any of you were able to output and save the vanilla_segmentation label png files?

@XLXIAOLONG
Copy link

@an99990 Hello. i make a linemod dataset by Objectdatasettools. in the eval_linemod.py, it's success rate is 0.9285. but when i visualize it, the point seems to be in the wrong place. Can you give me some advice? Thank you in advance!
2022-05-17 21-55-54屏幕截图

@an99990
Copy link

an99990 commented May 17, 2022

Have you payed with the cam_scale ? i had to change it to 1000, try with different values, it seems that its bigger than your object

@XLXIAOLONG
Copy link

Have you payed with the cam_scale ? i had to change it to 1000, try with different values, it seems that its bigger than your object

@an99990 Thanks for your reply. I make the dataset by realsense. I change the cam_scale to it's own value, like this
cam_scale = 0.0002500000118743628
pt2 = depth_masked * cam_scale
pt0 = (ymap_masked - self.cam_cx) * pt2 / self.cam_fx
pt1 = (xmap_masked - self.cam_cy) * pt2 / self.cam_fy
cloud = np.concatenate((pt0, pt1, pt2), axis=1)
# cloud = cloud / 1000.0
# print(cloud.max())
cloud = cloud

0.0002500000118743628 is the depth scale of real camera.

@Windson9
Copy link

Windson9 commented Apr 1, 2023

Hi @Xushuangyin and @an99990. I hope you are doing well. I am trying to train this model on my custom dataset. Can you please share if you were able to successfully train the model? Can you share the results if possible? Thanks.

@nanxiaoyixuan
Copy link

@jc0725 Hi, I also trained myself to build linemod datasets, and when I debug, I found that 'input_file = open('{0}/data/{1}/train.txt'.format(self.root, '%02d' % item)) 'error' No such file or Directory: '/ datasets/linemod/linemod_preprocessed/data / 01 / train. txt', 'cause I won't be able to view the subsequent code to run through the debug.
But through the command 'bash. / experiments/scripts/train_linemod sh' can be trained, not appear this kind of error, excuse me you had met this kind of situation? Is there any solution?
Thank you very much for your reply.

@nanxiaoyixuan
Copy link

@fbas-est Hi, I also trained myself to build linemod datasets, and when I debug, I found that 'input_file = open('{0}/data/{1}/train.txt'.format(self.root, '%02d' % item)) 'error' No such file or Directory: '/ datasets/linemod/linemod_preprocessed/data / 01 / train. txt', 'cause I won't be able to view the subsequent code to run through the debug.
But through the command 'bash. / experiments/scripts/train_linemod sh' can be trained, not appear this kind of error, excuse me you had met this kind of situation? Is there any solution?
Thank you very much for your reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants