Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss chages little #31

Open
XiaSunny opened this issue Oct 11, 2018 · 5 comments
Open

Loss chages little #31

XiaSunny opened this issue Oct 11, 2018 · 5 comments

Comments

@XiaSunny
Copy link

When i train my own 2047 training sets,its loss is around 4.The problem is as follows:

training loss at iteration 54405: 3.7331526279449463
training loss at iteration 54410: 3.6703977584838867
training loss at iteration 54415: 3.792213201522827
training loss at iteration 54420: 3.184262275695801
training loss at iteration 54425: 2.5833559036254883
training loss at iteration 54430: 3.983672618865967
training loss at iteration 54435: 2.3275046348571777
training loss at iteration 54440: 5.39171838760376
training loss at iteration 54445: 4.39638090133667
training loss at iteration 54450: 4.163412094116211
training loss at iteration 54455: 3.498041868209839
training loss at iteration 54460: 4.481285572052002
training loss at iteration 54465: 4.839260101318359
training loss at iteration 54470: 3.994492769241333
training loss at iteration 54475: 3.5584001541137695
training loss at iteration 54480: 4.685251712799072
training loss at iteration 54485: 4.015157222747803
training loss at iteration 54490: 4.063553810119629
training loss at iteration 54495: 3.18803334236145
training loss at iteration 54500: 4.06464147567749
validation loss at iteration 54500: 3.384662628173828
training loss at iteration 54505: 4.854454517364502
training loss at iteration 54510: 4.531534671783447
training loss at iteration 54515: 3.9393210411071777
training loss at iteration 54520: 3.8535616397857666
training loss at iteration 54525: 4.103610992431641
training loss at iteration 54530: 5.204475402832031
training loss at iteration 54535: 4.274611949920654
training loss at iteration 54540: 3.4196114540100098
training loss at iteration 54545: 3.8542287349700928
training loss at iteration 54550: 2.900420904159546
training loss at iteration 54555: 4.218133926391602
training loss at iteration 54560: 3.702637195587158
training loss at iteration 54565: 4.377786159515381
training loss at iteration 54570: 4.17509126663208
training loss at iteration 54575: 4.399285316467285
training loss at iteration 54580: 4.801181316375732
training loss at iteration 54585: 3.8167858123779297
training loss at iteration 54590: 3.7509422302246094
training loss at iteration 54595: 3.3366339206695557
training loss at iteration 54600: 4.960026264190674
validation loss at iteration 54600: 4.456686496734619
training loss at iteration 54605: 7.162835597991943
training loss at iteration 54610: 4.2326860427856445
training loss at iteration 54615: 5.217791557312012
training loss at iteration 54620: 5.991508483886719
training loss at iteration 54625: 3.405985116958618
training loss at iteration 54630: 8.021293640136719
training loss at iteration 54635: 4.542353630065918
training loss at iteration 54640: 5.705121040344238
47%|██████████████▍ | 39641/85000 [15:58:31<18:16:47, 1.45s/it]

Thank you! @heilaw

@heilaw
Copy link
Collaborator

heilaw commented Oct 11, 2018

When I trained CornerNet on MS COCO, I got similar loss values. Have you tried running the evaluation script to see the actual predictions?

@XiaSunny
Copy link
Author

I have tried running the evaluation script ,the actual predictions is very poor.Only I chaged test/coco.py : keep_inds = (top_bboxes[image_id][j][:, -1] > 0.5) to keep_inds = (top_bboxes[image_id][j][:, -1] > 0.3), the rectangle of some images appear.
What should i do to get good prediction? thank you ! @heilaw

@heilaw
Copy link
Collaborator

heilaw commented Oct 15, 2018

Can you give more details about the training? For example, size of your dataset, batch size, number of GPUs etc. You may also consider fine-tuning our model on your dataset. #23 (comment)

@XiaSunny
Copy link
Author

I have 2047 training set, a GTX 1080 Ti GPU with 12GB memory.
My json file is as follws:
{
"system": {
"dataset": "MSCOCO",
"batch_size": 4,
"sampling_function": "kp_detection",

    "train_split": "trainval",
    "val_split": "minival",

    "learning_rate": 0.01,
    "decay_rate": 10,

    "val_iter": 100,

    "opt_algo": "adam",
    "prefetch_size": 5,

    "max_iter": 100000,
    "stepsize": 50000,
    "snapshot": 5000,

    "chunk_sizes": [4],

    "data_dir": "./data"
},

"db": {
    "rand_scale_min": 0.6,
    "rand_scale_max": 1.4,
    "rand_scale_step": 0.1,
    "rand_scales": null,

    "rand_crop": true,
    "rand_color": true,

    "border": 128,
    "gaussian_bump": true,

    "input_size": [511, 511],
    "output_sizes": [[128, 128]],

    "test_scales": [1],

    "top_k": 100,
    "categories": 1,
    "ae_threshold": 0.5,
    "nms_threshold": 0.5,

    "max_per_image": 100
}

}

Thank you ! @heilaw @ywchao @anewell @jiadeng

@ckqsars
Copy link

ckqsars commented Dec 18, 2018

@XiaSunny @heilaw also have the same problem. but the dataset I use is coco. Because I only could train on the 1080ti GPU, I set the batch size is 2. However the loss is around 18 the focal_loss is around 36.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants