Issue about the negative data and label #8

ypflll · 2017-06-03T12:53:00Z

Hi, julian,
I am trying to build a nodule detector based on you job, and thanks very much for your sharing.
May I ask some questions:

You use several types of training set:
labels from lidc, v2 from luna16, luna16 false positive, ndsb and non-lung tissue edge.
So, on the train stage, except the non-lung tissue edge, the others are all positive sample? and the label for the positive sample is YES(to say if the cube contains a nodule) for positive samples, and NO for non-lung tissue edge, right?
Another question is: When predicting, a 646464 cube is get to the net, the result is if the cub contains a nodule and the probability?
Any information will be welcomed!

juliandewit · 2017-06-04T19:29:49Z

Hello.. candidatesv2 are also negative examples. (there are around 400.000 negatives there)
Basically that is the most important source of negatives. The edge examples only let the network know that non-lung-tissue is also not a lung nodule.
Another (small) source of negatives are the false positives that were predicted after one round of training on LUNA16.

The networks learns 2 things at once
1: Lung nodule y/n.. (non lung tissue should always be n)
2: Malignancy (0 if not a lung nodule, 0.1-25 if lung nodule).

I train/predict 32x32x32 cubes. The prediction is nodule Y/N.
If Yes then I also look at the malignancy..
Malignancy is the only thing I work with for the final prediction.

I hope that makes things clearer.
It's quite a complex solution with all the different label sources.

ypflll · 2017-06-05T03:21:30Z

Quite clear and really a complicated and refined work..

But the question is where is candidate v2 from?
In step1_preprocess_luna16.py, seems that you generate your negative samples from two files: lidc.xml and annotation_excluded.csv（it's candidate.csv?）.
So, where they are from?
If I don't have such files in my case, I should cut lung-tissue cubes randomly (does not contain a nodule) manually as negative samples?

juliandewit · 2017-06-05T15:13:59Z

In the resources folder there is a link to "resources.rar" in the readme.md.
This file contains all the data you need and even more.

In the resources.rar there is a folder "luna16_annotations".
In that folder there is candidatesv2.csv.
This file is directly taken from the LUNA16 competition.
Look here for more:
LUNA16 data

ypflll · 2017-06-06T08:33:28Z

Got it.
I am in Tianchi (a competition held by Alibaba, China). In my case, only nodules information were given.
Seems that I need train a 3d unet to generate false positive samples firstly.

juliandewit · 2017-06-07T18:27:35Z

Hi I looked at the competition..
My chinese is not too good :S

I do think this approach can be translated to that competition since the #6 team of the datascience bowl is #1 now at your competition.

Good luck!

ypflll · 2017-06-10T04:23:47Z

Tianchi's english version lacks important information-_-||

4th place in kaggle is now the first place in Tianchi.
So we need to do more.
Thanks.

ypflll closed this as completed Jun 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue about the negative data and label #8

Issue about the negative data and label #8

ypflll commented Jun 3, 2017

juliandewit commented Jun 4, 2017

ypflll commented Jun 5, 2017

juliandewit commented Jun 5, 2017 •

edited

ypflll commented Jun 6, 2017

juliandewit commented Jun 7, 2017

ypflll commented Jun 10, 2017

Issue about the negative data and label #8

Issue about the negative data and label #8

Comments

ypflll commented Jun 3, 2017

juliandewit commented Jun 4, 2017

ypflll commented Jun 5, 2017

juliandewit commented Jun 5, 2017 • edited

ypflll commented Jun 6, 2017

juliandewit commented Jun 7, 2017

ypflll commented Jun 10, 2017

juliandewit commented Jun 5, 2017 •

edited