Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue about the negative data and label #8

Closed
ypflll opened this issue Jun 3, 2017 · 6 comments
Closed

Issue about the negative data and label #8

ypflll opened this issue Jun 3, 2017 · 6 comments

Comments

@ypflll
Copy link

ypflll commented Jun 3, 2017

Hi, julian,
I am trying to build a nodule detector based on you job, and thanks very much for your sharing.
May I ask some questions:

  1. You use several types of training set:
    labels from lidc, v2 from luna16, luna16 false positive, ndsb and non-lung tissue edge.
    So, on the train stage, except the non-lung tissue edge, the others are all positive sample? and the label for the positive sample is YES(to say if the cube contains a nodule) for positive samples, and NO for non-lung tissue edge, right?

  2. Another question is: When predicting, a 646464 cube is get to the net, the result is if the cub contains a nodule and the probability?
    Any information will be welcomed!

@juliandewit
Copy link
Owner

Hello.. candidatesv2 are also negative examples. (there are around 400.000 negatives there)
Basically that is the most important source of negatives. The edge examples only let the network know that non-lung-tissue is also not a lung nodule.
Another (small) source of negatives are the false positives that were predicted after one round of training on LUNA16.

The networks learns 2 things at once
1: Lung nodule y/n.. (non lung tissue should always be n)
2: Malignancy (0 if not a lung nodule, 0.1-25 if lung nodule).

I train/predict 32x32x32 cubes. The prediction is nodule Y/N.
If Yes then I also look at the malignancy..
Malignancy is the only thing I work with for the final prediction.

I hope that makes things clearer.
It's quite a complex solution with all the different label sources.

@ypflll
Copy link
Author

ypflll commented Jun 5, 2017

Quite clear and really a complicated and refined work..

But the question is where is candidate v2 from?
In step1_preprocess_luna16.py, seems that you generate your negative samples from two files: lidc.xml and annotation_excluded.csv(it's candidate.csv?).
So, where they are from?
If I don't have such files in my case, I should cut lung-tissue cubes randomly (does not contain a nodule) manually as negative samples?

@juliandewit
Copy link
Owner

juliandewit commented Jun 5, 2017

In the resources folder there is a link to "resources.rar" in the readme.md.
This file contains all the data you need and even more.

In the resources.rar there is a folder "luna16_annotations".
In that folder there is candidatesv2.csv.
This file is directly taken from the LUNA16 competition.
Look here for more:
LUNA16 data

@ypflll
Copy link
Author

ypflll commented Jun 6, 2017

Got it.
I am in Tianchi (a competition held by Alibaba, China). In my case, only nodules information were given.
Seems that I need train a 3d unet to generate false positive samples firstly.

@juliandewit
Copy link
Owner

Hi I looked at the competition..
My chinese is not too good :S

I do think this approach can be translated to that competition since the #6 team of the datascience bowl is #1 now at your competition.

Good luck!

@ypflll
Copy link
Author

ypflll commented Jun 10, 2017

Tianchi's english version lacks important information-_-||

4th place in kaggle is now the first place in Tianchi.
So we need to do more.
Thanks.

@ypflll ypflll closed this as completed Jun 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants