-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Summary of classification model performance on Task V576 DNAse #6
Comments
These loss curves make no sense. Something is wrong with the training
procedure or model spec.
…-Anshul.
On Wed, May 16, 2018 at 3:05 PM, annashcherbina ***@***.***> wrote:
Google spreadsheet with all information:
https://docs.google.com/spreadsheets/d/1gfbolLoB1o_
oRHjGdV6ht5eTIjFHtTjIivlxsLugvp4/edit?usp=sharing
Performance matrix of models ## :
Yellow highlighting indicates performance on training dataset. Absence of
yellow highlighting indicates performance on test dataset.
Blue text indicates performance on "version 1" of the data labels (i.e.
negatives from peaks present in other CRC samples, but absent in current
CRC sample).
Green text indicates performance on "version 2" of the data labels (i.e.
negatives from ENCODE DNAse summits minus colon-specific data)
[image: image]
<https://user-images.githubusercontent.com/5261545/40138168-fc63dfcc-5900-11e8-8d5d-a1e5aca87c9c.png>
Loss curves for most promising models in the data matrix:
[image: image]
<https://user-images.githubusercontent.com/5261545/40137489-65c5e944-58ff-11e8-8c12-5422f64b31ff.png>
Interestingly, it appears the baseline models are actually outperforming
the models with GC-balanced negative sets, Dinucleotide-balanced negative
sets, reverse complement values added.
Next steps: Try regression models
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#6>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AAI7Ed-nh1jlLbOPshgv2sZIQDs3wAZxks5tzHhxgaJpZM4UB3jb>
.
|
Yes, it seems to basically be overfitting to the training data. I will move away from Basset for this dataset. |
I'm pretty sure its not the architecture. There seems to be something more
fundamental going wrong here (some initialization or learning rate or one
of those silent row/column transposition errors). Can you triple check how
u are constructing the training and test sets. The model is either not
learning in some of the plots or its just overfitting like crazy. This
dataset is really not that different from any other DNase dataset. There is
no reason why an architecture that works well almost universally seems to
be failing miserably here.
We should ask Surag to train a model from scratch on your data to see what
he gets so we have an independent sanity check.
…-Anshul.
On Wed, May 16, 2018 at 9:15 PM, annashcherbina ***@***.***> wrote:
Yes, it seems to basically be overfitting to the training data.
Hence I am hoping that some of the other architectures (i.e. those from
Surag & Jacob) will be less prone to the overfitting problem.
I will move away from Basset for this dataset.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAI7EayzuKZHbYRDIDyKYPkW6ZopHnsqks5tzM89gaJpZM4UB3jb>
.
|
I've triple-checked by running the same code on DMSO & het data and reproducing the top-scoring performance on those models with this workflow & the basic bassett architecture. I can use Surag's code to train the model, that is the next thing on my to-do list. |
Can u post the learning curves for those datasets.
…On Wed, May 16, 2018 at 9:26 PM, annashcherbina ***@***.***> wrote:
I've triple-checked by running the same code on DMSO & het data and
reproducing the top-scoring performance on those models with this workflow
& the basic bassett architecture.
hence, I'm inclined to think it's not a bug.
I can use Surag's code to train the model, that is the next thing on my
to-do list.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAI7ER-Rk4H0ZGBhpYkKr76b_2oTdD-Eks5tzNHOgaJpZM4UB3jb>
.
|
the weird thing is that the baseline performance is actually quite good. And the dataset for the baseline model was generated with the same approach to determining negatives as was the het data & DMSO data -- those performance values actually match quite closely. The new negative set is what's primarily leading to huge drop in auPRC & recallAtFDR50. |
It can't be. The original negatives were from other CRC samples, whereas
the new negatives are from totally different cell types. There is no way it
can be harder than the original set because the negatives will contain
totally different motif patterns. Also dinuc and GC matched negatives
should be trivial to classify against. You should be getting very high
auPRCs for those runs. Thats why I am convinced there is something
fundamentally wrong - its not the architecture for sure.
…On Wed, May 16, 2018 at 9:28 PM, annashcherbina ***@***.***> wrote:
the weird thing is not that the baseline performance is actually quite
good. And the dataset for the baseline model was generated with the same
approach to determining negatives as was the het data & DMSO data -- those
performance values actually match quite closely.
The new negative set is what's primarily leading to huge drop in auPRC &
recallAtFDR50.
Is it possible, the new negative set is actually more difficult to learn
than the old one?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAI7ERHvjyQExilZsZDKURXvOBL0gWryks5tzNJLgaJpZM4UB3jb>
.
|
Lets simplify this. Train the following models.
For a single task
- positives are centered at DNase peaks from the task. Include reverse
complements.
- Negative set 1a: Balanced negative set (same number as positives)
consisting of regions sampled from the genome which are GC-matched to the
positive. Let training and test be balanced. No need to oversample
positives or weight the loss in any way.
- Negative set 1b: Same as 1a but this time negative set should be 5x
positive set. Let training and test have the same imbalance. Here u should
train with random sampling and also with oversampling positives in each
minbatch and compare.
- Negative set 2a and 2b: Analogous to 1a and 1b but dinuc matched
negatives sampled from the genome instead of GC matched.
1a and 2a should be trivial to beat. 1b and 2b should be a little harder
but still easy to beat.
…On Wed, May 16, 2018 at 9:32 PM, Anshul Kundaje ***@***.***> wrote:
It can't be. The original negatives were from other CRC samples, whereas
the new negatives are from totally different cell types. There is no way it
can be harder than the original set because the negatives will contain
totally different motif patterns. Also dinuc and GC matched negatives
should be trivial to classify against. You should be getting very high
auPRCs for those runs. Thats why I am convinced there is something
fundamentally wrong - its not the architecture for sure.
On Wed, May 16, 2018 at 9:28 PM, annashcherbina ***@***.***>
wrote:
> the weird thing is not that the baseline performance is actually quite
> good. And the dataset for the baseline model was generated with the same
> approach to determining negatives as was the het data & DMSO data -- those
> performance values actually match quite closely.
>
> The new negative set is what's primarily leading to huge drop in auPRC &
> recallAtFDR50.
> Is it possible, the new negative set is actually more difficult to learn
> than the old one?
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#6 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAI7ERHvjyQExilZsZDKURXvOBL0gWryks5tzNJLgaJpZM4UB3jb>
> .
>
|
Yes, but it's the best performance we've gotten on these 2 datasets... |
Lets debug using the GC and dinuc matched negative experiments as I
suggested before. If the validation curves on those are not well behaved we
know for sure there is something wrong.
…On Wed, May 16, 2018 at 10:02 PM, Anshul Kundaje ***@***.***> wrote:
Its the same problem. The validation loss curves are really bad.
On Wed, May 16, 2018 at 9:59 PM, annashcherbina ***@***.***>
wrote:
> Here's what I'm getting for het model losses with the same code & basic
> basset.
> I used ENCODE initializations for the model and it was a multi-tasked
> model:
> [image: image]
> <https://user-images.githubusercontent.com/5261545/40152459-217cb4d0-593a-11e8-9950-7afaa53a9394.png>
>
> And for DMSO with same code & basic bassett.
> I also used ENCODE initializations and multi-tasked model:
>
> [image: image]
> <https://user-images.githubusercontent.com/5261545/40152592-b189babe-593a-11e8-9f0a-14c960a97df1.png>
>
> The average performance values across tasks were:
> Hets
>
> [image: image]
> <https://user-images.githubusercontent.com/5261545/40152711-2404ffea-593b-11e8-890b-750ca9ec5b00.png>
> [image: image]
> <https://user-images.githubusercontent.com/5261545/40152724-352b18f4-593b-11e8-899a-f9780c2e454a.png>
> DMSO
>
> [image: image]
> <https://user-images.githubusercontent.com/5261545/40152641-f0a46ff0-593a-11e8-93a4-c1fc03a02654.png>
> [image: image]
> <https://user-images.githubusercontent.com/5261545/40152690-12852d26-593b-11e8-9273-2c4e910edd5e.png>
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#6 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAI7EWlid8Ao-KPkpV5pawTFhxuIKbTHks5tzNltgaJpZM4UB3jb>
> .
>
|
By best performance, I mean best performance from all hyperparam searches & applications of bassett to the data. This is why I am fan of moving on to other architectures. |
I can assure you the issue is not architecture here. Its something more
fundamental. There is a bug somewhere in the model spec or evaluation code
or something else. Loss curves should never look like this. It means there
is something fundamentally wrong.
…On Wed, May 16, 2018 at 10:05 PM, annashcherbina ***@***.***> wrote:
By best performance, I mean best performance from all hyperparam searches
& applications of bassett to the data. This is why I am fan of moving on to
other architectures.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAI7EeszuIv7GJJ3qSzWtbBC5Bfdvvk6ks5tzNrQgaJpZM4UB3jb>
.
|
This would explain why the performance (auPRC/recallAtFDR50 ) was the best we've attained to date, with the loss curves showing no improvement beyond the first epoch. |
This looks like a classic case of learning rate is too high.
http://cs231n.github.io/neural-networks-3/
If the network has a strong initialization, you should be using a low
learning rate to fine tune the model.
Also do you use early stopping and if so what is the metric you are using
for early stopping? Is the performance you are reporting at the end of all
epochs in the plots or is it based on the best epoch (e.g. end of epoch 1).
…On Thu, May 17, 2018 at 10:06 PM, annashcherbina ***@***.***> wrote:
This would explain why the performance (auPRC/recallAtFDR50 ) was the best
we've attained to date, with the loss curves showing no improvement beyond
the first epoch.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAI7EUcyZFTvf0WHsPn7WrtrrG2TldDxks5tziy9gaJpZM4UB3jb>
.
|
I thought that the model achieving auPRC ~ 1 on all four toy datasets is indication that everything is working fine. So I guess I'm confused at what the problem we are trying to solve is? What do we expect the loss curves to look like? I'm not sure I understand why the ones we are observing are different than what usually shows up in the literature? Early stopping is triggered by no drop in validation loss for 5 consecutive epochs. The performance I am reporting is based on early stopping (i.e. generally after the first full epoch, i.e. a pass through 700,000 training examples ). Learning rate used for the "toy" models was 0.001. I have used learning rate values ranging from 0.00001 to 0.01, with no major change in auPRC for the GECCO datasets. I can post performance values for those, but they are within a few percent of the ones above. |
For example, all params kept constant, but learning rate is varied: Graph on the left is for LR = 0.001 (This is on the full dataset, not the toy datasets). |
Ok great. That looks good.
Now you can try switching to the residual architecture from Surag. Mahfuza
said it gave her a boost on her data as well.
Then we can switch to regression.
…-A
On Mon, May 21, 2018 at 12:03 PM, annashcherbina ***@***.***> wrote:
Updated learning curves for the negative datasets. The blue box on the
learning curve graphs indicates the stopping epoch for which "Trained"
accuracy is reported:
GC-balanced, 1 neg: 1 pos
[image: image]
<https://user-images.githubusercontent.com/5261545/40324265-dd0dfe56-5cec-11e8-80b1-2022024cf3de.png>
GC-balanced, 5 neg: 5 pos
I tried lr = 0.001 and lr=0.0001 for this negative set, though the curve
for the lower lr looks more like what we'd like to see, the auPRC is higher
for the higher lr.
[image: image]
<https://user-images.githubusercontent.com/5261545/40324382-351fa9e6-5ced-11e8-9157-8524b8f11630.png>
Dinucleotide-balanced, 1 neg: 1 pos
[image: image]
<https://user-images.githubusercontent.com/5261545/40324542-a7cc12e0-5ced-11e8-88a3-405d4c8f78c1.png>
Dinucleotide-balanced, 5 neg: 1 pos
[image: image]
<https://user-images.githubusercontent.com/5261545/40324706-28f6c00e-5cee-11e8-9fc5-54c73c621a67.png>
Negatives from (ENCODE - CRC cell types), 10 neg: 1 pos
[image: image]
<https://user-images.githubusercontent.com/5261545/40324797-7f7dc350-5cee-11e8-84ee-52da40f509e7.png>
Negatives for V576 DNAse from Scacheri sample matrix, 10 neg: 1 pos
[image: image]
<https://user-images.githubusercontent.com/5261545/40324902-db9cf25a-5cee-11e8-99da-c65cd865732e.png>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAI7EXwCXnHPKjZz3BkLMT7PIupB1nMoks5t0w91gaJpZM4UB3jb>
.
|
Google spreadsheet with all information:
https://docs.google.com/spreadsheets/d/1gfbolLoB1o_oRHjGdV6ht5eTIjFHtTjIivlxsLugvp4/edit?usp=sharing
Performance matrix of models ## :
Yellow highlighting indicates performance on training dataset. Absence of yellow highlighting indicates performance on test dataset.
Blue text indicates performance on "version 1" of the data labels (i.e. negatives from peaks present in other CRC samples, but absent in current CRC sample).
Green text indicates performance on "version 2" of the data labels (i.e. negatives from ENCODE DNAse summits minus colon-specific data)
Loss curves for most promising models in the data matrix:
Interestingly, it appears the baseline models are actually outperforming the models with GC-balanced negative sets, Dinucleotide-balanced negative sets, reverse complement values added.
Next steps: Try regression models
The text was updated successfully, but these errors were encountered: