Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multivariate-Tweedie for age/length compositional likelihood #266

Open
6 of 11 tasks
James-Thorson-NOAA opened this issue Mar 7, 2022 · 34 comments
Open
6 of 11 tasks
Assignees
Labels
composition Epic: Data weighting improvement Group of issues/user stories related to improving data weighting in progress This is being worked on in a branch statistics related to logL wishlist request new feature; bigger than revision; OK to remove after adding to Milestone
Milestone

Comments

@James-Thorson-NOAA
Copy link

James-Thorson-NOAA commented Mar 7, 2022

I'm leading a paper with Tim Miller and Brian Stock, introducing a new "multivariate-Tweedie" likelihood for fitting age/length composition data. It is essentially a more flexible alternative to the Dirichlet-multinomial, and appears to have some useful pros and cons relative to the DM. It internally relies upon a dtweedie likeliood, which Johnoel is adding the ADMB-13, so it should be feasible to implement in SS3 by this summer.

Does anyone have sufficient time and interest to work with me to get it implemented (both in logical code, and with a thoughtful user-interface) in SS3?

[Edit: task list below added by Ian on March 24.]
To-do list for MV Tweedie likelihood based on comments above:

@k-doering-NOAA k-doering-NOAA added help wanted Attention Rick - please look misc. internal calc wishlist request new feature; bigger than revision; OK to remove after adding to Milestone labels Mar 7, 2022
@iantaylor-NOAA
Copy link
Contributor

I don't think I can do this without help from others on the SS3 team, but am happy to be involved.

@zandjyo
Copy link

zandjyo commented Mar 23, 2022

I am all in support of anything that gets us away from the art involved in choosing input sample sizes and data weighting. Probably not much help on the logical code side of things, but could put some thought into user-interface and diagnostics once I get my head around what you are doing.

@Rick-Methot-NOAA
Copy link
Collaborator

Jim,
The interface should be a straightforward extension of current interface for selecting DM. I can assure that will happen.

@James-Thorson-NOAA
Copy link
Author

yeah, it'll have two parameters instead of one ... if someone could make a dev branch or something with those named parameters, I could do a PR with the added code. Or is there some other preferred way to proceed ... perhaps instead I could just providing the code-snippet for the likelihood in this Issue thread?

@Rick-Methot-NOAA
Copy link
Collaborator

branch MV-Tweedie-complike has been created and you can do PR to it.
search in read_data and in SS_objfun for text = MV_Tweedie to get started adding code

@Rick-Methot-NOAA
Copy link
Collaborator

I did not intend to fast-track this. I just wanted there to be a branch on which development could happen. Only git collision I foresee is with the request to rename the D-M parameters.

@James-Thorson-NOAA
Copy link
Author

By read_data I assume you mean "SS_readdata_330.tpl" ... I just opened this and can't easily make sense of it. So a few questions:

  1. Would it still be helpful if I can make the changes in SS_objfun but not SS_readdata_330?
  2. If yes to Q1, could someone provide me with syntax to extract the parameters from the various inputs, e.g., analogous to how the DM does is using lines:
        if(Comp_Err_L(f)==1) dirichlet_Parm=mfexp(selparm(Comp_Err_Parm_Start+Comp_Err_L2(f)))*nsamp_l(f,i);
        if(Comp_Err_L(f)==2) dirichlet_Parm=mfexp(selparm(Comp_Err_Parm_Start+Comp_Err_L2(f)));

...?

@Rick-Methot-NOAA
Copy link
Collaborator

Jim,
If you can start by working in SS_objfun.tpl. When you push that, I can see what parameters need to be created in SS_readdata330.tpl, and then named in ss_readcontrol330.tpl. Everything else should be handled automatically (reporting and write_ssnew).

@iantaylor-NOAA
Copy link
Contributor

The potential conflict noted by Rick is with issue #185 where there's a proposal to add information about which fleets use the parameter to the label, such as "ln(DM_theta)_Age_P7(2-3-4)" instead of the status-quo "ln(DM_theta)_7" for the 7th DM parameter. However, that change is low priority so can wait until MV-Tweedie is developed in parallel with some similar naming convention.

Is it possible, and would it ever be a good idea, to share one of these weighting parameters among a mix of length and age data? If so, we should dial back the proposal to add "Age" to the label.

@James-Thorson-NOAA
Copy link
Author

OK, so I just did a PR ... there's plenty still to do, and the likelihood calculation no doubt would benefit from some extra eyes from people who better understand the standard pitfalls. In particular dtweedie is probably not vectorized so I added a loop across the vector of comps. Also, it obviously needs help extracting parameters correctly (two for mvtweedie compared with one for DM).

https://github.com/nmfs-stock-synthesis/stock-synthesis/pull/278/files

@iantaylor-NOAA
Copy link
Contributor

iantaylor-NOAA commented Mar 24, 2022

Jim, nice job getting this together. In addition to the additional work that Jim has noted, we will need to install the dev branch of ADMB to get the dtweedie function https://github.com/admb-project/admb/blob/dev/src/linad99/dtweedie.cpp (which indeed seems to not be vectorized).

I'm too tired from a week of workshop to do any real work on this but I'll make a to-do list below to keep track of the next steps.

@k-doering-NOAA, it's probably a good idea to be testing ADMB 13 prior to it's release for 3.30.19 to make sure there are no compile errors or changes in model results. Would it be possible to add a github action to do so?

[edit: task list moved to top comment on this issue so it triggers a counter and also doesn't get buried]

@Rick-Methot-NOAA
Copy link
Collaborator

Nice work Ian and Jim.
I will work on the readdata and readcontrol aspects.
For #219 , I intend to look into modularizing this convoluted section of code that has lots of repeated elements.

@iantaylor-NOAA iantaylor-NOAA added this to the 3.30.20 milestone Mar 24, 2022
@iantaylor-NOAA
Copy link
Contributor

Rick, that sounds great.

Question on the workflow for a change like this (which I just assigned to the 3.30.20 milestone): there's a pull request associated with this issue (#277 from MV-Tweedie-complike into main). We could either leave that open until all this is done, or just create the PR once everything is in place.

@k-doering-NOAA, does leaving the PR open trigger more checks, or do we get those for the branch regardless?
An open PR also gives people notifications when changes are pushed to that branch, which could be either good or bad depending on how many notifications we all get.

@Rick-Methot-NOAA
Copy link
Collaborator

Perhaps I erred in doing it this way. I thought that use of a PR was the right way to initiate collaboration for a new feature branch. I did it mostly to give Jim something to work with and did not intend to push (sic) this issue to top of the priority stack for everyone, especially because we do not even have tweedie in our ADMB version yet.

@Rick-Methot-NOAA
Copy link
Collaborator

Rick-Methot-NOAA commented Mar 24, 2022

Didn't realize that Jim was not yet part of the SS team, so just invited him into NOAA_contributors.
Looked at the parameter I/O. This will be a bit complicated because current logic of counting parameters is completely dependent on one parameter and does not generalize easily to 2 parameters. so, this will take a bit of time. And when doing so, anticipate how to include generalized sizecomp DM and MVT options.

@iantaylor-NOAA
Copy link
Contributor

Good solution to add Jim to the contributors who can push changes within the organization.
I think we can probably close the pull request and all contribute to the new branch until it's ready, then open a new PR.
@k-doering-NOAA may have additional ideas on the optimal workflow when she logs in tomorrow.

@k-doering-NOAA
Copy link
Contributor

Wow, lots of action on this topic, this is great! A few thoughts.

  1. Agree it would be good to test ADMB 13 separately. Since it is not out yet, I don't think we should be using it for the 3.30.19 release, but I can set up a job that runs the dev version of admb for testing out MV tweedie changes.
  2. No need to leave the PR open, as I think all of the jobs should run when pushing to the branch anyway. We could close it, or mark the PR as "draft" to indicate it is not yet done. Either way works!

@k-doering-NOAA
Copy link
Contributor

k-doering-NOAA commented Mar 28, 2022

Just wanted to say, setting up the GHA to test out admb dev is on my to do list! However, we are working on release 3.30.19, so that will come first. I hope to get to the admb dev github action by next week.

@Rick-Methot-NOAA Rick-Methot-NOAA added in progress This is being worked on in a branch composition statistics related to logL and removed help wanted Attention Rick - please look misc. internal calc labels Apr 12, 2022
@k-doering-NOAA
Copy link
Contributor

k-doering-NOAA commented May 5, 2022

Just wanted to say, setting up the GHA to test out admb dev is on my to do list! However, we are working on release 3.30.19, so that will come first. I hope to get to the admb dev github action by next week.

I did try this and was not able to successfully compile admb on github actions. It may be easiest to wait until there is a release for admb 13. In the meantime, I did compile it locally on my windows machine, so perhaps just compiling locally will be ok?

@Rick-Methot-NOAA
Copy link
Collaborator

But we still could not continue development of MVT because any commit would trigger a gha which would fail when it tried to compile the MVT command.

@k-doering-NOAA
Copy link
Contributor

I'm confused, could the failing jobs on the branch just be ignored for now? I think we would want them to pass before merging the branch into main, but I'm hoping ADMB 13 will be out before then, so I can modify our testing suite to use it.

@Rick-Methot-NOAA
Copy link
Collaborator

What I mean is that we cannot proceed with development of a branch that includes a call like Do_MVT(). Even though we could compile SS3 locally using ADMB13, compiles using gha will fail because the do_MVT() routine does not exist in ADMB 12.3

@k-doering-NOAA
Copy link
Contributor

I get that the do_MVT() routine doesn't exist in ADMB 12.3, so all the github actions would fail. I guess I still don't understand why that would impede development?

@Rick-Methot-NOAA
Copy link
Collaborator

I suppose we still could do all MVT development locally, but we created gha partly so we could test frequently

@k-doering-NOAA
Copy link
Contributor

Ah, ok, I see the concern. Maybe it would be best to wait on this until ADMB 13 (or its prerelease) comes out

@James-Thorson-NOAA
Copy link
Author

Hi all,

I don't know where this landed, but I saw that ADMB-13 is now released, and also the MVTW paper is now accepted @ ICESJMS.

Is there anything I can do, or perhaps you could tell me when you've started switching over to ADMB-13?

Jim

@Rick-Methot-NOAA
Copy link
Collaborator

There is a branch with a PR in which a placeholder to Tweedie is implemented, as well as several changes to how D-M is implemented.
We have installed ADMB13 locally and tested SS3 with it. All looks good.
I'll get back to you in a week with an update on when we are ready to move to next stage of implementing Tweedie.

@Rick-Methot-NOAA Rick-Methot-NOAA modified the milestones: 3.30.20, 3.30.21 Sep 9, 2022
@James-Thorson-NOAA
Copy link
Author

Kathryn or Ian,

can either of you (or someone else) point me at the SS3 code snippet where the logical code for the mvtweedie would be added (presumably by me)?

I'm happy to take a stab at this in the next couple weeks.

Jim

@iantaylor-NOAA
Copy link
Contributor

Hi @James-Thorson-NOAA,
The length comp likelihood for the MV-tweedie is implemented here (thanks to you): https://github.com/nmfs-stock-synthesis/stock-synthesis/blob/main/SS_objfunc.tpl#L376-L401. There are placeholders in that file for the age and generalized size-comp likelihoods. I'm assuming that @Rick-Methot-NOAA is planning to move the likelihood into a generalized function for all data types in this section: https://github.com/nmfs-stock-synthesis/stock-synthesis/blob/main/SS_miscfxn.tpl#L106-L122 (where the multinomial and Dirichlet-multinomial are now located).

I'm not sure what other changes are needed to implement this. I checked off the items for the data and control file pieces in the checklist at the top of this issue as it looks like they are complete.

@Rick-Methot-NOAA
Copy link
Collaborator

Ian covered the new situation well. Suggest creating a new branch if you work on the full Tweedie implementation. Adding Elizabeth for awareness. @e-gugliotti-NOAA

@James-Thorson-NOAA
Copy link
Author

Sorry, who's supposed to do something next?

Rick -- It looks like the mvtweedie likelihood is there for length-comps (except needing to uncomment the calculations). are you planning to move likelihood calls to some other place (where the code isn't duplicated)?

Ian -- are you willing to do a quick test of the uncommented likelihood for a length-comp model? I don't think I know how to compile the current SS3 code.

@Rick-Methot-NOAA
Copy link
Collaborator

The Tweedie code will move into ss_miscfxn.tpl, similar to the function: Comp_logL_Dirichlet()

@Rick-Methot-NOAA
Copy link
Collaborator

I will finish the implementation, then pass to Jim for testing.

The code is in main and I will work from there.

The FUNCTION in ss_miscfxn.tpl is:
FUNCTION dvariable Comp_logL_Dirichlet(const double& Nsamp, const dvariable& dirichlet_Parm, const dvector& obs_comp, const dvar_vector& exp_comp)
{
dvariable logL;
logL = sum(gammln(Nsamp * obs_comp + dirichlet_Parm * exp_comp)) - sum(gammln(dirichlet_Parm * exp_comp));
return (logL);
}

@James-Thorson-NOAA
Copy link
Author

OK thanks Rick!

PS: I posted a comment where I was confused about where to find files. I then deleted the comment because I realized it was due to working on my own fork which was out-of-date. I'm definitely struggling a bit to sort through the different file structures across branches and forks :0

@Rick-Methot-NOAA Rick-Methot-NOAA modified the milestones: 3.30.21, 3.30.22 Jan 26, 2023
@Rick-Methot-NOAA Rick-Methot-NOAA modified the milestones: 3.30.22, 3.30.23 Jun 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
composition Epic: Data weighting improvement Group of issues/user stories related to improving data weighting in progress This is being worked on in a branch statistics related to logL wishlist request new feature; bigger than revision; OK to remove after adding to Milestone
Projects
Status: No status
Development

No branches or pull requests

5 participants