Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish original source code for covid-sim #144

Closed
bitcartel opened this issue Apr 28, 2020 · 12 comments
Closed

Publish original source code for covid-sim #144

bitcartel opened this issue Apr 28, 2020 · 12 comments

Comments

@bitcartel
Copy link

Please publish the original source code and any required input data so that the public can run the simulator and reproduce any results which were provided to assist the British Government in determining its COVID-19 policy decisions. I believe it would be in the public interest for this code to be made available. Thank you.

Please publish:

  1. Full history as documented in the first commit of the current repository.
    image

  2. Original C code before it was updated by Microsoft, Github and others:
    https://twitter.com/neil_ferguson/status/1241835454707699713
    image

  3. Any Fortran code which was transpiled to C:
    https://twitter.com/ID_AA_Carmack/status/1254872368763277313
    image

@alecmocatta
Copy link

alecmocatta commented May 6, 2020

I know this isn't the original code but it is more original than the Aprill 22 squash most are aware of.

The history from April 1 to the squash On April 22 is here: https://github.com/mrc-ide/covid-sim/commits/1cd06e1ce20d516fb96f73990c04b2defe1b063f

and the import on April 1 is here: https://github.com/mrc-ide/covid-sim/tree/7282c948b940c8bd90d6afaa1575afb3848aa8b5

@leolara
Copy link

leolara commented May 7, 2020

I think the issue refers to the original single 15k line of code in C with some functions that look like they were machine translated from Fortran. The links provided in the previous comment, seem to do not provide that.

@bitcartel
Copy link
Author

Regarding the squashed Github history, @alecmocatta how did you find the commits in the first place?

The orphaned commits are not found on any branches in the repository, so they can't be referenced locally.

git show 1cd06e1ce20d516fb96f73990c04b2defe1b063f
fatal: bad object 1cd06e1ce20d516fb96f73990c04b2defe1b063f

git show 7282c948b940c8bd90d6afaa1575afb3848aa8b5
fatal: bad object 7282c948b940c8bd90d6afaa1575afb3848aa8b5

However, we can now manually retrieve the commits from Github pull request #118 which was merged as commit 1cd06e1 above:

git fetch origin pull/118/head:unsquashed

As @leolara mentions above, the initial import is the modified code and not the original C code.

@alecmocatta
Copy link

@bitcartel I saw #119 was the first merged PR since the squash, so I took the commit hash from #118.

@leolara I'm well aware of that. Like others I'm interested in the original code, and the code of April 1 is more original than the April 22. As most are unaware of the above trick, I thought I'd share.

@weshinsley
Copy link
Collaborator

weshinsley commented May 7, 2020

Brief comments on this due to time, and other priorities. We would prefer issues raised in github to be things we can more tangibly address in the code, rather than these sorts of requests. In reverse order:

(3) There is no such Fortran code; all the "native" C code needed to build the simulation is in the repo. John Carmack may be commenting on the RANLIB (see the Readme) library we use, which was released in multiple flavours with some inter-language comments. See the readme for links to that.

(2) The code here is essentially the same functionally as that used for Report 9, and can be used to reproduce the results. The refactoring Microsoft and Github helped us with restructured and improved the layout of the code, with some documentation, to make it somewhat easier to scrutinise, but was written with regression tests against a reference result set to ensure changes in code structure did not change behaviour. We do not think it would be particularly helpful to release a second codebase which is functionally the same, but in one undocumented C file. Our refactor aims to make it easier (with some effort we acknowledge) for the open-source community to comment on the current live code, which is in use today.

Seeing as you have quoted John Carmack's twitter, I hope that you find his comments encouraging when he writes, "it turned out that it fared a lot better going through the gauntlet of code analysis tools I hit it with than a lot of more modern code. There is something to be said for straightforward C code. Bugs were found and fixed, but generally in paths that weren't enabled or hit."

(1) The history squash merged a number of changes we were making with large data files, making the repo rather easier to download. We do not think there is much benefit in trawling through our internal commit histories. Again, we would rather people focussed on the live code. If you wish to look through the undeleted branches in the repo, and use the method @alecmocatta points out, you are welcome to.

@fche
Copy link

fche commented May 7, 2020

Apart from that, the request was also for all of the configuration input data that were used to construct Report 9, and all its subsequent versions.

@weshinsley
Copy link
Collaborator

weshinsley commented May 7, 2020

Many tens of thousands of runs contributed to the spread of results in report 9. Right now, I'm afraid the process for lay users is to pick values out of the ranges described in the report, plug them into the example parameter files in the data folder, and build and run the executable with the instructions. This will be a task somewhat simpler for infectious disease epidemiologists to understand than lay readers - as will be the interpretation of the results. All this is indicated in the readme.

Various approaches are underway to provide the right kind of sample datasets people want and present the science in more accessible hands-on ways. But as stated in the readme, we just don't have the bandwidth in current conditions to provide personal support for these kinds of request. (which would be necessary).

@fche
Copy link

fche commented May 7, 2020

"Many tens of thousands of runs contributed ..."

Are there any input artifacts saved from these runs? If so, what reason exists for not simply sharing them? If not, then .... well that'd be very bad.

@weshinsley
Copy link
Collaborator

weshinsley commented May 7, 2020

Only that there are tens of thousands of them. As I said, various approaches are underway to make those available in a sensible way.

@fche
Copy link

fche commented May 7, 2020

Sounds fine, modern computers and version control systems like git have no problems with tens of thousands of files. Looking forward to their release.

@dstansby
Copy link

dstansby commented May 7, 2020

We do not think it would be particularly helpful to release a second codebase which is functionally the same, but in one undocumented C file.

If this was the code that was used to produce the original report, it would make the results truly reproducible. If the functionality is the same then why is there a reluctance to release this file?

@weshinsley
Copy link
Collaborator

weshinsley commented May 7, 2020

The priority, for our limited time and resources, is on the current code, since that is in daily use. Many users are making constructive attempts to understand and critique that code, which we welcome. We do not feel that would be nearly as easy with the original code, and it would only be a distraction to be trying to answer questions about two code bases, one of them effectively deprecated.

We are working on how to best provide input parameters for users to reproduce the results we have produced so far. In the meantime, if users really are serious about this, we suggest they build the model and run some tests for themselves, by which time we may have progress on more input parameter files.

@mrc-ide mrc-ide locked and limited conversation to collaborators May 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants