Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-open and unlock issue #144 (Publish original source code) #179

Closed
bitcartel opened this issue May 7, 2020 · 119 comments
Closed

Re-open and unlock issue #144 (Publish original source code) #179

bitcartel opened this issue May 7, 2020 · 119 comments

Comments

@bitcartel
Copy link

bitcartel commented May 7, 2020

Issue #144 has been prematurely closed as the original C source code has not been published.

Please re-open and unlock the issue so the community can provide feedback and discuss the comments made so far.

If a formal decision has been made to not release the original code, please confirm and document this in the comments of #144, rather than abruptly closing the issue.

@Feynstein
Copy link

Feynstein commented May 8, 2020

You guys keep giving s*** to people that are probably low-wage grad students and post-doc. They decided to release the latest version they had on github because they knew this was getting important and I salute them for it. The original code probably doesn't exist anymore anyway. Under normal circumstances who needs an epidemiology algorithm that is production ready… No one in his right mind thought it would someday be urgently needed. No one of the computer engineering-y guys here would have ever wanted to do a cleanup of this mess anyway. And that’s the reality of things. People are focused on doing web-based server-client ruby fast delivery html5 software engineering bull****

It gets me really pissed to see all the comments that say use this or that production code technique. This kind of code never leaves the 8th basement of the university where its kept by some post-doc dude that has too much to do in the actual lab than take the time to make all those improvements. And then suddenly the world needs it. I think some people here need to take a good hard look at themselves and say why is this happening? Welp because theoretical fields like this are under-funded that’s why.

And by under-funded I mean 15-20k per year in research grants to grad students… those who get them. I’ve been there, I know how it is. I'm an actual scientific software developer that learned from scratch with a B.Sc. in physics and a M.Sc. in electrical engineering. Now, after a few years, I can do production-ready assisted defect recognition in x-ray inspection. When I first started I was working on code for radiation dose in radiation therapy without any real coding experience, this kind of code gets eventually cleaned up and reviewed before actually going into a clinical setting. And I made no real money while doing it even though it might be used someday to save one of your a***s. And while studying and having to do 40h a week of stupidly hard research work and still wondering if you’ll be able to pay rent the next month. People that make that kind of money don’t give two ****s about the code. While all those undergrad computer science guys get full salary paid internships. And if they get their Ph.D. there’s no job for them in the market anyway… so they’ll end up being in the academia limbo forever.

I am personally used to using Geant4 for radiation physics and It’s almost the same thing! It has been going for yyeeeaaarrsss, and was eventually cleaned up because someday some company decided to use it. https://geant4.web.cern.ch/. This thing was being used for nuclear power plants before many of you could even walk by the way.

I'm sorry for this, but it had to be said sometime.

@bitcartel
Copy link
Author

The original code probably doesn't exist anymore anyway.

The original code does exist. Multiple external developers including Microsoft and Github were recently granted access to it. See issue #144 for more info.

@Feynstein
Copy link

Feynstein commented May 8, 2020

Do you know what happened? I am used to this kind of thing. People at Microsoft and GitHub took the code and tried to do some refactoring then they sent it here so that the open source community can look at it. But what happened is that no one at both those companies is an epidemiologist... So no one really understood what was going on, making an object-oriented pass at the original C code nearly impossible because they don't want to get the liability of it. So yeah, better let the original lab post it on GitHub and get all the flak from the community, that's what happened. By the way, this is probably already documented in scientific literature.... I'll see if I can find the original paper... Omg you guys... If I have to make a pass in this in order for everyone to stop arguing I will.

No one with only a computer science background understands what's going on with the code I write. The reason is simple, software engineers simply don't have the mathematical and physics background to do it. I could write the most documented code in the world, if you can't understand what's the meaning of the Klein-Nishina differential cross section for photo-electric Compton effect, you won't have any clue, and I mean not one bit of understanding for my data containers and architecture choices... even senior architects and developers because they're not nuclear physicists, that's what happening here. And I already had, on multiple and multiple occasions, to explain to those computer science guys that integrate what I do why it might not always be 100% repeatable. Do you know anything about chaos theory or non-linear algebra? When you have so much variables in the code that even the slightest bit of rounding error in a double can drastically change the outcome of the simulation? I'll leave it here for you guys while I look in academic papers for the original article. Take a good look at it. It's why meteorological predictions are bad. https://en.wikipedia.org/wiki/Chaos_theory
and especially this one: https://en.wikipedia.org/wiki/Butterfly_effect

It seems to me that 20GB of ram should probably lead to some kind of chaotic system.

By the way you can drop the repo etiquette now, you guys are in for blood. You don't actually care about the research...

@Feynstein
Copy link

Feynstein commented May 8, 2020

Ok I got it... I suggest you guys look at the methods on page 213, that explains it all, you can basically reconstruct the original code from it.
ferguson2005.pdf
I got the paper from my University's library connection to nature... so as soon as this issue is closed I will delete this comment.

I'm pretty sure the original code to this looks very similar than the one in the repo. I looked at it and it is pretty much a single file. This is clear to me that both people from microsoft and github didnt want to be involved in this, the liability because of all that happened is too much for them. They saw it and they had the exact same reaction as you. "I won't touch this even with a hockey stick".

@Feynstein
Copy link

Feynstein commented May 8, 2020

So yeah, conclusion: This is legit... this is the version that microsoft and github didn't want to be involved with...
why? :

  1. If you look at CovidSim.cpp it's formatted like a one pager C main program. No comments whatsoever... this is clearly the work from someone translating from Fortran to C. I know many other physicists whose summer internships where to translate such codes for use in fluid dynamics labs, by example.
  2. If you look at Kernels.cpp this is clearly the work of someone who knows what they're doing, good comments and the intricate use of openmp is the tip off.
  3. You see why the guys at github and ms would have focused on performance issues in order to get results quicker considering the ongoing pandemic. So that it seems that only "critical" parts of the code (like kernels) are modified by people with visible experience.

These conclusions are based on my experience in both the academic programming setup and the continuous delivery - agile style - scientific software production setup. I won't dox myself, but if anyone wants to get credentials just private message me.

What I would suggest though and I might file an issue on this is the self written (hardcoded) random seeding... this is baaaaaaddddddd, like really bad. You better quickly switch to an experienced and proven random number generator. Like CLHEP: https://gitlab.cern.ch/CLHEP/CLHEP

@BenLubar
Copy link

BenLubar commented May 8, 2020

You guys keep giving s*** to people that are probably low-wage grad students and post-doc.

Why are we trusting the lives of billions of people to a group of people you don't seem to think we can trust to publish some already-existing source code on GitHub?

@Feynstein
Copy link

Feynstein commented May 8, 2020

That's a question that's out of this scope, you have to ask your own government for that answer. It's not their fault if this rusty and ugly piece of code was suddenly placed into the spotlight. It's not that it's a bad scientific algorithm, it's that it's not up to standard used in the industry. The decision of using it is not related in any way to the people that did it. And by the way, they're the only ones that have such an impressive modeling and it really stands out from others that use SEIR-type models. It's pretty much the best thing that's made in this field... I read the paper carefully and there is a lot more attention posed on details than I expected, really.

I do not try to subtract the group from constructive criticism, far from it in fact. I want to give context as to why they're not releasing the original code. It's the exact same code but with the ameliorations coming from GitHub and ms. If you look carefully at #144, their last answer especially, you will see that it is pointless to release the original code since there's probably not much relevant difference and they want for people to know what was scientifically used to take to decisions. They also want to make it easier for ordinary folk to understand. The original code was made specifically for an Asian flu epidemic, making it unusable for this current pandemic.

And finally... Look at the damn paper I uploaded... If you are a scientific software developer you can use it in order to generate the original code. It's called a scientific paper and it's the academic equivalent to the C original. If you can't come up with your own version using the methodology displayed in the article your not in a position to ask for original code. It would be completely pointless for you to have it because it's an unoptimized probably very intricated piece of software that you could not understand l. And I do not mean it in a bad way, I know you're not stupid, you just don't have the required background for it to make sense.

@ianna
Copy link

ianna commented May 8, 2020

What I would suggest though and I might file an issue on this is the self written (hardcoded) random seeding... this is baaaaaaddddddd, like really bad. You better quickly switch to an experienced and proven random number generator. Like CLHEP: http://cmd.inp.nsk.su/old/cmd2/manuals/cernlib/CLHEP/RefGuide/random.html

@Feynstein - more up to date link:
https://gitlab.cern.ch/CLHEP/CLHEP

@Feynstein
Copy link

Feynstein commented May 8, 2020

There's no conspiracy behind it, it's just ordinary scientific folks that want to be as rigorous as they can so that their work is not misquoted because the original code is clearly not made for the actual pandemic. And the fact that they involved GitHub and ms to work on it is the best example of the scientific rigorous work you expect from them, because they knew it was not up to standard. It all makes sense in the end.

@BenLubar
Copy link

BenLubar commented May 8, 2020

@Feynstein I'm finding it very hard to take you seriously given that you appear to simultaneously be arguing that we shouldn't be allowed to see the original source code because we "wouldn't understand it" and also that the people maintaining the project aren't smart enough to understand it either.

Do you actually understand anything about software development, or are you a troll trying to spread confusion?

Your walls of text with no substance aren't helping your case.

@Feynstein
Copy link

Ah come on work with me in this one. I'm sorry I might be confusing, French is my first language. What I'm trying to do, as I said, is give a bit of context about this. I don't say that they should release or not release. I'm trying to do the Devils advocate in order to try and understand their decisions. I'm very sorry for the confusion. In my opinion, for real, it would be easier for them to release the code in order to stop this conspiracy nonsense. But on the other hand, I understand the scientific rigor behind the decision... Can I have a way to communicate with you out of GitHub without risking to be doxxed? I really am not wanting to troll. I'm sorry.

@Feynstein
Copy link

Feynstein commented May 8, 2020

I do work in a company that "tries" to use continuous delivery (you know what I'm talking about) and I know I know less about the best development practices than the other computer science guys in the company I work for. That's why I have an integrator/architect attached to me to help me integrate my work in the bigger architecture and it can be used in production.

@BenLubar
Copy link

BenLubar commented May 8, 2020

Among other things, I'm very surprised that you'd spell your own name wrong, Mister Bélanger.

If you want to prove you really are who you say you are, simply post a link to your GitHub profile from any of the other places you frequently post on the internet.

You appear to have much better English on other websites as well.

@Feynstein
Copy link

Feynstein commented May 8, 2020

I just sent you an email... are you happy now that you revealed my name? That's perfect... it's always like that with your kind of people... You try to really express what you think in order to bring a better view of the problem and you get laughed at. You get doxxed/hacked and your personnal info ends up on some obscure 4chan thread... You know what? I don't care anymore... do what you want with that... I tried as much as I can to express a scientist's view of the situation. Just try to leave enough money in my bank account so that I can pay my rent this month.

@Feynstein
Copy link

Feynstein commented May 8, 2020

And i'll finish with a quote from another physicist in the comment section of this page:
https://lockdownsceptics.org/code-review-of-fergusons-model/

"Time and money. The author themselves stated that insurers have managers and professional software engineers to ensure model software is properly tested and understandable, which academic efforts don’t. Academics would love to be able to employ a professional software engineer to work with them and make sure their code is up to scratch. Occasionally someone does manage to scrounge together the funds to do so, but most groups simply do not have the money to hire a professional software engineer. Academia is a constant game of trying to spread the resources you have as far as they will go.

By all means encourage government to increase science funding and require that a large coding project employs a professional software developer, but if you just gave the money that academic epidemiologists have used to do their work to the insurance industry and asked them to do the same job, but producing better code, they would laugh at you."

@Feynstein
Copy link

Feynstein commented May 8, 2020

@BenLubar That's what I thought... When you find out who I really am you don't have anything else to say. I'm 30 years old and I'm only finishing a master's degree because of all the hard stuff I lived. Even though I'm lead scientist in my company I still live in a 1 bedroom apartment and probably make half as much as you. People like me and them are pushing the boundaries of science daily and hardly get any recognition from anyone. In fact right now these scientists, even though they created a very good model, get **** from this community for trying to do the right thing.

@ghost
Copy link

ghost commented May 8, 2020

Love these personal dramas. But back to ticket, original source, allows us software engineers to create models (code evolution over time, very important for validity), and double plus good - original data and inputs so we might be able to help creating unit and property tests and, heaven help us; even though we aren't scientists we might be able to do maths and stuff (like understanding floating point, and variability in inputs and data) .

@davividal
Copy link

davividal commented May 8, 2020

@Feynstein

In fact right now these scientists, even though they created a very good model, get **** from this community for trying to do the right thing.

"Talk is shit, show me the code".

The "right thing" to do is to publish the entire repository history, not the squashed version. Anything short than that is BS.

@davividal
Copy link

Love these personal dramas. But back to ticket, original source, allows us software engineers to create models (code evolution over time, very important for validity), and double plus good - original data and inputs so we might be able to help creating unit and property tests and, heaven help us; even though we aren't scientists we might be able to do maths and stuff (like understanding floating point, and variability in inputs and data) .

I would also add that this would allow the community to validate all the refactoring that was done. No one is imune to errors, so was the original code functionality preserved across all the refactory? It is really hard to state that without unit tests.

@Feynstein
Copy link

Feynstein commented May 8, 2020

@tau-tao I've said everything that had to be said. If I were you I would start working on this code right now to figure out why there's no repeatability and stop waiting for the first C version, which might never come... You know you don't need the original source in order to trace it properly and figure it out. What you said is BS. I wonder if you really understand what's going on in this code... What you can do to check validity is look at the papers and re-run it in order to replicate the data they got. That's a good idea. I suggest you look at the first one in 2005 that I uploaded in my comment earlier, it seems to be the earliest paper from the professor. That's what I would do, after doing an object oriented pass in CovidSim.cpp

@davividal You also should start working on this version right now while you wait for the original. Don't act like you're better than all of these folks doing their best. Why don't you start writing the damn unit tests yourself?!? ... Or maybe even functional tests? But you need to understand what it does eh? Hum too bad...

Ah man, you guys are cancer... This community can be so toxic at times... Why don't you start implementing the random number generator switch to a more recognized one, like I suggested earlier? Can you actually do that? Can you really find where and why the random numbers are used so that you don't break stuff? I'm done... really, i'm f* done with you.

On another note I'll try to work on it this weekend, in order to show that I really want to help them.

@ghost
Copy link

ghost commented May 8, 2020

OK, looked into it a bit, pretty sure I am not cancer (given that is generally a genetic dysfunction, very close family have had it :-(, not sure how I would cause that) - not sure what that means really. Just think that original source,input, data might help. smiley face :-).

@davividal
Copy link

@Feynstein: about software testing: it is not about understanding the actual code, but knowing what the code is supposed to do. Then writing a test taking that in consideration, running the test against the current code and acting upon the test result.

@Feynstein
Copy link

I know that, but didnt they provided outputs? If you consider that first output as your regression line, you can work from there easily. The non-reproductibility issue is probably due to their multi-threading seed generation that means its not the same seed on every thread. I think it would be easy to fix that by using a mutex for their seed. Im sorry about the cancer stuff people get me started up sometimes with non-constructive stuff.

@beewenib
Copy link

beewenib commented May 8, 2020

Interesting debate here. I think what needs to be answered is "why is this code here"? Like you say, unless you're an epidemiologist, there's no use for it. So, why is it here?

  • Is it to for others to copy and immediately use? Doubtful. The worlds top epidemiologists (Johan Giesecke, Knut Wittkowski, Dr John Ioannidis) all point out that the assumptions (and therefore inputs) into this model were completed under extreme pressure and not peer reviewed. This has nothing to do with the model.
  • Is the code here so that they can get feedback and improve the algorithm? Doubtful. Epidemiologists already have good models. Although there are a lot of C++ experts here, how many are going to give feedback on a good architecture that's needed for performing epidemiological computations? This model is a tool, nothing more, nothing less. This tool will not predict the future. This tool does not replace the experience and knowledge of a seasoned scientist. Seasoned scientists already have models.
  • Is it a PR move? Likely. If so, it's not a very good one. The best PR move would be for Imperial College to admit some mistakes (like all scientists do) and have an open debate with other scientists that disagree with their methodologies so that the scientific community can go back to being open to discussions instead of arguing. This is how we improve and learn from each other. This is the scientific method.

So, why is it here? I get the feeling this isn't a software issue. Even if it was, there's no budget to do anything.

@insidedctm
Copy link
Contributor

insidedctm commented May 8, 2020

  • Is it to for others to copy and immediately use? Doubtful. The worlds top epidemiologists (Johan Giesecke, Knut Wittkowski, Dr John Ioannidis) all point out that the assumptions (and therefore inputs) into this model were completed under extreme pressure and not peer reviewed. This has nothing to do with the model.

So it's entirely possible to examine what the model would output with different assumptions. That seems entirely a useful thing to do.

  • Is the code here so that they can get feedback and improve the algorithm? Doubtful. Epidemiologists already have good models. Although there are a lot of C++ experts here, how many are going to give feedback on a good architecture that's needed for performing epidemiological computations? This model is a tool, nothing more, nothing less. This tool will not predict the future. This tool does not replace the experience and knowledge of a seasoned scientist. Seasoned scientists already have models.

I think you completely under-estimate how useful it is to have baseline code to work from. The last thing you want to do as a researcher is to have an interesting new idea and then have to build everything from scratch.

  • Is it a PR move? Likely. If so, it's not a very good one. The best PR move would be for Imperial College to admit some mistakes (like all scientists do) and have an open debate with other scientists that disagree with their methodologies so that the scientific community can go back to being open to discussions instead of arguing. This is how we improve and learn from each other. This is the scientific method.

Seems unnecessarily argumentative.

@beewenib
Copy link

beewenib commented May 8, 2020

Seems unnecessarily argumentative.

Even the mere suggestion of debate sounds argumentative to you!?

@insidedctm
Copy link
Contributor

Seems unnecessarily argumentative.

Even the mere suggestion of debate sounds argumentative to you!?

If you want to debate the reasons why the code was released this isn't the appropriate place

@beewenib
Copy link

beewenib commented May 8, 2020

Seems unnecessarily argumentative.

Even the mere suggestion of debate sounds argumentative to you!?

If you want to debate the reasons why the code was released this isn't the appropriate place

I agree, that makes sense, and I'll stop. Being a scientist, you need to reflect on your initial reaction to an assumption that someone is arguing VERY seriously. You just broke the scientific method. I do acknowledge that you're in the spotlight right now, but that's a part of the profession. If the mere suggestion of this is disturbing, you need to fix this yourself.

@bitcartel
Copy link
Author

I believe it is important for public trust in the scientific method that the original code be made available.

British scientists, researchers and engineers should have the same level of access to the original code as granted to American corporations (Microsoft, Github) and independent developers (John Carmack).

@weshinsley Please consider re-opening #144 so that this ticket can be closed. Thanks.

@insidedctm
Copy link
Contributor

I agree, that makes sense, and I'll stop. Being a scientist, you need to reflect on your initial reaction to an assumption that someone is arguing VERY seriously. You just broke the scientific method. I do acknowledge that you're in the spotlight right now, but that's a part of the profession. If the mere suggestion of this is disturbing, you need to fix this yourself.

There's a difference between debating an issue and being argumentative. I'm not sure where you got your list of "The worlds top epidemiologists" but they sound suspiciously like the subset of epidemiologists who agree with your predecided position. I'm not taking lectures from you on the scientific method. Now that is argumentative, so I'll stop.

@Feynstein
Copy link

@DJ-19 Good science would in fact involve reproduction, analysis and publication. A technical note should do in that case. If you only reproduce the results without any peer-reviewed methodology that's useless, bad science. I think any credible scientist would agree with me that peer-reviewed rebuttal is very desirable.

@weshinsley
Copy link
Collaborator

Of course, and that process takes quite some time.

@DJ-19
Copy link

DJ-19 commented Sep 8, 2020

@Feynstein thankyou for your thoughts. The pros and cons of the peer review system are well known however in the case of SAGE I was informed by my MP it was "...entirely appropriate that these experts are allowed to exercise their professional expertise without any undue influence from external forces". The alternate opinion or criticism of peers can be considered an "undue influence" from "external forces".

@bitcartel
Copy link
Author

@weshinsley Thanks for describing the thread settings.

Btw, can you confirm the FOIA code was the actual code run for Report 9? As @davividal mentioned, why not publish the full history of the FOIA code? If an old version control system was used, such as RCS or SVN which Github doesn't support, just zip/tarball it up.

@Feynstein
Copy link

@DJ-19 If your MP thinks that the peer review process is an undue influence you should tell them that they are wrong. It's a way to remove (mostly personal) influence in research. My point was that any person that tries to reproduce the results seriously should publish it. There's no point in undermining research if you can't back it up. Especially if your methodology fails to meet field-specific criteria.

@DJ-19
Copy link

DJ-19 commented Sep 8, 2020

@Feynstein I don't think you understood what my MP was saying about how SAGE was operating but that's okay.

@Feynstein
Copy link

That's probably the case. I did not update my knowledge of the situation for a while. No need to explain, I just wanted to convey a global message I guess.

@weshinsley
Copy link
Collaborator

weshinsley commented Sep 9, 2020

@bitcartel - I didn't work on that aspect of report 9, but as Neil and the FOI response indicated, that is the code used to produce the results for that report, and you can do it yourself if you have the right licensing. We'll see more on that when the RAMP analysis is published.

There is no prior history for that code; the bulk of it was written by a single author before github existed, and where version control was essentially backup. Formal testing and scientific rigour was every bit as important then as it is now of course, just today's tools make [some of] it more automatic.

@DanBuchan
Copy link

I have both over the years and today worked with plenty of scientists who work with code and do not use source control. Universities, research councils, grants, journals, etc... do not mandate people's day-to-day working practice. It's getting less common for researchers not to use source control but I suspect it will always be a thing others will have to contend with in the future. Often times the versions of the code that accompany specific papers are as much version history as you get.

@Feynstein
Copy link

Feynstein commented Sep 9, 2020

@DanBuchan That was the first point I made in this thread, but it doesn't seem to register. For some people epidemiological models should have used full continuous delivery logistics from the beginning. Yet, not one of them cared before the pandemic. I hope it will draw scientists towards more robust coding practices, but I also hope that these computer-science guys would understand the reality of research before the pandemic. I would recommend anyone interested in the meta-analysis of this phenomenon (and basically of this thread) to read that book:
https://books.google.ca/books/about/The_Death_of_Expertise.html?id=x3TYDQAAQBAJ&printsec=frontcover&source=kp_read_button&redir_esc=y#v=onepage&q&f=false

@DJ-19
Copy link

DJ-19 commented Sep 9, 2020

This group was setup to help address some of these issues in research software: https://software.ac.uk/

@bitcartel
Copy link
Author

@weshinsley

"There is no prior history for that code; the bulk of it was written by a single author before github existed, and where version control was essentially backup. "

So can the backups be made available? It would be easy to reconstruct the history from those backups.

@weshinsley
Copy link
Collaborator

I was speaking generally about computing pre-version-control. No such 15-year-old backup stash exists.

@davividal
Copy link

I was speaking generally about computing pre-version-control. No such 15-year-old backup stash exists.

So the version Microsoft et al refactored/worked on came from where?
Did the original author dictate it, Microsoft/Google/etc. refactored it on the fly and just then put everything under git and published here?

@bitcartel
Copy link
Author

bitcartel commented Sep 9, 2020

image

From the FOIA code:
wc -l SpatialSim.c
17361 SpatialSim.c

So there is difference of around 2000+ lines of code between the FOIA code and whatever was given to Microsoft and others. It's quite possible that John Carmack made a typo or did not remember how large the file was. Regardless, this should be verified.

@weshinsley
Copy link
Collaborator

Seriously?

@davividal You can email a zip. MS/GH (not google) helped us build this repo with it.
@bitcartel John joined a few days into the work. I doubt 15k is meant to be taken as precisely as you have taken it.

A reminder this issue tracker is for reporting specific issues found when using the code in this repo, or for open-source developers to constructively contribute to it.

@davividal
Copy link

Seriously?

@davividal You can email a zip. MS/GH (not google) helped us build this repo with it.

So there is some pre-git content...

I still don't understand why the secrecy about the pre-git content.

@weshinsley
Copy link
Collaborator

The pre-git content is the zip file we've been talking about for some time now.

@DanBuchan
Copy link

@weshinsley

"There is no prior history for that code; the bulk of it was written by a single author before github existed, and where version control was essentially backup. "

So can the backups be made available? It would be easy to reconstruct the history from those backups.

Probably not. Things like tape based backups are often overwritten on a timed schedule. I'm unaware of any org or university that keeps a complete daily backup history of their file system(s) in perpetuity

Seriously?
@davividal You can email a zip. MS/GH (not google) helped us build this repo with it.

So there is some pre-git content...

I still don't understand why the secrecy about the pre-git content.

There's no particular secrecy, old covid-sim was likely a single directory on someone's computer that wasn't under version control. And had been that way for years. People really do just work by saving over the contents of files/directories. If that directory is on a networked filesystem perhaps there is a rolling-fortnight's worth of backups, there is certainly no 15 year history of backups.

When the time came to release it I assume they zipped the directory in question and emailed it to MS/GH. And I'd also assume it wasn't made public because it was a bit of a mess and couldn't be [directly] compiled/executed by others so the utility of releasing it was likely deemed pretty low. Hence why MS/GH were given a first pass on refactoring it. Clearly that was not good enough so an FOIA request has led to the release of the aforementioned zip and that is the pre git content. If you want anything earlier you should dig out some of the prior publications that have code releases attached, I think I noted 3 papers with DOIs for code last time I looked, you can likely reconstruct some pseudo-history that way. I can not imagine you're going to get much better than that.

@davividal
Copy link

Seriously?
@davividal You can email a zip. MS/GH (not google) helped us build this repo with it.

So there is some pre-git content...
I still don't understand why the secrecy about the pre-git content.

There's no particular secrecy, old covid-sim was likely a single directory on someone's computer that wasn't under version control. And had been that way for years. People really do just work by saving over the contents of files/directories. If that directory is on a networked filesystem perhaps there is a rolling-fortnight's worth of backups, there is certainly no 15 year history of backups.

There is no need to see how the code evolved over the last 15 years. That was never the point.

It took a very long time for them to release a zip file they already had.

And they are still hiding the repository history: bd87d47

I don't think anyone is interested on how the original source evolved over the past decade, only for the past year.

@DJ-19
Copy link

DJ-19 commented Sep 10, 2020

@davividal you might get an idea of that from the comments in the code as many include a date.

@weshinsley
Copy link
Collaborator

Others in this discussion have already requested ancient backups. It takes time to open-source code; you can't just "release it" with no regard for licensing issues. There has also been a pandemic we've been busy with.

Since the public release in mid-April, you've been able to run the model with any parameters you like, dug from whichever paper or report you like, to a statistically equivalent level compared to report 9 if you choose the same params. Since June, you've been able to run one script to produce all the graphs, in the same way as Codecheck certified. Since mid August, you've been able (if you have the licensed data) to identically reproduce the original report 9, because you now have the code (by personal request), from the start of the year. More will come on the latter from the RAMP analysis. And you are bothered about a history squash on the first release that made the github repo much smaller to download.

Another reminder: this issue tracker is for reporting specific issues found when using the code in this repo, or for open-source developers to constructively contribute to it.

@Feynstein
Copy link

Feynstein commented Sep 10, 2020

By the way... If this is all about doubting the effect of the lockdown on the pandemic, I can offer this. Recent empirical research was done and they agree that it was effective in slowing the spread.

This study also weighs-in the economical, psychological and environmental factors.
1.https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7293850/

This study even go as far as saying that the lockdowns should have been stricter.
2.https://www.thelancet.com/journals/eclinm/article/PIIS2589-5370(20)30201-7/fulltext

My guess is that the model was right. During my lockdown I did a few differential modelling (SEIRS) and Markov-Chain Monte-Carlo simulations (other standard ways of modelling it seems) and the intial outcome was terrifying. At least as much as this model. If disease control authorities used these "older but proven" models I think the lockdowns would have been worse. There is an approximate 10 days lag or latency between the start of a lockdown and its effects on the spread. And its positive effects are very hard to detect on a day to day basis because of this lag. If you mess up, you pay later, but it is already too late.
Screenshot from 2020-09-10 10-08-47

@DanBuchan
Copy link

And they are still hiding the repository history: bd87d47

Well, at this point, you have the zip file that was given to MS/GH and you have the initial commit of this repository. So you now have all the information you need to compare those two codebases and infer what was done in the initial refactor. That may not be convenient to you but it's now kind of moot what MS/GH's refactoring commit history looks like.

@zebmason
Copy link
Contributor

@bitcartel wc gives you the number of lines and not number of lines of code. Blank lines and comment lines are not lines of code, and you can write rather long c programs on one line given the use of semicolons.

@bitcartel
Copy link
Author

bitcartel commented Sep 10, 2020

@zebmason Typically lines of code as a metric means all the lines in a file. Even if we strip out blank lines, it's close to 17k
cat SpatialSim.c | sed '/^\s*$/d' | wc -l
16581

@DanBuchan It's hard to believe that this project was just sitting around on a computer somewhere, 15 years worth of work, where it could have been lost or corrupted at any time, without some type of version control or backup system in place.

Members of MRC have been using Git version control since at least 2016... https://github.com/mrc-ide/EPICYST with Neil Ferguson joining Github in November 2016. https://github.com/NeilFerguson?tab=overview&from=2016-12-01&to=2016-12-31

@Feynstein
Copy link

@bitcartel It's hard to believe when you come from a different background. It probably was in a backup drive or something and on multiple grad students PC at the same time. Multiple versions could have existed all at the same time. That's what I am saying since the start. There's no conspiracy buddy, only unintended carelessness. You should get used to it, it's everywhere lol.

@mrc-ide mrc-ide locked as resolved and limited conversation to collaborators Sep 10, 2020
@weshinsley
Copy link
Collaborator

weshinsley commented Sep 10, 2020

Files in universities sit on network shares that are safe with redundencies but don't get backed up. That's not careless, it's just that not everyone in early 2017 decided to devote all their research time into pulling their old code onto our github org. You can also see every commit made by us throughout the time since then, to see what we've been working on.

I think we've answered this issue enough now - and will close, as was suggested by another commenter earlier.

If you have actual issues with the code, or actual contributions, by all means open another issue. But if you want to continue filling our issue tracker with all this conspiracy nonsense, my next option (and responsibility) will be blocking.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests