Adding draft of processes for writing papers #27

Open
wants to merge 4 commits into
from

Projects

None yet
@ellisonbg
Contributor

No description provided.

@rgbkrk
rgbkrk approved these changes Dec 22, 2016 View changes
@michaelpacer

First, thank you for taking the time to put this together. This is a really thorough workflow that should be straightforward to follow. I cannot imagine how this could have possibly proceeded were it not for this kind of effort. If had been built up piecemeal instead of as one big policy document, I'm not sure if anything so coherent and comprehensive would have been arrived at. It's clear that you put a tonne of thought into this and the result is superb!


I think the arguments for the author ordering being alphabetical are strong, with one caveat.

The convention in many fields is to treat author order as indicative of amount of work that went into the creation of a paper by that author. This is especially true for articles that have so many authors that they become "First-Author-Lastname, et al. (2016)".

Since the the Coördinator is putting in a greater deal of effort in ensuring that the paper happens, I'm inclined to say that the Coördinator should be first author while remaining authors names should appear in alphabetical order. This has the added benefit of making the Coördinator that much more invested in making these papers timely and of the highest quality.

Also, to be clear, I'm interested in acting as Coördinator for nbconvert. I will be happy to do so regardless of how author order is determined.

@ellisonbg
Contributor
@fperez
Member
fperez commented Dec 22, 2016

Coming from the world of physics/math where long author lists and alphabetical ordering are common in contrast to the life sciences, I am very much in favor of keeping the alphabetical order, with the "team first" as Brian mentions. That's what the astropy collaboration did, and I think it's an excellent approach.

Favoring one person in front automatically creates a disincentive to broader collaboration. For example if we do as you suggest for nbconvert, @mpacer, then all of a sudden we'd be "promoting" in terms of credit this in front of the years of work that went into creating the library itself:

(master)denali[nbconvert]> git shortlog -sne | head -15
   524	Benjamin Ragan-Kelley <benjaminrk@gmail.com>
   458	Jonathan Frederic <jdfreder@calpoly.edu>
   421	Matthias Bussonnier <bussonniermatthias@gmail.com>
   216	Thomas Kluyver <takowl@gmail.com>
   134	michaelpacer <michaelpacer@gmail.com>
   114	Damián Avila <damianavila82@yahoo.com.ar>
    76	Jessica B. Hamrick <jhamrick@berkeley.edu>
    60	Carol Willing <carolcode@willingconsulting.com>
    59	Paul Ivanov <pi@berkeley.edu>
    50	Brian E. Granger <ellisonbg@gmail.com>
    35	Fernando Perez <Fernando.Perez@berkeley.edu>
    30	Thomas Kluyver <thomas@kluyver.me.uk>
    27	Antonino Ingargiola <tritemio@gmail.com>
    24	Pierre Gerold <pierre.gerold@laposte.net>
    20	Matthias Geier <Matthias.Geier@gmail.com>

But we've advocated for not playing the commit-counting game either, as that is its own other form of perverse incentive.

I'm a firm believer in trying to come up with mechanisms that encourage as much open collaboration as possible, and in many ways I think these open source efforts are quite similar to the big physics experiments. As an illustration, the Higgs Boson paper had over 5,000 authors, and listed the ATLAS collaboration (the LHC experiment that made the discovery) as its first author, same as what AstroPy did:

image

It may not be perfect, but I think it's the "least imperfect" of all options for this...

@michaelpacer

Good point about the authorship list. I think Fernando's idea to deal with the first author, et al. pattern is to list "Jupyter Development Team" as the first author explicitly. Other thoughts?

Mainly one, but its a soluble problem: ORCIDs. As far as I understood, ORCIDs are not available for institutions/organisations. That might change, and we could probably talk to them about getting that changed. But, ORCIDs are vital piece of the modern metadata ecosystem for tracking and analysing scholarly work, so if at all possible we should try to have them available in our author lists. This is done in the brief example JOSS uses to show the author list and as of this month is required by other open science journals such as PLOS. My personal preference would be to try to work with the ORCID organisation to establish a means for organisations to get ORCIDs. Do we know anyone involved there?

Favoring one person in front automatically creates a disincentive to broader collaboration. For example if we do as you suggest for nbconvert, @mpacer, then all of a sudden we'd be "promoting" in terms of credit this in front of the years of work that went into creating the library itself:

Note even in the approach I suggested (which I agree is suboptimal), I figured that the actual Coördinator role (and thus the first author) would go to the person with the greatest seniority and contribution to the project who wanted to take that role. I figured many of those who had contributed to the software in the past may not want to put effort into organising the writing efforts. This was thinking of authorship order as being about amount of work put into writing the paper about the software not the software itself. Based on your reasoning though, I'm gathering that this is not about representing the work going into the paper, but is specifically reflecting the work on the software.

Regardless, I brought this up because this was not part of the process as it was written up didn't look like it was mentioned in the process outline as it stood, I was suggesting a solution in a vacuum that otherwise suggested that the first author would be chosen by virtue of alphabetical order (since the "Jupyter Development Team" option wasn't mentioned anywhere).

I like this organisational name approach – assuming we can solve the ORCID problem – but I'd prefer "Project Jupyter" over "Jupyter Development Team". This is mostly an aesthetic preference, but it also points to recognising the role that non-developers played in bringing the software to its current state.

@willingc
Member

I'd prefer "Project Jupyter" over "Jupyter Development Team". This is mostly an aesthetic preference, but it also points to recognising the role that non-developers played in bringing the software to its current state.

Great point re: non-developer recognition!

@minrk
Member
minrk commented Dec 22, 2016

I think we mentioned before updating the copyright text to use "Project Jupyter Contributors" instead of "Jupyter Development Team", to better indicate the collective ownership. I don't know if that carries too much implicit reference to code contributors for this context, though. 👍 to "Project Jupyter" as well.

As the person who probably benefits most from the commit count at the moment, I'm a big -1 on sorting just about anything by commits. I'm a fan of alphabetical ordering.

@ellisonbg
Contributor
@damianavila

OK I have added a section with more detailed on author ordering. For now I
have used "Project Jupyter" as the first author.

I fully agree with this. The draft looks good for me as is. Improvements could be done later on further steps.

@rgbkrk
rgbkrk approved these changes Dec 22, 2016 View changes
@willingc

@ellisonbg Nicely done. The principles section is 💯

papers.md
+
+Given this focus on research, the ongoing academic careers of many Jupyter
+contributors, and our desire to have an impact on computing research, it is
+important for us to author and publish peer-review papers on Jupyter itself.
@blink1073
blink1073 Dec 22, 2016 Member

peer-reviewed?

@ellisonbg
ellisonbg Jan 6, 2017 Contributor

fixed

papers.md
+* **Openness.** The process of authoring papers should be open in the same way
+ as the rest of the project's work. Thus, all of our papers are written in the
+ public on GitHub.
+* **Accountability.** Being an author on a paper is a priviledge, but also
@blink1073
blink1073 Dec 22, 2016 Member

privilege*

@ellisonbg
ellisonbg Jan 6, 2017 Contributor

fixed

papers.md
+Listing the first author as "Project Jupyter" is important as it means that in
+abbreviated citations, the author list will be "Project Jupyter, et al." rather
+than artificially showing the first alphabetical author's name.
+
@fperez
fperez Dec 23, 2016 Member

Maybe add a quick note about how this is done in other large collaborative projects like HEP papers? Not critical, but it can give context to this idea as one that has already been proven in physics/astronomy (AstroPy paper).

@ellisonbg
ellisonbg Jan 6, 2017 Contributor

fixed

papers.md
+
+The authorship criteria for non-JOSS papers is slightly different than for JOSS
+in that all authors are expected to actively participate in the writing and
+editing of the paper:
@fperez
fperez Dec 23, 2016 Member

dangling :? Doesn't seem to follow into next paragraph as a colon...

This is a minor point, but I'm not sure it will be practical to have all authors actively writing say a main Jupyter paper, simply b/c we could have potentially 500+ authors, and I simply can't imagine 500 people wordsmithing together... E.g. for the Higgs, I don't imagine all 5,000+ authors of the main paper actually edited it, given that (minus references/author list) the paper has ~15,000 words, it would amount to 3 words per person :)

So I think we should say something like:
"""
all authors are expected to actively participate in the editing of the paper. In cases of very large author lists where practicality dictates that a smaller subset does the bulk of the writing, all authors are still expected to read and approve 100% of the final content, contributing edits as necessary.
"""
or similar...

@ellisonbg
ellisonbg Jan 6, 2017 Contributor

I felt this way before seeing how the SymPy paper went. They had almost 30 authors and it was by far the best group paper writing experience I have ever seen. Granted, not all 30 shared the writing and editing equally (a few people did most of the work, with others doing editing and discussing the content). But not requiring a co-author to do any active work on a paper is the wrong message and I think in SymPy's case, not requiring authors to actively participate would have led to a lesser outcome.

You are right that a 500 person paper would need to be handled differently, but I don't think we are going to face that. For use I think the few dozen range is more likely and would be covered by this.

To cover these things, I will add some language to clarify that "actively participate" can be interpreted in a flexible way depending on the number of authors.

@ellisonbg
ellisonbg Jan 6, 2017 Contributor

fixed

@fperez
fperez Jan 6, 2017 Member

sounds good and sufficient.

papers.md
+try out the above process for JOSS papers and see how it should be modified for
+longer form papers. In general, we expect to closely follow the process used in
+the SymPy project to author [this paper](https://github.com/sympy/sympy-paper).
+
@fperez
fperez Dec 23, 2016 Member

remove spurious whitespace at end

@ellisonbg
ellisonbg Jan 6, 2017 Contributor

fixed

@fperez
Member
fperez commented Dec 23, 2016

This looks overall great! Minor comments above from me, but once those and the rest are dealt with, happy to see it go in (good to wait for a bit broader feedback, given this isn't super urgent and the holidays may slow folks down).

Thanks for pushing this through and to the sympy team for clearing the trail for us :)

@Carreau
Member
Carreau commented Dec 23, 2016

+1 to Alphabetical, There is anyway a section where we describe who did what, which I think is important.

@minrk
minrk approved these changes Jan 2, 2017 View changes
@damianavila

There are minor things highlighted by Fernando and Steve but otherwise, LGTM.

@ellisonbg
Contributor

OK all review comments addresses.

@jhamrick

👍 Looks good to me!

@ellisonbg
Contributor

I have found that many folks (students) don't understand academic publication and why they might want to participate. Because of that I have added a new paragraph:

"In addition, the Coordinator should send personal emails to individuals, such as students, who are new to writing academic publications, explaining why we are writing a paper, and encouraging them to participate. These personal emails should cc the individual's mentor or advisor if applicable."

In the past, I have had students asked to be on a paper, and they didn't know what that even meant. It is extremely important for us to go out of our way to make sure students join our papers.

@fperez
fperez approved these changes Jan 6, 2017 View changes

With current fixes, I'm +1 on this. We can then iterate on real-world usage and tweak later based on that experience. It's a great start.

Thanks!!

@parente parente requested review from parente and removed request for parente Jan 7, 2017
@parente
parente approved these changes Jan 7, 2017 View changes
@SylvainCorlay SylvainCorlay self-requested a review Jan 7, 2017
@rgbkrk
rgbkrk approved these changes Jan 7, 2017 View changes
@minrk
minrk approved these changes Jan 19, 2017 View changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment