# What should we teach about writing/publishing papers in a webby world? #199

Closed
opened this Issue Dec 5, 2013 · 68 comments

None yet

### 31 participants

Member
commented Dec 5, 2013
 What can/should we teach people about writing/publishing/reviewing (i.e., the last lap of every scientific project)? Clearly interacts with reproducible research, open access, etc.; what mechanics/tools should we demonstrate/advocate? See also #172.
referenced this issue Dec 5, 2013
Closed

#### What do we teach about documentation, and to whom? #172

commented Dec 5, 2013
 A demonstration of how arXiv works—or any other preprint server—would be invaluable, as I am certain that many people aren't following these things yet. SWC already has some discussion of licensing and open access which can be readily extended to cover papers. We could find a good list of open-access journals (COS or EFF probably has one). Reviewing may be hard to cover effectively since it works a little differently by field. Plus that really is something that you should be dealing with with your advisor (if you are in academia). Neal Davis
 Give an excercise where they read one paper that is reproducible research and one that isn't and need to interact with them in a similar manner (Answer a deep question? Extract some data for a meta-analysis?). Let them feel how much more usable reproducible research is.
commented Dec 5, 2013
 To make papers more suitable for code-review on GitHub, we use ReStructuredText to write the SciPy conference papers. The conference tools are aimed at a whole proceedings, but I just reworked these tools for a single paper on scikit-image we're writing: https://github.com/scikit-image/scikit-image-paper I also have some tools for formatting papers for uploading to Arxiv, which I think is a particularly handy thing to be able to do: https://github.com/stefanv/arxiv_tools (In this case, my paper was in LyX format, but it is trivial to modify for pure LaTeX or other formats).
Member
commented Dec 5, 2013
 @stefanv - Thanks for bringing up the way SciPy papers are done, very forward-looking!
Member
commented Dec 5, 2013
 On the literal act of writing, I think the biggest hurdle, and most important teaching point, is simply the idea of writing papers in plain text (regardless of the markup language). This is probably a big enough jump as it is for most introductory bootcamps. There are lots of advantages to plain text, as we know, but in the context of the bootcamp, one of the biggest is that it provides a really good use case for version control, and so it ties in nicely to the other bootcamp materials. The elephant in the room for students, of course, is (a) why they should change to a practice (leaving Word) that will be viewed as strange and potentially difficult by other collaborators, and (b) more specifically, how they will interact with collaborators who only use Word for track changes and commenting. I don't know that I have good answers to either of these questions. On the practical act of publishing, I'd rather take the time to explain self archiving (i.e., author accepted manuscripts) rather than pre-print servers, as I think the former is more supportive of open science, especially in fields where pre-prints are not widely expected nor read.
Contributor
commented Dec 8, 2013
 what mechanics/tools should we demonstrate/advocate? As usual, I would reply: The IPython Notebook! I didn't realize you could format text so nicely around code, until I read some of Jake Vanderplas's blog posts which were "written entirely in the IPython Notebook" (@jakevdp).
commented Dec 9, 2013
 From my point of view, an important part on writing papers is the way we handle references. I'm always surprised the different ways the people do to handle the references they used (mainly based on their memory - the one in the brain, not the computer one). Almost no-one in my environment uses the advantages of the web or new technologies in their favour to find the reference they want, or the thing they read... (I mean pdf search, metadata classification of papers, etc). Tools like zotero, Mendeley and many others that also simplified collaborations, or even simple ones without the social stuff like jabref together with bibtex for LaTeX makes writing papers a lot easier. I wonder if there's an easy way to integrated bibtex directly with rst or markdown (or even ipython notebooks).
commented Dec 9, 2013
 @dpshelio About "I wonder if there's an easy way to integrated bibtex directly with rst or markdown (or even ipython notebooks)." I don't think so.
Member
 I wonder if there's an easy way to integrated bibtex directly with rst or markdown (or even ipython notebooks). Pandoc can handle bibtex citations [1]. See the makefile in [2] for an example of how we use this for writing papers.
Contributor
commented Dec 9, 2013
 There are a few things we could mention to boost people efficiency in writing (even if these are not necessarily linked to the "webby world"): using git (obviously) guidelines on writing with git, like: "1 sentence per line rule", "never reformat the sources with your editor", "handling non-git co-authors (branch with their version)", "checking the text editor behaves well with files modified from outside", … tips when using latex: "use latexdiff with git", "avoid typical mistakes like double spaces « ~», … a few important general writing tips: "avoid long sentences", …
 @ethanwhite nice one!! thanks!!
 Instead of jumping straight to the final paper, it might be better to get people thinking first about writing reproducible reports, e.g. using knitr (in R) or ipython notebook. Such reports are useful for gathering together key ideas and disseminating these to coauthors for discussion, before producing a full blown paper.
Member
 I like this idea and also think there would be interest among bootcamp attendees. And even if it can't be covered in a standard 2-day bootcamp, I think it would be great to point them to a resource that they can use months after the bootcamp, i.e. once they are comfortable with git. Actually, I am the perfect target for this lesson. I have adopted SWC principles and am striving to work in the "open." The biggest impediment I currently have is the licensing. For example, I'd ideally like to have my code, some summary data files, and the paper all in the same repository. While I know that I am fine with putting the code under the MIT license, I am pretty confused how to license the data and paper. Is the Creative Commons Attribution License sufficient to prevent others from publishing a paper that uses my data*? *Of course I will make it free to use upon publication.
Member
commented Dec 19, 2013
 On Thu, Dec 19, 2013 at 09:10:34AM -0800, John Blischak wrote: While I know that I am fine with putting the code under the MIT license, I am pretty confused how to license the data and paper. Is the Creative Commons Attribution License sufficient to prevent others from publishing a paper that uses my data*? *Of course I will make it free to use upon publication. I think "Copyright \$x, all rights reserved." is the safe bet for stuff you don't want others reusing. You can always re-license once you get the paper published.
Member
commented Dec 19, 2013
 Is the Creative Commons Attribution License sufficient to prevent others from publishing a paper that uses my data*? In most situations, yes, it is sufficient to keep somebody else from publishing your work, since any reputable journal will refuse to publish work that has already been published or written by somebody else. The problem with this approach is that many of today's journals will refuse to publish your work if you've already released it somewhere else, especially under a license granting reuse. As @wking comments, reserving copyright is the simplest approach here.
Member
 http://yihui.name/en/2013/10/markdown-or-latex/ may be relevant...
 Jumping in here ... There seem to be a few main points emerging here ... questions about whether we're trying to teach more about the end result, or change practice leading up to that final write up. There's been a lot of work in the open science / scholarly communication circles around various aspects touched on here - tools, workflow hacks, discussion of new forms of publishing more reproducible research. I've written up a blog post to see if we can involve some of those experts in this discussion ... http://mozillascience.org/what-should-we-teach-about-publishing-on-the-web/
commented Jan 6, 2014
 Well, for openers, the equivalent to Christopher Gandrud's book https://github.com/christophergandrud/Rep-Res-Book in Python, perhaps facilitated by Dexy.it https://github.com/dexy/dexy and Pandoc.
 This is really interesting. I think it could be worth taking a step back and re-phrasing the question a little. Is the object to teach those building tools about publishing in general (ie what tools and hacks might be useful to create) or is the focus here specifically on how to get better incorporation of code into published work? I think the latter is the focus but it might be good to be explicit. On that basis it would be good in my view to touch on some background in literate programming to give people a bit of context and then look at various authoring tools (KnitR, IPyNB, Sweave, Dexy...others presumably) alongside various code repositories and data repositories in that light. This would then provide a way of thinking about the available tools as a way of telling the story, which is different to how they are generally used in practice to manage code and records and actually do the work. It's a personal bias but I'd also be inclined to spend some time on the sausage making of the publishing process and why it doesn't fit with what the tools above. What gaps are there? How could they be filled? What would the optimal system look like? What formats would be used? That's a bit chunky but its the way I'd approach it.
 I welcome ideas to make the writing of "papers" easier and to facilitate joining or reproducing the writing process, but in a webby world that is aware of version control, why not take advantage of that for updating scholarly knowledge directly, in one place, forking only when really necessary? At present, whenever you come across an interesting article, there is basically no way to predict where the next article on the subject is going to be published, but if there were already a reasonably good article on that topic and it were publicly available, openly licensed and version controlled, there is no reason why new materials relevant to it should not be added as they become available. We could "watch" it the way we watch GitHub repos or Wikipedia articles, and we could engage with updates much more directly than via static stand-alone documents. In order to "publish" our research, do we really need to write (and review) a ten-page narrative summary thereof if it is available (assuming reasonable long-term preservation) from open notebooks, data and code repositories in maximal detail and could be contextualized and made more widely known by simply inserting a few words, paragraphs, illustrations, equations or lines of code into an existing article with a slightly broader focus?
 Hi all. Specifically regarding the publication of software, the Journal of Open Research Software (of which I am managing editor) has devised a checklist as part of our peer-review process that might be useful: http://openresearchsoftware.metajnl.com/about/editorialPolicies#peerReviewProcess
commented Jan 6, 2014
 I would recommend looking at literate programming as Cameron suggests. (Ipython notebooks is great but it would be good to say that there are others out there.) Finally, using a workflow tool (taverna, knime, wings, galaxy) to chunk codes together in to understandable pipelines is useful when sharing reproducible research [1] Another thing would be to suggest good practice around attribution in both code and documentation. It's good to discuss the ability to use plain text, either through markdown or latex. https://www.authorea.com is a nice tool (although commercial) for demonstrating this.
commented Jan 6, 2014
 Oh one more thing, you might like to talk about the importance of stable URLs and having urls that don't change or won't magically disappear when you move departments or whatever. See data services or owning your own domain name.
commented Jan 6, 2014

Lots of interesting directions here. A few thoughts:

I think the first thing to teach would be best-practice platforms for publishing code and data independent of the rest of the publication process (dynamic documents etc can wait). Most journals being what they are, changes to workflow there are much harder and I think ultimately much less valuable than teaching people how to publish data in a permanent archive with complete and accurate metadata, and how to publish code with version control and metadata.

### Plain text for publications

If the goal really is to address preparing publications for journals, I would focus on plain-text publication tools (markdown/pandoc, possibly LaTeX) along with version management/collaboration tools (Git/Github). Getting more folks off of Word and comfortable with any alternative would be the single biggest value, spur more innovation, ease collaboration for the rest of us ;-) The remaining effort effort should be spent addressing pain points in the process.

### Pain points:

In my experience and reflected in the comments above, these are some of the major pain points in text-based publishing. Teaching anything that addresses these challenges would be really helpful.

#### Journal submission software

The paleolithic nature of things like Manuscript Central today is a real barrier to a more modern and web-native workflow, making almost all advancement in this area more of a hack then a solution. That said, as far as I've seen most will take pdfs at least for the initial review, or do a reasonable job with tex files (when restricted to 1990s era tex).

Probably worth mentioning journals/prepublication platforms that don't suffer these limitations (e.g. figshare, arXiv -- who knew you could zip the code and data with your arxiv paper?)

#### Collaboration

Collaborating with others using Word or some other platform is as annoying as it is inevitable. The problem isn't limited to Word -- in two of my own manuscripts I'm writing in R-markdown some collaborators edit the tex file. Others would rather just write notes with pen on a print-out anyway (or with a stylus on the pdf), making the software choice rather irrelevant anyhow. Beyond that I don't have any great suggestions here but am happy to learn!

#### Citations

Even with the host of reference managers, citations are unnecessarily annoying and collaboration can be difficult when everyone has a different preferred reference manager. What bothers me most though is it just feels archaic.

In a web-native world, citations are links (Fenner 2010). (Preferably using permanent identifiers, and ideally with semantics). The remaining bibliographic information can be automatically generated from the link by any reasonable tool (e.g. using Crossref APIs for DOIs), it shouldn't be the author's concern (who should be free to worry about the semantic reason for the citation, (e.g. cito:critiques) rather than the article page numbers). Unfortunately the platforms generating citations from links aren't as developed as they might be.

### Dynamic Documents

knitr, Sweave, ipython, Dexy, even Make etc are wonderful tools that are worth exposing researchers to, but perhaps it is more important to teach the concept than the particular implementation in this case. Dynamic documents can introduce additional challenges to collaboration and additional gotchas (via caching, etc)

commented Jan 6, 2014
commented Jan 6, 2014
 @cboettig Nice analysis. Yet we all want move to a webby world there are some journals/conferences, most of then from fields outside math/computer/engineer where complex equations didn't exist, that will only accept Words files (you have been blessed if it accept ODT). For this cases we should start saying that some tools (e.g. pandoc) can convert from Markdown to Word. I had this type of problem last year.
Contributor
commented Jan 7, 2014
 An interesting discussion in particular as I am right now preparing a course about modern technologies for research aimed at PhD students. TL;DR: I mostly agree with @cboettig. I decided to focus on immediately useful stuff, and end with an outlook on upcoming promising technologies which today's young scientists will have to know about if they plan to do a career in science. Publishing SciPy-style is definitely in the second category because a PhD student will have to work with mainstream journals in the relevant time frame (three to four years). I will, however, present and recommend a plain-text-with-version-control approach for doing research, including keeping notes. It's only for the formal writeup that I think we have to stick to traditonal techniques for a while - which unfortunately means Word in the disciplines where it dominates. If anyone has a good idea for collaborating with Word users while sticking to decent tools, I'd be eager to learn about it. It's the most frustrating aspect in my collaborations with biologists.
Contributor
commented Jan 7, 2014
 One more comment about publishers and Word: in my experience, they are happy if you send them a Word file, whatever its contents. Just paste your Markdown text into Word and submit it. It's only when you need formulas (maths or chemistry) that this approach breaks down. The technical editing staff at major publishers actually does a very good job and can deal with anything that's reasonably clear.
Contributor
commented Jan 7, 2014
 And one more question. For writing papers, the publishing system imposes lots of constraints, but there is complete freedom for producing slides for presentations. Is anyone aware of a useable plaintext-based system for generating slides, other than the various LaTeX packages? The condition that excludes most of the simple tools is the possiblity to integrate images, plus ideally mathematical equations.
commented Jan 7, 2014
 @khinsen AFAIK you can use pandoc to convert from Word to Markdown to. If you can test and give us a feedback of how Markdown -> Word -> Edit -> Word -> Markdown works will be great. About slides for presentations you can try some Javascript/CSS library and write HTML (yes, I know that HTML is not the best plaintext format). For mathematical equations you can use MathML or LaTeX with the help of MathJax.
Member
commented Jan 7, 2014
 +1 to the various Markdown+MathJax slideshows out there. I've had good luck writing in Markdown+MathJax, and using pandoc to convert to one of the slideshow formats. As @r-gaia-cs mentions, pandoc can try to convert a number of different formats to Word, but it has very limited abilities to handle complex formulae. I think you already know this, but pandoc is also the underlying engine beneath IPython's nbconvert tool.
Contributor
commented Jan 7, 2014
 @r-gaia-cs My biologist collaborators use the revision tracking system in Word. From what I could find about pandoc conversion, this information doesn't survive, so I don't think pandoc is the solution for me. However, I could at least use it to write my own contributions which I could then convert and paste into the master file - I will try this next time. @ahmadia Do you have an example of slides in Markdown+MathJax?
Member
commented Jan 7, 2014
 @khinsen: this is a very limited demo, but: http://aron.ahmadia.net/pyhpc/petsc4py-tutorial-slides.html Here's the corresponding source: https://github.com/pyHPC/pyhpc-tutorial/blob/master/markdown/scale/petsc4py-tutorial.md
Member
commented Jan 7, 2014
 @jkitzes wrote: The elephant in the room for students, of course, is (a) why they should change to a practice (leaving Word) that will be viewed as strange and potentially difficult by other collaborators, and (b) more specifically, how they will interact with collaborators who only use Word for track changes and commenting. I don't know that I have good answers to either of these questions. And aye, there's the rub. Word is easier to use for normal tasks (like writing a paper with bullet points and italics) than Markdown, much less LaTeX --- it's only Stockholm Syndrome that makes us believe otherwise :-). And as long as both senior faculty and journals require people to submit Word (or PDFs derived from specific Word templates), it's hard for us to say, "No, really, version control is better in the long run," because the long run ends in you wrestling with Pandoc to try to get it to format things the way some particular conference requires. (True story: I submitted the outline for our upcoming SIGCSE workshop to the ACM using their LaTeX template. During the holiday break, I got mail telling me I had to re-do it using their MICROSOFT WORD template (their capitalization), which of course LibreOffice couldn't load properly.) So: given that the end product must be acceptable to senior profs and journals, and that markup-based tools impose more cognitive load on newcomers than WYSIWYG alternatives (i.e., the payoff for switching is tomorrow, the pain is today), what's our path forward? What can we teach in an hour that the average biologist will find compelling?
Member
commented Jan 7, 2014
 HTML slideshow packages are a great example of the disconnect Philip Guo talked about in his Two Cultures essay: Users want a whiteboard that lets them mix text, drawings, tables, and other kinds of information. Programmers want a format that they can edit with Vim and store in version control. Yes, programmers can use that format to put a callout beside a table with an arrow pointing to a circled cell and a picture of a kitten beside it, but it's a lot of work compared to just WYSIWYG'ing it in PowerPoint, Keynote, or what have you. As with markup-vs-WYSIWYG for preparing papers, I think the distinction is between people who look at text littered with strange symbols and "see" the final (compiled) product, and people who want to directly manipulate that final product without having to mentally compile it (or reverse-compile it). Now that the element is widespread, there's no reason why we couldn't create an authoring tool that would let people generate HTML5 slideshows without mental compilation and typing lots of strange symbols. My suspicion, though, is that those slideshows wouldn't be any more diff'able or merge'able than IPython Notebooks, i.e., they'd be almost as hard in practice for version control to work with as what we have today. They would therefore fail to satisfy end users ("Why should I switch? It only does half of what Keynote does!") and programmers ("Why should I switch? I still can't merge, and your composition tool doesn't have Vim bindings!").
Member
commented Jan 7, 2014
 @gvwilson - The file extension tells me a lot about what somebody wants to do with their work: .pdf - share it with colleagues in print, or maybe a journal/arxiv .doc - collaborate with colleagues over email, maybe submit it to a journal with crazy requirements .html/.md - share and collaborate with the world I think this is less about the authoring process, and more about the sharing and collaborating process. I have yet to encounter a scientist who defended Word for working with lots of collaborators and versions. Their track changes features simply don't scale. PDF goes everywhere, but is not easy to edit/version. HTML goes everywhere, is easy to version, and is slightly painful to write. Markdown is a compromise, but it's a good one, and we'll see better WYSIWYG editors for slideshow presentations in the future.
Member
commented Jan 7, 2014
 @ahmadia wrote: I have yet to encounter a scientist who defended Word for working with lots of collaborators and versions. But that's a non-issue. We have to convince people to switch when working in the small, because that's the normal case for most scientists. At least, I think it is: does anyone have a histogram of how many papers are written by how many authors? HTML goes everywhere, is easy to version, and is slightly painful to write. Markdown is a compromise, but it's a good one, and we'll see better WYSIWYG editors for slideshow presentations in the future. It's easy to sell futures on the stock exchange; it's much harder to sell them in a classroom... :-(
commented Jan 7, 2014
 Even if folks are writing their papers in Word, I still think version control is a useful tool when paper writing, because there is so much more to writing a paper than just the final document, e.g. results files, figures, images, correspondence, submission documents, as well as any scripts you use to do analysis and generate other assets. You may not be able to use 'git diff' on a word doc but you can use it on many of these other things . And even then, under VC you can still checkout an older copy of your paper, and use Word's compare feature to do the diff. Plus you get the benefits of having a log of your changes, easy backups (e.g. git push) and rollbacks, etc. The point I'm making is I think the benefits of version control when paper writing are worthwhile despite the fact that word files don't diff easily. I'd also like to suggest that teaching folks to use knitr or ipython notebooks or even just to create scripts to generate figures[1]) can be a really useful thing. I've been showing people how to use rstudio to create a draft of their paper in markdown to leverage the power of knitr. Even those that don't draft their paper in markdown but just use it like an ipython notebook get value out of being able to build up a document of figures and tables which they can paste into their word documents[2]. It's not perfect, but I'd argue it is better. Jon. [1] I work in a research hospital where many people use R but rarely write scripts... The good students keep a word document with code that they cut and paste into the R REPL. ugh. On 01/07, Greg Wilson wrote: @jkitzes wrote: The elephant in the room for students, of course, is (a) why they should change to a practice (leaving Word) that will be viewed as strange and potentially difficult by other collaborators, and (b) more specifically, how they will interact with collaborators who only use Word for track changes and commenting. I don't know that I have good answers to either of these questions. And aye, there's the rub. Word is easier to use for normal tasks (like writing a paper with bullet points and italics) than Markdown, much less LaTeX --- it's only Stockholm Syndrome that makes us believe otherwise :-). And as long as both senior faculty and journals require people to submit Word (or PDFs derived from specific Word templates), it's hard for us to say, "No, really, version control is better in the long run," because the long run ends in you wrestling with Pandoc to try to get it to format things the way some particular conference requires. (True story: I submitted the outline for our upcoming SIGCSE workshop to the ACM using their LaTeX template. During the holiday break, I got mail telling me I had to re-do it using their MICROSOFT WORD template (their capitalization), which of course LibreOffice couldn't load properly.) So: given that the end product must be acceptable to senior profs and journals, and that markup-based tools impose more cognitive load on newcomers than WYSIWYG alternatives (i.e., the payoff for switching is tomorrow, the pain is today), what's our path forward? What can we teach in an hour that the average biologist will find compelling? Reply to this email directly or view it on GitHub: #199 (comment)
commented Jan 7, 2014
 @khinsen Since you asked, we've had some success using Remark for doing HTML slides in Markdown. E.g.: http://cournape.github.io/davidc-scipy-2013 You can use MathJax with it, as well as print to PDF.
 From the less technical and more editorial perspective, I'd say the key issue is that authoring needs to be done with reproducibility and re-use in mind. So, even if you are working in Word, the starting point needs to be one of preparing information for the person who wants to re-use your 'research objects', not just read a narrative about them. And if you're talking about educating people who are already of a technical mindset, this should be a relatively easy point to make.
commented Jan 7, 2014
 @gvwilson @jkitzes Great points that cut to the heart of the matter; hence my initial arguments that SWC should first focus on publication of code and data with appropriate metadata, which is a natural context to introduce plain-text-based scientific writing (and probably the experience from which many of us first realized it might make sense to do the same for manuscripts). The reason to adopt a plain-text (version-controlled, online collaborative) workflow is the same reason software carpentry teaches everywhere else: it will save you time. Yes, it makes collaborating with Word users potentially more time consuming, while making your own writing and other collaborations less time consuming. If the arithmetic comes out in your favor and you save time, great. If not, stick to Word. (Or develop in markdown and then paste/pandoc into Word for editing and revisions). This is what I and no doubt many on the list do -- use markdown or latex for the time-saving, headache-reducing benefits it provides, and switch that to Word (or tex or Google Doc or whatever) if or when the transaction costs of collaborating become too high. Otherwise we risk painting the false dichotomy and echoing every flame war between choices of software or programming language. I believe SWC students should simply be given the basic skills to author scientific documents on the web in plain text, and they can then choose the appropriate medium based on context.
referenced this issue Jan 7, 2014
Closed

#### Plain-text presentation software #219

 Lots of great thoughts here and many people have hits some of the points that I would myself. I figured I would fill in with a little detail on how we (the IPython team) see the IPython Notebook in relationship to writing academic papers. There are a couple of different scenarios that we are thinking about: [quick point first: the first step should always be to create a public git/hg repo somewhere and put everything related to a paper in it. If it is not in a repo, it doesn't exist!] IPython Notebooks as computational companions In this usage case, the IPython Notebooks are not the papers themselves, but are used to generate and document the computational aspects of a work. We imagine that a user would create GitHub repo with the data and some Notebooks and then link to those resources through nbivewer from the actual paper. This is a great starting point because you can still use any process/tools (LaTeX, Word, etc.) to write the actual paper. The only difficulty with this is that you will likely want to generate your figures using a Notebook and then incorporate them into your paper. There are a couple of routes for this: 1) use Dexy - it already has some integration with IPython Notebooks that can extract figures from a Notebook a put them into another document, 2) IPython.nbconvert - it is not hard to use IPython's nbconvert utility to export the figures in a Notebook to external files. IPython Notebook as the paper itself - "lightweight version" If you are writing a paper for a journal that accepts LaTeX, you might be able to use IPython.nbconvert to produce your paper directly from an IPython Notebook. The Notebook/nbconvert now supports bibtex managed references and the nbconvert template system is general enough to accommodate a journals LaTeX standards, styles, classes, commands, etc. Many of the papers that I plan on writing in the near future fall into this category. There are two downsides of this: 1) if you can't submit LaTeX to the journal you are out of luck (go back to scenario 1) and 2) you are writing in Markdown, which lacks many of the features of LaTeX. IPython Notebook as the paper itself - "serious LaTeX" In some cases you actually need many of the more advanced features of LaTeX. For me this was the case when I was writing lots of papers in Physical Review Journals. Lots of numbered equations, equations/section references, alternating single/multi-column layout. You can fake this a little bit by using IPython "Raw Cells" which are just dumped verbatim into the LaTeX. But that only goes so far. If you hit this limit, I think the current best option is to use Dexy to write the paper and pull resources from Notebooks. I should note that there is still a lot of work to be done on IPython and Dexy to better support these work flows. We plan on making some additional changes to nbconvert to improve these usage case and I am sure than @ananelson is more than willing to improve Dexy where needed.
Contributor
commented Jan 8, 2014
 @ahmadia Thanks for the example. How was it generated? @stefanv Thanks as well... even if your slides don't work for me (Firefox 26). I see a piece of Markdown text flashing on the screen, to be rapidly replaced by an all-white page. @wking Even more thanks - my comments are on the new issue you created. My impression from these examples, and my own research, is that doing scientific presentations the HTML way is possible if you don't mind learning some arcane syntax and managing a build process. In other words, it's pretty much the same as doing slides using LaTeX (Beamer etc.). So, given that scientists in some domains (physics for example) end up learning LaTeX anyway for their publications, what are the advantages to be expected from using the HTML/Markdown approach, other than fancier visual effects?
Contributor
commented Jan 8, 2014
 @gvwilson The "two cultures" argument holds much less for presentations than for other subjects. In the "two cultures" divide, I am clearly on the programmers' side. I need very good reasons to choose something else than Emacs plus associated command-line tools. But I am not happy with anything I have tried for doing presentations this way. When I switched from Linux to the Mac ten years ago, I started exploring Keynote, and ended up using it for most of my presentations, swearing at it from time to time because many tasks required too much work compared to LaTeX. And I still used LaTeX for maths-heavy presentations, swearing at it from time to time because of the endless time spent to get things exactly in the place where I wanted them. Recently I got a new Mac with MacOS X 10.9 and the new Keynote that comes with it. It can't open my first representations any more, telling me to convert them using the previous version of Keynotes - which is not available any more. I was always afraid of vendor lock-in with Keynote, but didn't expect Apple to drop compatibility even with its own earlier formats. I decided to stop using Keynote, but I have yet to find a successor. My colleagues around me are of little help: some have the same experience and profile as I do, and come to the same conclusions. Others, with less programming experience, don't even consider anything else than PowerPoint or Keynote, but aren't happy either. I do have some ideas of how to design and implement something better, but absolutely not the resources to do it. So for now, my best option for happiness is doing as few presentations as possible ;-)
 @khinsen perhaps http://slidewiki.org/ may be wort a look?
Contributor
commented Jan 8, 2014
 One more idea related to the "two cultures" and transmitting the "programming culture" to scientists. There is the famous quote by Alan J. Perlis: "It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures." That's exactly the principle behind plain-text tools. There's a single data structure (plain text = a list of lines of characters), with lots of functions (i.e. programs) operating on them. I don't expect to convince students by quoting a famous person, of course, but putting the specific situation (the use of plain text formats) into a wider context might be of help. The Perlis quote puts it into the context of programming, which again is probably not of much use for non-programmers, especially since it takes a lot of experience to see why Perlis was right. But perhaps we can come up with some analogy from somewhere else?
Contributor
commented Jan 8, 2014
 @Daniel-Mietchen Thanks for the pointer, SlideWiki does look interesting. And it's developed specifically for education, which is a welcome change compared to systems designed for business presentations. There are three features that make me hesitate about SlideWiki, although I have to admit that I have spent only ten minutes exploring the site. First, everything is editable by everyone, Wiki style. That's fine for some applications but not for others. I think I'd prefer a Github-style approach, where everyone has full control over his/her personal version. Second, everything is stored on the SlideWiki server. If the server is down, or worse, if the service disappears, all the content becomes inaccessible. Again, I'd prefer a Github-like approach where people keep personal copies on their own machines and the Web service only handles the collaborative aspect. Third, the only way I found to download a presentation is in "SCORM format", which I have never heard of before. It's just a zip file containing HTML, CSS, and some bookkeeping files, so I could probably figure out how to work with it, but I don't have an immediate and evident solution for how to play the slides on my own computer when I have no Internet connection.
commented Jan 8, 2014
 @khinsen About So, given that scientists in some domains (physics for example) end up learning LaTeX anyway for their publications, what are the advantages to be expected from using the HTML/Markdown approach, other than fancier visual effects? SEO (Search Engine Optimization). AFAIK it's easier to someone write a crawler for HTML pages than PDF files and every researcher should hope that their works are findable.
 As someone who comes from the user side, working in clinical research and evidence synthesis, I would be looking first for efficiencies in connecting all the pieces together. The field has a number of mature, powerful tools for various parts of the research/authoring process, but they do not talk to each other (this was a topic that came up in recent conversations at the Cochrane Colloquium Symposium, exploring ways of increasing efficiency through automation). So that is an area that requires individual and collective development. The ability to pull together pieces by different authors and output from different programs, using conversion scripts and automated clean-up, would allow for a more distributed authorship, production of the document in stages, and would also accommodate different levels of technical skill and interest. A document could have one or more coordinating authors, responsible for converting, compiling and managing the repository. As an aside: The wedge for the introduction of writing in text/markdown may be mobile computing. The realization that I could bounce writing in progress from OSX to an iPad and then to Windows7, display it to suit me on each one, have the very same functionality, and not lose changes I had made, made a convert out of me. Citations will have to work a whole lot better than they do right now, though, for me to convert completely.
 @asinclair Can you contact me ana@ananelson.com? Would like to hear more about what sort of pieces you want to be able to connect together in an automated way.

What should we teach about publishing on the web? I ended up writing a bit more than I expected, so here are the main peices of advice:

tl;dr:
- use a reference management tool
- try to find the fastest venue to publish in
- try to publish in an OA journal
- have a look for the best preprint server for your discipline, and add your work there too (might be a university archive)
- add as much supporting material as you can to the right locations, e.g. github for code, figshare for anything, vimeo or you tube for videos
- don't be afraid to screw around with copyright transfer statements
- use version control for your own sanity
- remember that all the time you spend pretty formatting your paper will be ignored and thrown away by large publishing companies, especially the work you do on reference formatting, so don't do it
- if the collaborative environment of your choice is not working for the group, be pragmatic, drop it, get the damn paper finished already

I would start by advising people to keep in mind the goals of publishing. You want to get your work out into a venue that will be respected by your peers, and noticed by them. In most cases - but not all cases - this will be a journal published by one of the large STM publishers. Elsevier, Springer, Wiley, Taylor & Francis, PLOS and Sage represent a very large part of that market.

You want this process to happen as quickly as possible. Aside from the act of writing, and constructing your story, the act of publishing - getting it onto the web - is pure schlep. Every minute longer that you spend in this process is a minute wasted, as it's not adding value to your research or your ability to put yourself in the position of being able to get the resources you need to do the research you are interested in.

Your first priority is to understand the most appropriate venue and then understand the system that this venue uses to get the work online. Tailor your process to lower the friction between the artefact you create and the process that will be used to get it online.

The great failure of my industry in the face of the web has been to make allow this process to remain orders of magnitude harder than publishing a post on blogger or wordpress.

I'll step through some advice covering these topics now.

# The most appropriate venue

Ask your colleagues, confer with your coauthors, it's usually not hard to determine. A tool like the Journal author name estimator has been around for years and it can suggest a journal based on the text of your abstract. In addition the following resources can also help Journal Finder, http://www.edanzediting.com/journal_selector, http://www.journalguide.com/ and http://etest.vbi.vt.edu/etblast3. Most of these are for the life sciecnes.

If your publication is an OA publication the Eigenfactor Journal Rank tool will tell you if you are getting good value for money. This ranks cost of the article processing fee against a rank of the journal determined by their own algorithm.

## Speed of publication

It might be worth checking if there is an alternative venue that might be a lot faster than your first choice.

A common approach is to submit to a high profile journal, and on rejection submit to PLOS one. This is done in order to reduce the thrashing around within the peer review system. Perhaps consider submitting to PLOS one first? You could also look for a journal that is smaller, and might be more responsive. In the life sciences the journal I work for - eLife - is both prestigious and fast.

For the life sciences Anna Sharman has a great resrouce for a selection of journals giving information about decision times, OA charges and journal metrics.

It might be interesting to encourage people attending your courses to contribute to these, or to create similar resources for their own disciplines.

## Preprint servers / archives

Your discipline may have a discipline specific archive. Make sure a copy of your work is deposited there. If the full stext is deposited in one of these venues Google Scholar will be able to provide readers with a link to a full text version of your article - even if you have had to publish in a paywalled journal.

Often you can get your work in draft up there before the peer review process is complete (if that's considered Kosher in your field). This can give you priority on an idea, even before the idea has been formally reviewed.

Also, check with your university library and find out what archives they run, deposit there for the same reasons as above.

Keeping control of your own content is a significant advantage that authors can derive from publishing in an OA journal. I'll touch on that a bit later.

Currently - as of writing this post, the Google main search bot does not index content that is behind an academic paywall for users who do not have access. That means if you publish at an non paywalled venue more people have a chance to find your content.

Now most of your immediate peers will probably be able to access your content by virtue of having it in either the appropriate venue or in an appropriate repository, but it can't hurt to make it even easier to find.

If your coauthors will not agree to publishing in an OA venue, you can always try to modify the copyright transfer agreement that the publishing company will ask you to sign.

You can follow these examples to allow you to retain the right to distribute the paper in any way that you see fit. This is the one piece of advice that I'm giving that might slow down the process of publication, but go on, you know you want to do it, don't you?

## What happens to my paper in a big publishing company, and why should I care?

During the reviewing stage a very badly formatted version of your article will be created to be sent to the reviewers of your article. If you have a preprint of your article available, that might even be an easier artefact for the reviewers to use, and it might speed up the review process, though I don't have any evidence to suggest that it will.

If your manuscript is accepted for publication then it will be sent to a large typesetting company, where it will be digitally torn apart and converted to XML. All of the formatting that you do on figures, text and on the reference lists, will be thrown away. I'll just say that again. All of the work and hours you spend carefully formatting your reference lists will be ignored as the content goes through an automated typesetting system. (That's why at eLife we don't have a proscriptive requirement on the format of the references that we get sent, we will take them in any format).

All of your specially chosen fonts, and special text alignment will be mostly ignored.

Depending on the state of the manuscript and the quality of the language in the manuscript it may be checked by a copy editor, either for internal journal style, or for the quality of the language. Much of this work is undertaken by highly educated graduates in developing countries, particularly India, the Philippines and increasingly China - globalisation in action.

Why is this? For the most part the systems that run our global publication infrastructure are old, many of them have code bases that are older than 20 years. Back in the day XML was the only reliable transfer format, and it remains the industry standard today. A slow evolution has been happening with the XML that publishers are using, and under the gentle pressure to deposit into PubMed and PubMedCentral most publishers and typesetters are starting to target one of the many dialects of the NLM DTD. This has become a de-facto standard in the industry, however no writing tools export natively to this format, and the DTD supports, and is designed for, archiving print material. One of the very many consequences of this is that code that is typeset in this DTD is usually typeset as dumb text. On the other hand it does allow a resource like PMC to archive millions of articles, from thousands of publishers, and provide a very fine grained search interface on top of all of this content. I'll mention writing tools a little later.

In order to potentially reduce the time to review your manuscript, and in order to reduce your the time your manuscript takes in the copy editing / typesetting process the following things could help:

• As mentioned -having a preprint version of your article available that the reviewer may know about, e.g. on the ArXiVe.
• If English is not your first language, have the manuscript proofed by a native speaking colleague, or pay to have the proofing done.
• Use a tool for managing your references, and don't sweat the formatting details. Tools you might consider using any of:

Remember, this is probably a lifestyle choice, my main advice is pick a tool that does not have too much lock in. I used to work at Mendeley and believe it to be as good as any tool out there.

## But wait! I want to do iPython, interactive, open data, virtual machines, 3D printed DNA dinosaur replication and what you have just told me sound like like I can't do that - that sucks :(

Yes, yes, it does suck, and I hear what you are saying, but remember, at the moment of publishing, your priority is to get the damn work published, and unfortunately that still means interacting with a system that has changed little since the late 17th century. There are moves in the right direction, oaises of sanity, but there is a long long way to go.

If you feel really passionate about this then the best thing you can do is to keep the rights to your own work, get the paper out as a CC-BY paper in a boring old venue, and then do the kind of publication that you really want to on your own academic home page, and build your own audience around your work that way. In that case you want the boring route to take up as little time as possible.

You should also deposit artefacts of your paper in the best possible place for them. Code to a location like github. Videos to youtube or Vimeo. Images to flickr. Data to Figshare, DataDryad, Zenodo, or one of the very many other subject specific data repositories that may be appropriate for your field.

Try and keep your artefacts well organised, and backed up off of your machine. You can back a lot up to github as part of a git repo, but that's not it's main use case. You can use a service like EverNote, or get a licence for a research specific asset management tool like Projects or LabArchives.

The aim here is to reduce the friction in getting instances of these resources into the hands of others - if you believe that this is a critical part of doing research.

It can also to make it possible to recover this informaiton in the instance of losing your main machine. (I decomissioned my main machine last summer via cup of coffee).

For the purposes of archiving your work you should also check with your institution and library to see if they can provide support or systems. Librarians in many institutions are mustard keen to help, as it provides a way for them to prove value to the academy in a world in which library subscriptions are under extreme pressure. You may find yourself with the problem of having too many options - which is not a bad problem at all.

# Authoring tools, and why does this all suck so much?

I noticed that there was some discussion in the thread about collaborative tools for authoring. Again, I'll just stress, get the work published as soon as possible. This might mean sending a PDF of the article to a publishing house, or having to just send in a Word file.

On the other hand, there are a new generation of online tools emerging for writing, and also tools emerging for writing on the iPhone and iPad. I think we have more viable options now at our fingertips than at any time in the past. I don't believe that there are any serious contenders yet ready to oust the Word/LaTeX duopoly, but it would not hurt to take some of the following for a test drive to help with the authoring experience. It's too broad a topic to go into a detailed review of each one, I'll leave an investigation of these tools as an exercise for the interested reader. The list below is just a smaple, there are a bunch of others out there.

The tool that I see emerging at some time on the horizon, and that I have a lot of excitement for, is the work on the substance reader and composer and eLife lens. What's really nice about this is that to get started you can import NLM XML directly, or markdown via panodoc. It does a great job of separating the view, logic and control of the writing experience, and so it should also be possible to write directly in browser, and export to a publication ready format directly - but some work remains.

In my own ideal world you can submit an idea to a journal as part of a pull request to the publication, peer review takes place in some system similar to how we do code review today. On acceptance the full digital artefact is published instantly. The writing and collaboration happens in almost any tool that the user likes, modifications are synced via something like dropbox. In this world writing tools support offline, as well as online modes, and content logic and views can be assembled independantly. In my ideal world the source is open. We are a little bit away from that at the moment, but there is no doubt in my mind that we are moving in that direction. [this great post by plos] has some great insights discussing what the native format for publihsing on the web should be.

As we are discussing publishing on the web, I thought it might be useful to describe the tools I used to write this post. The body of the text is stored on my machine as a plain text file, and I store all of these in one directory using nvALT to manage them. This directory is also held under a Dropbox account, and I can access the content from my iPhone through a variety of editors, but in this case I didn't use any of these.

For writing this post I used WriteRoom for mac in distraction free mode. I often use SublimeText in distraction free mode too. For some shortcuts in formatting I used TextExpander. To format the links I write the post in markdown, and did the formatting in SublimeText. I previewed the post using Marked. I also used Marked to verify that all of the links were working, at the time of writing. I used the GrabLinks bookmarklet to gather all of the links from this post to add in as a resources list at the end of this post. In order to publish the post on my blog I posted it directly into a github repo using github pages to render the content. You can see the result at my blog where I have cross posted this comment.

# Final thoughts

I realise that I have mostly been answering the question about what shlould people know about the world as it is now, and not so much about what tools or approahces we should advocate to make the world a better place, but I hope that we can have a clear view on what is bad, so that this can help people make pragmatic decisions about how to change things for the better.

# resources

Contributor
commented Jan 10, 2014
 If anyone's interested, here's the repository for my upcoming course: http://github.com/khinsen/FdV-ScientificComputing-2014 For now there is only the material for the first session, which contains an introduction to the subject and a practical session on Git. The Git course is the "novice" course from Software Carpentry, reformatted as slides based on remark.js. The reason for this is that I want to use only those techniques for preparing my course that I also explain to my students. I have no intention of teaching Jekyll or any Markdown-to-HTML converter, so remark.js turns out to be a good approach. The slides are not quite as nice as with some other HTML-based frameworks, but simplicity is more important for me at this stage. In the following sessions, I plan to cover automation and data management, including data publishing (figshare, Zenodo, etc.).
Member
 How did you do the reformatting to remark.js? Manually, scripted, pandoc, other?
Contributor
commented Jan 11, 2014
 First, I reformatted the links to the images using an Emacs macro (a few minutes' work). The other links (glossary, image directory) just took a global search/replace. After that, I had to do some pagination to cut up the text into slides. I did that by hand, while reading through the whole text (which is a good idea anyway before teaching something ;-). I don't imagine doing this automatically because I did try to keep related material on the same screen as much as possible.
commented Jan 11, 2014
 There is also something to be said about using GitHub Issues mechanism to hande replies to referees. That's something I like more and more.
Member
commented Jan 13, 2014
 @IanMulvany - Thanks for the detailed comments, I really appreciate your insight into some of the problems and the collection of tools you linked to. I wanted to quickly point out that many of your comments on publishers seem to be specific to your discipline. There are many publishers within mathematics and the computational sciences that are LaTeX-driven, as opposed to XML-driven, so the formatting and submission rules do not necessarily apply.
Member
commented Jan 13, 2014
 Also, I think there's enough in this GitHub issue to put together a solid article on options and recommendations for publishing as a scientist. @gvwilson and @kaythaney, what do you think?
commented Jan 13, 2014
 @ahmadia I think such a paper would be really great. It could also be an opportunity to talk more about "scholarly markdown", which seems to be gaining tractions. I'm sucre @karthik has tons of good refs on that. Basically, a table comparing the pros/cons of using Word, LaTeX, md, Rmd, ..., to write a scientif paper would be a really great ressource.
Member
 Would someone like to volunteer to be lead author? Seems like an obvious match to PLOS Comp Bio's "Ten Simple Things" collection: http://www.ploscollections.org/article/browse/issue/info%3Adoi%2F10.1371%2Fissue.pcol.v03.i01
 Two further thoughts: It's essential that tools support a field's reporting standards, which in clinical research represent a years-long push to improve the standards, transparency and usability of published reports - See: the Equator Network (http://www.equator-network.org/). I get the impression that's spilling over into other fields. To quote Paul Murrell in his talk at the Joint Statistical Meetings last August: "Don't be a dead end." He was talking about interacting and manipulating figures produced by grid graphics in R, but the principle applies to scientific reporting as well. I spend days manually extracting data from PDFs, because even in its digital form, the traditional scientific article is a dead end. Reporting needs to consider the downstream.
Member
 Please see (and comment on) #303 (a blog post summarizing discussion to date).