# Why publish your software?

There are many reasons to publish your software.

## Why publish research software?

In an academic context, the most important ones are:

1.  Publishing your software will help make the research results that
    have been produced using your software **more reproducible**. This
    is because the published software can be more easily referenced from
    the original report of research results, and also because people who
    want to reproduce the results will be able to identify, find,
    obtain, and run the same software more reliably. In other words,
    published software can become part of the
    [*provenance*](https://web.archive.org/web/20201002115234/https://en.wikipedia.org/wiki/Data_lineage#Data_provenance)
    record for a piece of research.
2.  Publishing your software makes it possible for you to **receive
    credit** for your work on the software, especially if you publish it
    in a way that allows researchers to easily find metadata that they
    can use to **cite your software**.
    
## More reasons to publish your software

There are other good reasons why you should consider publishing your
software.

Users can derive more trust in the software from that fact that it has
undergone a publication process.

This is true especially if the process includes peer review, for example
in the form of documented code reviews, or through more formal peer
review, for example when the software is published in a software journal
that includes review of the software as well as review of the paper.

Ideally, the publication process itself will also make the developers
more conscious about the quality of their software, given that the whole
world will be able to see it, use it, and critique it. And the review
process, if it exists, will provide a means by which the developers get
feedback about their practices.

Another good reason for the publication of your software could be that
publication of the research results that have been obtained using it
requires the software to be published.

While not all academic publishers require publication of software (and
data) at this time, some journals have started asking for code
submissions, for example using tools such as CodeOcean.

Finally, publishing your software can help to make the impact that
software has on research more visible.

No one these days can ignore the fact that software is used in almost
all research disciplines.

If researchers cite published software - and are required to do so -
more and more software contributions to research will become more
visible.

In turn, this will also help the people that create or maintain this
software receive credit for their work on the software.

# How to publish your software?

## Two main routes

Generally speaking, there are two main routes for publishing your
software:

1.  A more traditional publication route in the academic sense, where
    software is deposited in an archival repository in a similar fashion
    to the way scholarly publication of preprints works:
    Software is deposited on a platform such as
    [*Zenodo*](https://zenodo.org/) together with the metadata required
    to cite it: author information, name of the published software,
    version identifier, publication date, and the location where the
    software can be found.
    These software publications are given a unique persistent
    identifier, such as a [*Digital Object Identifier
    (DOI)*](https://web.archive.org/web/20200926012320/https://en.wikipedia.org/wiki/Digital_object_identifier),
    which makes it possible to refer to the software publication in an
    unambiguous way, similarly to how depositing a preprint on BioRxiv
    gives a DOI that can be used to cite that preprint.
    The paper [*"Software Citation
    Principles"*](https://doi.org/10.7717/peerj-cs.86) by Arfon M.
    Smith, Daniel S. Katz, Kyle E. Niemeyer and the FORCE11 Software
    Citation Working Group provides a comprehensive overview of how this
    publication route allows software to be regarded as a first class
    citizen in research.
2.  A more development-centric route is the archiving of development
    versions of software source code.
    [*Software Heritage*](https://www.softwareheritage.org/#) - a UNESCO
    project aiming to build the universal software archive - collects
    source code from open repositories such as GitHub, GitLab, Bitbucket
    and others, and saves it in a distributed long-term archive. This is
    similar to how the Internet Archive, archive.org, saves web content
    in a way that you can refer to versions of web content that no
    longer exist..
    In addition to the automatic collection, you can also request that a
    specific repository be archived.
    This has the advantage that the archived software will remain
    citable through Software Heritage, even if the original repository
    is not.
    Software Heritage does not provide DOIs for archived software.
    Instead, it provides its own persistent identifiers, for whole
    snapshots of a repository, but more importantly also for releases,
    single commits, specific directories, and file contents, even down
    to specific lines of code in a source code file.
    This makes Software Heritage great for referencing software source
    code more generally, but is not ideal for academic citation, as not
    all citation-relevant metadata is standardly provided with every
    software.
    There are, however, ways to get around this issue, as we will show
    later.

Publishing your software together with correct and complete metadata on
a platform that gives you a DOI for it, such as Zenodo, is perhaps the
best choice right now if you want your software to become part of your
scholarly record.

Similarly, if you don't care so much for this more traditional academic
way of publishing, but would rather give people the option to identify
and reference highly specific versions or parts of your software,
depositing it in the Software Heritage archive is probably what you
should do.

## Other ways to publish your software

In addition, there are also some other options for software publication
that each have their advantages and disadvantages.

1.  If you are not interested in academic credit, or cannot publish your
    software with a DOI for whatever reason, putting it in public
    version control is the most basic step to make your software
    available.
    You can, for example, push your code to an open repository on
    GitHub, GitLab, Bitbucket, or another public coding platform.
    This doesn't strictly qualify as publication in the traditional
    sense: there will be no metadata available to cite your software for
    example, unless you put in some effort to provide it.
    And users will need to figure out the exact version they want to use
    by themselves.
    Also, the software is not strictly preserved (you can remove it
    later, or the platform you are using may not survive, e.g. as
    happened with Google Code).
    Nevertheless, putting your source code on one of these platforms is
    a valuable first step.
    And ideally, at some point automatic archiving as provided by
    Software Heritage will mean that your software can be accessed even
    if the public repository goes away.
    Doing just this will, however, make it hard for someone who wants to
    cite your software in a paper.
2.  You could also describe your software in a traditional journal
    paper, perhaps saying something about the concept behind the
    software or its functionality.
    The Software Sustainability Institute maintains [*a list of journals
    that accept such
    papers*](https://www.software.ac.uk/which-journals-should-i-publish-my-software).
    The issue with this approach is that this is not really software
    publication at all, mainly because the software itself is not
    published, and depending on the journal, may not be part of the
    peer-review of the paper.
    Also, when you finish the next version of the software, it may not be
    feasible or even possible to write a new paper describing this
    version, especially if you follow the good software engineering
    practice of releasing early and often, or you have users who want to
    use a version of the code between releases.
    There may still be some corner cases for choosing this way to
    "publish" your software, for example when your institution does not
    allow public sharing of source code, as may be the case for
    system-critical or safety-critical software.
3.  There is actually a very good compromise available between the more
    traditional publication of a paper, and the publication of the
    software and its versions by itself: software journals.
    Software journals have developed over the last few years, and focus
    on the publication of research software.
    Publications in software journals combine a smaller amount of text
    describing the software with software metadata, references to cited
    works, and all the other ingredients for a journal publication.
    They include links to the software source code, and most
    importantly, they provide (ideally open) peer review for both the
    software and the accompanying text.
    The peer review of the software focuses on the quality of the
    software, and the implementation of best practices, such as
    comprehensive documentation and tests, reproducible environments,
    etc., and also on the domain aspects of the software.
    The most commonly used journal of this kind is the [*Journal of Open
    Source Software (JOSS)*](https://joss.theoj.org/).
    However, while publication in a software journal provides you with a
    DOI for the software, and adds important quality checks through peer
    review, it is not software publication as such.
    This means, for example, that only a specific version of the
    software is published and reviewed, but the following versions are
    not, although they may be discoverable through links to the source
    code repository.
4.  Another option for publishing research software is the submission to
    an index.
    There are software indices available for software developed at a
    specific institution, or group of institutions, or for
    discipline-specific software.
    [*swMath*](http://swmath.org/), for example, is an index for
    mathematical software, [*bio.tools*](https://bio.tools/) for
    bioinformatics software,
    [*SciCrunch*](https://codemeta.github.io/codemeta-generator/) for
    software from the life sciences, and the [*Astrophysics Source Code
    Library (ASCL)*](http://ascl.net/) focuses on software developed for
    astrophysics research.
    These indices also provide persistent identifiers for their
    entries.
    DIfferent journals may or may not accept these
    identifiers/citations.One particular advantage of adding software to
    such an index is that they allow the addition of software that has
    not been developed by the people who add the software to the index.
    This is useful if you want a reference to someone else's software,
    but such a reference doesn't exist yet: you can add it to an index,
    and use the provided identifier in your work.
    If your institution or discipline does not have a software index,
    you can also use the Software Heritage Archive to achieve something
    similar: you can [*request the archiving of a source code
    repository*](https://archive.softwareheritage.org/save/), and use
    the respective Software Heritage identifier to reference the
    software once it's archived.

## How to publish your software according to the Software Citation Principles

Coming back to the Software Citation Principles, and the more
traditional DOI-based publication route it recommends, we will now
explain how this works.
The general idea is that you provide the source code of your software as
well as its metadata to a long-term archival repository.
The source code and metadata are archived in the repository, which
returns a DOI that points to the software and includes the metadata.
Usually, both are provided together on a landing page.
There are some important aspects in this publication process that make
it suitable for software publication and citation:

1.  The software metadata includes all relevant information that you
    need to cite the software, such as its name, a complete and correct
    list of its authors, version information, and its publication date.
2.  The software source code can also include documentation and other
    artifacts related to the software. This can make it easier to use,
    for example when attempting to reproduce research results that have
    been obtained using the software.
3.  You can publish all release versions of the software in this way.

To link between the different published versions of software, you can
use a separate DOI, whose metadata can point to the DOIs for the
different versions.
In theory, this could be done manually, for example to link between
versions published with different publishers, but there doesn't exist
any tooling for this yet, other than via citation.
Thankfully, some repository platforms such as
[*Zenodo*](https://zenodo.org/) provide this "parent" DOI automatically
for all versions that are published there.
Generally speaking, this is the preferred way for publishing software,
as it publishes the object itself, i.e., the actual software version,
and not a text proxy.
A remaining challenge for this approach to software publication is,
however, that it doesn't include the quality checks provided by peer
review.
These checks must still be carried out outside of the publication
process, for example in the form of code reviews of new source code
added to the source code repository, or as post-publication audits.
Additionally, only the archived releases of the software can be cited.

# Why you should make your software citable

In the previous section, we have presented some ways to publish your
software.
We have also explained how some of them, namely the publication of the
software itself with a DOI and the publication of software papers,
ensure that the software is citable by providing the correct and
complete metadata.
One example of where incorrect metadata can prevent correct citation is
if you want to cite a software package from the information present in
its source code repository, you may not be able to get correct
authorship information, because some code contributors may only be
identifiable by a username, and some software authors may have
contributed to the package in other ways than committing code to version
control, and may therefore not appear in the repository at all.

This example demonstrates that making your software citable includes
that you provide citation-relevant metadata for it.
And there are good reasons why you should do this, even if it means some
extra work during the preparation of a software publication.

1.  Citable software makes it possible to track its usage and impact.
    The creators of the software can gain some insight about the usage
    of their software and perhaps deduce the impact it's had on research
    by looking at the number of citations to it, and where those
    citations are made.
    These citation metrics may also be important for their employers
    during an evaluation of the software creators' work.
    Other researchers can use citation metrics for software to conduct
    their own research, for example into how the use and impact of
    software in research changes over time.
2.  It may be useful for the creators of software to quantify the impact
    of their work based on citation to advance their own careers.
    Academic credit is still closely tied to how many times a person's
    work has been cited, and until this changes, software creators can
    leverage this tradition by making their software citable.
3.  Making your software citable, as a prerequisite for a better
    practice of software citation, is a step towards promoting the
    status of research software in academia in general.
4.  Making your software citable means that you lead by example.
    It will be hard to cite software as a valid research output and as
    an integral part of research as long as the way software is
    published - or not published - doesn't enable software citation.

We have shown in the previous sections that you can publish your
software.
The key ingredient for making software citable is to provide the correct
metadata to enable others to cite it.

# How to make your software citable? (Metadata about software)

## What software metadata do people need from you?

If you want someone to cite your software, you need to think about what
they need from you in order to do this. There are two basic
possibilities:

1.  They want to refer to your software generally, perhaps to compare it
    to other software, or to discuss the problems it can solve or the
    algorithms it uses. We call this \"the software concept\", as it
    refers to the idea of the software and all of its versions, not a
    specific version
2.  They want to refer to a specific version of the software, because
    it\'s the version they used.

In both cases, to make your software citable, you need to provide the
information that someone else needs to cite it. For both cases, this
includes who you consider the authors, and the name of the software. For
the software concept, it also includes information about where the
software can be found (a link to the Software Heritage archive of the
software repository, the working repository, or both, via the Software
Heritage link with the \"origin\" information provided.) For a specific
version, this includes information about where the software can be found
(a link to that specific version in the Software Heritage archive of the
software repository, the working repository, or both, via the Software
Heritage link with the \"origin\" information provided), as well as
information about the version (a version number or a commit hash and the
date of the version.)

## Who are the authors of your software?


Because it is not possible to generally know the authors who should be
credited for the software from information that is automatically
captured from commits (as discussed in the previous section), this
information must be determined and recorded by the authors themselves.
And ideally, it should change as needed with different versions of the
software. A new author may join the project as of a particular version,
or an old author\'s code may be removed when part of the code is
re-written (of course, where the old author should still be listed as an
author, even if none of their code survives, is a decision the authors
need to make.) This information, along with information that may be
easier to find from the code itself, is the metadata needed for citation
(as discussed in the previous section.)

## Where to store the metadata?

Given that much of it (except the authors) is stored with the code, and
that someone who has the history of the code can find the other metadata
associated with past versions, we believe that the author metadata also
should be stored with the code itself, and should be updated as part of
the process of updating the code. And since this has to be manually
determined and updated, we might as well do the same for as much of the
metadata possible. We can do this by storing the metadata in the
software repository, which also means that it will be archived by
Software Heritage, and changing our practice so that when we make a
change to the code, and making matching changes to the tests and
documents, we also make a matching change to the metadata, when needed.
Much like the software documentation, a change to the code is unlikely
to require that all the metadata be updated.

## The difference between publishing a paper and publishing software

This is a place where the difference between open source software and
published papers becomes clear. For a paper, there is generally one
point at which the metadata needs to be recorded: when it is submitted
to a publisher or a repository (though there can, of course, be multiple
versions of a paper over time.) The paper is generally not available to
be read or cited until this submission (and possibly peer-review)
happens. One solution for software citation is to treat software like a
paper, and manually submit it and its metadata to a repository (e.g.,
Zenodo) or a publisher (e.g., JOSS) when a major version is released.
This is what the [*software citation
principles*](https://doi.org/10.7717/peerj-cs.86) paper recommended in
2016. However, it does not help authors who want to cite a different
version of your software, between these major versions. If you do choose
to follow these recommendations, you can still store metadata in your
repository, and in the case of Zenodo (if the metadata is stored in a
.zenodo.json metadata file), this will make the deposit process simpler.

## Metadata formats and how to create metadata files

If you want to enable citation of all versions of your software, you
should certainly store the software\'s metadata in a machine- and
human-readable way in the repository. Today, there are two main options
for this, though this is somewhat in flux.

First, you can store the metadata needed for citation in a
[*CITATION.cff*](https://citation-file-format.github.io) file. This is a
relatively simple file format (the Citation File Format (CFF)) that is
both machine- and human-readable, and that contains just the metadata
needed for citation. CFF is compatible with other formats that can
represent citation metadata, such as RIS, BibTeX, EndNote, CodeMeta,
schema.org and .zenodo.json, and a converter tool is available as a
[*Python command-line
tool*](https://archive.softwareheritage.org/swh:1:snp:952376399ae6bf9fdbc352f754685e6b947a81fe;origin=https://github.com/citation-file-format/cff-converter-python/)
and a [*web service*](https://bit.ly/cffconvert). A valid CITATION.cff
file also contains some meta information about itself, so that human
readers can quickly determine what it represents, e.g., a "message" that
can provide information as to what the metadata in the file should be
used for ("If you use this version of the software, please cite it using
these metadata."). To create a CITATION.cff file from scratch, you can
use the [*CFF
Initializer*](https://citation-file-format.github.io/cff-initializer-javascript/).

Second, you can store the metadata in a
[*codemeta.json*](https://codemeta.github.io/) file. These files are
machine-readable and it can be argued they are also human-readable. They
can contain much more metadata than is needed just for citation. In
addition, there is an ongoing effort to incorporate codemeta.json into
schema.org. To create a CodeMeta file from scratch, you can use the
[*CodeMeta Generator*](https://codemeta.github.io/codemeta-generator/).

You can also manually create a CITATION (or CITATION.md) file that tells
human beings how to cite your software, in a particular citation style.
While this is good for the people who want to cite your software in some
ways, it\'s also not really a complete solution, since they may want to
use a different style than you provided. It would be better in the long
term if we had tools that took the machine-readable data in CITATION.cff
or codemeta.json files and created a citation in a choice of styles. To
be able to do this in a LaTeX document today, you can convert an
existing CITATION.cff file to BibTeX with the CFF converter.

# When and how to cite software?

Software can be cited in different types of documents, including text
documents (papers), other software, and datasets. To create a citation,
you need to collect the needed metadata and put it in the required
format for the particular document. For example, for a paper where a
human-readable citation is desired, you may need to put the citation in
APA style, which might be generated from the citation metadata in a
reference collection or manager, such as BibTeX or Zotero. For software,
or a dataset, you might put the citation metadata for the software to be
cited in the metadata of the product, in a section for references.

In all cases, there are different options that depend on how the
software has been made citable, if it has, and where any metadata is
recorded.

## How to cite software publications?

If the software has been published, for example, on Zenodo, you can use
the identifier and the associated metadata from the repository or
publisher.

If the software is on Software Heritage, you can find and identify it
by:

1.  Linking to the full repository archived in Software Heritage (with
    all its development history) by prepending
    [*https://archive.softwareheritage.org/browse/origin*](https://archive.softwareheritage.org/browse/origin)
    to the URL that was archived. For example, if the repository
    [*https://github.com/rdicosmo/parmap*](https://github.com/rdicosmo/parmap)
    has been archived, then the link to the saved version in Software
    Heritage will be
    [*https://archive.softwareheritage.org/browse/origin/https://github.com/rdicosmo/parmap/*](https://archive.softwareheritage.org/browse/origin/https://github.com/rdicosmo/parmap/)
2.  Linking to a specific version of the project. The following SWHID
    identifies a precise version of the source code of Parmap:
    swh:1:rev:0064fbd0ad69de205ea6ec6999f3d3895e9442c2;origin=[*https://github.com/rdicosmo/parmap*](https://github.com/rdicosmo/parmap)
    SWHIDs can be turned into a clickable URL by prepending
    [*https://archive.softwareheritage.org/*](https://archive.softwareheritage.org/)
    to them. For example, the following URL brings you directly to a
    page in Software Heritage that is browsing that precise version:
    [*https://archive.softwareheritage.org/swh:1:rev:0064fbd0ad69de205ea6ec6999f3d3895e9442c2;origin=https://github.com/rdicosmo/parmap*](https://archive.softwareheritage.org/swh:1:rev:0064fbd0ad69de205ea6ec6999f3d3895e9442c2;origin=https://github.com/rdicosmo/parmap)
    A very simple way of getting the right SWHID is to browse your
    archived code in Software Heritage, and navigate to the revision you
    are interested in. Click then on the permalinks vertical red tab
    that is present on all pages of the archive, and in the tab that
    opens up you select the revision identifier. In addition, version 1
    of the SWHIDs uses git-compatible hashes, so if you are using git as
    a version control system, you can create the right SWHID by just
    prepending swh:1:rev: to your commit hash.

In any case, if there is stored metadata in the Software Heritage
archive that you are pointing to, you can use it. If there isn\'t, you
have to do your best to identify the software\'s name, perhaps from its
documentation (including its README) and if the authors are not
identified, use the project\'s name as the authors (e.g., \"Parmap
project\").

## How to cite unpublished software?

A last option for citing software is to submit it to an indexing
service, then cite the identifier provided by the index. Some
disciplines have such indexing services, such as
[*ASCL*](https://ascl.net/) in astronomy,
[*SciCrunch*](https://scicrunch.org/) in life sciences, and
[*swMath*](https://swmath.org/) in mathematics. Note that this may leave
holes in the metadata as well, as you again may not be able to identify
the authors.

## How to cite software papers?

Finally, if there is a published software paper (e.g., in JOSS), you can
cite that paper as well, just as you would cite any other paper, though
if the version of the software you want to refer to is not identical to
the version published in the software paper, you should also cite the
software itself using one of the previously discussed methods.

# Further guidance

Further guidance for software citation has been, and is being, developed:

- Software Citation Principles (doi:[10.7717/peerj-cs.86](https://doi.org/10.7717/peerj-cs.86)) - A paper outlining the principles of software citation following the academic, DOI-based route
- Software citation checklists for
  - software developers (doi:[10.5281/zenodo.3482769](https://doi.org/10.5281/zenodo.3482769))
  - paper authors (doi:[10.5281/zenodo.3479199](https://doi.org/10.5281/zenodo.3479199))
- [cite.research-software.org](cite.research-software.org) - A website providing a high-level overview of software citation with some practical suggestions on how to cite software and make it citable, based on the Software Citation Principles