# Introduction

## Science becoming increasingly computational

- Modeling
- Simulation
- Data analysis
- Data management

## Research software is more than just code

<img src="fig/research-software.png" alt="Research software" style="width: 12em; float: right;" />

- Data
- Organization
- Communication
- Process

## Key features of research software projects

<center><img src="fig/softwaredevelopment.jpg" alt="Pair programming" style="width: 12em; margin: 1em auto 0 auto" /></center>

- Developers (scientists first, THEN programmers)
- Problems (subtle, complicated, important)
- Requirements (exploring vs. engineering)

## End game

- Software can be used by others
- Reasonable confidence in accuracy
- Small changes and extensions are safe and easy
- Fast enough to be useful
- Sustainable (during its lifecycle)
- Citable

## What is your project's value proposition?

Fill in the template below for your current project.

1. For *[description of target users]*
2. who want to *[statement of their need(s)]*,
3. *[project name]*
4. provides *[statement of key benefits]*.
5. Unlike *[name of alternative solutions(s)]*,
6. our project enables users to *[key differentiator]*.

## Describe how your project is managed

Write a short point-form description (5-6 bullets) of how your current project is managed:

1. Who uses the software?
2. How?
3. How do they find the software?
4. How do they set it up?
5. Who decides what to change and when?
6. How are decisions and changes circulated?

# Basics

## Project organization

<img src="fig/noble.png" alt="Research software" style="width: 12em; float: right;" />

- It's like a diet
- An example: "Noble's Rules"  
  ([Noble 2009, *PLOS Comp. Bio.*](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000424))
- Details not important,  
  but principles are

## Project organization (cont.)

- **<font color="red">Name all files to reflect content and purpose</font>**
- Use established conventions
- But (deliberate) adaptations are fine

## Task management

- DRY principle: don't repeat yourself
  > *The only thing you can accomplish by typing something repeatedly is to get it wrong.*
- Use an automated build manager
- Use checklists for tasks you can't automate

## Build managers

- GNU Make
    - *The old standby*
- CMake, automake/autoconf, SCons, etc.
    - *"New" flavors*
- rake, pydoit, SnakeMake, etc.
    - *Language-specific*

## Build managers (cont)

- Key feature: dependencies
    - "X depends on Y depends on Z"
    - Usually implemented using timestamps or hashes
    - Tasks only re-executed when needed
- Originally designed for compiling large programs
- Can be adapted for aritrary workflows

## Checklists

- "Build file" executed by humans
- Keep in version control
- Adapt over time as needed based on experience and feedback

## Create a task list

1. If your project doesn’t use a build manager, what are the first few tasks you should automate?
2. If your project already uses a build manager, what tasks are used most often?

## Create a setup checklist

1. Write a short point-form checklist describing the things you do when setting up a new machine to do development on your project.
2. How many of the steps in your checklist can be automated using shell scripts or other small programs?
3. How will newcomers know if they have completed the steps in the checklist correctly?

# Make the Software Robust

## Robust software

*Robust* is the difference between 

> *Works for me on my machine.*

and 

> *Works for someone I've never met on a cluster I've never heard of.*

## Taschuk's Rules

See ([Taschuk & Wilson 2017, *PLOS Comp. Bio.*](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005412))

<center>Provide a descriptive README (synopsis, dependencies)</center>

<center><img src="fig/khmer-readme.png" alt="README in terminal and browser" style="width: 24em" /></center>

## Taschuk's Rules (cont.)

<center>Provide a descriptive usage statement<br />Make common operations easy to configure</center>

<center><img src="fig/canon-cli.png" alt="CLI in terminal" style="width: 16em" /></center>

## Taschuk's Rules (cont.)

- Use version control
- Release stable versions w/ meaningful version number
- Reuse existing software (whenever possible)
- Use build tools & package managers for installation
- Do not require special privileges
- Eliminate fixed/absolute file paths
- Include a small data set to test installation
- Produce identical results given identical input

## How do you version now?

1. How many different versions of your project are in use right now? How do you know?
2. If a user has a problem, how will you and they find out which version of the software they have?

## Runtime configuration

1. What options or parameters does your program use?
2. Which ones are users most likely to set or change?
3. How are these parameters set?

# Issues and Action Items

## Issue trackers

- Also called bug trackers
- Shared "to-do" list to manage everything
- Every task is recorded as a separate ticket

## Components of issues/tasks/tickets

- Unique ID (auto-assigned)
- Short desriptive summary (to aid browsing)
- Tags / metadata (to aid searching)
- Status
- Owner (who's responsible)
- Full description
- Threaded discussion

<img src="fig/issue-thread-list.png" alt="Issue thread list" />

<img src="fig/issue-thread-short.png" alt="Issue thread list" />

<center><img src="fig/issue-thread-long.png" alt="Issue thread list" style="height: 18em" /></center>

## Issue trackers (cont.)

- Key utility: prioritization
    - What has to be done right now? Soon? Later?
- Key utility: documenting your work
    - Think of it as a shared lab notebook.
    - Full record of work done, along with relevant discussion, reasoning, etc.
    - Everyone knows what everyone else is working on.

## What's on *YOUR* list?

1. What are the top 3 items on your project’s to-do list?
2. How confident are you that your collaborators and users would agree with your selection?

## Issue lifecycle

1. What states can your project’s issues be in?
2. What state transitions are allowed? (“Any to any” is a common and acceptable answer.)
3. Who decides when an issue can move from one state to another?

# Licensing

## Licensing research software

- Creative works automatically eligible for protection
- Reusing creative works without a license is dangerous (infringement lawsuits)
- Explicitly adding an license to your project signals how you wish to engage the wider research community.

## Choose a license

- Put a `LICENSE` or `LICENSE.txt` file in your repository
- Use a common license, **don't write your own**
    - MIT or BSD
    - GPL
    - Creative Commons (CC-0 or CC-BY)
    - others from [Open Source Initiative](http://opensource.org/licenses)

## Licensing considerations

- Do you want to license your project at all? Can you?
- Do you require derivative works to have the same license? (fraught with unintended consequences)
- Is your license compatible with the software your project depends on?