# General programing best practices
<br/><br/><br/>
<center>
<a href="https://www.ncsa.illinois.edu/~rhaas">Roland Haas</a>, NCSA and UIUC

<a href="mailto:rhaas@illinois.edu">rhaas@illinois.edu</a>
</center>

In [1]:
%%HTML
<style>
.rendered_html ul { display: block; }
.floatright { float: right; padding-left: 1em; }
</style>

Topics to cover:

  1. best practises that Katy and Maddicken covered
  1. remove Nuclear Physics specific contents
  1. add some more best coding practises


# Who am I?

<center>
<img src="fig/universityofguelph.png"/> 
</center>

<center>
  <img style="width: 400px" src="fig/NCSA_INF_FullColor_RGB.png"/>
</center>

# What do I do?


<center>
    <img style="width: 200px" src="fig/emri_0.jpg"/><br/>
    image credit: NASA
</center>

<center>
  <img style="width: 100px" src="fig/einstein_right.svg"/>  <img style="width: 400px; padding-left: 2ex" src="fig/BW_logo_blue.jpg"/>
</center>

# What do I do?<img style="width: '100%'" src="fig/fortran.png"/><br/><img style="width: '100%'" src="fig/cxx.png"/>

# What do I do?

<center>
    <img style="height: 64px" src="fig/1200px-Python-logo.png"/>
</center>

<center>
    <img style="height: 64px" src="fig/matplotlib.svg"/>
</center>

<center>
    <img style="height: 64px" src="fig/220px-NumPy_logo.png"/>
</center>

# Science

* builds and organizes knowledge

* tests explanations about the universe

*  systematically,

* objectively,

* transparently,

* and reproducibly.

***Otherwise it's not science.***

# Computers should...

* improve efficiency,

* reduce human error,

* automate the mundane,

* simplify the complex,

* and accelerate research.

***But we don't always them effectively.***

# Getting started

<img style="width:400px" src="fig/The_Scientific_Method.svg">

[Efrazil](https://commons.wikimedia.org/w/index.php?title=User:Efbrazil) [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0)

# Data Storage

* **Good:** pencil and paper 

* **Better:** spreadsheet

* **<span style="color: green">Best:</span>** standardized file format, database management system

**<span style="color: blue"> Formats:</span>**  csv, YAML, Hierarchical Data Format (HDF), Flexible Image Transport System (FITS), etc.

**<span style="color: blue"> Management:</span>** C++/Python/Fortran APIs, HDF5, Pandas, astropy, etc.

# Backing Up Files

* **Good:** hope

* **Better:** nightly emails

* **<span style="color: green"> Best:</span>** remote version control (GitHub, BitBucket, GitLab), remote backup (CrashPlan, Duplicity)

**<span style="color: blue"> Version Control Systems:</span>** svn, hg, **git**

**<span style="color: blue"> Hint:</span>** tools like git-annex or git-lfs can help you manager large data files

# Managing Changes

* **Good:** naming convention

* **Better:** clever naming convention

* **<span style="color: green"> Best:</span>** local version control

<img src="fig/git_vc.jpeg"/>

# Getting It Done

# Analysis

* **Good:** pencil and calculator

* **Better:** spreadsheets, matlab, mathematica

* **<span style="color: green"> Best:</span>** scripting, open source libraries, current programming language

**<span style="color: blue">Hint:</span>** Check out GitHub for existing toolkits for analysis in your domain. e.g. astropy

# Multiple File Cleanup

* **Good:** manually edit every file

* **Better:** search and replace in each file

* **<span style="color: green"> Best:</span>** scripted batch editing with backups

**<span style="color: blue">Hint:</span>** try a tutorial on BASH, ZSH, Python, or Perl, e.g. the [bash lesson by Software Carpentry](http://swcarpentry.github.io/shell-novice/).

# Executing Workflows

* **Good:** retype a series of commands from notes

* **Better:** shell script

* **<span style="color: green"> Best:</span>** build system

**<span style="color: blue">Build system tools:</span>** make, docker, cmake, autoconf, automake, etc.

**<span style="color: blue">Reference:</span>** The Carpentries have an associated [Automation and Make](http://swcarpentry.github.io/make-novice/) lesson

# Data Structures

* **Good:** 100 string variables holding numbers

* **Better:** list of list of numbers

* **<span style="color: green"> Best:</span>** appropriate powerful data structures

**<span style="color: blue">Hint:</span>** In C++, learn about structs, unordered\_maps, maps, vectors, and (maybe) classes, etc. In Python the power lies in dictionaries, and numpy arrays, and DataFrames when analyzing data.

# API Design

* **Good:** single block of procedural code

* **Better:** separate functions

* **<span style="color: green"> Best:</span>** small, testable functions that handle well defined tasks, grop into classes

**<span style="color: blue">DRY:</span>** Don't Repeat Yourself. Code replication is bug proliferation.

**<span style="color: blue">KISS:</span>** Keep it simple, stupid.

# Variable Naming

* **Good:** d1, d2, d3

* **Better:** x, y, z

* **<span style="color: green"> Best:</span>** p.x, p.y, p.z, p = Point(x,y,z)

**<span style="color: blue">Hint:</span>** [Prof. Jenny Bryan on Naming Things](https://t.co/CGfXDSDvmz)

# Style Guides

* **Good:** Have consistent style

* **Better:** Agree with your colleagues on style

* **<span style="color: green"> Best:</span>** Follow a standard style guide and use a code formatter

**<span style="color: blue">Hint:</span>** [clang-format](https://www.kernel.org/doc/html/latest/process/clang-format.html), [Black code formatter for Python](https://black.readthedocs.io/en/stable/)

# Comments on commenting

* **Good:** have readable code

* **Better:** document what each code units does

* **<span style="color: green"> Best:</span>** document design and purpose of each code unit

**<span style="color: blue">Hints:</span>** [Best practices for writing code comments](https://stackoverflow.blog/2021/12/23/best-practices-for-writing-code-comments/), Python [PEP-8](https://peps.python.org/pep-0008/#comments)

<center><img src="fig/et_comments.png"/></center>

# Runtime Parameter Handling

* **Good:** none, hardcoded variables

* **Better:** plain text input file, line-by-line homemade string parsing

* **<span style="color: green"> Best:</span>** argument / file parsing library

**<span style="color: blue"> Formats:</span>** Python argparse, libconfig, yaml-cpp, spify, etc.

# Getting It Right

# Error Detection

* **Good:** show results to experts

* **Better:** integration testing

* **<span style="color: green"> Best:</span>** unit test suite, continuous integration

<img style="width: 100%" src="fig/et_ci.jpg">

# Error Diagnostics

* **Good:** re-re-read the code

* **Better:** print statements

* **<span style="color: green"> Best:</span>** use a linter, a debugger, and a profiler

**<span style="color: blue">Tools:</span>** cpplint, pyflakes, gdb, lldb, pdb, idb, perf, hotspot, coz, valgrind, kernprof, kcachegrind, cprofile/snakeviz

# Error Correction

* **Good:** fix code

* **Better:** fix, add an exception

* **<span style="color: green"> Best:</span>** fix, add an exception, add a test

<img style="width: 100%" src="fig/spec_test_update.png">

# Getting It Together

# Merging Collaborative Work

* **Good:** single master copy, waiting

* **Better:** email and patches

* **<span style="color: green"> Best:</span>** remote version control

<img style="width: 100%" src="fig/mpitar_WIP_pull.png">

# Peer Review For Code

* **Good:** separation of concerns

* **Better:** shared repository(s)

* **<span style="color: green"> Best:</span>** peer-reviewed pull requests

**<span style="color: blue">Hint:</span>** reviewing changes is work, keep them simple, stupid.

# Teamwork

* **Good:** weekly research meetings, year-long tasks

* **Better:** daily conversations, month-long goals

* **<span style="color: green"> Best:</span>** pair programming, issue tracking

<img style="width: 100%" src="fig/et_issues.png">

# Software Handovers

* **Good:** zip file, theory paper

* **Better:** code repository, theory paper, comments in code, example input file

* **<span style="color: green"> Best:</span>** code repository, theory paper, automated documentation, example input file, test suite

# Documentation

* **Good:** paper notes describing model in code

* **Better:** electronic documentation in code repository

* **<span style="color: green"> Best:</span>** auto-generated documenttaion describing code intent and usage

**<span style="color: blue">Tools:</span>** [doxygen](https://www.doxygen.nl/index.html), [sphinx](https://www.sphinx-doc.org/en/master/)

<img style="width: 100%" src="fig/eigen_doxygen.png">

# Getting It Out There

# Plotting

* **Good:** custom formatting, clickable GUI

* **Better:** plot format templates (Excel, Mathematica)

* **<span style="color: green"> Best:</span>** scripted plotting, matplotlib, gnuplot, etc.

# Writing

* **Good:** Microsoft Word, LibreOffice Write

* **Better:** Word, Write with track changes

* **<span style="color: green"> Best:</span>** plain text markup with version control and a makefile

**<span style="color: blue">Tools:</span>** LaTeX, markdown, restructured text

# Distribution Control

* **Good:** "email to request access"

* **Better:** license file

* **<span style="color: green"> Best:</span>** license file, citation file, DOI, forkable repository

**<span style="color: green"> Example:</span>** [SpECTRE](https://github.com/sxs-collaboration/spectre)

# Community Adoption

* **Good:** none, internal use only

* **Better:** online repository, developer email online

* **<span style="color: green"> Best:</span>** issue tracker, user/developer forum, communication channels, online documentation

# Resources

# Papers!
<small>  <ul>    <li>    <a href="http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001745">    Wilson, Greg, D. A. Aruliah, C. Titus Brown,     Neil P. Chue Hong, Matt Davis, Richard T.     Guy, Steven H. D. Haddock, Kathryn D. Huff,     et al. 2014. Best Practices for Scientific     Computing. PLoS Biol 12 (1): e1001745.      doi:10.1371/journal.pbio.1001745.    </a></li>    <li>    <a href="https://arxiv.org/abs/1609.00037">             Wilson, Greg, Jennifer Bryan,             Karen Cranston, Justin             Kitzes, Lex Nederbragt, and             Tracy K. Teal. 2016. Good             Enough Practices in             Scientific Computing.             arXiv:1609.00037 [Cs],             August.             http://arxiv.org/abs/1609.00037.    <li> <a href="https://www.amazon.com/Effective-Computation-Physics-Anthony-Scopatz/dp/1491901535">             Scopatz, Anthony, and     Kathryn D. Huff. 2015. Effective     Computation in Physics.  1st edition. ly Media.  </a></li>    <li>    <a href="http://dx.doi.org/10.5334/jors.ba">      Blanton, Brian, and Chris Lenhardt. 2014. A     Scientists Perspective on Sustainable     Scientific Software. Journal of Open     Research Software, Issues in Research     Software, 2 (1): e17.     </a></li>    <li>    <a href="http://dx.doi.org/10.1109/MCSE.2009.15">    Donoho, David L., Arian Maleki, Inam Ur     Rahman, Morteza Shahram, and Victoria     Stodden. 2009. Reproducible Research in     Computational Harmonic Analysis. Computing     in Science & Engineering 11 (1): 818.     doi:10.1109/MCSE.2009.15.    </a></li>    <li>    <a href="http://dx.doi.org/10.1109/MIC.2014.88">    Goble, Carole. 2014. Better Software, Better     Research. IEEE Internet Computing 18 (5):     48. doi:10.1109/MIC.2014.88.    </a></li>    <li>    <a href="http://dl.acm.org/citation.cfm?id=1556928">    Hannay, J. E, C. MacLeod, J. Singer, H. P     Langtangen, D. Pfahl, and G. Wilson. 2009.     How Do Scientists Develop and Use Scientific     Software? In Proceedings of the 2009 ICSE     Workshop on Software Engineering for     Computational Science and Engineering, 18.    </a></li>    <li>    <a href="http://dx.doi.org/10.1126/science.1231535">    Joppa, L. N., G. McInerny, R. Harper, L.     Salido, K. Takeda, K. OHara, D. Gavaghan,     and S. Emmott. 2013. Troubling Trends in     Scientific Software Use. Science 340 (6134):     81415.  doi:10.1126/science.1231535.    </a></li>    <li>    <a href="http://dx.doi/org/10.1038/467775a">    Merali, Zeeya. 2010. Computational Science:     ...Error. Nature 467 (7317): 77577.     doi:10.1038/467775a.    </a></li>    <li>    <a href="http://arxiv.org/abs/1407.5648">    Petre, Marian, and Greg Wilson. 2014. Code     Review For and By Scientists.     arXiv:1407.5648 [cs], July.     </a></li>    <li>    <a href="http://arxiv.org/abs/1407.6220">    Schossau, Jory, and Greg Wilson. 2014. Which     Sustainable Software Practices Do Scientists     Find Most Useful? arXiv:1407.6220 [cs],     July.    </a></li>    <li>    <a href="http://dx.doi.org/10.2139/ssrn.1550193">    Stodden, Victoria. 2010. The Scientific     Method in Practice: Reproducibility in the     Computational Sciences. SSRN Electronic     Journal. doi:10.2139/ssrn.1550193.    </a></li>    <li>    <a href="http://dx.doi.org/10.1371/journal.pone.0026828">    Wicherts, Jelte M., Marjan Bakker, and Dylan     Molenaar. 2011. Willingness to Share     Research Data Is Related to the Strength of     the Evidence and the Quality of Reporting of     Statistical Results. PLoS ONE 6 (11):     e26828. doi:10.1371/journal.pone.0026828.    </a></li>  </ul></small>

# Books!
* Clean Code - Robert C. Martin
* Working Effectively with Legacy Code - Martin Fowler
* Effective Computation in Physics - Huff, Scopatz
* The Elements of Prgoamming Style - Kernghan and Pauger (1974)


# Acknowledgements

Many of these slides were originally in presentations by Dr. Katy Huff and Dr. Madicken Munk at

* [katyhuff.github.io/2017-09-20-ncsa](https://katyhuff.github.io/2017-09-20-ncsa)
* [munkm.github.io/2021-09-24-NCSA](munkm.github.io/2021-09-24-NCSA).

which are licensed under a [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/).

<img class="floatleft" style="height: 4em" src="fig/logo_nsf.jpg"/>This work has been supported by NSF grants [2004879](https://nsf.gov/awardsearch/showAward?AWD_ID=2004879), [2103680](https://nsf.gov/awardsearch/showAward?AWD_ID=2103680). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

General programing best practices by Roland Haas is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/). 

Based on a work at http://munkm.github.io/2021-09-24-NCSA.