# Chapter 1. How to learn bioinformatics?
---

Link to "the Sequence Read Archive": https://www.ncbi.nlm.nih.gov/sra

To have a reproducible code, simply checking input data and intermediate results, running quick sanity checks, maintaining proper controls, and testing programs is a great start. It's good practice to document where we downloaded data from, when we downloaded it, and what steps we ran.

Robust research starts with a good experimental design. A good source for experimental design:

**Experimental Design and Data Analysis for Biologists (Quinn and Keough, Cambridge University Press, 2002)**

Writing a readable code is very important for debugging and collaborations with others. Google has [public style guides for many languages](https://google.github.io/styleguide/pyguide.html), which serve as excellent templates. 


A common method to test code is called *unit testing*. A more sensible strategy is to consider three important variables each time you write a bit of code:
- How many times is this code called by other code?
- If this code were wrong, how detrimental to the final results would it be?
- How noticeable would an error be if one occurred?

A good reference for *Unit testing* in Python is in [The Hitchhiker’s Guide to Python](https://docs.python-guide.org/) handbook. You may check [unit testing](https://docs.python-guide.org/writing/tests/) chapter.

**Use Existing Libraries Whenever Possible.**

**Treat Data as Read-Only**

**Spend Time Developing Frequently Used Scripts into Tools**

It’s important to never assume a dataset is high quality. Rather, data’s quality should be proved through exploratory data analysis (known as EDA). EDA is not complex or time consuming, and will make your research much more robust to lurking surprises in large datasets.

To fully reproduce a study, each step of analysis must be described in much more detail than can be accomplished in a scholarly article. Thus, additional documentation is essential for reproducibility. A good practice to adopt is to document each of your analysis steps in plain-text *README* files.

**Make Figures and Statistics the Results of Scripts**
