<table>
  <tr>
    <th>
      <img src="https://raw.githubusercontent.com/riinbre-bioinfo/Colab_Biomarkers/main/Biomarkers/images/RIINBRE-Logo.jpg", height = "125", alt="RI-INBRE Logo">
    </th>
    <th>
      <img src="https://raw.githubusercontent.com/riinbre-bioinfo/Colab_Biomarkers/main/Biomarkers/images/MIC_Logo.png", height = "125", alt="RI-INBRE Logo">
    </th>
  </tr>
</table>

# Analysis of Biomedical Data for Biomarker Discovery
## Submodule 1: Introduction to Biomarkers
### Dr. Christopher L. Hemme
### Director, [RI-INBRE Molecular Informatics Core](https://web.uri.edu/riinbre/mic/)
### The University of Rhode Island College of Pharmacy
Last Updated: June 5, 2023 (Google Collab version)

---

## Introduction

Welcome to the Analysis of Biomedical Data for Biomarker Discovery cloud-based learning module. This module was funded through an administrative supplement to the Rhode Island IDeA Network of Biomedical Research Excellence (RI-INBRE) from the National Institute of General Medical Sciences of the National Institutes of Health under grant number P20GM103430 (RI-INBRE). The module was developed by Dr. Christopher L. Hemme, Director of the RI-INBRE Molecular Informatics Core using data from Dr. Nisanne Ghonem at the Department of Biomedical and Pharmaceutical Sciences, College of Pharmacy, University of Rhode Island.  Our goal with this module is to bridge the gap between bioinformaticians and clinicians or clinical researchers who often view the same data in very different ways.  For example, bioinformaticians (particularly those new to the biomedical sciences) often aren't familiar with the conventions for data presentation and visualization in the clinical literature, while clinicians are often overwhelmed by the volumes of data generated by modern bioinformatics methods or may question the utility of the results of bioinformatics analyses compared to more traditional clinical methods.  We present this challenge in terms of clinical biomarker discovery, that is, biological measures of health and disease.  For the clinician, a biomarker must be cheap and easy to measure, accurate, and easily interpretable for both the clinician and the patient.  A bioinformatician, on the other hand, is often looking at biomarkers on a global scale, trying to identify multiple correlated biomarkers that may or may not be obvious targets.  For both groups, sensitivity, precision and diagnostic utility of the biomarker are key.

This module consists of 9 submodules, with each chapter consisting of a Jupyter Notebook running the R programming language.  The core submodules cover the basic concepts of biomarker discovery.  Optional submodules provide supplemental content on R data structure, linear models, and principles of exploratory analysis that some students may find useful.  The submodules are organized as follows:

1. Introduction to R Data Structures (optional)
2. Introduction to Linear Models (optional)
3. Principles of Exploratory Analysis (optional)
4. Introduction to Biomarkers
5. Case Study: Rat Renal Ischemia-Reperfusion Injury
6. Comparison of Clinal IRI Biomarkers Using Linear and Logistic Regression
7. Exploratory Analysis of IRI Proteomic Data
8. Identification of IRI Biomarkers from Proteomic Data
9. Machine Learning Methods in Biomarker Discovery

This module assumes a functional knowledge of R coding.  Submodule 2 provides a very basic overview of R data structures if needed.

<div class="alert alert-block alert-info">
<b>&#9995; Tip:</b> Blue boxes will indicate helpful tips.</div>

<div class="alert alert-block alert-warning">
<b>&#127891; Note:</b> Used for interesting asides or notes.
</div>

<div class="alert alert-block alert-success">
<b>&#9997; Reference:</b> This box indicates a reference for an attached figure or table.
</div>

<div class="alert alert-block alert-danger">
<b>&#128721; Caution:</b> A red box indicates potential hazards or pitfalls you may encounter.
</div>

---

## Biomarker Concepts

### What is a Biomarker?

The International Programme on Chemical Safety defines a biomarker as “any substance, structure, or process that can be measured in the body or its products and influence or predict the incidence of outcome or disease” (WHO International Programme on Chemical Safety, 2001).  This is a very broad definition but in layman's terms, it simply means any biological entity that can be used to indicate a change in state.  For biomedical research, these state changes typically represent emergence or progression of a disease, or the effects of a treatment(s) on the disease state.  However, the biomarker concept works perfectly well outside of the biomedical sciences.  In environmental research, biomarkers can be used to measure the health or functional capacity of an ecosystem, or in bioprospecting of useful natural products.  While this module will focus on the biomedical sciences, the methods discussed are applicable to any system in which biomarkers are relevant.

### Properties of Biomarkers

It is relatively straight-forward to identify biomarkers for any condition.  However, for a biomarker to be useful in a clinical setting, it should have the following properties:

- High specificity (isolated to specific organs, tissues, cells, or disease states)
- High sensitivity (detectable at low concentrations)
- High accuracy (low variability in measurements between individuals or cohorts allowing for consistent determination of disease state)
- Low cost
- Ease of use and interpretability
- Rapid turnaround

The first three factors define the diagnostics/prognostic utility of the biomarker.  Biomarker tests should minimize false positives (incorrectly diagnosing a healthy patient as having a disease) and false negatives (failing to identify the disease in a sick patient).  A good biomarker should be applicable across a wide cohort of patients and should show similar quantitative values regardless of other confounding factors.  Ideally, the biomarker will be highly focused on a specific disease, a specific organ, or a specific cohort, with few confounding factors that can complicate diagnosis.  The final three factors focus on the practical utility of the test itself.  An ideal biomarker test should be cheap and easy to use.  For example, fluid samples such as blood or urine are far more cost-effective and easy to collect than biopsy samples.  Rapid tests such as at-home detection kits speed the turnaround time for getting results and are critical when rapid diagnosis is needed.

In practice, few biomarkers meet all of these standards.  A test might be highly sensitive and cost-effective but may indicate a broad range of possible conditions, requiring further tests to get a more accurate diagnosis.  Other biomarkers may be very accurate within a specific cohort but not applicable outside of that cohort.  For rare diseases in which patient data is rare, potential biomarkers may not have the robust datasets of more common diseases, thus limiting the diagnostic power of the biomarker.  In practice, a single biomarker is usually insufficient to properly diagnose an issue and so common clinical tests usually look at correlated biomarkers that together better indicate the nature of the disease.

### Types of Biomarkers

#### Qualitative (Physiological) Biomarkers

The definition above gives us a global view of what a biomarker is, but what does this mean in a practical sense?  Consider the state of medicine prior to the modern scientific age.  While the state of medical knowledge differed between cultures and time periods, a common feature is that medical diagnosis was largely limited to observing symptoms that manifested at the levels of organs, individuals and populations.  Without access to modern instrumentation and molecular methods, diagnosis of disease based on these types of qualitative biomarkers relied on shared experience resulting from generations of trial and error.  While these medical systems could be quite sophisticated, they were limited in their applications.  Consider a mother treating a sick child.  She may put her hand to the child's head to determine if the child has a fever and, based on her life experience, the severity of the fever.  However, she would not have the ability to quantify that fever or determine how it rated compared to other children.  Other qualitative biomarkers could include jaundice, sepsis, visible tumors, etc.  A working knowledge of these symptoms were (and still are) essential to diagnosing disease, but their utility will always be limited unless complemented with quantitative methods.

#### Quantitative Biomarkers (Macro-Scale)

The development of modern medical devices capable of quantitative macro-scale measurement of symptoms was a key first advancement in the modern field of biomarker discovery.  With a thermometer, a mother can now quantify the child's temperature (though she'll still use the hand method first) which allows her to distinguish between a mild temperature of 99&deg;F (37&deg;C) vs. a high temperature of 102&deg;F (39&deg;C) that would require medical attention.  Conditions such as high blood pressure that may not manifest as obvious symptoms can be quantitatively measured using blood pressure monitors.  The value of these types of tools is that they allow medical practitioners to place their qualitative diagnoses and experience into a quantitative mathematical framework.  Collection of data from millions of patients over the course of the last century of modern medicine have led to the establishment of well-defined cutoffs between healthy and diseased states for a variety of conditions.  However, these methods do have two common drawbacks.  First, they may be insufficient on their own to identify the nature of the symptom.  For example, a single blood pressure reading alone will not indicate if high blood pressure is an acute condition resulting from stress or a chronic condition resulting from some other factor.  A second drawback is that these types of biomarkers tend to be lagging indicators of an active ongoing condition and may not always be suitable for early prognosis of a condition.  A fever, for instance, might indicate an active viral infection in the process of being fought off by the patient's immune system.  In such a case, treatment of symptoms is often the only course of action.  A typical medical exam thus relies on multiple tests which in combination may provide a much more precise diagnosis.

#### Quantitative Biomarkers (Molecular Scale)

With the development of the fields of molecular genetics, molecular biology, and biochemistry, biomarker discovery was extended into the molecular realm.  In addition to the qualitative and macro-scale quantitative methods described previously, molecular biomarkers allow physicians to diagnose conditions on a more fundamental scale, allowing for higher precision and specificity in their diagnoses compared to other methods alone.  Molecular biomarkers come in many forms including:

- Metabolite concentrations
- Enzyme/protein concentrations
- Enzyme activity
- Genetic variants
- Antibodies/antigens
- Presence and abundance of pathogens
- Presence and abundance of specific cell types (e.g. immune cells)

Consider the common blood test you get during a routine doctor's visit.  Blood contains a variety of biomarkers that can give a broad overview of your health while also providing focused insight into specific health conditions.  Blood work will typically include the following tests looking for the following biomarkers:

- Total blood count (red blood cells, white blood cells, platelets, hemoglobin)
- Basic metabolic panel (calcium, glucose, sodium, potassium, bicarbonate, chloride, serum creatinine, blood urea nitrogen)
- Comprehensive metabolic panel (albumin, total protein, alkaline phosphatase, alanine aminotransferase, aspartate aminotransferase, bilirubin)
- Lipid panel (high-density lipoprotein, low-density lipoprotein)
- Thyroid panel (triiodothyronine, thyroxine, thyroid-stimulating hormone)
- Cardiac biomarkers (creatine kinase, creatine kinase-MB, troponin)
- Sexually-transmitted diseases (pathogens)
- Coagulation panel (blood-clotting ability)
- DHEA-sulfate serum test (DHEA hormone)
- C-reactive protein test (inflammation)

Many of these biomarkers you may recognize.  Blood glucose levels are a strong indicator of diabetes.  HDL/LDL ("good" and "bad" cholesterol) and cardiac enzymes indicate cardiovascular diseases.  Several of the biomarkers from the metabolic panels are used to indicate diseases or injury to liver, kidney and other organs.  As with macro-scale biomarkers, decades of data collected from millions of patients have led to the establishment of accepted quantitative cutoffs indicating disease states.  When used in conjunction with other types of biomarkers, the physician can perform very sophisticated and precise diagnoses based on these standardized tests.  However, even these tests may suffer the same drawbacks as macro-scale biomarkers.  Early changes in disease states such as the development of cancer may be preceded by genetic or molecular changes that don't yet manifest as physical symptoms or which are not yet detectable by traditional methods such as histology.  Some of these molecular changes may represent causative factors or may simply be correlated with known mechanisms.  To better understand these types of biomarkers, we must utilize omics technology.

#### Quantitative Biomarkers (Omics)

Omics data allows us to look at the biological system <b>as a system</b>.  Instead of looking at a single biomarker (e.g. a protein) we can look at all potential biomarkers at once (i.e. all proteins).  Traditionally we look at one layer of the system at a time as correlating data between levels is difficult.  More modern multiomics methods allow us to analyze multiple layers at once (often in a single experiment) thus reducing experimental noise and improving our ability to correlate data across levels.  Some of the more common omics datasets and experiments are shown below:


Omics allows us to analyze data across a broader scale, potentially revealing unanticipated new biomarkers.  However, this comes at the cost of increasing complexity of the data and the resulting analysis.  We must also consider that just because we find an informative omics biomarker doesn't mean that it's useful as a clinical biomarker.  A transcriptomic biomarker, for instance, may not be viable as an easy-to-use and cost-effective biomarker.  On the other hand, it might be relatively easy to develop an affordable test for a proteomic or metabolomic biomarker using standard experimental methods.  These are considerations we have to take into account when evaluating the clinical utility of a potential biomarker. Of course, once a potential biomarker is identified, an affordable and easy-to-use test can often be developed (e.g., at-home COVID detection kits).

#### Biomarkers, EndpointS and other Tools (BEST) Glossary

We can also classify biomarkers based on their clinical roles.  The US Food and Drug Administration utilizes the Biomarkers, EndpointS and other Tools (BEST) Glossary to define biomarker categories as follows:
<ul>
    <li>Susceptibility/Risk - "A biomarker that indicates the potential for developing a disease or medical condition in an individual who does not currently have clinically apparent disease or the medical condition."</li>
    <li>Diagnostic - "A biomarker used to detect or confirm presence of a disease or condition of interest or to identify individuals with a subtype of the disease."</li>
    <li>Monitoring - "A biomarker measured repeatedly for assessing status of a disease or medical condition or for evidence of exposure to (or effect of) a medical product or an environmental agent."</li>
    <li>Prognostic - "A biomarker used to identify likelihood of a clinical event, disease recurrence or progression in patients who have the disease or medical condition of interest."</li>
    <li>Predictive - "A biomarker used to identify individuals who are more likely than similar individuals without the biomarker to experience a favorable or unfavorable effect from exposure to a medical product or an environmental agent."</li>
    <li>Pharmacodynamic/Response - "A biomarker used to show that a biological response, potentially beneficial or harmful, has occurred in an individual who has been exposed to a medical product or an environmental agent."</li>
    <li>Safety - "A biomarker measured before or after an exposure to a medical product or an environmental agent to indicate the likelihood, presence, or extent of toxicity as an adverse effect."</li>
</ul>

<div>
  <img src="https://raw.githubusercontent.com/riinbre-bioinfo/Colab_Biomarkers/main/Biomarkers/images/BEST.png", alt="BEST Glossary">
</div>
<div class="alert alert-block alert-success">
    <b>&#9997; Reference: </b>Created in <a href="https://biorender.com/">BioRender</a>
</div>

Notice that the same or similar biomarkers (e.g. BRCA1/2 Variants) can be classified as different types depending on their clinical use.  Classifying biomarkers in this way is useful because it allows us to group different kinds of biomarkers, both qualitative and quantitative, based on the specific clinical question being asked.

---

## We Have a Biomarker.  Now What?

Detecting a potential biomarker is only the first step in translating it to a viable clinical biomarker.  We have already discussed the properties we look for in a good clinical biomarker.  The first three factors are initially determined through basic and clinical research.  Basic research provides us the physiological parameters of the biomarker, its biological mechanism of action, and potential correlations with other biomarkers or confounding factors.  Clinical trials provide us information about the population level parameters of the biomarker and allow us to assess the value of the biomarker in a medical setting.  The initial research phase is then followed up with applied research and development practices (typically at pharmaceutical companies) to produce viable diagnostic tests.  While this module will only focus on biomarker detection and evaluation methods, it is important to keep these downstream factors in mind, particularly when justifying the potential utility of the biomarker in publications, patents or grant proposals.

### Correlation vs. Causation

A living cell is a complex system of interconnected molecular networks which are constantly processing information from the environment and from each other.  As such, it can often be difficult to determine if a biomarker is informative regarding the observed state change or merely an interesting byproduct.  A famous saying in science states "correlation does not imply causation", meaning that just because you see a relationship between two variables does not mean that they directly affect each other.  In the case of biomarkers, we must determine whether the biomarker in question is causative, meaning it directly causes the observed change in state, or is a correlated effect of the change of state.  If the biomarker is correlated, we must determine if the change in the biomarker is a direct effect of the observed change in state (e.g. a change in enzyme activity leads to an increase in associated metabolites), an indirect effect of the change of state (e.g. changes in concentration of a regulatory protein affects the expression of a second regulatory protein gene which changes expression of the genes of a seemingly unrelated metabolic pathway), or is completely unrelated or not specific to the state change (e.g. a specific stress or activating a general global stress response).  This is particularly challenging when working with omics data where we are viewing the entire system as a whole.  A strong knowledge of the biological model in question is essential in teasing these effects apart and determining the potential value of a biomarker.  Especially when working with big data, it is easy to find statistically significant patterns but it is the biological context that gives us confidence in our results.

### Use of Non-Human Model Organisms

Because of ethical constraints on human research, it is often easier to perform basic research on biomarkers using model organisms such as mice or rats.  Many transgenic lines have been developed over the years designed to simulate human diseases such as cancer or neurodegenerative disorders.  Because of genetic homology between mammals, many of these studies can be extrapolated to human models.  However, it is important to remember that non-human organisms, despite genetic similarity to humans, are not human and can display significant physiological, biochemical and genetic differences.  For work in model organisms to translate to humans, human clinical trials are essential.  This is particularly important for bioinformaticians to remember as it is easy to become so focused on that data as to forget the translational nature of the research.  While all research is valuable from an intellectual perspective, biomedical research must ultimately prove practical in a clinical setting.

### Accounting for Human Variability

Designing clinical trials is difficult for a variety of reasons.  Ethical constraints limit the type of research that can be conducted, and it can be difficult to find subjects who meet the proper research criteria.  This is particularly a problem when studying rare diseases.  Another problem that particularly impacted clinical trials in the 20th and early 21st centuries was the lack of diversity in clinical trial subjects, with trials often being conducted using white males.  It was often assumed in such trials that what applied to one cohort applied to all cohorts, but this mentality has often resulted in sub-optimal or negative health outcomes in other cohorts.  The Human Genome Project and Human Haplotype Project have made clear the extraordinary genetic and epigenetic diversity in the human population.  Much of this genetic diversity is the result of small mutations such as single nucleotide polymorphisms (SNPs), insertions and deletions, and chromosomal rearrangements.  Many of these mutations are linked in the genome causing them to be inherited and fixed in populations, resulting in the formation of <b>haplotypes</b>.  These mutations can further be linked to genetic diseases which is the basis for fields such as pharmacogenomics, biomedical informatics, and personalized medicine.  Researchers and funding agencies now recognize that including diverse cohorts in clinical trials is essential.  When restricted cohorts are used, such as analyzing diseases linked to specific cohorts, such restrictions must be justified in the experimental design.  In the case of biomarker research, this ensures that the biomarker data has the maximum utility for the population as a whole and will result in optimal health outcomes for all cohorts.

### Is This Biomarker Better Than What We Have?

Finally, the most important factor we have to consider with a new biomarker is, does this biomarker improve on what is already in use?  Ideally, a new biomarker will be as good or better than what is currently in use.  In practice, trade-offs often have to be considered (e.g. the new biomarker test is more expensive but more accurate and more specific).  If the new biomarker does not substantially improve on what already exists, then it's unlikely to be adopted in a clinical setting.  Most of the chapters in this module will focus on methods to identify and evaluate potential biomarkers and so we will defer further discussion of this topic until later.

---

## Examples of Common Clinical Biomarkers

The following case studies represent well-established biomarkers that are used to diagnose common diseases such as cancer and organ injury.  These cases were chosen because they represent a variety of types of biomarkers.  Read through these case studies and take the biomarkers quiz at the end. 

### Case Study 1: Prostate-Specific Antigen (PSA) as a Protein Biomarker for Prostate Cancer

Prostate-Specific Antigen (PSA) is an immunoprotein produced by both healthy and cancerous prostate cells.  PSA is measured as part of the standard blood work protocol.  Individuals suffering from prostate cancer often exhibit elevated PSA levels and the standard PSA test is often used in conjunction with digital rectal exams to identify prostate cancer.  However, it should be noted that other conditions such as an enlarged prostate can lead to elevated PSA levels even if cancer is not present.  Traditionally, PSA levels of 4.0 mg/dL were considered normal.  However, chronic (e.g. age) and acute (e.g. exercise) changes in PSA levels can complicate diagnosis.  Generally, elevated PSA levels indicate a higher probability of prostate cancer.  Alternative tests including urine biomarker analysis (e.g. prostate cancer antigen 3) and alternate measures of PSA are being explored with the goal of providing clinicians more precise and accurate diagnostic tools for prostate cancer detection.

<div>
  <img src="https://raw.githubusercontent.com/riinbre-bioinfo/Colab_Biomarkers/main/Biomarkers/images/PSA.png", alt="PSA">
</div>
<div class="alert alert-block alert-success">
    <b>&#9997; Reference: </b>Created in <a href="https://biorender.com/">BioRender</a>
</div>


<div class="alert alert-block alert-success">
<b>&#9997; Reference:</b> Crystal structure of human prostate specific antigen complexed with an activating antibody (Crystal structure of human prostate specific antigen complexed with an activating antibody (<a href="https://www.rcsb.org/structure/2ZCH">2ZCH</a>)<br>
Created in <a href="https://biorender.com/">BioRender</a>
</div>

### Case Study 2: Alkaline Phosphatase Activity (ALP) as an Enzymatic Activity Biomarker for Liver Injury

Alkaline phosphatase (ALP) is an enzyme that is found in nearly every organism and which catalyzes hydrolysis of phosphate esters in an alkaline environment.  While the enzyme is active in nearly all tissues, in mammals is is highly expressed in liver and bones and as such has become a useful biomarker for liver and bone diseases including malignant biliary obstruction, primary biliary cholangitis, primary sclerosing cholangitis, hepatic lymphoma, sarcoidosis, Paget's disease, rickets/osteomalacia, osteogenic sarcoma, leukemia, myelofibrosis, and hyperthyroidism.  In these diseases, serum ALP concentrations tend to be high but it can be difficult to diagnose the disease based on ALP alone because of it's ubiquitous nature.  Clinicians may follow up these tests with isozyme analysis which identifies tissue-specific variants of ALP, but these tests are more technically challenging.  Common blood tests also measure concentrations of alanine aminotransferase (ALT), aspartate aminotransferase (AST), and bilirubin which are used in conjunction with ALP to diagnose liver diseases.

<div>
  <img src="https://raw.githubusercontent.com/riinbre-bioinfo/Colab_Biomarkers/main/Biomarkers/images/Liver_Disease.png", alt="Liver Disease">
</div>
<div class="alert alert-block alert-success">
    <b>&#9997; Reference: </b>Created in <a href="https://biorender.com/">BioRender</a>
</div>


<div class="alert alert-block alert-success">
<b>&#9997; Reference:</b> X-ray structure of human alkaline phosphatase in complex with strontium (<a href="https://www.rcsb.org/structure/2GLQ">2GLQ</a>)<br>
Created in <a href="https://biorender.com/">BioRender</a>

The enzymology of ALP is well-known and as such, accurate enzymatic assays for serum ALP have been developed.  A common colorimetric assay uses p-nitrophenyl phosphate which is catalyzed by ALP to a yellow byproduct that absorbs light at 405 nm wavelength.  The rate at which the substrate is converted is directly proportional to the enzyme activity.

<div>
  <img src="https://raw.githubusercontent.com/riinbre-bioinfo/Colab_Biomarkers/main/Biomarkers/images/ALP_Assay.png", alt="ALP Assay">
</div>
<div class="alert alert-block alert-success">
    <b>&#9997; Reference: </b>Created in <a href="https://biorender.com/">BioRender</a>
</div>



<div class="alert alert-block alert-success">
<b>&#9997; Reference:</b> ALP Colorimetric Assay<br>
Created in <a href="https://biorender.com/">BioRender</a>

### Case Study 3: BRCA Variants as Genetic Variant Biomarkers for Breast Cancer

Mutations (also called polymorphisms) in genomic or mitochondrial DNA can result in variety of effects including changes in protein structure and activity, changes in gene regulation, inactivation of genes, and disruption of cellular networks.  Some mutations are inherited across generations (germline mutations) resulting in distinct haplotypes, that is, a unique combination of variants residing near each other on the chromosome.  Some mutations (somatic mutations) occur during an individual's lifetime and are often the result of incorrect repair of DNA.  While most mutations are benign, many have been linked to specific diseases through methods such as genome-wide association studies (GWAS).  Among the most well-known examples of this are the BRCA1 & BRCA2 genes linked to breast cancer.

In healthy individuals, BRCA1 and BRCA2 form a variety of complexes related to DNA repair and tumor suppression.  One such complex, called the BRCA1-D complex, is involved in homologous recombination repair.  Certain variants of these genes can disrupt the tumor suppression activity of the expressed proteins, making the individual more susceptible to breast cancer and at a younger age.  Women in particular are susceptible to these variants, with some variants increasing the chances of developing breast cancer to over 50%.  As such, genetic testing and counseling has become standard procedure for those with a family history of cancer.  Many variants can be detected by testing blood or saliva samples and further tests can determine if the harmful variants are germline or somatic.

<div>
  <img src="https://raw.githubusercontent.com/riinbre-bioinfo/Colab_Biomarkers/main/Biomarkers/images/BRCA_Genes.png", alt="BRCA Genes">
</div>
<div class="alert alert-block alert-success">
    <b>&#9997; Reference: </b>Created in <a href="https://biorender.com/">BioRender</a>
</div>


<div class="alert alert-block alert-success">
<b>&#9997; Reference:</b> Crystal structure of the BRCT repeat region from the breast cancer associated protein, BRCA1 (<a href="https://www.rcsb.org/structure/1JNX">1JNX</a>)<br>
Structure of a BRCA2-DSS1-SSDNA Complex (<a href="https://www.rcsb.org/structure/1MJE">1MJE</a>)<br>
Created in <a href="https://biorender.com/">BioRender</a>

<div>
  <img src="https://raw.githubusercontent.com/riinbre-bioinfo/Colab_Biomarkers/main/Biomarkers/images/BRCA_variants.png", alt="BRCA Variants">
</div>
<div class="alert alert-block alert-success">
    <b>&#9997; Reference: </b>Created in <a href="https://biorender.com/">BioRender</a>
</div>

<div class="alert alert-block alert-success">
<b>&#9997; Reference:</b> BRCA1/BRCA2 pathogenic variants. <a href="https://pubmed.ncbi.nlm.nih.gov/29161300/">Alemar et. al., PLoS One. 2017 Nov 21;12(11)</a><br>

Scientific Figure Available on ResearchGate: <a href="https://www.researchgate.net/figure/Diagrams-of-the-BRCA1-and-BRCA2-genes-indicating-the-position-of-pathogenic-variants_fig2_321205153">ResearchGate</a>

<a href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International</a>
</div>

### Case Study 4: Serum Creatinine as a Metabolic Biomarker for Kidney Injury

The primary function of the kidneys is to filter waster products out of the blood and into urine.  Disruption in kidney function can cause a variety of changes in blood chemistry that can be detected in a standard blood test.  Two common tests used to diagnose kidney disease are serum creatinine (SCr) and blood urea nitrogen (BUN).  Creatinine is a byproduct of muscle function and urea is produced in the liver from ammonia produced by protein catabolism.  In healthy kidneys, these compounds are rapidly filtered out of the blood, but when kidney function is impaired, these chemicals begin to accumulate in the blood.  SCr and BUN are measured as part of standard blood tests.  SCr levels may also be evaluated compared to serum albumin, a common blood protein that is not significantly filtered by the kidneys.  Elevated SCr/albumin levels can indicate disorders such as diabetic kidney disease.

<div>
  <img src="https://raw.githubusercontent.com/riinbre-bioinfo/Colab_Biomarkers/main/Biomarkers/images/Kidney_Disease.png", alt="Kidney Disease">
</div>
<div class="alert alert-block alert-success">
    <b>&#9997; Reference: </b>Created in <a href="https://biorender.com/">BioRender</a>
</div>


<div class="alert alert-block alert-success">
<b>&#9997; Reference:</b> Creatinine (<a href="https://pubchem.ncbi.nlm.nih.gov/compound/588">PubChem CID:588</a>)<br>
Created in <a href="https://biorender.com/">BioRender</a>

### Case Study 5: Microbial Pathogens as Biomarkers for Periodontal Disease

Microbes are microscopic organisms such as bacteria, viruses, protists, etc.  Most microbes exist as complex interconnected communities called microbiomes.  Within these microbiomes, microbial community members interact through competition and cooperation for resources.  These communities can be quite complex, often composed of thousands of distinct microbial species with tens of thousands of species variants (called strains).  Within a given environment, we are typically interested in the structure (which species are there and in what abundance) and the function (what does the community and its members do).  These properties give us indications of the health of the microbiome, interactions between community members, and nutrient cycling.  Just as with an organism, stressors impacting the microbiome can change the structure and function of the community.  This provides a variety of potential biomarkers that can be used to assess the "health" of the microbiome.  Such biomarkers can include presence and abundance of individual species or strains, metabolic and regulatory networks, and traditional omics technologies at the community scale.

In a host-associated microbiome, the microbial community lives in a mutualistic relationship with its host.  This relationship represents a complex interplay between the host's microbiome, the host's personal genetics, the host's immune system, and the environment in which the host lives.  The microbiome can differ radically over different parts of the body depending on where it's located, and species transfer between location-specific microbiomes is common.  The microbiome of the host has a significant effect on the health of the host and vice versa.  Disruption of a microbiome can lead to changes in nutrient cycling within the host or may lead to the emergence of opportunistic pathogens such as <i>C. difficile</i>.  Similarly, the host contracting a disease can cause disruptions in the host microbiome.  Complicating issues, the host's immune system must be able to distinguish between "good" microbes and potential pathogens.  When this fails, autoimmune effects such as inflammation can occur, leading to disorders such as irritable bowel syndrome.  Because all microbiomes are linked through the host's immune system, immune responses in response to a microbiome in one part of the body (e.g., the gut) can lead to health effects in seemingly unrelated parts of the body such as the brain or heart.

The human oral microbiome is a critical component of human health.  The mouth is a major source of environmental signals entering the body through the air and food.  As such, the oral microbiome represents a first line of defense against external pathogens.  The oral microbiome is also the first step in processing and digestion of food and is closely linked with the gut microbiome which is essential for human digestion.  The microbes of the mouth and throat are highly adapted to that environment but are also subject to frequent disruptions from food intake and oral hygiene.  Oral pathogens have been linked to a variety of oral diseases including caries (cavities), gingivitis, periodontitis, and oral cancers.

<div>
  <img src="https://raw.githubusercontent.com/riinbre-bioinfo/Colab_Biomarkers/main/Biomarkers/images/Oral_Microbiome.png", alt="Oral Microbiome">
</div>
<div class="alert alert-block alert-success">
    <b>&#9997; Reference: </b>Created in <a href="https://biorender.com/">BioRender</a>
</div>


<div class="alert alert-block alert-success">
<b>&#9997; Reference:</b> Created in <a href="https://biorender.com/">BioRender</a>
</div>

While oral pathogens have been known for some time, a major step in understanding the nature of oral pathogens was the development of "complex theory" by Socransky et. al.  In complex theory, oral microbes are divided into color groups based on their pathogenicity and location in the oral cavity.  Blue, green, yellow and purple are early colonizers of microbial biofilms on the surface of the tooth root.  These colonizers create biofilms that have an internal environment preferable for orange bridge species and red pathogens which are able to colonize the periodontal pocket wall.  In a healthy oral cavity, blue and green microbes should dominate, while increasing abundances of orange and red species indicate increasingly severe periodontal disease.  With omics technology, researchers can explore the microbial dark matter of the oral microbiome, allowing them to identify new pathogens, microbes associated with pathogens which might not themselves be pathogenic, or exotic species unique to specific cohorts or individuals.  Omics analysis of the oral microbiome can also be used to judge the effectiveness of treatments for oral microbiomes as we would assume as treated oral cavity would show a microbiome more closely similar to that of a healthy microbiome. 

<div>
  <img src="https://raw.githubusercontent.com/riinbre-bioinfo/Colab_Biomarkers/main/Biomarkers/images/Oral_Microbiome_Complex_Theory.png", alt="Oral Microbiome Complex Theory">
</div>
<div class="alert alert-block alert-success">
    <b>&#9997; Reference: </b>Created in <a href="https://biorender.com/">BioRender</a>
</div>

<div class="alert alert-block alert-success">
<b>&#9997; Reference:</b> Created in <a href="https://biorender.com/">BioRender</a> and adapted from <a href="https://pubmed.ncbi.nlm.nih.gov/15853940/">Socransky et. al. 2000</a>
</div>

---

<p><span style="font-size: 30px"><b>Quizzes</b></span> <span style="float : inline;">(run the command below to display the quizzes)</span> </p>

In [None]:
IRdisplay::display_html('<iframe src="quizes/Chapter1_Quizes.html" width=100% height=450></iframe>')

---

## Next Steps

Different users will have different familiarity with the R programming language and with statistical methods used for omics data analysis.  Submodules 2-4 cover some of these basic topics for those who need it.  If you are already familiar with R, linear models, and exploratory analysis methods, you may skip ahead to <b>Submodule 5: Rat Renal Ischemia Reperfusion Injury Case Study</b>.

---

## References

### General

[Hemme CL, Bellavia L, Cho BP, Meenach S, Howlett NG. RI-INBRE: A Statewide NIH Program Grant to Improve Institutional Biomedical Research Capacity in Rhode Island. R I Med J (2021) 104(2):25-29. PMID: 33648315; PMCID: PMC8742675.][riinbre]<br>
[Biomarkers Definition Working Group (2001). "Biomarkers and surrogate endpoints: preferred definitions and conceptual framework." Clinical pharmacology and therapeutics 69(3).  Biomarkers and surrogate endpoints: preferred definitions and conceptual framework][biomarker_wg]<br>
[WHO International Programme on Chemical Safety (2001). "Biomarkers in Risk Assessment: Validity and Validation."][biomarkers]<br>
[US Food and Drug Administration (2021). "About Biomarkers and Qualification"][fda]<br>

[biomarker_wg]: https://www.ncbi.nlm.nih.gov/pubmed/11240971 "Biomarkers Definition Working Group (2001). Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clinical pharmacology and therapeutics 69(3).  Biomarkers and surrogate endpoints: preferred definitions and conceptual framework"
[biomarkers]: https://inchem.org/documents/ehc/ehc/ehc222.htm "Biomarkers in Risk Assessment: Validity and Validation."
[riinbre]: https://pubmed.ncbi.nlm.nih.gov/33648315/ "Hemme CL, Bellavia L, Cho BP, Meenach S, Howlett NG. RI-INBRE: A Statewide NIH Program Grant to Improve Institutional Biomedical Research Capacity in Rhode Island. R I Med J 2021 Mar 1;104(2):25-29. PMID: 33648315; PMCID: PMC8742675."
[fda]: https://www.fda.gov/drugs/biomarker-qualification-program/about-biomarkers-and-qualification "US Food and Drug Administration (2021). 'About Biomarkers and Qualification'"

### Prostate Specific Antigen

[Merrill RM, Otto SA, Hammond EB. Prostate-Specific Antigen Screening According to Health Professional Counseling and Age in the United States. Prostate Cancer. 2022 Jan 6;2022:8646314. doi: 10.1155/2022/8646314. PMID: 35036010; PMCID: PMC8758274.][screening]<br>
[PSA Fact Sheet, NIH National Cancer Institute][fact_sheet]

[screening]: https://pubmed.ncbi.nlm.nih.gov/35036010/ "Merrill RM, Otto SA, Hammond EB. Prostate-Specific Antigen Screening According to Health Professional Counseling and Age in the United States. Prostate Cancer. 2022 Jan 6;2022:8646314. doi: 10.1155/2022/8646314. PMID: 35036010; PMCID: PMC8758274."
[fact_sheet]: https://www.cancer.gov/types/prostate/psa-fact-sheet "PSA Fact Sheet, NIH National Cancer Institute"

### Liver Injury Model

[Gallucci GM, Trottier J, Hemme C, Assis DN, Boyer JL, Barbier O, Ghonem NS. Adjunct Fenofibrate Up-regulates Bile Acid Glucuronidation and Improves Treatment Response For Patients With Cholestasis. Hepatol Commun. 2021 Dec;5(12):2035-2051. doi: 10.1002/hep4.1787. Epub 2021 Aug 27. PMID: 34558841; PMCID: PMC8631103.][gallucci]<br>
[Ghonem NS, Auclair AM, Hemme CL, Gallucci GM, de la Rosa Rodriguez R, Boyer JL, Assis DN. Fenofibrate Improves Liver Function and Reduces the Toxicity of the Bile Acid Pool in Patients With Primary Biliary Cholangitis and Primary Sclerosing Cholangitis Who Are Partial Responders to Ursodiol. Clin Pharmacol Ther. 2020 Dec;108(6):1213-1223. doi: 10.1002/cpt.1930. Epub 2020 Jul 17. PMID: 32480421; PMCID: PMC7886378.][ursodiol]<br>

[ursodiol]: https://pubmed.ncbi.nlm.nih.gov/32480421/ "Ghonem NS, Auclair AM, Hemme CL, Gallucci GM, de la Rosa Rodriguez R, Boyer JL, Assis DN. Fenofibrate Improves Liver Function and Reduces the Toxicity of the Bile Acid Pool in Patients With Primary Biliary Cholangitis and Primary Sclerosing Cholangitis Who Are Partial Responders to Ursodiol. Clin Pharmacol Ther. 2020 Dec;108(6):1213-1223. doi: 10.1002/cpt.1930. Epub 2020 Jul 17. PMID: 32480421; PMCID: PMC7886378."
[gallucci]: https://pubmed.ncbi.nlm.nih.gov/34558841/ "Gallucci GM, Trottier J, Hemme C, Assis DN, Boyer JL, Barbier O, Ghonem NS. Adjunct Fenofibrate Up-regulates Bile Acid Glucuronidation and Improves Treatment Response For Patients With Cholestasis. Hepatol Commun. 2021 Dec;5(12):2035-2051. doi: 10.1002/hep4.1787. Epub 2021 Aug 27. PMID: 34558841; PMCID: PMC8631103."

### BRCA

[Alemar B, Gregório C, Herzog J, Matzenbacher Bittar C, Brinckmann Oliveira Netto C, Artigalas O, Schwartz IVD, Coffa J, Alves Camey S, Weitzel J, Ashton-Prolla P. BRCA1 and BRCA2 mutational profile and prevalence in hereditary breast and ovarian cancer (HBOC) probands from Southern Brazil: Are international testing criteria appropriate for this specific population? PLoS One. 2017 Nov 21;12(11):e0187630. doi: 10.1371/journal.pone.0187630. Erratum in: PLoS One. 2018 May 11;13(5):e0197529. PMID: 29161300; PMCID: PMC5697861.][brca1]<br>
[Savage KI, Harkin DP. BRCA1, a 'complex' protein involved in the maintenance of genomic stability. FEBS J. 2015 Feb;282(4):630-46. doi: 10.1111/febs.13150. Epub 2014 Dec 2. PMID: 25400280.][savage]<br>
[BRCA Gene Mutations: Cancer Risk and Genetic Testing (National Cancer Institute)][fact_sheet]

[brca1]: https://pubmed.ncbi.nlm.nih.gov/29161300/ "Alemar B, Gregório C, Herzog J, Matzenbacher Bittar C, Brinckmann Oliveira Netto C, Artigalas O, Schwartz IVD, Coffa J, Alves Camey S, Weitzel J, Ashton-Prolla P. BRCA1 and BRCA2 mutational profile and prevalence in hereditary breast and ovarian cancer (HBOC) probands from Southern Brazil: Are international testing criteria appropriate for this specific population? PLoS One. 2017 Nov 21;12(11):e0187630. doi: 10.1371/journal.pone.0187630. Erratum in: PLoS One. 2018 May 11;13(5):e0197529. PMID: 29161300; PMCID: PMC5697861."
[savage]: https://pubmed.ncbi.nlm.nih.gov/25400280/ "Savage KI, Harkin DP. BRCA1, a 'complex' protein involved in the maintenance of genomic stability. FEBS J. 2015 Feb;282(4):630-46. doi: 10.1111/febs.13150. Epub 2014 Dec 2. PMID: 25400280."
[fact_sheet]: https://www.cancer.gov/about-cancer/causes-prevention/genetics/brca-fact-sheet "BRCA Gene Mutations: Cancer Risk and Genetic Testing (National Cancer Institute)"

### Renal IRI

[Shiva N, Sharma N, Kulkarni YA, Mulay SR, Gaikwad AB. Renal ischemia/reperfusion injury: An insight on in vitro and in vivo models. Life Sci. 2020 Sep 1;256:117860. doi: 10.1016/j.lfs.2020.117860. Epub 2020 Jun 11. PMID: 32534037.][shiva]<br>
[Hou J, Tolbert E, Birkenbach M, Ghonem NS. Treprostinil alleviates hepatic mitochondrial injury during rat renal ischemia-reperfusion injury. Biomed Pharmacother. 2021 Nov;143:112172. doi: 10.1016/j.biopha.2021.112172. Epub 2021 Sep 21. PMID: 34560548; PMCID: PMC8550798.][hou]<br>
[Ding M, Tolbert E, Birkenbach M, Gohh R, Akhlaghi F, Ghonem NS. Treprostinil reduces mitochondrial injury during rat renal ischemia-reperfusion injury. Biomed Pharmacother. 2021 Sep;141:111912. doi: 10.1016/j.biopha.2021.111912. Epub 2021 Jul 15. PMID: 34328097; PMCID: PMC8429269.][ding]<br>
[Mayo Clinic Creatinine Tests][mayo_cre]<br>
[Mayo Clinic BUN Test][mayo_bun]<br>


[ding]: https://pubmed.ncbi.nlm.nih.gov/34328097/ "Ding M, Tolbert E, Birkenbach M, Gohh R, Akhlaghi F, Ghonem NS. Treprostinil reduces mitochondrial injury during rat renal ischemia-reperfusion injury. Biomed Pharmacother. 2021 Sep;141:111912. doi: 10.1016/j.biopha.2021.111912. Epub 2021 Jul 15. PMID: 34328097; PMCID: PMC8429269."
[hou]: https://pubmed.ncbi.nlm.nih.gov/34560548/ "Hou J, Tolbert E, Birkenbach M, Ghonem NS. Treprostinil alleviates hepatic mitochondrial injury during rat renal ischemia-reperfusion injury. Biomed Pharmacother. 2021 Nov;143:112172. doi: 10.1016/j.biopha.2021.112172. Epub 2021 Sep 21. PMID: 34560548; PMCID: PMC8550798."
[shiva]: https://pubmed.ncbi.nlm.nih.gov/32534037/ "Shiva N, Sharma N, Kulkarni YA, Mulay SR, Gaikwad AB. Renal ischemia/reperfusion injury: An insight on in vitro and in vivo models. Life Sci. 2020 Sep 1;256:117860. doi: 10.1016/j.lfs.2020.117860. Epub 2020 Jun 11. PMID: 32534037."
[mayo_cre]: https://www.mayoclinic.org/tests-procedures/creatinine-test/about/pac-20384646 "Mayo Clinic Creatinine Tests"
[mayo_bun]: https://www.mayoclinic.org/tests-procedures/blood-urea-nitrogen/about/pac-20384821#:~:text=A%20common%20blood%20test%2C%20the,nitrogen%20that's%20in%20your%20blood "Mayo Clinic BUN Test"

### Oral Microbiome

[Socransky SS, Haffajee AD, Cugini MA, Smith C, Kent RL Jr. Microbial complexes in subgingival plaque. J Clin Periodontol. 1998 Feb;25(2):134-44. doi: 10.1111/j.1600-051x.1998.tb02419.x. PMID: 9495612.][complex1]<br>
[Socransky SS, Haffajee AD. Periodontal microbial ecology. Periodontol 2000. 2005;38:135-87. doi: 10.1111/j.1600-0757.2005.00107.x. PMID: 15853940.][complex2]<br>
[Dewhirst FE, Chen T, Izard J, Paster BJ, Tanner AC, Yu WH, Lakshmanan A, Wade WG. The human oral microbiome. J Bacteriol. 2010 Oct;192(19):5002-17. doi: 10.1128/JB.00542-10. Epub 2010 Jul 23. PMID: 20656903; PMCID: PMC2944498.][hom]<br>
[The Human Oral Microbiome Database][homd]<br>
[Bartold PM, Van Dyke TE. Periodontitis: a host-mediated disruption of microbial homeostasis. Unlearning learned concepts. Periodontol 2000. 2013 Jun;62(1):203-17. doi: 10.1111/j.1600-0757.2012.00450.x. PMID: 23574467; PMCID: PMC3692012.][bartold]<br>

[complex1]: https://pubmed.ncbi.nlm.nih.gov/9495612/ "Socransky SS, Haffajee AD, Cugini MA, Smith C, Kent RL Jr. Microbial complexes in subgingival plaque. J Clin Periodontol. 1998 Feb;25(2):134-44. doi: 10.1111/j.1600-051x.1998.tb02419.x. PMID: 9495612."
[complex2]: https://pubmed.ncbi.nlm.nih.gov/15853940/ "Socransky SS, Haffajee AD. Periodontal microbial ecology. Periodontol 2000. 2005;38:135-87. doi: 10.1111/j.1600-0757.2005.00107.x. PMID: 15853940."
[hom]: https://pubmed.ncbi.nlm.nih.gov/20656903/ "Dewhirst FE, Chen T, Izard J, Paster BJ, Tanner AC, Yu WH, Lakshmanan A, Wade WG. The human oral microbiome. J Bacteriol. 2010 Oct;192(19):5002-17. doi: 10.1128/JB.00542-10. Epub 2010 Jul 23. PMID: 20656903; PMCID: PMC2944498."
[homd]: https://www.homd.org/ "The Human Oral Microbiome Database"
[bartold]: https://pubmed.ncbi.nlm.nih.gov/23574467/ "Bartold PM, Van Dyke TE. Periodontitis: a host-mediated disruption of microbial homeostasis. Unlearning learned concepts. Periodontol 2000. 2013 Jun;62(1):203-17. doi: 10.1111/j.1600-0757.2012.00450.x. PMID: 23574467; PMCID: PMC3692012."

---