sllobs-biostats

Location

HSEB 1730, 9AM on Mondays, Dates Below

Lectures/Slides

Module 1 - Intro to R

Lecture 1 (Aaron Quinlan, May 20, 2019): Intro to Data Analysis in RStudio
- slides
Lecture 2 (Aaron Quinlan, June 3, 2019): Data frames and Importing Data
- slides
- Inspired in part by: https://4va.github.io/biodatasci/r-rnaseq-airway.html
Lecture 3 (Aaron Quinlan, June 10, 2019): More with data frames, precision v. accuracy, very basic RNA-seq analysis
- Inspired in part by: https://4va.github.io/biodatasci/r-rnaseq-airway.html
- slides
- video
Lecture 4,5 (Javier Hernandez, June 17, 2019): R Packages, Data Types, Functions
- slides
- code
Lecture 6 (Tom Sasani, July 1, 2019): Intro to data visualization and ggplot2
- slides
- video
Lecture 7 (Charlie Murtaugh, July 8, 2019): Data Wrangling with "tidyverse"
- slides
- data
- code

Module 2: Intro to Probability and Inference

Lecture 8 (Aaron Quinlan, July 22, 2019): Intro to Probability
- slides
- video
Lecture 9 (Alan Rogers, August 5, 2019): Sum rule, Product rule, Conditional probability and Bayes rule.
- slides
Lecture 10 (Alan Rogers, August 12, 2019): Conditional Probability and Bayes Rule
- slides
- video
Lecture 11 (Aaron Quinlan, August, 26): Poisson random variables for counting applications in biology
- slides
- video
Lecture 12 (Aaron Quinlan, September 16): Gaussian distributions and QQ plots
- slides
- video
Lecture 13 (Aaron Quinlan, October 14): Central Limit Theorem and Confidence Intervals
- slides
Lecture 14 (Aaron Quinlan, November 4): The t-statistic, t-distribution, t-tests, and p-values
- slides
- video
Lecture 15 (Aaron Quinlan, November 25): Power calculations and sample size
- slides
- video

Module 3: Regression

Lecture 16 (Tom Sasani, December 2): Intro to regression, part 1
- slides
- video
Lecture 17 (Tom Sasani, December 16): Intro to regression, part 2
- slides
Lecture 18 (Tom Sasani, February 10): Intro to regression, part 3
- slides

Slack Group

Join the SLLOBS slack group here:

Overview

The goal of the Salt Lake Learners of Biostats (SLLOBS) is to convene folks at all levels that are interested in learning (and teaching) basic concepts in data analysis and statistics for biological research. We will meet bi-weekly on Monday mornings at 9AM for one hour.

One or more people will work together to learn and present a topic each week. The goal of each lecture is to:

- Give an accessible introduction to the topic
- Provide clear examples and explanations
- Demonstrate R code that conveys the topic

This will indeed require effort by the presenter, but the idea is that a large group of interested folks will provide a large pool of both teachers and learners.

The vision is that if we all attend and put our best effort forward, we will all learn together and have a shared foundation for future learning and discussion.

Expectations

To be successful, SLLOBS will need to:

- attend (most) every meeting
- read any required material before each lecture
- make a concerted effort to contribute and present material for the group.

If members follow these expectations, we will have a large corpus of teaching and learning material that will be available to refer back to. Furthermore, it will be the basis for a formal course in the future.

Grab bag of things to discuss in the future.

multiple-testing FDR, Bonferroni, Q-values (Storey)
power analysis
chi-squared, contingency tests
batch effects
survival curves
r versus r^2
Monte Carlo simulations
Gibbs sampling
MCMC
MA plots:
- https://twitter.com/Noncodarnia/status/1124099713291169800

References (incomplete)

Curriculum

Module 1: Intro to Data Analysis in R

Lecture 1: Goals of the group, Intro to R and RStudio
- Goals and Motivation
- Meeting frequency
- Expectations
- sharing material
- sharing knowledge
- What is R?
- Why R?
- Installing RStudio
- RStudio
- Calculator
- Lists
Lecture 2: Basics of R (I)
- RMarkdown
- Vectorization
- Data types
- Built-in datasets
- Importing Data
  - broken data
Lecture 3: Basics of R (II)
- Installing packages
- Basic Data Wrangling
- tidyverse
Lecture 4: Intro to plotting and data visualization
- Why
- plot()
  - customizing
- Scatter Plot
- Regression Line (details later)
- Barplot
- Histograms
- Boxplots and better versions thereof

Module 2: Intro to Probability and Inference

Lecture 1: Probability
- Discrete Random Variables
  - Bernouli trials
- Binomial success counts
- Poisson distributions
- Continuous Random Variables
- Descriptive Statistics
  - expected value
- mean
- median
- mode
- Basic Simulations
  - coin toss
- importance of sample size
Lecture 2: Inference (I)
- Random variable probability distributions
- Expected Values
- Standard Error
- Variance and standard deviation
Lecture 3: Maximum Likelihood
- Problem setup
- Work through an example
Lecture 4: Inference (II)
- Estimates
- Central Limit Theorem
- Confidence Intervals
Lecture 5: t-tests, p-values, multiple testing, q-values???
Lecture 6: Inference (III)
- Developing Models
- Intro to Bayesian Inference
- Bayesian Thinking

Module 3: Regression

References:
- http://blog.yhat.com/posts/r-lm-summary.html
Lecture 1: Motivation
- Examples
- Complexity
- What is a linear model?
- Basic correlation
- Least Squares
Lecture 2: Intro to Regression
- Galton: Regression toward the mean
- Correlation
  - Pearson
  - Spearman
- Anscombe's Quartet
- Regression Line
- Stratification
Lecture 3: Linear Models
- lm
- interpretation of coefficients and p-values
- impact and handling of outliers
- confidence intervals
- Interpretation with Examples
Lecture 4: Generalized Linear Models
- Why?
- How to design them
- Example: Sasani et al, DNM counts?
Lecture 6: Most statistical tests are really just linear models
- One mean tests:
  - One sample t-test and Wilcoxon signed-rank
  - One mean tests: Paired samples t-test and Wilcoxon matched pairs
Lecture 7: Most statistical tests are really just linear models (cont.)
- Two means tests:
  - Independent t-test
- Mann-Whitney U
- Welch's t-test
Lecture 8: Most statistical tests are really just linear models (cont.)
- Three or more means
  - One-way ANOVA and Kruskal-Wallis
- Two-way ANOVA
- ANCOVA
Lecture 9: Goodness of fit tests

Module 4: Advanced Plotting

Lecture 1: ggplot
Lecture 2: 1D data plots: barplots, boxplots, violin plots, beewswarm plots, density plots
Lecture 3: 2D data plots: Scatterplots, hexbin
Lecture 4: more than two dimensions: faceting, interactive graphics, color

Module 5: Models for count data such as high throughput-sequencing

Lecture 1: Intro
- Challenges of count data
- RNA-seq
- Modeling Count Data
  - Dispersion
  - Normalization
Lecture 2:
- Poisson noise
- Biological signal
- Biological and technical replicates
Lecture 3: DeSeq2
- the method
- analyses and examples
Lecture 4: Misc
- Outliers
- Count data transformations

Module 6: Mixture Models

Lecture 1: Biological data is often multi-modal. How do we handle this?
- generate mixtures of normal distributions
Lecture 2: Expectation Maximization (EM) for reverse engineering the mixtures

Module 7: Clustering

Lecture 1: Intro
- Why do we cluster data?
- Measuring similarity
- k-means clustering
Lecture 2: Clustering examples with flow cytometry data
- Data preprocessing
- Density-based clustering
Lecture 3: Hierarchical clustering
Lecture 4: Validating and choosing the number of clusters
Lecture 5: Detecting Batch effects

Module 8: Testing

Lecture 1: Hypothesis testing
- types of error
- revisiting the t-test
- permutation tests
Lecture 2: P-value hacking
Lecture 3: Multiple testing
- Theory, Implications
- Bonferonni correction
Lecture 4: False discovery rate (FDR)
- P-value histogram
- Benjamini-Hochberg algorithm for limiting FDR
- Local FDR
Lecture 5: Other tests?

Module 9: Distributions

Lecture 1: Different distributions
Lecture 2: Fitting data to distributions, Q-Q plot

Module 10: Dimensionality Reduction (in progress)

Dimension reduction
PCA

Module 11: Special Topics (in progress)

Sampling
Bootstrapping

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
code		code
data		data
img		img
slides		slides
README.md		README.md

quinlan-lab/sllobs-biostats

Folders and files

Latest commit

History

Repository files navigation

sllobs-biostats

Location

Lectures/Slides

Module 1 - Intro to R

Module 2: Intro to Probability and Inference

Module 3: Regression

Slack Group

Overview

Expectations

Grab bag of things to discuss in the future.

References (incomplete)

Curriculum

Module 1: Intro to Data Analysis in R

Module 2: Intro to Probability and Inference

Module 3: Regression

Module 4: Advanced Plotting

Module 5: Models for count data such as high throughput-sequencing

Module 6: Mixture Models

Module 7: Clustering

Module 8: Testing

Module 9: Distributions

Module 10: Dimensionality Reduction (in progress)

Module 11: Special Topics (in progress)

About

Resources

Stars

Watchers

Forks

Languages