Skip to content

gdlc/GP3W

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 

Repository files navigation

Genomic Analysis with R in 3 weeks

This repository contains materials for a 3-week course designed to introduce high school students (9th grade or higher) to data analysis using R with a focus on studies involving genetic data.

General Information

  • Instructor: Gustavo de los Campos ( gdeloscampos@gmail.com )
  • Modality: Synchronous distant learning (via discord)
  • Required background: High school student, 9th grade or higher.
  • Expected time commitment: 10-15 hr/week. We will meet as a group on Monday for a lecture and an introduction of the week's task, have a short lecture and office hours on Wednesday, and meet as a group on Friday to present and discuss results.
  • Other requisites: Students are required to have access to a computer, have a personal Discord and Github accounts.
  • Disclaimer:
    • The course is offered by Gustavo de los Campos.
    • This is not a Michigan State University course.
    • Upon completion, you will receive a certificate signed by the instructor.

Tentative program

  • Week 1: Introduction to Github, R, and RStudio

    • Introduction to R: Types, conditional statements, loops, plots, arrays, and importing/exporting data.

    • Introduction to Github.

    • Reporting using RMarkdown and RStudio.

    • Task 1:

      • Reading a genomic data set in R.
      • Producing summary statistics for phenotypes and genotypes.
      • Report results.
  • Week 2: Descriptive statistics and association analysis

    • Variance, covariance and correlation.

    • Simple linear regression.

    • Introduction to Genome Wide Association (GWA) Analysis.

    • Task 2:

      • Produce a Manhattan plot.
      • Identify SNPs significantly associated with a trait.
      • Report results.
  • Week 3: Beyond single-marker-phenotype analysis:

    • Multiple-linear regression (models, estimation, and goodness of fit).

    • Training versus testing accuracy.

    • The curse of dimensionality.

    • Task 3:

      • Fit a multiple regression model to a training set (we will consider various approaches).
      • Evaluate prediction accuracy in the training and testing set.
      • Report results.

Reading materials

Data

Tentative Schedule

Date Time Activity Matierials
M., July 19, 2021 5:00pm-6:00pm Lecture
Wed., July 21, 2021 5:00pm-6:00pm Q&A + short lecture
Fr., July 23, 2021 5:00pm-6:00pm Presentation of Reports
M., July 26, 2021 5:00pm-6:00pm Lecture
Wed., July 28, 2021 5:00pm-6:00pm Q&A + short lecture
Fr., July 30, 2021 5:00pm-6:00pm Presentation of Reports
M., August 2, 2021 5:00pm-6:00pm Lecture
Wed., August 3, 2021 5:00pm-6:00pm Q&A + short lecture
Fr., August 5, 2021 5:00pm-6:00pm Presentation of Reports

About

Genomic Analysis with R in 3 Weeks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published