STA130 (Fall 2022): An Introduction to Statistical Reasoning and Data Science

This course transitions the Winter 2022 Online version of this course back to the Fall 2019 In-Person format. This is done by translating previous powerpoint slides and accompanying pre-recorded lectures into an Rmd beamer pdf presentation format (which is the format required for student project submissions). At the highest level, the course objectives are to develop and practice the two steps the statistical and data science workflow:

Extract meaning from data through coding and analysis
Communicate learned knowledge in writing and speaking

Weekly Course Routine

2 hours of in-person lecture and interactive class practice quizzes
- 2 hours of review, office hours, and piazza discussion board time
- 3 hours of "completion credit" R homework (5%) designed to develop and practice the skills evaluated on the exams (45%) and the course project (20%)
2 hours of in-person tutorial activities (15%) focussing on written and verbal comminication
- 1 hour of written and verbal communication homework (10%) designed to develop and practice the skills evaluated on the exams and the course project

Course Grading


5% R Homework	10% Written/Verbal Communication Homework	15% In-Person Tutorial Activies
20% Course project	20% Midterm and 25% Final exam	5% Participation Activities

Tutorial attendance is mandatory in the sense that it involves graded in-person activies. Lecture attendance is mandatory in the sense that lectures will not be recorded, but there are no addendance grades for lectures. Practice quizzes and other study materials to help prefar for the exams are available, but these are optional in the sense that they are not graded for course points. The participation activities involve surveys and mentorship activies.

Misses Assessments

Missed Exams may be rescheduled for valid excused absences
Late Project submission will not generally be accomodated
Late Homework and Tutorial work will not be accepted; however...
- the highest grade will replace the lowest for each of these categories
- For additional support contact your College Registrar

Communication

In-person Office Hours TBA	Online Piazza Discussion Board
Online Zoom Office Hours TBA	Special inquiries sta130@utorono.ca

Course project

The course project will be done in consultation with our "project collaborator" Dr. Heman Shakeri of the University of Virginia's (UVA) Data Science Institute (DSI). The DSI is a recently created School (the 12th) of this premier US university originally founded by Thomas Jefferson the principal author of the US Declaration of Independence.

Dr. Shakeri's research, in conjuction with the Department of Biomedical Engineering and Systems Biology and Biomedical Data Sciences, is motivated by the experience with cancer of a close family friend. So, Heman (Dr. Shakeri) wants to use "data-driven identification and control of high-dimensional dynamical systems" to detect deviations away from normal cellular function and intervene to interrupt the pregression of cancer before it can establish a deleterious cellular homeostasis in order to give family's more time with their loved ones and close friends.

The data we will work with is based on advances in the fields of Flow Cytometry for single cell analysis and Mass Spectrometry for measurement of cellular proteomic processes (the phenotypical process endpoint of cellular function and behavior). Based on these technologies, the multivariate landscape of proteomic activity can be measured for a single cell in any experiemental condition for any cell type (e.g., cancerous and benign cellular lines) at scale. By understanding typical cellular homeostatis of healthy and deliterious cells, and observing the phenotypical transformation of cellular proteomic homeostatsis over time in response to different treatments, it is hoped that we will eventually understand how to direct deleterious cellular states to transition into non-deleterious states. I.e., "data-driven identification and control of high-dimensional dynamical systems".

The data observations simultaneously measure 17 so-called AP-1 transcription factors over thousands of cells in a given experiemental condition. By repeatedly observing this protein complex (as is done in this data), the correlation between the different proteins can be observed. Further, the evolution of these dependencies can be observed over time as a response to different interventions. And even further, the emergence of downstream cellular phenotypes in response to changes of the state of the AP-1 system can also be observed and can be characterized through 4 other "phenotype" proteins (whose measurements are also available in this data set). Thus, though our course project, we seek to understand the inter-dependence between the AP-1 proteins, and their driving relationship with downstream cellular phenotypes, which might eventually suggest how we can intervene along this pathway to induce transformation away from deletarious cellular states.

Outline

Week 1: Jupyterhub and Rstudio and R Basics

Week 1 is concerned with introducing students to R and Rstudio using UofT's Jupyterhub. Our primary reference resources in this task are

the R for Data Science textbook by Hadley Wickham & Garret Grolemund
the DoSS Toolkit created by seasoned STA130 Profs. Alexander and Caetano et al.
- specifically the Rstudio, errors, and packages tutorials
and R+Rstudioprimers and Rmarkdown and Rstudio cheatsheets

The UofT Jupyterhub is a phenomenal resource; however, it is subject to service outages from time to time (which have in the past coincided with assignment due dates), and it can take a long time to load when there's a lot of simultaneous user demand (if a lot of students in our or another class log in at once). When you cannot use UofT Jupyterhub you must use your own local Rstudio instance.

An extremely valuable skill in the context of coding for statistics and Data Science is troubleshooting and figuring things out. Resources like the R for Data Science textbook and the the DoSS Toolkit are excellent recources to learn things in a systematic, structured, and organized manner; however, google, stack exchange/overflow, and coding blog posts can be an invaluable resource for finding quick solutions for coding bugs and suggestions for how to complete a desired analyses. Hopefully through this class you will take the opportunity to build your self-sufficiency and coding-resiliance.

Week 1 Course Material

Slides [Jupyterhub], Demo 1 [Jupyterhub], Demo 2 [Jupyterhub]	Questions [Round 1, Round 2, Round 3, Round 4, Round 5]
Homework Assignment and Practice Quiz

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STA130 (Fall 2022): An Introduction to Statistical Reasoning and Data Science

Weekly Course Routine

Course Grading

Misses Assessments

Communication

Course project

Outline

Week 1: Jupyterhub and Rstudio and R Basics

Week 1 Course Material

Module 2: Distributions and Statistics

Module 3: Data Wrangling with Tidy

Module 4: Statistical Inference for a Single Sample Proportion

Module 5: Permutation Tests for Two Groups

Midterm Review

Module 6: Sampling Distributions and Bootstrap Confidence Intervals

Module 7: Linear Regression I

Module 8: Linear Regression II

Module 9: Classification Trees

Module 10: Study Design, Confounding, and Ethics

Final Review

About

Releases

Packages

quin97/STA130

Folders and files

Latest commit

History

Repository files navigation

STA130 (Fall 2022): An Introduction to Statistical Reasoning and Data Science

Weekly Course Routine

Course Grading

Misses Assessments

Communication

Course project

Outline

Week 1: Jupyterhub and Rstudio and R Basics

Week 1 Course Material

Module 2: Distributions and Statistics

Module 3: Data Wrangling with Tidy

Module 4: Statistical Inference for a Single Sample Proportion

Module 5: Permutation Tests for Two Groups

Midterm Review

Module 6: Sampling Distributions and Bootstrap Confidence Intervals

Module 7: Linear Regression I

Module 8: Linear Regression II

Module 9: Classification Trees

Module 10: Study Design, Confounding, and Ethics

Final Review

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages