Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
146 lines (114 sloc) 23.9 KB
title: "R Classes at Stanford"
author: "cengel {sul-cidr}"
date: "Aug 22, 2018"
From [](
Adacemic Year 2017-2018
This list is probably not complete. Please submit a pull request or email me with what I have missed.
#### BIO 141/STATS 141: Biostatistics
Introductory statistical methods for biological data: describing data (numerical and graphical summaries); introduction to probability; and statistical inference (hypothesis tests and confidence intervals). Intermediate statistical methods: comparing groups (analysis of variance); analyzing associations (linear and logistic regression); and methods for categorical data (contingency tables and odds ratio). Course content integrated with statistical computing in R.
Terms: Win | Units: 3-5 | UG Reqs: GER:DB-Math, WAY-AQR | Grading: Letter or Credit/No Credit
Instructors: Zhu, X. (PI)
#### BIO 202: Ecological Statistics
Intended for graduate students (and advanced undergraduates in special circumstances with consent of instructors) in biology and related environmental sciences, this course is an introduction to statistical methods for ecological data analysis, using the programming language R. The course will have lectures, discussions, and independent research projects using the students¿ own data or simulated or publicly available data.
Terms: Aut, alternate years, not given next year | Units: 3 | Grading: Letter (ABCD/NP)
Instructors: Coyle, J. (PI) ; Fukami, T. (PI)
#### [BIOS 221/STATS 366: Modern Statistics for Modern Biology](
Application based course in nonparametric statistics. Modern toolbox of visualization and statistical methods for the analysis of data, examples drawn from immunology, microbiology, cancer research and ecology. Methods covered include multivariate methods (PCA and extensions), sparse representations (trees, networks, contingency tables) as well as nonparametric testing (Bootstrap, permutation and Monte Carlo methods). Hands on, use R and cover many Bioconductor packages. Prerequisite: Minimal familiarity with computers. Instructor consent. Location: Li Ka Shing Center, room 120.
Terms: Sum | Units: 3 | Grading: Letter or Credit/No Credit
#### [CHPR 290/EDUC 260B/STATS 266: Advanced Statistical Methods for Observational Studies](
Design principles and statistical methods for observational studies. Topics include: matching methods, sensitivity analysis, instrumental variables, graphical models, marginal structural models. 3 unit registration requires a small project and presentation. Computing is in R. Pre-requisites: HRP 261 and 262 or STAT 209 ( HRP 239), or equivalent.
Terms: Spr | Units: 2-3 | Grading: Medical Option (Med-Ltr-CR/NC)
Instructors: Baiocchi, M. (PI) ; Rogosa, D. (PI)
#### CME 195/STATS 195: Introduction to R
This short course runs for the first four weeks of the quarter and is offered in fall and spring. It is recommended for students who want to use R in statistics, science, or engineering courses and for students who want to learn the basics of R programming. The goal of the short course is to familiarize students with R's tools for scientific computing. Lectures will be interactive with a focus on learning by example, and assignments will be application-driven. No prior programming experience is needed. Topics covered include basic data structures, File I/O, graphs, control structures, etc, and some useful packages in R.
Terms: Aut, Spr | Units: 1 | Grading: Satisfactory/No Credit
Instructors: Nguyen, L. (PI)
#### CME 250: Introduction to Machine Learning
A Short course presenting the principles behind when, why, and how to apply modern machine learning algorithms. We will discuss a framework for reasoning about when to apply various machine learning techniques, emphasizing questions of over-fitting/under-fitting, regularization, interpretability, supervised/unsupervised methods, and handling of missing data. The principles behind various algorithms--the why and how of using them--will be discussed, while some mathematical detail underlying the algorithms--including proofs--will not be discussed. Unsupervised machine learning algorithms presented will include k-means clustering, principal component analysis (PCA), and independent component analysis (ICA). Supervised machine learning algorithms presented will include support vector machines (SVM), classification and regression trees (CART), boosting, bagging, and random forests. Imputation, the lasso, and cross-validation concepts will also be covered. The R programming language will be used for examples, though students need not have prior exposure to R. Prerequisite: undergraduate-level linear algebra and statistics; basic programming experience (R/Matlab/Python).
Terms: Win, Spr | Units: 1 | Grading: Satisfactory/No Credit
#### CS 102: Big Data: Tools and Techniques, Discoveries and Pitfalls
Aimed primarily at students who may not major in CS but want to learn about big data and apply that knowledge in their areas of study. Many of the world's biggest discoveries and decisions in science, technology, business, medicine, politics, and society as a whole, are now being made on the basis of analyzing massive data sets, but it is surprisingly easy to come to false conclusions from data analysis alone, and privacy of data connected to individuals can be a major concern. This course provides a broad introduction to big data: historical context and case studies; privacy issues; data analysis techniques including databases, data mining, and machine learning; sampling and statistical significance; data analysis tools including spreadsheets, SQL, Python, R; data visualization techniques and tools. Tools and techniques are hands-on but at a cursory level, providing a basis for future exploration and application. Prerequisites: high school AP computer science, CS106A, or other equivalent programming experience; comfort with statistics and spreadsheets helpful but not required.
Terms: Aut | Units: 3-4 | UG Reqs: WAY-AQR | Grading: Letter or Credit/No Credit
Instructors: Widom, J. (PI)
#### [EDUC 401D/STATS 196A: Multilevel Modeling Using R](
Multilevel data analysis examples using R. Topics include: two-level nested data, growth curve modeling, generalized linear models for counts and categorical data, nonlinear models, three-level analyses.
Terms: Spr | Units: 1 | Grading: Satisfactory/No Credit
Instructors: Rogosa, D. (PI)
#### ENGR 150: Data Challenge Lab
In this lab, students develop the practical skills of data science by solving a series of increasingly difficult, real problems. Skills developed include: data manipulation, data visualization, exploratory data analysis, and basic modeling. The data challenges each student undertakes are based upon their current skills. Students receive one-on-one coaching and see how expert practitioners solve the same challenges. Limited enrollment; application required. See for more information.
Terms: Win, Spr | Units: 1-5 | Grading: Letter (ABCD/NP)
Instructors: Behrman, W. (PI) ; Wickham, H. (PI)
#### HRP 219: Evaluating Technologies for Diagnosis, Prediction and Screening
New technologies designed to monitor and improve health outcomes are constantly emerging, but most fail in the clinic and in the marketplace because relatively few are supported by reliable, reproducible evidence that they produce a health benefit. This course covers the designs and methods that should be used to evaluate technologies to diagnose patients, predict prognosis or other health events, or screen for disease. These technologies can include devices, statistical prediction rules, biomarkers, gene panels, algorithms, imaging, or any information used to predict a future or a previously unknown health state. Specific topics to be covered include the phases of test development, how to frame a proper evaluation question, measures of test accuracy, Bayes theorem, internal and external validation, prediction evaluation criteria, decision analysis, net-utility, ROC curves, c-statistics, net reclassification index, decision curves and reporting standards. Examples of technology assessments and original methods papers are used. Software used in the course is R or Stata. Open to graduate students with a solid understanding of introductory biostatistics, epidemiologic and clinical research study design, and of medical conditions and related technologies required. Basic understanding of Stata or R is also required. Undergraduates may enroll with consent of instructor.
Terms: Win | Units: 3 | Grading: Medical Option (Med-Ltr-CR/NC)
Instructors: Goodman, S. (PI)
#### HUMBIO 89: Statistics in the Health Sciences
This course aims to provide a firm grounding in the foundations of probability and statistics, with a focus on analyzing data from the health sciences. Students will learn how to read, interpret, and critically evaluate the statistics in medical and biological studies. The course also prepares students to be able to analyze their own data, guiding them on how to choose the correct statistical test, avoid common statistical pitfalls, and perform basic functions in R deducer.
Terms: Aut, Win | Units: 3 | UG Reqs: GER:DB-Math, WAY-AQR | Grading: Letter or Credit/No Credit
Instructors: Sainani, K. (PI) ; Serghiou, S. (PI)
#### LAW 7505: Law and Economics of the Death Penalty Seminar
(Formerly Law 397) This seminar will examine the legal and policy aspects of a capital punishment regime, with a focus on three primary issues: 1) the Supreme Court's forty-year effort to define what cases can permissibly receive the death penalty and the procedures under which it must be imposed; 2) the arguments for and against the death penalty, with a major focus on whether the death penalty deters, is administered in a racially biased way, or is otherwise implemented in an arbitrary and capricious manner; and 3) what the U.S. and international status of the death penalty is today and what the prospects are for the future in the wake of Justice Breyer's invitation in June 2015 to the Court to rule on the constitutionality of capital punishment in light of the existing empirical evidence. Although the readings on deterrence and racial discrimination will entail some substantial statistical analysis, a background in statistics, though helpful, will not be required. Special Instructions: After the term begins, students can transfer from section (01) into section (02), which meets the R requirement, with consent of the instructor. Students taking the course for R credit can take the course for either 2 or 3 units, depending on the paper length. Elements used in grading seminar: Written assignments and final paper or approved research with the professor.
Terms: Aut | Units: 2-3 | Grading: Law Honors/Pass/Restrd Cr/Fail
Instructors: Donohue, J. (PI))
#### ME 341X: Statistics for Design Experiments
Feedback from users is fundamental to good design. Often this feedback is collected in the form of a survey, resulting in data requiring both analysis and synthesis. Course content will be delivered via live and on-line video lectures, with group classroom time dedicated to completing the lab assignments. You will learn the specific skills necessary to design, launch and collect data using an online survey tool (Qualtrics), how to analyze the results using R for Statistical Computing, and to create simple graphical representations of statistical data. This course is designed to complement ME341 ¿ Design Experiments although enrollment in ME341 is not a prerequisite for this course. One-unit credit requires completion of an analysis project using data collected as part of this class. Auditors welcome.
Terms: Win | Units: 1 | Grading: Credit/No Credit
Instructors: Schar, M. (PI)
#### MS&E 226: "Small" Data
This course is about understanding "small data": these are datasets that allow interaction, visualization, exploration, and analysis on a local machine. The material provides an introduction to applied data analysis, with an emphasis on providing a conceptual framework for thinking about data from both statistical and machine learning perspectives. Topics will be drawn from the following list, depending on time constraints and class interest: approaches to data analysis: statistics (frequentist, Bayesian) and machine learning; binary classification; regression; bootstrapping; causal inference and experimental design; multiple hypothesis testing. Class lectures will be supplemented by data-driven problem sets and a project. Prerequisites: CME 100 or MATH 51; 120, 220 or STATS 116; experience with R at the level of CME/STATS 195 or equivalent.
Terms: Aut | Units: 3 | Grading: Letter or Credit/No Credit
Instructors: Johari, R. (PI)
#### MS&E 231/SOC 278: Introduction to Computational Social Science
With a vast amount of data now collected on our online and offline actions -- from what we buy, to where we travel, to who we interact with -- we have an unprecedented opportunity to study complex social systems. This opportunity, however, comes with scientific, engineering, and ethical challenges. In this hands-on course, we develop ideas from computer science and statistics to address problems in sociology, economics, political science, and beyond. We cover techniques for collecting and parsing data, methods for large-scale machine learning, and principles for effectively communicating results. To see how these techniques are applied in practice, we discuss recent research findings in a variety of areas. Prerequisites: introductory course in applied statistics, and experience coding in R, Python, or another high-level language.
Terms: Aut | Units: 3 | Grading: Letter or Credit/No Credit
Instructors: Goel, S. (PI)
#### MS&E 246: Financial Risk Analytics
Practical introduction to financial risk analytics, focusing on data-driven modeling, computation, and statistical estimation of credit and market risks. Case studies based on real data will be emphasized. Topics include mortgage risk, asset-backed securities, commercial lending, consumer delinquencies, crowd funding, transactions analytics, derivatives risk. Tools from machine learning and statistics will be developed. Data sources will be discussed. Intended to enable students to design and implement risk analytics tools in practice. Prerequisite: 245A or similar, some background in probability and statistics, working knowledge of R, Matlab, or similar computational/statistical package.
Terms: Win | Units: 3 | Grading: Letter (ABCD/NP)
Instructors: Giesecke, K. (PI)
#### MS&E 330/SOC 279: Law, Order & Algorithms
Data and algorithms are rapidly transforming law enforcement and criminal justice, including how police officers are deployed, how discrimination is detected, and how sentencing, probation, and parole terms are set. Modern computational and statistical methods offer the promise of greater efficiency, equity, and transparency, but their use also raises complex legal, social, and ethical questions. In this course, we analyze recent court decisions, discuss methods from machine learning and game theory, and examine the often subtle relationship between law, public policy and statistics. Students work in interdisciplinary teams to explore these issues in an empirical or investigative project of their choice. Prerequisite: An introductory course in applied statistics (e.g. MS&E 125). Recommended: experience programming in R or Python.
Terms: Spr | Units: 3 | Grading: Letter (ABCD/NP)
Instructors: Goel, S. (PI)
#### MS&E 448: Big Financial Data and Algorithmic Trading
Project course emphasizing the connection between data, models, and reality. Vast amounts of high volume, high frequency observations of financial quotes, orders and transactions are now available, and poses a unique set of challenges. This type of data will be used as the empirical basis for modeling and testing various ideas within the umbrella of algorithmic trading and quantitative modeling related to the dynamics and micro-structure of financial markets. Due to the fact that it is near impossible to perform experiments in finance, there is a need for empirical inference and intuition, any model should also be justified in terms of plausibility that goes beyond pure econometric and data mining approaches. Introductory lectures, followed by real-world type projects to get a hands-on experience with realistic challenges and hone skills needed in the work place. Work in groups on selected projects that will entail obtaining and cleaning the raw data and becoming familiar with techniques and challenges in handling big data sets. Develop a framework for modeling and testing (in computer languages such as Python, C++ , Matlab and R) and prepare presentations to present to the class. Example projects include optimal order execution, developing a market making algorithm, design of an intra-day trading strategy, and modeling the dynamics of the bid and ask. Prerequisites: MS&E 211, 242, 342, or equivalents, some exposure to statistics and programming. Enrollment limited. Admission by application; details at first class.
Terms: Spr | Units: 3 | Grading: Letter (ABCD/NP)
Instructors: Borland, L. (PI)
#### OIT 367: Business Intelligence from Big Data
The objective of this course is to analyze real-world situations where significant competitive advantage can be obtained through large-scale data analysis, with special attention to what can be done with the data and where the potential pitfalls lie. Students will be challenged to develop business-relevant questions and then solve for them by manipulating large data sets. Problems from advertising, eCommerce, finance, healthcare, marketing, and revenue management are presented. Students learn to apply software (such as R and SQL) to data sets to create knowledge that will inform decisions. The course covers fundamentals of statistical modeling, machine learning, and data-driven decision making. Students are expected to layer these topics over an existing facility with mathematical notation, algebra, calculus, probability, and basic statistics.
Terms: Win | Units: 3 | Grading: GSB Letter Graded
Instructors: Bayati, M. (PI)
#### POLISCI 150B/POLISCI 355B: Machine Learning for Social Scientists
Machine learning---the use of algorithms to classify, predict, sort, learn and discover from data---has exploded in use across academic fields, industry, government, and non-profit. This course provides an introduction to machine learning for social scientists. We will introduce state of the art machine learning tools, show how to use those tools in the programming language R, and demonstrate why a social science focus is essential to effectively apply machine learning techniques. Applications of the methods will include forecasting social phenomena, the analysis of social media data, and the automatic analysis of text data. Political Science 150A or an equivalent is required.
Terms: Win | Units: 5 | UG Reqs: WAY-AQR | Grading: Letter or Credit/No Credit
Instructors: Terman, R. (PI)
#### STATS 32: Introduction to R for Undergraduates
This short course runs for weeks two through five of the quarter. It is recommended for undergraduate students who want to use R in the linguistics, humanities, social sciences or biological sciences and for students who want to learn the basics of R programming. The goal of the short course is to familiarize students with R's tools for scientific computing. Lectures will be interactive with a focus on learning by example, and assignments will be application-driven. No prior programming experience is needed. Topics covered include basic data structures, File I/O, graphs, control structures, etc, and some useful packages in R. Prerequisite: undergraduate student. Priority given to non-engineering students. Laptops necessary for use in class.
Terms: Aut | Units: 1 | Grading: Satisfactory/No Credit
Instructors: Tay, J. (PI)
#### STATS 101: DATA SCIENCE 101
This course will provide a hands-on introduction to statistics and data science. nStudents will engage with the fundamental ideas in inferential and computational thinking. Each week, we will explore a core topic comprising three lectures and two labs (a module), in which students will manipulate real-world data and learn about statistical and computational tools. Students will engage in statistical computing and visualization with current data analytic software (Jupyter, R). The objectives of this course are to have students (1) be able to connect data to underlying phenomena and to think critically about conclusions drawn from data analysis, and (2) be knowledgeable aboutnprogramming abstractions so that they can later design their ownncomputational inferential procedures. Open to undergraduates and graduates.
Terms: Aut, Spr | Units: 5 | UG Reqs: GER: DB-NatSci, WAY-AQR | Grading: Letter or Credit/No Credit
Instructors: Mohanty, P. (PI) ; Sabatti, C. (PI) ; Taylor, J. (PI) ; Walther, G. (PI) ; Xia, L. (PI)
#### STATS 191: Introduction to Applied Statistics
Statistical tools for modern data analysis. Topics include regression and prediction, elements of the analysis of variance, bootstrap, and cross-validation. Emphasis is on conceptual rather than theoretical understanding. Applications to social/biological sciences. Student assignments/projects require use of the software package R. Recommended: 60, 110, or 141.
Terms: Win | Units: 3-4 | UG Reqs: GER:DB-Math, WAY-AQR | Grading: Letter or Credit/No Credit
Instructors: Walther, G. (PI)
#### STATS 196A: Multilevel Modeling Using R (EDUC 401D)
See . Multilevel data analysis examples using R. Topics include: two-level nested data, growth curve modeling, generalized linear models for counts and categorical data, nonlinear models, three-level analyses.
Terms: Spr | Units: 1 | Grading: Satisfactory/No Credit
Instructors: Rogosa, D. (PI)
#### STATS 216: Introduction to Statistical Learning
Overview of supervised learning, with a focus on regression and classification methods. Syllabus includes: linear and polynomial regression, logistic regression and linear discriminant analysis;cross-validation and the bootstrap, model selection and regularization methods (ridge and lasso); nonlinear models, splines and generalized additive models; tree-based methods, random forests and boosting; support-vector machines; Some unsupervised learning: principal components and clustering (k-means and hierarchical). Computing is done in R, through tutorial sessions and homework assignments. This math-light course is offered via video segments (MOOC style), and in-class problem solving sessions. Prereqs: Introductory courses in statistics or probability (e.g., Stats 60), linear algebra (e.g., Math 51), and computer programming (e.g., CS 105).
Terms: Win | Units: 3 | Grading: Letter or Credit/No Credit
Instructors: Tibshirani, R. (PI)
#### STATS 290: Computing for Data Science
Programming and computing techniques for the requirements of data science: acquisition and organization of data; visualization, modelling and inference for scientific applications; presentation and interactive communication of results. Emphasis on computing for substantial projects. Software development with emphasis on R, plus other key software tools. Prerequisites: Programming experience including familiarity with R; computing at least at the level of CS 106; statistics at the level of STATS 110 or 141.
Terms: Win | Units: 3 | Grading: Letter or Credit/No Credit
Instructors: Chambers, J. (PI) ; Narasimhan, B. (PI)
#### STATS 367: Statistical Models in Genetics
This course will cover statistical problems in population genetics and molecular evolution with an emphasis on coalescent theory. Special attention will be paid to current research topics, illustrating the challenges presented by genomic data obtained via high-throughput technologies. No prior knowledge of genomics is necessary. Familiarity with the R statistical package or other computing language is needed for homework assignments. Prerequisites: knowledge of probability through elementary stochastic processes and statistics through likelihood theory.
Terms: Win | Units: 3 | Grading: Letter or Credit/No Credit
Instructors: Palacios, J. (PI)