Skip to content

IS606 - Statistics and Probability for Data Analysis - CUNY

Notifications You must be signed in to change notification settings

jbryer/IS606Fall2015

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IS 606 - Statistics and Probability for Data Analytics - Fall 2015

Instructor: Jason Bryer, Ph.D.
Class Meetup: Thursday 7:00pm to 8:00pm
Office Hours: By appointment
Email: jason.bryer@gmail.com
Phone: 518-464-8594

Course Description

This course covers basic techniques in probability and statistics that are important in the field of data analytics. Discrete probability models, sampling from infinite and finite populations, statistical distributions, basic Bayesian statistics, and non-parametric statistical techniques for categorical data are covered in this course. Each of these statistical concepts will be applied in a variety of real-world scenarios through the use of case studies and customized data sets.

Course Learning Outcomes:

By then end of the course, students should be able to:

  • Understand the foundations of probability theory and perform basic probability calculations.
  • Build basic stochastic models for commonly encountered business problems.
  • Model situations involving uncertainty using appropriate probability distributions and conditional techniques.
  • Explore and summarize data using descriptive statistics.
  • Test hypotheses using classical and modern computational techniques.
  • Construct estimators and calculate intervals using classical and modern computational techniques.
  • Perform basic Bayesian statistical techniques for estimation and testing hypotheses.

Program Learning Outcomes addressed by the course:

  • Business Understanding. Learn when probabilistic techniques apply to certain categories of business problems, discuss the sorts of solutions that are possible, and understand the limitations of these techniques.
  • Foundational Math Skills. Explore and analyze data, build probabilistic and statistical models, construct estimators, and test hypotheses.
  • Predictive Modeling. Learn foundational techniques that underlie predictive modeling algorithms, such as Naïve Bayes.
  • Presentation. Complete and submit collaborative assignments using techniques from the course.

How is this course relevant for data analytics professionals?

Probabilistic techniques are the foundation of many data science applications from data exploration and visualization to outlier analysis, stochastic modelling, and data mining algorithms. This course will ensure that students have a strong understanding of these foundations.

Grading

  • Homework (18%)
  • Labs (36%)
  • Data Project (20%)
  • Final exam (20%)
  • Meetup Presentation (5%)
  • Getting Aquainted (1%)

Grade Distribution

Quality of Performance Letter Grade Range % GPA
Excellent - work is of exceptional quality A 93 - 100 4
Excellent A- 90 - 92.9 3.7
Good - work is above average B+ 87 - 89.9 3.3
Satisfactory B 83 - 86.9 3
Below Average B- 80 - 82.9 2.7
Poor C+ 77 - 79.9 2.3
Poor C 70 - 76.9 2
Failure F < 70 0

How This Course Works:

This course is conducted entirely online. Each week, you will have various resources made available, including weekly readings from the textbooks and occasionally additional readings provided by the instructor. Most weeks will have homework assignments to be submitted. There will also be a presentation required and a forum post introduction required. You are expected to complete all assignments by their due dates.

Meetup presentations will comprise the solution and presentation to the class of one of the suggested problems for study from the weekly materials (not the graded homework problems). Each student must present one problem throughout the semester. Problems are chosen by entering your name and problem in the Google Spreadsheet. Note there is a maximum of three presentatiosn per meetup and presentations should be no more than five minutes. Additionally, prepare your presentation so that the slides or document (I suggest using R Markdown) will be shared on the course website. Problems are assigned first come, first served, so any problem not already chosen by another student is available.

Presentation Signup Sheet

Further details on each of these assignments will be available in Blackboard and/or this Github repository.

Schedule

NOTE: Tentative. Subject to change

Google Calendar for IS 606

Start Due Date Chapter Topic
Aug-27 Sep-6 1 Intro to Data
Sep-7 Sep-13 2 Probability
Sep-14 Sep-27 3 Distributions
Sep-28 Oct-11 4 Foundation for Inference
Oct-12 Nov-1 5 Inference for Numerical Data
Oct-12 Nov-1 6 Inference for Categorical Data
Nov-2 Nov-15 7 Linear Regression
Nov-16 Nov-29 8 Multiple & Logistic Regression
Nov-30 Dec-13 Kruschke Introduction to Bayesian Analysis
Dec-14 Dec-17 Final Exam (due by 5pm on Dec-17)

Meetup Schedule

There will be weekly meetups. You are encouraged to attend as many as you can but recordings will generally be availabe within a few days of the meetup.

Presentation Signup Sheet

Date Topic
Thursday Aug-27 7:00 pm Introduction to the course (Video)
Thursday Sep-3 7:00 pm Introduction to data (Video)
Thursday Sep-10 7:00 pm Probability (Video)
Tuesday Sep-15 7:00 pm Distributions Part I (Video)
Thursday Sep-24 7:00 pm Distributions Part II (Video)
Thursday Oct-1 7:00 pm Foundation for Inference (Video)
Thursday Oct-8 7:00 pm Foundation for Inference Part 2 (Video)
Thursday Oct-15 7:00 pm Inference for Numerical Data (Video)
Thursday Oct-22 No Class
Thursday Oct-29 7:00 pm Inference for Categorical Data (Video)
Thursday Nov-5 7:00 pm Linear Regression (Video)
Thursday Nov-12 7:00 pm Linear Regression (Video)
Thursday Nov-19 7:00 pm Multiple & Logistic Regression (Video)
Thursday Nov-26 No Class - Happy Thanksgiving
Wednesday Dec-2 7:00 pm Intro to Bayesian Analysis (Video)
Thursday Dec-10 No Class
Thursday Dec-17 7:00 pm Conclusions

Textbooks

Required

Diez, D.M., Barr, C.D., & Çetinkaya-Rundel, M. (2015). OpenIntro Statistics (3rd Ed).

This is an open source textbook and can be downloaded in PDF format here, from the OpenIntro website, or a printed copy can be ordered from Amazon.

Kruschke, J.K. (2014). Doing Bayesian Data Analysis, Second Edition: A Tutorial with R, JAGS, and Stan (2nd Ed). London: Academic Press.

This book can be purchased from Amazon, but also check out the author's webiste (doingbayesiandataanalysis.blogspot.com/) for additional resources.

Recommended

Kabacoff, R.I. (2011). R in Action. Manning Publications.

You can find a lot of the material in R in Action on Kabacoff's website, statmethods.net. You can receive 38% off using the ria38 promo code when ordering from here.

Wickham, H. Advanced R. Baca Raton, FL: Taylor & Francis Group.

Most of this book is available freely online at adv-r.had.co.nz but can be purchased from Amazon.

Other Documents

Homework Assignments

The solutions to the practice problems are at the end of the book and do not need to be handed in. Graded assignments should be typed (preferably using R Markdown) or neatly hand written and scanned. Data for the homework assignments, and for within the chapters too, can be downloaded here.

  • Chapter 1. (due Sept 6)
    • Practice: 1.7 (available in R using the data(iris) command), 1.9, 1.23, 1.33, 1.55, 1.69
    • Graded: 1.8, 1.10, 1.28, 1.36, 1.48, 1.50, 1.56, 1.70
  • Chapter 2. (due Sept 13)
    • Practice: 2.5, 2.7, 2.19, 2.29, 2.43
    • Graded: 2.6, 2.8, 2.20, 2.30, 2.38, 2.44
  • Chapter 3. (due Sept 27)
    • Practice: 3.1 (see normalPlot), 3.3, 3.17 (use qqnormsim from lab 3), 3.21, 3.37, 3.41
    • Graded: 3.2 (see normalPlot), 3.4, 3.18 (use qqnormsim from lab 3), 3.22, 3.38, 3.42
  • Chapter 4. (due Oct 11)
    • Practice: 4.3, 4.13, 4.23, 4.25, 4.39, 4.47
    • Graded: 4.4, 4.14, 4.24, 4.26, 4.34, 4.40, 4.48
  • Chapter 5. (due Nov 1)
    • Practice: 5.5, 5.13, 5.19, 5.31, 5.45
    • Graded: 5.6, 5.14, 5.20, 5.32, 5.48
  • Chapter 6. (due Nov 1)
    • Practice: 6.5, 6.11, 6.27, 6.43, 6.47
    • Graded: 6.6, 6.12, 6.20, 6.28, 6.44, 6.48
  • Chapter 7. (due Nov 15)
    • Practice: 7.23, 7.25, 7.29, 7.39
    • Graded: 7.24, 7.26, 7.30, 7.40
  • Chapter 8. (due Nov 29)
    • Practice: 8.1, 8.3, 8.7, 8.15, 8.17
    • Graded: 8.2, 8.4, 8.8, 8.16, 8.18
  • Bayesian (due Dec 13)
    • Graded: 2.1, 5.1, 5.2

Labs

These mini projects will have you explore statistical topics using R. For each project, create an R Markdown file. Name your file using the following format: LastName-X.Rmd where X is 0 to 8 for the project number.

  1. Introduction to R and RStudio (Template)
  2. Introduction to Data (Template)
  3. Probability (Template)
  4. Distributions of Random Variables (Template)
  5. Foundations for Statistical Inference
    1. Sampling Distributions (Template)
    2. Confidence Levels (Template)
  6. Inference for Numerical Data (Template)
  7. Inference for Categorical Data (Template)
  8. Introduction to Linear Regression (Template)
  9. Multiple Linear Regerssion (Template)

Data Project

The purpose of the data project is for you to conduct reproducible research using open access data. The final project will include an R Markdown file with all required data files so that anyone else can run your analysis. Your project will be made available to other students on this website. The proposal will be graded on a pass/fail basis. More details on the format of the project including templates are on this page: https://github.com/jbryer/IS606Fall2015/blob/master/Project/IS606_Data_Project.md

Important Dates:

  • Proposal due October 19
  • Final Project due December 7

Software

We will make use of R, an open source statistics program and language. Be sure to install R and RStudio on your own computers within the first few days of the class.

If using Windows, you also need to download and install these:

Once everything is installed, execute the following command in RStudio to install the packages we will use for this class (you can copy-and-paste):

install.packages(c('openintro','OIdata','devtools','ggplot2','psych','reshape2',
				   'knitr','markdown'))
devtools::install_github("jbryer/IS606")

The IS606 R Package

Many of the course resouces are available in the IS606 R package. Here are some command to get started:

library('IS606')          # Load the package
vignette(package='IS606') # Lists vignettes in the IS606 package
vignette('os3')           # Loads a PDF of the OpenIntro Statistics book
data(package='IS606')     # Lists data available in the package
getLabs()                 # Returns a list of the available labs
viewLab('Lab0')           # Opens Lab0 in the default web browser
startLab('Lab0')          # Starts Lab0 (copies to getwd()), opens the Rmd file
shiny_demo()              # Lists available Shiny apps

Learning R

Learning R Markdown

  • Video on RMarkdown by RStudio - This 26 minute video talks about some updates to RMarkdown.
  • Markdown Basics. Markdown is a way of formatting plain text documents mostly for the web. However, it has become for other writing tasks too. It has become popular because it focusses on writing and not formatting. The formatting is taken care later. The Markdown Basics provides a nice introduction to Markdown.
  • The R Markdown Website has a nice introduction on how Markdown is extended to allow for the inclusion of R code and output.
  • Video Introduction to R Markdown. This short video (under 4 minutes) was recorded with an older version, so not all of the features and dialog boxes will look the same, but may be helpful.

Creating Math Equations

Occasionally you will need to type equations in homework and labs. R Markdown supports LaTeX style equations using the MathJax javascript library. I do not expect you to learn LaTeX for this course. Instead, I recommend using the free application Daum Equation Editor. It availabe online, as a Google Chrome Extension, or as a standalone Mac Application. For more details, go to this page: http://github.com/jbryer/IS606Fall2015/Pages/Equations.md

Contact

Office Hours (cell phone or using GoToMeeting): TBD and also by appointment throughout the week. You’re encouraged to schedule an appointment, but you can try to call anytime.

You are encouraged to ask us questions on the "Ask Your Instructor"" forum on the course discussion board where other students will be able to benefit from your inquiries.

For the most part, you can expect me to respond to questions by email within 24 to 48 hours. If you do not hear back from me within 48 hours of sending an email, please resend your message.

I will be checking in on the course regularly, just about every day and likely several times each day. Please do not hesitate to ask if you have questions or concerns.

Accessibility and Accommodations

The CUNY School of Professional Studies is firmly committed to making higher education accessible to students with disabilities by removing architectural barriers and providing programs and support services necessary for them to benefit from the instruction and resources of the University. Early planning is essential for many of the resources and accommodations provided. Please see: http://sps.cuny.edu/student_services/disabilityservices.html

Online Etiquette and Anti-Harassment Policy

The University strictly prohibits the use of University online resources or facilities, including Blackboard, for the purpose of harassment of any individual or for the posting of any material that is scandalous, libelous, offensive or otherwise against the University’s policies. Please see: http://media.sps.cuny.edu/filestore/8/4/9_d018dae29d76f89/849_3c7d075b32c268e.pdf

Academic Integrity

Academic dishonesty is unacceptable and will not be tolerated. Cheating, forgery, plagiarism and collusion in dishonest acts undermine the educational mission of the City University of New York and the students' personal and intellectual growth. Please see: http://media.sps.cuny.edu/filestore/8/3/9_dea303d5822ab91/839_1753cee9c9d90e9.pdf

Student Support Services

If you need any additional help, please visit Student Support Services: http://sps.cuny.edu/student_resources/

About

IS606 - Statistics and Probability for Data Analysis - CUNY

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages