MSAN501 -- Computational analytics

Written by Terence Parr, prof. of computer science and analytics at the University of San Francisco, with ideas from the faculty.

The content contained in this repository represents a set of exercises for the computational analytics (PDF) 5-week bootcamp for the MS in Analytics program at the University of San Francisco. It collects all of the labs students must complete by the end of the bootcamp in order to pass. The labs start out as very simple tasks or step-by-step recipes but then accelerate in difficulty, culminating with an interesting text analysis project.

Summary

This course is specifically designed as an introduction to analytics programming for those who are not yet skilled programmers. The course also explores many concepts from math and statistics, but in an empirical fashion rather than symbolically as one would do in a math class. Consequently, this course is also useful to programmers who would like to strengthen their understanding of numerical methods.

The exercises are grouped into parts. We begin with simple programs to compute statistics, build simple data structures, and use libraries to create visualizations and then move on to learning to use the UNIX command line, launch virtual computers in the cloud, and write simple Hadoop map-reduce programs. The empirical statistics part strives to give an intuitive feel for random variables, density functions, the central limit theorem, hypothesis testing, and confidence intervals. It's one thing to learn about their formal definitions, but to get a really solid grasp of these concepts, it really helps to observe statistics in action. All of the techniques we'll use in empirical statistics rely on the ability to generate random values from a particular distribution. We can do it all from a uniform random number generator, which is the first exercise in that part.

The optimization exercises deal with minimizing functions. Given a particular function, f(x), optimizing it generally means finding its minimum or maximum, which occur when the derivative goes flat: f'(x) = 0. When the function's derivative cannot be derived symbolically, we're left with a general technique called gradient descent that searches for minima. It's like putting a marble on a hilly surface and letting gravity bring it to the nearest minimum.

Finally, we'll do an exercise that introduces text analysis. We'll compute something called TFIDF that indicates how well that word distinguishes a document from other documents in a corpus. That score is used broadly in text analytics, but our exercise uses it to summarize documents by listing the most important words.

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
data		data
labs		labs
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MSAN501 -- Computational analytics

Table of contents

Part I -- Introduction

Part II -- Python Programming and Data Structures

Part III -- A Taste of Distributed Computing

Part IV -- Empirical statistics

Part V -- Optimization and Prediction

Part VI -- Text Analysis

Summary

About

Releases

Packages

Languages

License

edwardt/msan501

Folders and files

Latest commit

History

Repository files navigation

MSAN501 -- Computational analytics

Table of contents

Part I -- Introduction

Part II -- Python Programming and Data Structures

Part III -- A Taste of Distributed Computing

Part IV -- Empirical statistics

Part V -- Optimization and Prediction

Part VI -- Text Analysis

Summary

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages