layout | permalink |
---|---|
default |
/ |
- Course: PHYS 398MLA
- Instructor: Prof. Mark Neubauer, msn@illinois.edu
- Lectures: Mondays from 3-4:50 pm in 222 Loomis Laboratory of Physics
- Need help?
-
- It sends message digests to people who aren't active in the room, so feel free to ask a question even if no one's around.
- Please do not post messages that give away answers to homework problems since posts are viewable by all students enrolled in the course.
- Look through and create issues
- Office Hours
- Prof. Neubauer: Thursdays from 4-5 pm over Zoom
-
Welcome you to the Data Analysis and Machine Learning Application (for physicists) course!
In this course, you will learn fundamentals of how to analyze and interpret scientific data and apply modern machine learning tools and techniques to problems common in physics research such as classification and regression. This course offering is very timely given the explosion of interest and rapid development of data science and artificial intelligence. Every day there are new applications of machine learning to the physical sciences in ways that are advancing our knowledge of nature.
This course is designed to be interactive and collaborative, at the same time developing your own skills and knowledge. I initiated this course in 2018 from a viewpoint that we live in an increasingly data-centric world, with both people and machines learning from vast amounts of data. There has never been a time where early-career physicists such as yourself will benefit from a solid understanding in the basics of scientific data analysis, data-driven inference and machine learning, and working knowledge of the most important tools and techniques from modern data science than today.
This is the third offering of the course. I welcome your feedback on any aspect of the course so that I can work to improve the curriculum.
- You need a laptop for this course. It is assumed that you have a laptop running MacOS, Linux or Windows for use both inside and outside of the class.
- Some knowledge of python preferred but is not required. You do need to have a working knowledge of the basics of computer programming.
- The course lecture workbooks and assessments will be hoste in Prairelearn using their workspaces feature. This provides all you will need to interact with the course notebooks. You also have the option to run the Docker container for the course environment locally, which would provide faster response than the PL Workspace since the course container is rather large. See instructions for use of the PHYS398MLA Docker container at here.
Topics covered include:
- Notebooks and numerical python
- Handling and Visualizing Data
- Finding structure in data
- Measuring and reducing dimensionality
- Adapting linear methods to nonlinear problems
- Estimating probability density
- Probability theory
- Statistical methods
- Bayesian statistics
- Markov-chain Monte Carlo in practice
- Stochastic processes and Markov-chain theory
- Variational inference
- Optimization
- Computational graphs and probabilistic programming
- Bayesian model selection
- Learning in a probabilistic context
- Supervised learning in Scikit-Learn
- Cross validation
- Neural networks
- Deep learning
Topics will be demonstrated in-class through live-code examples/slides in Juypter notebooks.
The lectures will include physics and data science pedagogy, demonstrated through live examples in Jupyter notebooks that you will work through in class. You are required to attend each lecture with your laptop and working environment. Attendance will be taken.
Homework is an important part of the course where you will have an opportunity to apply the techniques you are learning to problems relevant to the analysis of scientific data. All assignments are listed within the Course Outline and distributed through PrairieLearn. You will submit your homework via your privae Github repository.
Approximately halfway through the course, you will have the opportunity to choose from a set of projects based on open scientific data and apply what you have learned in the course. You will be asked to answer certain questions about the data, supported by your analysis and written up in a Jupyter notebook which you will submit. Your notebook will also include background information about how the data is generated, its scientific relevance and your methodology.
- Class Participation: ~20%
- Homework: ~45%
- Research project: ~35%
- Getting overview of the course, including reading list and homework assignments
- Setting up your environment
- Complete setting up your environment so that you can launch and execute notebooks
- None
- Gain familiarity with Jupyter Notebooks and Numerical python
- Learn about handling and describing data
- None
- IPython: Beyond Normal Python
- Python Data Science Handbook
- Introduction to NumPy
- Data Manipulation with Pandas
- Learn about visualizing data
- Learn about the importance of clustering data in physics
- Learn how to find structure in data (clustering)
- KMeans, Spectral Clustering, DBSCAN
- Homework 1: Numerical python and data handling
- Released via Prairelearn on Monday, Feb 7
- Due by 3:00 pm CDT on Monday, Feb 14
- Measure and reduce dimensionality
- Adapt linear models to nonlinear problems
- Homework 2: Visualization, Covariance and Correlation
- Released via Prairelearn on Monday, Feb 14
- Due by 3:00 pm CDT on Monday, Feb 21
- Eigenvalue/Eignvector refresher
- Principle Component Analysis
- PCA Step-by-Step
- Blind Signal Separation
- Learn about Kernel functions
- Learn about Probability Theory
- Homework 3: Expectation-Maximization Algorithm, K-Means, Principle Component Analysis
- Released via Prairelearn on Monday, Feb 21
- Due by 3:00 pm CDT on Monday, Feb 28
- Kernel Method
- Mercer's Theorem
- Similarity Measure
- Nonlinear Dimensionality Reduction by Locally Linear Embedding
- Estimate probability density
- Learn about Statistical Methods
- Homework 4: Probability
- Released via Prairelearn on Monday, Feb 28
- Due by 3:00 pm CDT on Monday, Mar 07
- Positive Definite Matrix definition
- AstroML: Machine Learning and Data Mining for Astronomy
- Freedman-Diaconis Rule for choice of binning
- Kernel Density Estimation
- Algorithms for calculating variance
- Probability Mass Function
-
Learn about Bayesian Statistics
-
Markov-chain Monte Carlo put into practice
- Homework 5: Kernel Density Estimation
- Released via Prairelearn on Monday, Mar 07
- Due by 3:00 pm CDT on Monday, Mar 21
- Beta Distribution
- Gamma Function
- Uninformative Priors
- Conjugate Priors
- Importance Sampling
- C. Maes, An introduction to the theory of Markov processes mostly for physics students
- Foreman-Mackey, Hogg, Lang, Goodman, emcee: The MCMC Hammer
- Learn about Stochastic processes in the realm of Data Science
- Learn about Markov-chain Theory
- Learn about the Variational Inference Method
- Homework 6: Bayesian Statistics and Markov Chain Monte Carlo
- Released via Prairelearn on Monday, Mar 21
- Due by 3:00 pm CDT on Monday, Mar 28
- C. Maes, An introduction to the theory of Markov processes mostly for physics students
- Example of dependence without correlation
- Conditional Independence
- Inverse Problem
- Brownian Motion
- Hamiltonian Mechanics
- Cannonical Distribution
- Hamiltonian MC
- Autocorrelation
- Learn about Optimization and Stochastic Gradient Descent
- Learn about Frameworks for Computational Graphs
- Learn about Probabilistic Programming methods
- Homework 7: Markov Chains
- Released via Prairelearn on Monday, Mar 28
- Due by 3:00 pm CDT on Monday, Apr 18
- Convex Functions
- Jensen's Inequality
- Finite Difference Equations
- Automatic Differentiation
- Rosenbrock function
- Nelder-Mead method
- Conjugate Gradient Method
- Newton's CG method
- Powell's method
- BFGS method
- Stochastic Gradient Descent
- Adam Optimizer
- Softmax Function
- None
- Learn about Cross Validation
- None
- Learning and Inference using Neural Networks
- Homework 8: Cross Validation and Artificial Neural Networks
- Released via Prairelearn on Monday, Apr 18
- Due by 3:00 pm CDT on Monday, May 13
- Learn about Deep Learning
- None
- Deep learning
- None
-
You can find the references list, including required and recommended reading, at Reading list
-
Some quick reference guides
- Linux Bash Shell
- Github
- Conda
- Python
- Markdown
- Jupyter Notebooks: Interface, Keyboard shortcuts
- Sharing code snippets: gist.github.com
- Asking questions of broader development community: Stack Overflow
I would like to acknowledge David Kirby at the University of California at Irvine for the materials and setup for which this course is based and the helpful discussions we have had. I would like to thank Matthew Feickert and Dewen Zhong for their guidance and contributions to the course. I also acknowledge the course at github.com/advanced-js for which the syllabus template was utilized.
Material for a University of Illinois course offered by the Physics Department.
Content is maintained on github and distributed under a BSD3 license.