### Acknowledgements

Numerous people have contributed to the development of the Jupyter notebooks for this undergraduate course. These include (in reverse alphabetical order): N.Wardle, F.Schaap, N.Reed, K.Joseph, C.Jing, A.Hussain, T.S.Evans, P.J.Dunne, D.J.Colling. The notebooks are derived from those used in the Masters  (MRes) course, [Machine Learning and Big Data in the Physical Sciences ](https://www.imperial.ac.uk/study/courses/postgraduate-taught/machine-learning-physical-sciences/), run here in the Physics department and we would like to thank those who created these MRes notebooks. Various other lecturers also provided us with access to their teaching material and we thank them for that.

The course lecturers take responsibility for the form of the notebooks used in the course. We are always looking to improve the course and these notebooks in particular. Please let the course lecturers know of any problems and they would also welcome any constructive comments or suggestions. Comments can be provided through the discussion board on the Blackboard course site or direct to the course lecturers.    


## General Course Information

Full and upto date information is on the Blackboard site for this course. Announcements such as any last minute changes e.g. due to illness, will be on Blackboard. This is just an outline of how the course will be run and is correct *only* at the time of writing.

## Bibliography

Some of this material is provided on the Blackboard

* These Jupyter notebooks are the slides for these lectures, notes on the lectures and the problem sheets for the lectures. 
* The notes on basic statistics from the PHYS40005 Statistics and Measurement course will be sufficient to deal with the statistics encountered here. We will review what we need early in this course but we assume that students are familiar with the material from the PHYS40005 Statistics and Measurement course.

There are many many Machine Learning/Data Science text books out there, many with more theoretical detail than we will use in this course.  We will highlight three that we recommend for this course which are practical as well as having enough theory for you to understand them. In this course we only have time to teach you the basics and to touch on a few techniques. We do not require any textbooks for the course but these books will take you further and will be invaluable as you go forward. The first is free to try and there are several paper copies of the last two in the Library which also provides them as ebooks.

The books are:

* ["The Hundred-Page Machine Learning Book"](https://themlbook.com/) is short and free to try. Prof Collings suggests this is the best conceptual book on machine learning that he found when writing material for the MRes Dat Science course and that it also does a reasonable jobs of the maths. If you find it useful, support the author and buy a copy (around 40 USD).
* ["Hands on machine learning with Scikit-Learn, Keras and TensorFlow 3rd Edition"](https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/) is Prof Collings' to recommendation as the best hands on book. It has sufficient detail to take you into the understanding of the algorithms. This book also has a good collection of the original papers cited and links to them on their website. There is also much information on the associated github. If you only buy one book then this one covers the most. If there was a text book for this course (there isn't), this would be it (3rd edition, 850 pages, 60 GBP).
* ["Introduction to Machine Learning with python"](https://www.oreilly.com/library/view/introduction-to-machine/9781449369880/) is a more introductory level text than the previous recommendation and so is complementary to that one. Well written and takes you through things carefully but is very light on the maths (400 pages, 50 GBP).

## Computing

* The course will use python. We do not assume knowledge beyond that of seen in compulsary courses.
* The machines in the Physics computer lab should have everything you need. 
* Assessment will be on a PC from the Physics computer suite only.

> You are welcome to use your own machine for this course but we do not support anymachne other than the PCs in the computer lab. 

We do not have the resources to help individual laptops.  The top tip would be to try a fresh install of python, then try adding the extra libraries needed (see below). Also note that you will be using a standard Physics computer lab PC for any assessments.   

** MAKE LIST OF LIBRARIES USED.**

We use one or two libraries that are well known and well maintained but are not always present on a standard python install. If you are using your own machine you will need to install these.  These extra libraries include

* [`iminuit`](https://iminuit.readthedocs.io/en/stable/install.html) in week 4.
* [`XGBoost`](https://xgboost.readthedocs.io/en/stable/python/python_intro.html#install-xgboost) in week 5?
    
Note, just because you can download a library does not mean it will work on your machine.  We have had probalems with a slightly older version of `iminuit` on the physics machines that wasn't working while we all had it workingon our work machines without a problem. 


In [3]:
# This will help you check to see if you have these libraries on your machine as any error indicates a problem
from iminuit import Minuit 
print("! iminuit loaded OK")

from xgboost import XGBoostRegressor
print("! xgboost loaded OK")


! iminuit loaded OK


ModuleNotFoundError: No module named 'xgboost'

## Timing

The course is organised in ten sections. For each section there is one python notebook which contains both the lecture material and the exercises for the week. The students are intended to work through each section, each notebook over one calendar week. Each week is broken down as follows:-

* The DSML course week starts on Wednesdays at 9.00 with a one hour presentation from a lecturer in the computing lab. This will cover the new material from the start of that week's Jupyter notebook.
* Students should then work on the exercises in the rest of the notebook over the following four working days.
* The computer suite is also booked for the course immediately after the lecture from 10.00-11.50 on Wednesdays. This is an opportunity for students to work together or singly on the Jupyter notebook for that week. There are no graduate demonstrators for this session.
* There is a presentation with both a lecturer and graduate demonstrators on Mondays at 16.00 in a lecture theatre. This is a chance for students to raise any issues. 

## Questions

* On a Monday afternoon, so towards the end of the DSML course week, there is a one hour session in a lecture theatre where students can ask graduate demonstrators and the lecturer questions. In addition there will be a presentation focussing on the Jupyter notebook that students have been working on for the past week. See timetable for details. 
* Office hours. The lecturer will hold two office hours, see Blackboard for details.  
* Discussion board. The best way to raise questions at other times is to post a questions on the dicussion board which willbe on Blackboard. As many students have similar questions you may well find your answer to a question there. Students are always welcome to add your own ideas to any discussion.


## Feedback

* Third-year students are well placed to support themselevs and to give feedback to each other.  The Physics computer lab is booked for  two hours on Wednesday morning after the lecture (see timetable) to facilitate collaborative work on the course material.
* The Monday session (see timetable) gives students a chance to ask questions of graduate demonstrators and the lecturer.
* The lecturers will hold office hours (see Blackboard)
* There is a discussion board, a link is provided on Blackboard, where students can ask questions at anytime.
* On week 8 there is an **optional** formative assessment. At time of writing this is on the afternoon of Wednesday 28th February but check your timetable for current timing. A formative assessment is for **feedback** only and it does not contribute to the course mark. Feedback from lecturers or graduate demonstrators will be given on any notebooks handed in. This assessment will mimic the style of the final exam but will be shorter. The python notebook used will be made available afterwards for those who can not attend the session.    

## Assessment

This course is assessed by a single practical exam. This exam will normally take place in the computer lab of the Physics department over a full day (morning and afternoon) during the summer exam period (at the time of writing this is planned for 29th April but check elsewhere for the definitive date). The exam will consist of a Jupyter notebook with exercises to complete. The exercises will be in the style of what is set in the course Jupyter notebooks. The focus is on students explaining what they are doing, why they are doing it and on interpreting their results.  

>*Note* the optional formative assessment in week 8 does *not* contribute to the course mark. Also the exercises in the weekly course notebooks are not assessed in anyway.
