# Lecture 1 — Introduction

[Open this notebook in Google Colab](https://colab.research.google.com/github/daanmeerburg/Statistics_meerburg_2026/blob/main/Lectures/Lecture_01_PDM.ipynb)



**Lecturer:** P. D. Meerburg 

*Heavily based on materials by Davide Gerosa (astroML, 2022) and collaborators. See [here](https://github.com/dgerosa/astrostatistics_bicocca_2023/blob/main/README.md) for credits.*


---

># Introduction

<img src="https://web-assets.domo.com/miyagi/images/product/product-feature-22-data-never-sleeps-10.png?lb-height=100%25&lb-mode=overlay&lb-width=100%25&utm_medium=website&utm_source=domo.com&utm_term=PF" width="600"/>

The amount of raw information that is being generated every minute is extraordinary. And physics and astronomy are no exception! Either for **"big data"** or **"small data"**, a proper statistical treatment that accounts for statistical and systematic noise, as well as signal dependencies on the measured output, is an essential piece to discovery. 

---

What kinds of things can we learn from data, and how do we do it? 

*What* we can learn is really dependent on your goal, but this must align with the information content of the data. *How* we can interact with data is what **Data Mining** and **Machine Learning** are all about.

* **Data mining** is exactly what it sounds like: sifting through piles of data in order to find something useful---like digging rock from the ground and extracting metal ores from it. It is sometimes called "knowledge discovery", since the emphasis is on techniques and attempts to find patterns in structured data.
* **Machine learning** is about how to do this using computers to leverage our ability to extract useful information from the data by statistically comparing data to various models. The techniques are sometimes called "statistical inference", encompassing regression and model selection. 

---

Who does data mining and uses machine learning?  About everyone and for about everything. Some examples from the real world (but there are so many!):

- Amazon to predict things that you might buy or ads that you might like, https://phys.org/news/2019-06-amazon-tracking.html
- Google for everything I guess, but here is a link about self-driving cars: http://dataconomy.com/how-data-science-is-driving-the-driverless-car/
- Netflix to predict what shows you are likely to want to watch: https://en.wikipedia.org/wiki/Netflix_Prize, https://www.wired.co.uk/article/how-do-netflixs-algorithms-work-machine-learning-helps-to-predict-what-viewers-will-like
- Insurance companies to predict how much of a risk it is to insure you
- Financial institutions to predict the future prices of their investments
- Election prognosticators, e.g., http://fivethirtyeight.com/
- Sports teams e.g., https://en.wikipedia.org/wiki/Moneyball


And, of course, **physicists and astronomers to study the world around us!**


---

### What is this course?

* An introduction to statistical inference, with practical applications. The connection to ML is something for self study or another course, such as the Deep Learning course in the QM program thought by Jelle Aalbers
* *Practical*: This is important! One does not understand how to treat scientific data by only reading equations on the blackboard. Your grade will be determined partly based on a group assigment which involves a data analysis project. 
* While examples are skewed towards cosmology and particle physics, the techniques we will look at are general.

### Why this class?

We're first of all physicists, not data scientists or statisticians. Stats is a tool, but a very important one!  **Having some knowledge of statistical inference and data analysis *absolutely essential* in today's modern (astro)physics research.** There's really no way around it, I think.

This figure from *drewconway.com* nicely illustrates the goal here. 
- You've had many of the green classes already (calculus, advanced maths methods)
- You're into an entire degree of blue classes (physics in this case)
- And perhaps like to play with the red things as well (a hacking project you want to share?)

This class is an attempt to put everything together and go as close to the middle as possible.

![http://static1.squarespace.com/static/5150aec6e4b0e340ec52710a/t/51525c33e4b0b3e0d10f77ab/1364352052403/Data_Science_VD.png?format=750w](http://static1.squarespace.com/static/5150aec6e4b0e340ec52710a/t/51525c33e4b0b3e0d10f77ab/1364352052403/Data_Science_VD.png?format=750w)

---

### My research interests

I am certainly not a statistics expert. However, within VSI together with Jelle, Ann-Katrin and Kristof Bruiyn, we are involved in scientific callaborations that deal with large data. I am talking about collider experiments, collecting huge volumed of data, direct dark matter detectors, such as XENON, and cosmological observables, such as the cosmic microwave background. I myself am involved in the [Simons Observatory](https://simonsobservatory.org/), [CCAT observatory](https://www.ccatobservatory.org/) and [REACH](https://www.reachtelescope.org/). To infer cosmology we typically rely on so-called Bayesian inference (sneakpeak in lecture 2), which in a nutshell assigns probabilty to all degrees of freedom, e.g. the model, the data and the parameters. There is some reason for this which we can (will) discuss. Traditionally in particle physics the statistical inference is based on frequentsist analysis, although of-late Bayesian inference is also applied more often. We will explain the difference between these during the course. 

![](https://www.aanda.org/articles/aa/full_html/2020/09/aa33887-18/aa33887-18-fig2.jpg)

Figure from:  Planck 2018 results X. Constraints on inflation, Planck Collaboration, inc P.D. Meerburg,  A&A Volume 641, September 2020. The Figure shows the final data release of the Planck collaboration and the implications for cosmological paramaters, with an emphesis on inflation. 

Besides the Simons Observatory, which began taking science data in 2024, there are many other exciting experiments already operating or coming online soon in both cosmology and particle physics.

Relevant to cosmology is the [Legacy Survey of Space and Time (LSST)](https://www.lsst.org/) at the **Vera C. Rubin Observatory**. LSST will generate roughly **200 PB** of imaging data over its 10-year mission. Every three nights it will re-image the entire southern sky, measuring well over 100 properties for some **40 billion** objects.  Here are some early example [images](https://rubinobservatory.org/news/first-imagery-rubin) after only a few days of data.

Another flagship project is [Euclid](https://www.esa.int/Science_Exploration/Space_Science/Euclid), a space-based mission launched by ESA and currently collecting data. Euclid is mapping billions of galaxies in optical and near-infrared bands to study dark energy, dark matter, and the geometry of the Universe.  Euclid first-light [images](https://www.esa.int/Science_Exploration/Space_Science/Euclid/Euclid_s_first_images_the_dazzling_edge_of_darkness).

Looking further ahead, the [Square Kilometre Array (SKA)](https://www.skatelescope.org/) will be the premier radio observatory of the late 2020s and 2030s. It will consist of thousands of dishes and over a million antennas spread across South Africa and Australia. At peak, SKA will generate **~1 exabyte of raw data per day**, exceeding the total current daily global internet traffic.

In **particle physics**, the data volumes are even more extreme. At the [Large Hadron Collider (LHC)](https://home.cern/science/accelerators/large-hadron-collider), the detectors observe up to **40 million collisions per second**. Because storing all collisions is impossible, experiments rely on **trigger systems**, which are layers of fast decision logic that determine which events are recorded for further analysis (roughly 1 in 100,000). 

One of the four major LHC experiments is [LHCb](https://lhcb-public.web.cern.ch/), which specializes in the physics of **beauty (b) and charm quarks**, precision tests of the Standard Model, and measurements of **CP violation**. LHCb also records enormous data volumes and recently upgraded to a triggerless readout at 30 TB/s. LHCb event images: https://lhcb-public.web.cern.ch/en/EventDisplay/.

Another important particle-physics experiment is the [XENON](https://xenonexperiment.org/) series of detectors (XENON1T, XENONnT), which are among the world’s most sensitive experiments searching for **dark matter** via the scattering of hypothetical particles like WIMPs off xenon nuclei. These detectors observe tiny flashes of light and ionization inside massive liquid-xenon time projection chambers. Such experiments also collect PB of data. 

What all these experiments have in common is they collect large volumes of data that we have to analyse in order to extract the information we are interested in. In many cases the data is not only large, it is also complex, where the instrument, the measurement and other factors will obscure the signal you are seeking for. Hence, clever methods need to be developed to clean the data. 


---

### Your research interests

Enough about me, how about *your* interests?

- What is your experience with statistics so far? 
- Why did you pick this class? What do you want from it? How do you think statistics (and data analysis) can be useful in your career?


---

># Content

The course is divided into 3 sections:

1. **Probability**
2. **Frequentist inference**
3. **Bayesian inference**

This is the first time we are teaching this course and we need to see how far we can get. We might want to explore a bit of machine learning towards the end. 



### Everything happens on github: [github.com/daanmeerburg/Statistics_meerburg_2026](https://github.com/daanmeerburg/Statistics_meerburg_2026)

Here you can pull all the resources you need. Since this course is heavily based on lectures by Davide Gerosa and Stephen Taylor, please feel free to navigate to those courses on the web as well. Those courses however also include machine learning and are aimed at MSc students. In addition, these courses are heavily skewed towards astronomy. In notebooks, I have tried to replace examples that are overly astro with similar examples from particle physics and cosmology. Since methods are pretty much transferrable to any field, it does not matter too much. So it is more because we are more interested in cosmo and particles :-).  


># Setup

Applying probability theory to real data is an important LO of this course.  One does not understand how to treat scientific data by only reading equations on the blackboard: you will need to get your hands dirty. Students are required to come to classes with a laptop or any device where you can code on (larger than a smartphone I would say...). Each class will pair theoretical explanations to hands-on exercises and demonstrations. These are the key content of the course, so please engage with them as much a possible.

I will run through the notebook and provide some derivations on the blackboard here and there. At the end of each lecture, there is an assignment which requires some coding based on stuff we have done during the lecture. These assignments can be completed during the tutorials. 

These are practical lab sessions where you'll be asked to immediately apply the techniques you have just seen. So, open your jupyter notebook and have fun! The TA's will be able to help and otherwise you can ask me. That being said, I am not a python expert (fortran is my native language) so I might not be able to answer all your questions. However, you are allowed to use help from LLM's. As long as you make sure you understand the outcome (so for LLM generated code I want the student to explicitly comment all lines of code and explain what the code is doing -- while you can ask the LLM to do that for you, it is usually incomplete). 

Because this is a physics course, we also will have to do some calculations with pen and paper. We will have a short entry exam each tutorial. These will be graded and can count towards your final grade (max 10%), but only if they help improve your grade will they count. 

In the final 2 weeks of the course you will be asked to work on a group assignment and write a report. This assignment will count for 40% of your grade. Finally, there will be a 2hr written exam. This will count for 50% of your final grade. The exact content of the written exam has to be determined. 


---

># Logistics


### Class times

- Nominal classes times are on **Tuesday at 9.00 -11.00** and **Wednesday at 13;00.15:00**. The tutorial is on **Thursday 13:00-15:00**. 
- The roster can be found [here](https://rooster.rug.nl/#/en/current/schedule/course-WBPH080-05/timeRange=all&concept=true).



--- 

># Textbook and resources


### Main textbook:

["Statistics, Data Mining, and Machine Learning in Astronomy"](https://press.princeton.edu/books/hardcover/9780691198309/statistics-data-mining-and-machine-learning-in-astronomy), Željko, Andrew, Jacob, and Gray. Princeton University Press, 2012.

This is a very nice book, since it covers the important basics of stastics that I think are applicable in both cosmology and particle physics and therefore provide a solid foundation. The connection with practical examples and codes is really useful. You can download the chapters for free from the RUG library. What I really like about that book is that they provide the code behind each single figure: [astroml.org/book\_figures](https://www.astroml.org/book_figures/). The best way to approach these topics is to study the introduction on the book, then grab the code and try to play with it.  Make sure you get the updated edition of the book (that's the one with a black cover, not orange) because all the examples have been updated to python 3. The figures use some formatting which was not working in the version of Python I was using, so I have tried to fix this to the best of my ability in the notebooks. If you decide to download the code snippets yourself, please be aware. You can always take a look at the modified notebook to see what needs to be commented out for the figures to display.   

![](https://pup-assets.imgix.net/onix/images/9780691198309.jpg?fit=fill&fill=solid&fill-color=ffffff&w=1200&h=630)

### Other useful resources  

- ["Statistical Data Analysis"](https://global.oup.com/academic/product/statistical-data-analysis-9780198501558?cc=fr&lang=en&), Cowan. Oxford Science Publications, 1997. 
- ["Data Analysis: A Bayesian Tutorial"](https://global.oup.com/academic/product/data-analysis-9780198568322?cc=fr&lang=en&), Sivia and Skilling. Oxford Science Publications, 2006.
- ["Bayesian Data Analysis",](http://www.stat.columbia.edu/~gelman/book/) Gelman, Carlin, Stern, Dunson, Vehtari, and Rubin. Chapman & Hall, 2013. Free!
- ["Python Data Science Handbook",](https://jakevdp.github.io/PythonDataScienceHandbook/) VanderPlas. O'Reilly Media, 2016. Free!
- ["Practical Statistics for Astronomers"](https://www.cambridge.org/core/books/practical-statistics-for-astronomers/CEB9D5F985F062BAD67E7219B96A4CD6), Wall and Jenkins. Cambridge University Press, 2003.
- ["Bayesian Logical Data Analysis for the Physical Sciences",](https://www.cambridge.org/core/books/bayesian-logical-data-analysis-for-the-physical-sciences/09E9A95DAE275F5B005676C71B542598) Gregory. Cambridge University Press, 2005.
- ["Modern Statistical Methods For Astronomy" Feigelson and Babu.](https://www.cambridge.org/core/books/modern-statistical-methods-for-astronomy/941AE392A553D68DD7B02491BB66DDEC) Cambridge University Press, 2012.
- ["Information theory, inference, and learning algorithms"](https://www.inference.org.uk/mackay/itila/book.html) MacKay. Cambridge University Press, 2003. Free!  
- “Data analysis recipes". These free are chapters of books that is not yet finished by Hogg et al.
    - ["Choosing the binning for a histogram"](https://arxiv.org/abs/0807.4820) [arXiv:0807.4820]
    - ["Fitting a model to data](https://arxiv.org/abs/1008.4686) [arXiv:1008.4686]
    - ["Probability calculus for inference"](https://arxiv.org/abs/1205.4446) [arXiv:1205.4446]
    - ["Using Markov Chain Monte Carlo"](https://arxiv.org/abs/1710.06068) [arXiv:1710.06068]
    - ["Products of multivariate Gaussians in Bayesian inferences"](https://arxiv.org/abs/2005.14199) [arXiv:2005.14199]
- ["Practical Guidance for Bayesian Inference in Astronomy"](https://arxiv.org/abs/2302.04703), Eadie et al., 2023.
- ["Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow"](https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/), Geron, O'Reilly Media, 2019.

### Still need to embrace the python world?

We will make usage of the python programming language. If you need to refresh your **python skills**, here are some catch-up resources and online tutorials. A strong python programming background is essential in modern (astro)ophysics.

- ["Lectures on scientific computing with Python"](https://github.com/jrjohansson/scientific-python-lectures), R. Johansson et al.  
- [Python Programming for Scientists"](https://astrofrog.github.io/py4sci/), T. Robitaille et al.
- ["Learning Scientific Programming with Python"](https://www.cambridge.org/core/books/learning-scientific-programming-with-python/3D264483BC7B380A3059B3861C661237), Hill, Cambridge University Press, 2020. Supporting code: [scipython.com](https://scipython.com/).

---

># Credits and feedback

### I need your help!

This is the first year that we offer a statistics class in the physics program. 

**This also means it's the first time I teach it.** So please be kind, things are not going to be perfect... But I'll get better eventually!

Very important: **please do give me feedback** (what works, what doesn't work, what topics have been covered in other classes already, if I assume too much from your computing skills, or if you want me to go faster, if the excercises are too hard or too easy, etc).

There will be the regular feedback form at the end of the course, but please feel free to come and give me your opinions at any time! This is particularly useful so I can adjust the class as we proceed.



### Get in touch!

I'm very happy to chat about the class (and more: cosmology, science, career prospects, etc). My office is number 5614.0402 in the Feringa Building. Feel free to stop by and knock at my door (I might say I'm busy and ask you to come later...). Or send me an email for an appointment: [p.d.meerburg@rug.nl](mailto:p.d.meerburg@rug.nl).

### Recording

The classes will be recorded but not live streamed. Recordings will be available on brightspace, not on github for privacy reasons.

**I think that attending lectures in person is crucial.** As you will see very soon, you will be ask to immediately apply what you have learned, while you learn it. If you're not here, you'll miss out on the vast majority of the learning experience. Binge-watching the recorded lectures before the exam is not the same thing as attending a class (this is always true in my opinion, but especially in this case). 


### A huge thanks to...

This class draws heavily from many others that came before me. Credit goes to [will have to update]:

- Davide Gerosa (Milan): [github.com/dgerosa/astrostatistics_bicocca_2023](https://github.com/dgerosa/astrostatistics_bicocca_2023)
- Stephen Taylor (Vanderbilt University): [github.com/VanderbiltAstronomy/astr_8070_s21](github.com/VanderbiltAstronomy/astr_8070_s21).
- Gordon Richards (Drexel University): [github.com/gtrichards/PHYS_440_540](https://github.com/gtrichards/PHYS_440_540).
- Jake Vanderplas (University of Washington): [github.com/jakevdp/ESAC-stats-2014](https://github.com/jakevdp/ESAC-stats-2014).
- Zeljko Ivezic (University of Washington): [github.com/uw-astr-302-w18/astr-302-w18](https://github.com/uw-astr-302-w18/astr-302-w18).
- Andy Connolly (University of Washington): [cadence.lsst.org/introAstroML/](http://cadence.lsst.org/introAstroML).
- Karen Leighly (University of Oklahoma): [seminar.ouml.org/](http://seminar.ouml.org).
- Adam Miller (Northwestern University): [github.com/LSSTC-DSFP/LSSTC-DSFP-Sessions/](https://github.com/LSSTC-DSFP/LSSTC-DSFP-Sessions).
- Jo Bovy (University of Toronto): [astro.utoronto.ca/~bovy/teaching.html](http://astro.utoronto.ca/~bovy/teaching.html).
- Thomas Wiecki (PyMC Labs): [twiecki.github.io/blog/2015/11/10/mcmc-sampling](http://twiecki.github.io/blog/2015/11/10/mcmc-sampling).
- Aurelienne Geron (freelancer): [github.com/ageron/handson-ml2](https://github.com/ageron/handson-ml2).



---

># IT setup

Again, everything happens on github: [github.com/daanmeerburg/Statistics_meerburg_2026](https://github.com/daanmeerburg/Statistics_meerburg_2026)


# Run python

All this class is developed in python (ie: code and lecture notes coincide). There are at least two ways to run the code I prepared.

## 1. Google Colab 

For this course we can use google colab, which is a free to use environment that works in your browser (but can also work in Visual Studio). It has many preinstalled libraries and since the RUG already is based on google you do not need a new account. It also works very nicely w git. By clicking the link at the top of each notebook, this notebook will automtically be opened in Google colab. Note that every time you:

-open a notebook from GitHub, or

-start a new Colab runtime, or

-reconnect after inactivity, or

-their session resets

the virtual machine is fresh, with only Colab’s default packages installed.

❗ Any packages installed with pip install in a previous session are lost.

This is a Colab design choice — not something we can change. We therefore have automated this prcoess by adding an executable cell at the top of the notebook that will install the packages that are not default. 

## 2. Your own python distribution

At some point in your research you'll need to run python code on your laptop. I guess most of you have done it already for earlier classes (if so, how many?). It might take a bit of effort to set it up, but sooner is better than later. 

If you have trouble installing python on your laptop, I'm happy to help as I can. I don't have personal experience with Windows but I was told that getting the [Anaconda installer](https://www.anaconda.com/products/individual#windows) is now the easiest way. 

I also use Visual Studio, since it has a very seemless integration with HPC computing. And it allows you to keep everything in one place, with lots of useful extension (such as github copilot). 

## Installing python packages

You probably know this already, but installing things in python is as easy as typing `pip install something` (if pip install doesn't work, something is wrong with your python installation).

All the packages you need for this class are listed at [requirements.txt](https://github.com/daanmeerburg/Statistics_meerburg_2026/blob/main/requirements.txt) on Github. 


---

># Version control with git

- How many of you have heard about `git` before?
- How many of you have some experience with it?

**Disclaimer**. You can probably get through this class without too much `git`, but I very highly reccomend learning it. It's  something that will make your life so much easier in research! You won't regret it.

In brief, git is a strategy to handle your files, as simple as that. Most crucially, it scales extremely well with number of people and complexity of the workflow. 

- Imagine having hundreds or maybe thousands of developers working on the same piece of code or on the same paper. Good luck sharing files with things like Dropbox, Google Drive, or Overleaf!
- But even if it's just for  yourself... Ever happened that you code **used to** work a while ago, then you changed something, and now desperately want to go back? 

![](https://www.atlassian.com/dam/jcr:9f149cef-f784-43de-8207-3e7968789a1f/03.svg)

### The basics

- `git` is a transfer protocol (kind of like `http` or `ssh`) that is designed explicitely for code development.
- On top of `git`, people have built web frontends to make our life easier (much like a browser for `http`). The most popular of these frontends is [github.com](https://github.com/) (which is owned by Microsoft now).

The core element is a "repository", which is basically just a directory --or better: a directory git can talk to. A single repository can have several instances:
- A remote server hosts a copy. In this case, we'll put it on github.com (it's free!).
- Developer 1 (say me) has a copy.
- Develper 2 (say you) has another copy.
- etc.

The repository of each developer talks only to the remote server:

![](https://www.cs.swarthmore.edu/git/git-repos.svg)

The process starts by creating a remote repository and **cloning** it locally (that's something you do only once). Someone changes the code locally and **pushes** it to the remote server. Someone else **pulls** the modification and goes on from there. The system is very smart, and has very precise rules on how to handle cases where both people edit the same file. Making a modification implies **adding** and **committing** a file.

These five commands (`clone`, `pull`, `push`, `add`, `commit`) already let you do a ton of powerful stuff.  



>## Hands on Git!

Good practice in my experience, especially when it concerns computing and coding, is to just do it. So let us try and play with git. 

### Part 1. Make sure you have it.

1. To get `git` for any platform see: [https://git-scm.com/download/](https://git-scm.com/download/). If you don't have git installed right now on your machine, I encourage you to sort it out later in your own time. For now you can use a Bicocca virtual machine, where `git` is already installed and functioning. Open a terminal window and run

    ```bash
    which git
    ```

    You should get a path like `/usr/bin/git` that indicates git is indeed present on your machine. 

2. Now create an account on [github.com](https://github.com/). Do what they say, and obviously select the free version. 

3. We now need to set up cryptographic keys (we'll use [RSA keys](https://en.wikipedia.org/wiki/RSA_(cryptosystem)), a very fascinating concept of safe encryption which has to do with prime number theory).

    On your terminal type
    ```bash|
    ssh-keygen
    ```
    and hit return three times. You should see the paths of the keys. Now copy the content of your public key (not the private one!)

    ```bash
    cat [path]/id_rsa.pub
    ```

    Go to github, top-right corner, Settings, SSH and GPG keys, New SSH keys. Paste the content of `id_ras.pub` into that box. Careful about adding unwanted new-line characters when copying and pasting.


### Part 2. Let's go!

Let's practice some git now!

1. On [github.com](https://github.com/), create a repository called `ilovegit`.

2. clone your repo using:
    ```bash
    cd ~/reps # Or wherever you want it to be
    git clone git@github.com:YOUR_GITHUB_USERNAME/ilovegit.git
    ```

3. start Jupyter in the cloned directory
    ```bash
    cd ilovegit
    jupyter notebook &
    ```
4. create a new notebook. Name it `hello.ipynb`. Add a cell with the following piece of code:
    ```python
    print("Hello World!")
    ```
5. see what happened:
    ```bash
    git status
    ```
6. add the notebook to your git repository and commit by running (in the terminal window) the following:
    ```bash
    git add hello.ipynb
    git commit -m "Added hello.ipynb to repository."
    ```
7. see what happened:
    ```bash
    git status
    ```
8. make another change in the Jupyter notebook. For example, add another cell ("+" icon on the toolbar) with the following:
    ```python
    x = 2+2
    print(x)
    ```
9. see what happened
    ```bash
    git status
    ```
10. commit changed files (the `a` options is equivalent to do `add` and then `commit`)
    ```bash
    git commit -am "Updated hello.ipynb with complex mathematics."
    ```
11. "push" the changes to github
    ```bash
    git push
    ```
12. go browse the result on github

13. edit the readme from the browser on github (this is to mimic what happens when someone touches the code)

14. "pull" the changes from github
    ```bash
    git pull
    ```
15. have a look at your local copy of README.md
























### Part 3. Interact with the class material

This class was developed and is being maintained using `git`. Go to the class git repository at https://github.com/daanmeerburg/Statistics_meerburg_2026. **Don't clone this!** Instead, look to the top right of the page for an option to fork the repository. This will make a copy of the class repository for your own personal use.

**If you plan on using your own computer, you can follow the following steps to access and work with the materials through Git** 

```bash
git clone git@github.com:YOUR_GITHUB_USERNAME/Statistics_meerburg_2026.git
```


Before proceeding further, we're now going to add the `daanmeerburg` repository as an [`upstream` repository to your fork](https://docs.github.com/en/free-pro-team@latest/github/collaborating-with-issues-and-pull-requests/configuring-a-remote-for-a-fork). First, list the current configured remote repository for your fork with:

```bash
git remote -v
```

Now, add the `daanmeerburg` repo as an `upstream`:

```bash
git remote add upstream https://github.com/daanmeerburg/Statistics_meerburg_2026
```

Verify that the new repository shows as an upstream by running `git remote -v` again.


You now have the ability to work with your own fork, sync upstream changes to this fork, and commit changes to your fork. (we won't do it, but git allows you to ask for permission to incorporate changes upstream, this feature is called `pull request`).

In order to [sync new lectures from upstream to your fork](https://docs.github.com/en/free-pro-team@latest/github/collaborating-with-issues-and-pull-requests/syncing-a-fork), run the following in the local directory of your cloned fork:

```bash
git fetch upstream
git checkout main
git merge upstream/main
```

You should do this often in order to see new materials that I add. 

For the group assignments, using **Git will be mandatory**. It will also allow me to check how much you have contributed to the assignment (how much you have pushed to a shared project folder)

In my class repo you will find 
- `Lectures` contains the material shown during classes.
- `Assignments` contains my exploration with the proposed datasets. Please don't write anything in any of these two directories. I will upload my solutions to github by the end of the week (assignments of that week) 
- `Working` is an empty directory for you. Put your solutions there.

----

These commands are going to be enough to get you through this class. If you want to dig deeper into git, this is an excellent beginning-to-end crash course of about 1 hour. 

### If you really hate all of this...

Learning `git` is a core content of this class (and, well, needless to say but this will be taken into account at the exam). This is because I believe `git` is a cornerstone of modern software development and I really think learning it would boost your science careers. It is also deeply embedded in some other aspects of our working environment, such as overleaf. The version control of git is what is so important, especially when working on large collaborative projects (hence I ask you to u)

The shortcut is to go to https://github.com/daanmeerburg/Statistics_meerburg_2026 then "Code" and "Download ZIP". This will download a copy of the class material without any git interaction. However, you'll need to do it manually every time I update the material (and sort out differences between your changes and my changes... good luck!).