In [None]:
from IPython.display import HTML

# Toggle button and custom styling
display(HTML('''
<script>
    const toggleCode = () => {
        const selectors = ['.jp-InputArea', '.input', '.code_cell .input'];
        selectors.forEach(sel => {
            document.querySelectorAll(sel).forEach(el => {
                el.style.display = el.style.display === 'none' ? '' : 'none';
            });
        });
    };
</script>
<button onclick="toggleCode()" class="btn btn-default">Toggle Code Cells</button>
<style>
    .container { width: 95% !important; }
</style>
'''))

# Load custom CSS
try:
    with open('custom.css', 'r') as f:
        display(HTML(f'<style>{f.read()}</style>'))
except FileNotFoundError:
    pass

In [None]:
from IPython.display import HTML

# Toggle button and custom styling
display(HTML('''
<script>
    const toggleCode = () => {
        const selectors = ['.jp-InputArea', '.input', '.code_cell .input'];
        selectors.forEach(sel => {
            document.querySelectorAll(sel).forEach(el => {
                el.style.display = el.style.display === 'none' ? '' : 'none';
            });
        });
    };
</script>
<button onclick="toggleCode()" class="btn btn-default">Toggle Code Cells</button>
<style>
</style>
'''))

# Load custom CSS
try:
        display(HTML(f'<style>{f.read()}</style>'))
except FileNotFoundError:
    pass

<div class="container-fluid">
    <div class="row">
        <div class="col-md-8" align="center">
            <h1>PHYS 10792: Introduction to Data Science</h1>
            <!--<h3>2019-2020 Academic Year</h3><br>-->
        </div>
        <div class="col-md-3">
            <img align='center' style="border-width:0" src="images/UoM_logo.png"/>
        </div>
    </div>
</div>

<div class="container-fluid">
    <div class="row">
        <div class="col-md-2" align="right">
            <b>Course instructors:&nbsp;&nbsp;</b>
        </div>
        <div class="col-md-9" align="left">
                    <a href="https://inspirehep.net/authors/1262635">Dr. Ivan Polyokov</a><br>
                    <a href="https://research.manchester.ac.uk/en/persons/patrick.parkinson">Dr. Patrick Parkinson</a>
                </div>
    </div>
</div>

## Problem Sheet 9

### Problem 1: Probabilities

#### Problem 1.1
In which of the following situations might we prefer to use Bayesian or Frequentist approaches to model the statistics?

- Medical diagnosis incorporating prior patient history
- Quality control in manufacturing
- Predicting election outcomes based on prior polls
- Clinical trials with fixed sample sizes
- Real-time fraud detection in financial transactions
- Estimating population parameters from survey data
- Weather forecasting using historical data
- Comparing means of two independent samples
- Personalized marketing based on customer behavior
- Analyzing the effectiveness of a new drug


#### Problem 1.2

An experiment to count muons reaching the earth's surface from cosmic rays is conducted by 120 students. The average expected count rate is 1 per cm $^2$ and per minute. The students start their experiment at 15:20 on a Friday and end the count at 10:00 on the following Monday. Their detectors have a survace area of 0.5 cm by 5.0 cm.
- What average count and sample standard deviation do you expect?
- How many of the students would you expect to have a count of 200 or more above the average?

### Solution to Problem 1

#### Solution 1.1

| Example | Preferred Approach | Explanation |
|---------|--------------------|-------------|
| Medical diagnosis incorporating prior patient history | Bayesian | Bayesian methods allow for the integration of prior knowledge (e.g., patient history) with current data to update probabilities. |
| Quality control in manufacturing | Frequentist | Frequentist methods are often used for hypothesis testing and confidence intervals in quality control processes. |
| Predicting election outcomes based on prior polls | Bayesian | Bayesian approaches can incorporate prior poll results and continuously update predictions as new data comes in. |
| Clinical trials with fixed sample sizes | Frequentist | Frequentist methods are commonly used in clinical trials to test hypotheses and determine the efficacy of treatments. |
| Real-time fraud detection in financial transactions | Bayesian | Bayesian methods can update probabilities in real-time as new transaction data is received, improving detection accuracy. |
| Estimating population parameters from survey data | Frequentist | Frequentist approaches are typically used to estimate population parameters and construct confidence intervals from survey data. |
| Weather forecasting using historical data | Bayesian | Bayesian methods can incorporate historical weather data and update forecasts as new information becomes available. |
| Comparing means of two independent samples | Frequentist | Frequentist methods are often used for comparing means and performing t-tests on independent samples. |
| Personalized marketing based on customer behavior | Bayesian | Bayesian approaches can use prior customer behavior data to update and refine marketing strategies. |
| Analyzing the effectiveness of a new drug | Frequentist | Frequentist methods are standard in drug efficacy studies to test hypotheses and determine statistical significance. |


#### Solution 1.2

- 4000 minutes and 2.5 cm$^2$ mean the expected count is 10,000. The standard deviation is therefore $\sigma=\sqrt{N}=100$.
- A count of 200 above the mean corresponds to 2 standard deviations. We would expect $5\%$ to lie outside two standard deviations either below or above the mean, so $2.5\%$ should have a count of 200 or more above the average. This corresponds to 3 students.

### Problem 2: Confidence belts

#### Problem 2.1
You want to produce a $90\%$ upper limit confidence belt for a Poisson distribution. Calculate the lower limits of the confidence intervals, $k_-$, for the following true means:
- $\lambda=2.0$
- $\lambda=2.3$
- $\lambda=2.4$
- $\lambda=7.5$

#### Problem 2.2

The plot below shows the $80\%$ central interval belt for a Poisson distribution. Derive the $80\%$ intervals for the true mean for measurements of
- $k=0$
- $k=4$
- $k=10$

What is the largest observed count rate that can be interpreted based on this plot as shown?

<img src="images/Poisson_belt_80.png" width=80%>

### Solution to Problem 2

#### Solution to 2.1
We need to find the largest $k$ for which $\sum_{i=0}^k e^{-\lambda}\lambda^k/k!<1-C$ and the confidence interval will start at the subsequent count, therefore $k_-=k+1$. The list below are the probabilities for the elements starting with $k=0$:
- $\lambda=2.0$: 0.135: $k_-=0$
- $\lambda=2.3$: 0.1003: $k_-=0$
- $\lambda=2.4$: 0.091, 0.218: $k_-=1$
- $\lambda=7.5$: 0.0006, 0.0041, 0.0156, 0.0389, 0.0729: $k_-=4$

#### Solution to 2.2
The intervals for the true values of $\lambda$ are read off as the outward-facing corners of the confidence belts where this is intersected by a vertical line through the measured count. They can only be read off approximately by eye, so don't worry if you obtained values that are one or two tenth off.
- $k=0$: $[0.0,2.3]$
- $k=4$: $[1.8,8.0]$
- $k=10$: $[6.3,15.4]$

The largest count rate that can be interpreted is $k=13$ as the upper end of the interval for $k=14$ is not visible on the plot.


<div class="well" align="center">
    <div class="container-fluid">
        <div class="row">
            <div class="col-md-3" align="center">
                <img align="center" alt="Creative Commons License" style="border-width:0" src="https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png" width="60%">
            </div>
            <div class="col-md-8">
            This work is licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>).
            </div>
        </div>
    </div>
    <br>
    <br>
    <i>Note: The content of this Jupyter Notebook is provided for educational purposes only.</i>
</div>