### **Question 1 (10 points)**

In a digital card game, players are awarded a random card from a deck after completing each level. The deck consists of $n$ different types of cards. Each card type is equally likely to be awarded after a level is completed. After a player collects $k$ different types of cards, the game introduces one additional type of card to the deck, making it $n+1$ different types in total. The probability of getting any particular type of card remains equal.

**a.** Determine how many levels a player must complete on average before they can expect to have at least one of each type of card in the expanded deck. (7 points)  
**b.** Explain how the introduction of the new card type after collecting $k$ cards affects the expected number of levels needed to complete the set. (3 points)  

### **Solution - Question 1**

**a.** 

First, we'll consider the expected number of levels to collect $k$ different types of cards, and then consider the additional steps required to collect the $n+1$ th card after the introduction of the new card type.

**1. Collecting the First $k$ Cards:** 
- When the game starts, there are $n$ different types of cards. The expected number of levels to collect the first card is obviously 1 (since you're guaranteed a new card type on your first try). 
- For the second card, since there are $n-1$ new types to collect, the probability of getting a new type in each level is $(n−1)/n$. So, the expected number of levels to get a second new type is $n/(n−1)$. This pattern continues for collecting up to $k$ different cards.

The expected number of levels to collect $k$ different types of cards is given by the sum:
$$ E(k) = 1 + \frac{n}{n-1} + \frac{n}{n-2} + \cdots + \frac{n}{n-k+1} $$

**2. Collecting the Last $n-k+1$ Cards:** 
- After $k$ cards are collected, a new type is added, making it $n+1$ types in total. The expected number of levels to collect each of the remaining $n-k+1$ cards follows a similar pattern, but starting from $n+1$.

- The expected number of levels to collect the remaining $n-k+1$ cards is:
$$ E(n+1-k) = \frac{n+1}{n+1-k} + \frac{n+1}{n+1-k-1} + \cdots + \frac{n+1}{1} $$

- The total expected number of levels to collect all $n+1$ cards is the sum of these two parts:
$$ E_{total} = E(k) + E(n+1-k) $$

**b.**
The introduction of a new card type after collecting $k$ cards increases the total number of levels needed to complete the set. This is because the probability of getting a new card type decreases each time a new type is collected, and this decrease is more pronounced with the addition of the extra card type.

**- Before the New Card:** As you collect more cards, the probability of getting a new card decreases, making it progressively harder to find the remaining types.

**- After the New Card:** The addition of the new card type means there's one more type to collect, and the probabilities need to be recalculated, leading to an increased number of levels on average to complete the collection.

### **Justification and Proof of Correctness - Question 1**

**1. Collecting the First 'k' Cards**

- *Probability Approach:* Initially, there are 'n' different types of cards, and each card type is equally likely to be awarded after a level is completed. The probability of getting a new card type in the first level is 1 (as any card will be a new type). Once you have one card, the probability of getting a new card type in the next level is (n−1)/n, since there are n−1 card types that you don't have out of 'n' total types.

- *Expected Number of Levels:* The expected number of levels to get a new card type is the reciprocal of the probability of getting a new type in each level. So, after having ii different types of cards, the expected number of levels to get the i+1th different card is n/(n−i). This leads to the sum:
$$ E(k) = 1 + \frac{n}{n-1} + \frac{n}{n-2} + \cdots + \frac{n}{n-k+1} $$

**2. Collecting the Last 'n-k+1' Cards**

- *Adapting to the New Situation:* After collecting 'k' different types of cards, the problem scenario changes with the introduction of an additional card type, increasing the total types to 'n+1'. The same principle applies, but now starting with 'k' cards already collected and 'n+1' total types.

- *Expected Number of Levels with the New Card Type:* The expected number of levels to collect the remaining 'n-k+1' cards, considering the increased card types, is calculated similarly to the first part but starting from 'n+1':
$$ E(n+1-k) = \frac{n+1}{n+1-k} + \frac{n+1}{n+1-k-1} + \cdots + \frac{n+1}{1} $$

**Total Expected Number of Levels**

Summation of Both Parts: The total expected number of levels, EtotalEtotal​, is the sum of the expected levels to collect the first 'k' cards and the expected levels to collect the remaining 'n-k+1' cards after the introduction of the new card type:
$$ E_{total} = E(k) + E(n+1-k) $$

The solution correctly accounts for the change in probability as cards are collected and the subsequent alteration in the total number of card types. The Coupon Collector's Problem is a widely recognized and studied problem in probability theory, and this solution adapts its principles to the specific conditions of the given problem.



### **Question 2 (12 points)**

Consider a simplified online lottery system where there are $n$ participants, and each participant holds a unique ticket number, which is a positive natural number. The system randomly selects a participant, one at a time, to announce their ticket number. The system maintains a variable $t*$ that records the lowest ticket number announced so far. (Initially, $t*$ is set to infinity.) 

After half of the participants have announced their ticket numbers, the system introduces an additional participant with a new ticket number, again unique and a positive natural number, into the lottery. This participant is then included in the remaining random selection process.

**a.** Calculate the expected number of times $t*$ is updated before the additional participant is introduced. Assume there are initially $n$ participants. ( 5 points)  
**b.** Determine the expected number of times $t*$ is updated after the additional participant is introduced, taking into account the total number of participants is now $n+1$. (5 points)  
**c.** Analyze how the introduction of the additional participant after half of the original participants have announced their numbers affects the expected number of updates to $t*$. (2 points)  


### **Solution - Question 2**

**Part a:**

Each time a participant announces their number, $t*$ is updated only if the announced number is lower than all previously announced numbers.

1. **First Participant:** $t*$ is guaranteed to be updated because it's initially set to infinity. So, the first participant always results in an update.

2. **Subsequent Participants:** For the second participant, there's a 1/2 chance that their number is lower than the first (since both numbers are equally likely to be the lowest). For the third participant, there's a 1/3 chance, and so forth.

3. **Expected Updates:** The expected number of updates is the sum of the probabilities of each participant being the one with the lowest number so far. This is calculated as:

$$ E_{before} = 1 + \frac{1}{2} + \frac{1}{3} + \cdots + \frac{1}{n/2} $$

**Part b**

Consider the phase after the additional participant is introduced, making the total 'n+1'.

1. **Additional Participant:** This participant has a 1/(n/2 + 1) chance of having a lower number than all previous numbers (including the first half participants).

2. **Remaining Participants:** For each of the remaining n/2 original participants, the probability of having the lowest number is recalculated with the total number of participants being 'n+1'.

3. **Expected Updates:** The expected number of updates for the second half, including the new participant, is calculated as:

$$E_{after} = \frac{1}{n/2 + 1} + \frac{1}{n/2 + 2} + \cdots + \frac{1}{n+1}$$

**Part c**

The introduction of the additional participant after half of the original participants have announced their numbers affects the expected number of updates in the following ways:

1. **Increased Probability Space:** The addition of a new participant increases the total number of participants, thus changing the probability space. This generally decreases the chance of each subsequent participant having the lowest number, as there are now more numbers to compete with.

2. **Comparative Analysis:** The expected number of updates in the second half of the process is generally lower per participant compared to the first half, due to the increased number of participants.

3. **Overall Impact:** The overall expected number of updates, $E_{total}$, which is the sum of $E_{before}$ and $E_{after}$, slightly increases due to the introduction of the additional participant. This is because there's an additional chance for $t*$ to be updated with the new participant's number, albeit a small one.

### **Justification and Proof of Correctness - Question 2**

**Part a**

1. **Probability of Update:** When each participant announces their number, $t*$ is updated only if this number is lower than all previously announced numbers. For the first participant, this probability is 1 (100%) since $t*$ is set to infinity initially. For the second participant, the probability of having the lowest number is 1/2, as there are two numbers, and each is equally likely to be the lowest. This pattern continues such that the $i$th participant has a 1/$i$ chance of having the lowest number.

2. **Summation of Probabilities:** The expected number of updates is the sum of these individual probabilities. Up to the half point (n/2 participants), the expected number is:
   $$ E_{before} = 1 + \frac{1}{2} + \frac{1}{3} + \cdots + \frac{1}{n/2} $$

This is a direct application of the principle that the expected value can be calculated as the sum of individual probabilities in discrete scenarios.

**Part b**

1. **Revised Probability Space:** With the introduction of an additional participant after half the original participants have announced their numbers, the total participant count becomes $n+1$. This changes the probability space for subsequent updates.

2. **Probability for Remaining Participants:** For the newly introduced participant, the probability of updating $t*$ is 1/(n/2 + 1). For each subsequent original participant, the probability of updating $t*$ is recalculated based on the new total, leading to the summation:
   $$ E_{after} = \frac{1}{n/2 + 1} + \frac{1}{n/2 + 2} + \cdots + \frac{1}{n+1} $$

**Part c**

1. **Increased Competition:** The introduction of the additional participant increases the number of ticket numbers, thereby reducing the probability of each subsequent original participant having the lowest number.

2. **Overall Impact:** The total expected number of updates, $ E_{total} $, is affected by both the increase in the number of participants and the decreasing probability of each participant having the lowest number. This impact is a nuanced one, as it involves both an increased opportunity for $t*$ to be updated (due to the additional participant) and a decrease in the probability of each subsequent update.

### **Question 3 (15 points)**

Suppose you are working on a machine learning algorithm that involves clustering a very large set of real numbers into groups. To initialize the clustering, you need to select a set of initial centroids. You decide to use a randomized approach to select these centroids.

You sample a subset of the data uniformly at random (with replacement) and use this to estimate the initial centroids. To evaluate the reliability of your centroid estimates, you decide to apply a Chernoff-bound analysis. The data set has a known skewed distribution, specifically a log-normal distribution, which is different from a normal distribution and might affect the sampling and estimation process.

**a.** *Calculate the Confidence in Centroid Estimates:* Choose a confidence level (90%, 95%, or 99%) and use a Chernoff-bound to calculate your confidence in your approximate centroid estimates. (5 points)

**b.** *Impact of Skewed Distribution:* Discuss how the log-normal distribution of the data set might impact your sampling and estimation process compared to a normally distributed data set. (5 points)

**c.** *Sample Size Determination:* Determine an appropriate sample size for your estimation process, taking into account the skewed nature of the data distribution. (5 points)


### **Solution - Question 3**

**Part a: Calculate the Confidence in Centroid Estimates**

For a 95% confidence level using a Chernoff bound:

1. **Assumptions:** 
   Assume the log of our data follows a normal distribution with mean $\mu$ and variance $\sigma^2$, consistent with a log-normal distribution of the data.

2. **Sampling and Transformation:** 
   Sample $m$ values $X_1, X_2, \ldots, X_m$ from the log-normal distribution. Transform these to $Y_i = \ln(X_i)$, which are normally distributed with mean $\mu$ and variance $\sigma^2$.

3. **Mean of Transformed Data:** 
   The sample mean of the transformed data is $ \bar{Y} = \frac{1}{m} \sum_{i=1}^{m} Y_i $. This is normally distributed with mean $\mu$ and variance $\sigma^2/m$.

4. **Confidence Interval Calculation:** 
   For a 95% confidence level, the probability that $\bar{Y}$ is within 1.96 standard deviations of the mean $\mu$ is approximately 0.95. Hence, the 95% confidence interval for the true centroid in the log-normal scale is:
   $$ e^{\mu - 1.96\frac{\sigma}{\sqrt{m}}} \leq C \leq e^{\mu + 1.96\frac{\sigma}{\sqrt{m}}} $$

**Part b: Impact of Skewed Distribution**

1. **Skewness of Log-Normal Distribution:** 
   This distribution is skewed, with a longer tail on one side, leading to a higher likelihood of sampling extreme values.

2. **Impact on Centroid Estimation:** 
   The skewness can lead to over-representation of higher values in the centroid calculation if the sample size is not large enough to adequately represent the distribution's tail.

3. **Compared to Normal Distribution:** 
   A normal distribution, being symmetric, makes centroid estimation more robust to sample size, as extreme values have less impact on the mean.

**Part c: Sample Size Determination**

To determine an appropriate sample size for skewed data in a log-normal distribution, we aim for a desired level of precision in our centroid estimate.

1. **Standard Error (SE) Approximation:** 
   The standard error for a log-normal distribution is approximated as:
   $$ SE = \frac{s}{\sqrt{m}} $$
   where \( s \) is the sample standard deviation of the log-transformed data and \( m \) is the sample size.

2. **Adjusting for Skewness:** 
   Considering the skewness, we use a more conservative range for our confidence interval. For a 99% confidence interval:
   $$ \text{Range} = \pm 2.58 \times SE $$

3. **Solving for Sample Size \( m \):**
   Rearranging the formula to solve for \( m \), given a desired precision level, we get:
   $$ m = \left( \frac{2.58 \times s}{\text{desired precision}} \right)^2 $$





### **Justification and Proof of Correctness - Question 3**

**Part 1**

1. **Use of Chernoff Bound:**
   - **Mathematical Rigor:** The Chernoff bound is a well-established probabilistic tool used to bound the tail probabilities of sums of random variables. Its application here is mathematically rigorous and suitable for estimating confidence intervals.
   - **Adaptation to Log-normal Distribution:** By transforming the log-normal distribution to a normal distribution via a logarithmic transformation, the application of the Chernoff bound becomes valid. This transformation is a standard technique in statistics for dealing with log-normal distributions.

2. **Confidence Interval Calculation:**
   - **Standard Deviation Inclusion:** The use of 1.96 standard deviations for the 95% confidence interval is based on the properties of the normal distribution, where approximately 95% of the data lies within 1.96 standard deviations from the mean.
   - **Relevance to Centroid Estimation:** This approach directly addresses the problem's requirement to estimate the confidence in centroid values, ensuring that the solution is relevant and applicable to the given task.

**Part 2**

1. **Addressing Skewness:**
   - **Real-World Application:** Skewness in data is a common real-world challenge. Discussing its impact on the estimation process is directly relevant to practical machine learning tasks, such as clustering.
   - **Comparison to Normal Distribution:** By comparing the effects of skewness in a log-normal distribution to a normal distribution, the solution provides a deeper understanding of how data distribution affects statistical estimation, thereby enhancing its correctness and applicability.

2. **Influence on Sampling and Estimation:**
   - **Recognition of Extreme Values:** The acknowledgment of the potential over-representation of higher values in skewed distributions is crucial for understanding the accuracy and limitations of the centroid estimation in such contexts.

The solution's correctness is justified by its adherence to statistical principles, its direct relevance to the problem statement, and its practical applicability in a machine learning context.

**Part 3**

The approach for determining sample size in the presence of a log-normal distribution is grounded in standard statistical methods, particularly those applicable to skewed data.

1. **Standard Error for Log-normal Distribution:** 
   The use of standard error (SE) in the formula 
   $$ SE = \frac{s}{\sqrt{m}} $$
   is a well-established statistical concept. Here, $ s $ represents the sample standard deviation, a measure of dispersion in the log-transformed data. The formula for SE accurately captures the decrease in variability of the sample mean as the sample size $ m $ increases.

2. **Conservative Range for Skewness:** 
   Adjusting the confidence interval range to ±2.58 times the SE is a conservative approach to account for the skewness inherent in the log-normal distribution. This adjustment ensures that the range is more likely to capture the true mean of the skewed data, as it widens the interval to include more data points from the longer tail of the distribution.

3. **Sample Size Calculation Formula:** 
   The rearranged formula 
   $$ m = \left( \frac{2.58 \times s}{\text{desired precision}} \right)^2 $$
   for sample size determination is mathematically sound. It directly relates the sample size to the desired precision level, allowing for a tailored approach based on specific clustering requirements. By squaring the ratio, the formula ensures that even small increases in desired precision can be achieved by a proportionally larger sample size, which is crucial for skewed distributions.


### **Question 4 (16 points)**

In a large online retail company, you are tasked with developing an algorithm for managing incoming orders and assigning them to different warehouses for processing. Let's assume that on a busy day, the company receives m (e.g., a million) orders, and these need to be processed by one of your $p$ warehouses.

Your Algorithm: You decide to implement an algorithm that assigns each order to a warehouse based on the proximity of the delivery address to the warehouse locations, but with a catch. Each order has a $20%$ chance of being randomly assigned to any warehouse, regardless of proximity, to balance the load.

**Questions:**

A. What is the expected number of orders per warehouse? Consider both the proximity-based assignment and the 20% random assignment in your calculation. (4 points)

B. What is the probability that a warehouse receives at least 1.5 times the expected number of orders? (A bound or approximation is acceptable).(4 points)

C. Given the random assignment aspect of the algorithm, what is the probability that a warehouse ends up with no orders? (Again, a bound or approximation is acceptable).(4 points)

D. How does the introduction of a 20% random assignment, compared to a purely proximity-based assignment, affect the variance in the number of orders per warehouse? Provide a brief explanation or proof of your reasoning. (4 points)

### **Solution - Question 4**

Let's assume that on a busy day, the company receives `m` orders, and these need to be processed by one of your `p` warehouses.

The algorithm assigns each order based on proximity of the delivery address to the warehouses, with a 20% chance of being randomly assigned to any warehouse, regardless of proximity.

**A.**

Given:
- `m` orders.
- `p` warehouses.
- 80% of orders are assigned based on proximity, and 20% are randomly assigned.

*Calculation*
The expected number of orders per warehouse is calculated as follows:

$$
\text{Expected Orders per Warehouse} = \frac{80\% \times m}{p} + \frac{20\% \times m}{p} = \frac{m}{p}
$$

**B.**

We want to find the probability that a warehouse receives at least 1.5 times the expected number of orders. This can be approached by using a statistical bound such as Chebyshev's inequality.

*Approach*
Chebyshev's inequality states that for any distribution:

$$
P(|X - \mu| \geq k\sigma) \leq \frac{1}{k^2}
$$

Where:
- $ \mu $ is the mean.
- $ \sigma $ is the standard deviation.
- $ k $ is the number of standard deviations from the mean.

For our case:
- $ \mu = \frac{m}{p} $
- $ \sigma $ is unknown, but we can qualitatively assess its impact.

*Approximation*
Assuming $ k $ such that $ 1.5 \times \frac{m}{p} = \mu + k\sigma $, we find:

$$
k = \frac{1.5 \times \frac{m}{p} - \frac{m}{p}}{\sigma}
$$

Without $ \sigma $, we cannot compute $ k $ or the bound. However, the probability is inversely proportional to the variance in order distribution.

**C.**

Given the 20% random assignment, we calculate the probability of a warehouse receiving zero orders.

*Calculation*
The probability that none of the randomly assigned orders (20% of `m`) go to a specific warehouse is:

$$
P(\text{Zero Orders}) = (1 - \frac{1}{p})^{20\% \times m}
$$

**D.**

The introduction of a 20% random assignment increases the variance in the number of orders per warehouse compared to a purely proximity-based assignment. This is because the random element introduces uncertainty, which affects the load distribution among the warehouses.

*Analysis*
- **Purely Proximity-Based**: Lower variance, as orders are evenly distributed based on location.
- **With 20% Random Assignment**: Higher variance, due to the unpredictability of random assignments.



### **Justification and Proof of Correctness - Question 4**

**A.** 

**Proof**:
- The algorithm for assigning orders involves two components: 80% based on proximity and 20% random assignment.
- For the proximity-based assignment, if we assume an even distribution of orders and warehouses, each warehouse will receive $$\frac{80\% \times m}{p}$$ orders on average.
- For the random assignment, each order has an equal probability of being assigned to any warehouse. Therefore, each warehouse will receive $$\frac{20\% \times m}{p}$$ orders on average from this portion.
- Adding these two parts gives the total expected orders per warehouse: $$\frac{80\% \times m}{p} + \frac{20\% \times m}{p} = \frac{m}{p}$$ .

**Justification**:
- This calculation is accurate under the assumption of uniform distribution of orders and equal probability in random assignment. It uses basic principles of expected value in probability.

**B.**

**Proof**:
- Without a specific distribution model, we resort to using a statistical bound for approximation.
- Chebyshev's inequality is a non-parametric method that doesn’t assume a specific distribution and is thus applicable here.
- The inequality provides an upper bound for the probability of a random variable deviating from its mean by a certain number of standard deviations.
- However, since we lack the variance or standard deviation of the order distribution, we can only qualitatively estimate the probability.

**Justification**:
- This part of the solution is more of an approximation than an exact calculation due to the lack of detailed distribution data.
- Chebyshev's inequality is a widely accepted method for estimating probabilities in cases with limited distributional information.

**C.**

**Proof**:
- The probability of a warehouse receiving no orders is based on the chance that none of the 20% randomly assigned orders are allocated to it.
- The calculation $$(1 - \frac{1}{p})^{20\% \times m}$$  follows from the principles of probability, where each order has a $\frac{1}{p}$  chance of being assigned to a particular warehouse.

**Justification**:
- This approach is justified as it uses basic probability rules. It assumes independent assignment of each order, which is a reasonable assumption given the problem statement.

**D.**

**Proof**:
- The analysis is qualitative, based on the general understanding that introducing randomness (20% random assignment) increases variability compared to a deterministic system (purely proximity-based).
- Variance in a system is increased by factors that are unpredictable or non-uniform, which in this case is the random assignment of orders.

**Justification**:
- This analysis is based on fundamental principles of statistics and probability, particularly the concept that randomness increases variance.


### **Question 5 (15 points)**

Imagine a data center with $s$ storage units and $f$ files to be stored (where $s$ < $f$). Your storage allocation algorithm selects a storage unit at random for each file, with each unit equally likely to be chosen. (Assume $f$ is significantly larger than $s$ and $l$.)

Each storage unit has a limit, say $l$, on the number of files it can hold. Once a storage unit reaches its limit, it is no longer available for allocation, and the remaining files are distributed among the available units.

**Questions:**

A. What is the expected number of files in each storage unit before any unit reaches its limit? (2 points)

B. What is the maximum possible difference in the number of files between any two storage units? Provide a clear explanation. (3 points)

C. Given that the average load per storage unit is a, what is the probability that a particular storage unit ends up with twice the average load, i.e., $2a$ files? ( Give the answer using a bound or approximation.) (5 points)

D. What is the likelihood that a storage unit ends up with only half the average load, i.e., $0.5a$ files? ( Give the answer using a bound or approximation.) (5 points)

### **Solution - Question 5**

Given:
- `s` storage units.
- `f` files to be stored (`f` > `s`).
- Each unit can hold up to `l` files.
- Files are allocated randomly to storage units until a unit reaches its limit.

**A.**

- The expected number of files per unit, before any unit reaches its limit, is the average number of files distributed among all units.
- Calculated as $ \frac{f}{s} $ files per unit.

**B.**

- The maximum possible difference occurs when one unit reaches its limit ($l$ files) while another has the least number of files, potentially zero.
- Therefore, the maximum difference is $l$ files.

**C.**

- Assuming the allocation of files follows a binomial distribution, provided $2a \leq l$ .

- The probability of a specific unit receiving exactly `2a` files is given by the binomial distribution:
  
 $$ P(X = 2a) = \binom{f}{2a} \left(\frac{1}{s}\right)^{2a} \left(1 - \frac{1}{s}\right)^{f - 2a} $$

- Where `a` is the average load per unit $ \frac{f}{s} $.

**D. Probability of Underloaded Unit**

1. **Case 1: $ \frac{f}{2s} $ is not a Whole Number or $ \frac{f}{2s} > l $**:
   - It's infeasible for a unit to have half the average load.
2. **Case 2: $ \frac{f}{2s} $ is a Whole Number and $ \frac{f}{2s} ≤ l $**:
   - Probability of one specific unit having half the average load: 
     $ \frac{\binom{f}{\frac{f}{2s}} \times \text{[Ways to distribute remaining files]}}{s^f} $


### **Justification and Proof of Correctness - Question 5**

**A.**

**Proof**:
- The expected number of files per unit is calculated as the average, assuming a uniform distribution of files across all units.
- This is a standard expectation calculation in probability, given by $ \frac{f}{s} $.

**Justification**:
- The calculation is valid under the assumption of a random and uniform distribution of files among the storage units.

**B.**

**Proof**:
- The maximum difference in load occurs under the most extreme allocation scenario: one unit reaches its maximum capacity (`l`) while another receives the least number of files, potentially zero.
- This results in a maximum difference of `l` files.

**Justification**:
- This is a straightforward extremal case analysis. It's correct provided the random allocation model and the maximum capacity limit for each unit.

**C.**

**Proof**:
- Assuming a binomial distribution for file allocation to each unit, the probability of a unit receiving exactly `2a` files is calculated using the binomial distribution formula.
- The formula $ P(X = 2a) = \binom{f}{2a} \left(\frac{1}{s}\right)^{2a} \left(1 - \frac{1}{s}\right)^{f - 2a} $ gives this probability.

**Justification**:
- The binomial model is appropriate here, assuming the independence of file allocation events and the fact that each file has an equal chance of being allocated to any unit.
- This approach is valid if `2a` is within the limit `l` and if `f` is large enough for the binomial approximation to be accurate.

**D.**

**Proof**:
- The calculation for the probability of a unit having half the average load is complex and depends on the binomial distribution.
- In the feasible case where $ \frac{f}{2s} $ is a whole number and within the limit `l`, the probability is derived from combinatorics and the binomial distribution.

**Justification**:
- This part of the solution is more of an approximation due to the complexity of the distribution of remaining files among the other units.
- The calculation is valid under the assumptions of random allocation and the independence of file storage events.


### **Question 6 (10 points)**


Consider an algorithm used for finding the Least Common Multiple (LCM) of two numbers, the smallest number that is a multiple of both numbers. This algorithm uses the GCD (Greatest Common Divisor) as an intermediate step, leveraging the fact that $$ \text{LCM}(a, b) = \frac{a \times b}{\text{GCD}(a, b)} $$

**Pseudocode**

```plaintext

int gcd(int a, int b) {
    if (b == 0)
        return a;
    else
        return gcd(b, a % b);
}

int lcm_algorithm(int m, int n) {
    return (m * n) / gcd(m, n);
}
```
**A. Recurrence Relation for the gcd Function** (2 points)

Write a recurrence relation for the gcd function used in the lcm_algorithm. Provide a recurrence relation of the form $ T(n)=T(n/b)+c$, explaining your choice of $b$ and $c$.

**B. Runtime Expression for the gcd Function** (4 points)

Give an expression for the runtime $T(n)$ of the gcd function if your recurrence can be solved with the Master Theorem. Consider the scenario where the Master Theorem is applicable, and provide an expression for the runtime.  

**C. Optimization Analysis** (4 points)

Suppose the gcd function is modified to include a check that immediately returns the smaller number if one number is a multiple of the other *(i.e., if a % b == 0 or b % a == 0)*. Write a new recurrence relation for this optimized gcd function and explain how this optimization might affect the algorithm's efficiency.


### **Solution - Question 6**

**A. Recurrence Relation for the gcd Function**

In the $gcd$ function, each recursive call approximately halves the problem size because the parameter a % b  is at most half of $b$. Therefore, the recurrence relation for the $gcd$ function can be expressed as:

$T(n) = T(n/2) + c$

Here, $n$ represents the larger of the two inputs to $gcd$, and $c$ is a constant representing the time taken for the modulo operation and the recursive call.

**B. Runtime Expression for the gcd Function**

Using the Master Theorem to solve the recurrence relation $ T(n) = T(n/2) + c $, we find that this falls under Case 2 of the theorem. Here, $a = 1$, $b = 2$, and $k = 0$. According to the Master Theorem, the time complexity is:

$T(n) = \Theta(\log n)$

Thus, the runtime of the $gcd$ function is $ \Theta(\log n) $.

**C. Optimization Analysis**

If the $gcd$ function is optimized to immediately return the smaller number if one number is a multiple of the other, the best-case time complexity improves to $ O(1) $. However, the average and worst-case scenarios remain unchanged. The recurrence relation for these cases would still be:

$T(n) = T(n/2) + c$

This optimization improves the best-case efficiency of the algorithm but does not affect the average or worst-case efficiency, which remains $ \Theta(\log n) $, as derived in part B.


### **Justification and Proof of Correctness - Question 6**

**A.**

 *Proof:*
- The $gcd$ function uses the Euclidean algorithm, where each recursive call effectively reduces the second parameter to at most half of its previous value. This is due to the property of the modulo operation.
- Therefore, the recurrence relation is based on the observation that the size of the problem (represented by the second parameter) halves with each recursive call.

*Justification:*
- This recurrence relation accurately captures the behavior of the Euclidean algorithm. The key insight is the effect of the modulo operation, which ensures that the problem size reduces at least by half in each step.

**B.**

 *Proof:*
- Applying the Master Theorem to the recurrence relation $ T(n) = T(n/2) + c $ categorizes it under Case 2, where the work done at each step is constant (represented by $c$), and the size of the problem halves.
- The Master Theorem indicates that for such cases, the runtime is $ \Theta(\log n) $.

 *Justification:*
- The Master Theorem is a well-established tool for analyzing the time complexity of divide-and-conquer algorithms. The conditions of the theorem align perfectly with the characteristics of the $gcd$ function, thus validating the derived time complexity.

**C.**

 *Proof:*
- Introducing a check for whether one number is a multiple of the other can lead to an immediate return in some cases, thus improving the best-case time complexity to constant time, $ O(1) $.
- However, in cases where this condition is not met, the algorithm still follows the standard Euclidean algorithm process, and thus the average and worst-case time complexity remains as previously calculated.

*Justification:*
- While this optimization enhances the best-case scenario, it does not affect the average and worst-case scenarios because the fundamental behavior of the algorithm remains unchanged in those cases. The initial enhancement only applies to specific cases and does not alter the overall computational complexity in the general scenario.


### **Question 7 (15 points)**

Consider a graph $G=(V,E)$ where each vertex can be either included or excluded from a vertex cover set $C$. A vertex cover is a set of vertices such that every edge in the graph is incident to at least one vertex in the set. The objective is to find a small (not necessarily minimum) vertex cover $C$ using a randomized algorithm. Assume for this problem that each vertex in the graph has exactly $k$ neighbors.

- Each vertex $V_i$​ in the graph independently picks a random value $y_i$​; it sets $y_i$​ to 1 with probability $q=0.6$ and sets $y_i$​ to 0 with probability $1−q=0.4$.
- A vertex decides to enter the set $C$ if and only if it chooses the value 1 or if at least one of the $k$ vertices it is connected to chooses the value 0.

**Tasks:**  
**A.** *Expected Size of Vertex Cover:* Give a formula for the expected size of the vertex cover set C when q is set to 0.6. ( 5 points)  
**B.** *Probability Analysis:* Calculate the probability that a given edge in the graph is covered by this algorithm.( 5 points)  
**C.** *Comparison with Optimal Solution:* Discuss how the size of the vertex cover found by this randomized algorithm might compare to the size of an optimal vertex cover. ( 5 points)  

### **Solution - Question 7**

**A.**

Given that each vertex $ V_i $ picks 1 with probability $ q = 0.6 $ and 0 with probability 0.4, and a vertex is included in the vertex cover $ C $ if it picks 1 or if at least one of its $ k $ neighbors picks 0, the expected size of the vertex cover can be calculated as follows:

- The probability that a vertex $ V_i $ is **not** included in the vertex cover is the probability that it picks 0 and all its $ k $ neighbors pick 1, which is $ 0.4 \times (0.6)^k $.
- Therefore, the probability of $ V_i $ being in the vertex cover is $ 1 - 0.4 \times (0.6)^k $.

For $ n $ vertices in the graph, the expected size of the vertex cover is:

$$ \text{Expected Size of } C = n \times \left( 1 - 0.4 \times (0.6)^k \right) $$

**B.**

The probability that an edge is covered by this algorithm can be calculated as follows:

- The probability that an edge is **not** covered is the probability that both its vertices are not in the vertex cover, which is $ \left( 0.4 \times (0.6)^k \right)^2 $.
- Thus, the probability that an edge is covered is $ 1 - \left( 0.4 \times (0.6)^k \right)^2 $.

**C.**

- An optimal algorithm would minimize the vertex cover size but could be computationally intensive, especially for large graphs.
- The randomized approach offers simplicity and speed, making it suitable for large graphs where exact solutions are not feasible.



### **Justification and Proof of Correctness - Question 7**

**A. Expected Size of Vertex Cover**

- The algorithm includes a vertex in the cover if it picks 1 (with probability 0.6) or if at least one of its $ k $ neighbors picks 0.
- The probability that a vertex is **not** included is when it picks 0 (with probability 0.4) and all its $ k $ neighbors pick 1, which is $ 0.4 \times (0.6)^k $.
- Thus, the probability of a vertex being in the vertex cover is the complement: $ 1 - 0.4 \times (0.6)^k $.
- For $ n $ vertices, the expected size of the vertex cover is the sum of these individual probabilities.

- The calculation accurately reflects the independent probabilistic decision made by each vertex based on the given algorithm.
- It adheres to the principles of probability theory, specifically the concept of the complement of an event.

**B. Probability Analysis for Edge Coverage**

- An edge is covered if at least one of its two vertices is in the vertex cover.
- The probability that a vertex is not in the cover is $ 0.4 \times (0.6)^k $, as established in part A.
- The probability that both vertices of an edge are not in the cover is the square of this probability.
- Therefore, the probability that at least one vertex of an edge is in the cover is the complement: $ 1 - \left( 0.4 \times (0.6)^k \right)^2 $.

- This analysis correctly uses the concept of independent events (the decisions of the two vertices) in probability.
- The calculation is appropriate for determining the likelihood of an edge being covered by the vertex cover.

**C. Comparison with Optimal Solution**

- The randomized algorithm provides a feasible solution quickly but does not guarantee the smallest possible vertex cover.
- In contrast, an optimal algorithm, while providing the smallest vertex cover, might be computationally infeasible for large graphs.
- The value of the randomized algorithm lies in its balance between efficiency and optimality, particularly suitable for large or complex graphs where an exact solution is less practical.



### **Question 8 (10 points)**

**LaserGrid Puzzle**

Consider a 2-D grid-based puzzle game called LaserGrid. The game has the following rules:  
a. The square grid initially contains several mirrors (placed in certain cells), a laser source, and a target.  
b. The laser emits a beam in a straight line. The beam can reflect off mirrors. Each mirror reflects the laser beam at a right angle.  
c. Players can rotate mirrors 90 degrees at a time.  
d. The goal is to adjust the mirrors so that the laser beam hits the target.  

Assume the grid size is $n×n$.

*Gameplay:*

- Each cell in the grid can either be empty, contain a mirror, contain the laser source, or contain the target.
- Mirrors have a fixed diagonal orientation (either "" or "/") and can be rotated to change their orientation.
- The laser source emits a beam in one of the four cardinal directions (up, down, left, right), and this direction is fixed.
- The puzzle is solved when the laser beam reaches the target after possibly reflecting off several mirrors.

If the puzzle is in NP, prove it. (10 points)

### **Solution - Question 8**

**Understanding NP (Nondeterministic Polynomial Time):**

- To reiterate, a problem is in the class NP if a solution to the problem can be verified in polynomial time, given a certificate (or a witness) for the solution.
- This class does not concern how long it takes to find the solution, but rather, how long it takes to verify it.

**Applying to LaserGrid:**

*Solution Verification (Certificate Checking):*
- In the context of the LaserGrid puzzle, a certificate would be a sequence of actions (mirror rotations) that allegedly leads to the laser hitting the target.
- The verification process involves applying these actions to the initial grid configuration and then simulating the path of the laser to see if it hits the target.
- This simulation checks at each step the direction of the laser and the state of the grid cell it enters (empty, mirror, laser source, or target).

**Polynomial-Time Verification:**  

*The verification process involves a finite number of steps*:  

- Apply each rotation in the given sequence to the corresponding mirror. This is O(m)O(m), where mm is the number of rotations in the sequence.
- Simulate the laser's path. In the worst case, the laser could travel through every cell in the grid, which is O(n2)O(n2) for an n×nn×n grid.
- Therefore, the total verification time is polynomial with respect to the grid size and the length of the action sequence.

**Correctness of Verification:**
- If the certificate is a correct solution, the laser will hit the target at the end of the simulation.
- If the certificate is incorrect, the laser will either miss the target or get into an infinite loop. In the case of an infinite loop, we can stop the simulation after n2n2 steps since any valid path must be shorter than this.


The LaserGrid puzzle is in NP because any given solution can be verified in polynomial time relative to the size of the puzzle and the length of the solution. The process of checking whether a provided sequence of mirror rotations leads to the laser hitting the target is efficient and straightforward.

### **Justification and Proof of Correctness - Question 8**

- *Alignment with NP Definition:* The solution's approach to verifying a given sequence of mirror rotations aligns with the NP class definition, which emphasizes solution verification in polynomial time.

- *Efficiency of Verification:* The solution demonstrates that checking whether a proposed sequence leads to the goal can be done efficiently, within polynomial time relative to the puzzle's size.

**Proof of Correctness:**

- *Polynomial-Time Verification:* The analysis shows that both applying the mirror rotations and simulating the laser's path can be done in polynomial time, which is the core criterion for a problem to be classified as NP.  
- *Adherence to Puzzle Rules:* The verification process respects the rules and constraints of the LaserGrid puzzle, ensuring that the solution's classification of the problem as NP is based on a correct understanding of the problem.

### **Question 9 (5 points)**

Consider the following payoff matrix for a two-player game:

|         | Player B: Action 1 | Player B: Action 2 |
|---------|-------------------|-------------------|
| **Player A: Action 1** | (3, 2)            | (1, 1)            |
| **Player A: Action 2** | (2, 3)            | (4, 4)            |

**Question**:

Does the payoff matrix below have any Nash equilibria? Why or why not?

---

*Note*: In the matrix, the entries like (3, 2) denote the payoffs for Players A and B, respectively. A Nash equilibrium occurs when neither player can benefit by unilaterally changing their strategy.


### **Solution - Question 9**

**Analyzing Player A's Strategies:**

*When Player B chooses Action 1:*
- Player A gets a payoff of 3 with Action 1 and 2 with Action 2. So, Player A prefers Action 1.
*When Player B chooses Action 2:*
- Player A gets a payoff of 1 with Action 1 and 4 with Action 2. So, Player A prefers Action 2.

**Analyzing Player B's Strategies:**

*When Player A chooses Action 1:*
- Player B gets a payoff of 2 with Action 1 and 1 with Action 2. So, Player B prefers Action 1.
*When Player A chooses Action 2:*
- Player B gets a payoff of 3 with Action 1 and 4 with Action 2. So, Player B prefers Action 2.

**Identifying Nash Equilibria:**

Nash equilibrium occurs when both players are playing their best response to the other player's strategy.
*In this matrix:*
- (Player A: Action 1, Player B: Action 1) is not a Nash equilibrium because Player A has a better response (Action 2) given Player B's choice of Action 1.
- (Player A: Action 1, Player B: Action 2) is not a Nash equilibrium because both players have better responses (Player A: Action 2, Player B: Action 1).
- (Player A: Action 2, Player B: Action 1) is not a Nash equilibrium because both players have better responses (Player A: Action 1, Player B: Action 2).
- (Player A: Action 2, Player B: Action 2) is a Nash equilibrium because neither player can improve their payoff by unilaterally changing their strategy.

The payoff matrix has one Nash equilibrium at (Player A: Action 2, Player B: Action 2), where both players receive a payoff of 4 and neither can benefit by changing their strategy independently.

### **Justification and Proof of Correctness - Question 9**

**Definition of Nash Equilibrium:**
A Nash equilibrium in a game occurs when each player's strategy is an optimal response to the other player's strategy. In other words, no player can gain by unilaterally changing their strategy.

**Application to the Given Payoff Matrix:**
- For each cell in the payoff matrix, we evaluated whether each player could improve their payoff by changing their strategy, assuming the other player's strategy remains fixed.
- The equilibrium at (Player A: Action 2, Player B: Action 2) was identified by observing that neither player can increase their payoff by switching from this strategy.
- Player A's payoff is maximized at 4 when Player B chooses Action 2, and similarly, Player B's payoff is maximized at 4 when Player A chooses Action 2.
- Any deviation from this strategy pair by either player leads to a reduced payoff for the deviating player.

**Ensuring All Possibilities Are Considered:**
- The analysis methodically checked each possible combination of strategies to ensure no potential Nash equilibrium was overlooked.
- This thorough approach confirms the correctness of identifying the Nash equilibrium.

The solution correctly identifies (Player A: Action 2, Player B: Action 2) as the Nash equilibrium in the provided payoff matrix. This conclusion is based on a systematic analysis of each player's best responses and the definition of Nash equilibrium.

### **Question 10 (5 points)**

In a game involving flipping a biased coin, where the probability of getting heads is pp and tails is $1−p$, you keep flipping the coin until you get the first heads. Let $X$ be the number of independent flips required to get the first heads. What is the expected value of $X$? Express the expectation as a function of $p$.

### **Answer - Question 10**

Consider a biased coin where the probability of getting heads is $ p $ and tails is $ 1-p $. We are interested in finding the expected number of independent flips $ X $ required to get the first heads. 

This is a geometric distribution scenario, where $ X $ is the number of trials needed for the first success (heads).

1. **Geometric Distribution**:
   - The probability of getting the first success (heads) on the $ k $-th trial is $ P(X = k) = (1-p)^{k-1} \times p $, where $ k $ is a positive integer.

2. **Expected Value Calculation**:
   - The expected value of $ X $ for a geometric distribution is given by:
     $$ E(X) = \frac{1}{p} $$

**Proof:**

- The expectation of a geometrically distributed random variable $ X $ can be calculated as:
  $$ E(X) = \sum_{k=1}^{\infty} k \times P(X = k) $$
  $$ = \sum_{k=1}^{\infty} k \times (1-p)^{k-1} \times p $$
- Using the sum of an infinite geometric series, this simplifies to $ \frac{1}{p} $.


The expected number of coin flips $ X $ until the first heads, given a probability of heads $ p $, is $ \frac{1}{p} $. This result assumes independent flips with a constant probability of heads for each flip.


### **Justification and Proof of Correctness - Question 10**

*Geometric Distribution Applicability:*
- The scenario where we are flipping a coin repeatedly until we achieve the first success (heads) is a textbook example of a geometric distribution. The geometric distribution is defined as the probability distribution of the number $X$ of Bernoulli trials needed to get the first success.

*Formula for Expected Value in Geometric Distribution:*
- The expected value $E(X)$ of a geometrically distributed random variable is well-known and is given by the formula $E(X)=1/p$​, where pp is the probability of success (heads in this case) on each trial. This formula is derived from the sum of an infinite geometric series, a fundamental concept in probability theory.

*Correctness of Applying the Formula:*
- The problem fits the criteria for using the geometric distribution: each coin flip is an independent Bernoulli trial with a constant probability of success $p$. Therefore, the application of the geometric distribution's expected value formula is valid and appropriate.

*Relevance to the Question:*
- The question asks for the expected number of flips until the first heads, which is precisely what the expected value of a geometric distribution represents. Therefore, using $E(X)=1/p$​ directly answers the question.