We want to calculate the joint entropy of infinitely finely spaced measurements of a 1-D random process, where the domain is bounded and the response is binned at resolution $1/\Delta r$. Specifically, given 

$$r(\tau) = [1 - e^{-1/\tau}]\sum_{m=0}^{\infty}e^{-m/\tau}s_m$$

where $\{s_m\}$ is an infinite binary string sampled from some distribution (e.g. i.i.d. Bernoulli variables), let $\mathbf{r} = [r_0, ..., r_L]$ and $a(\tau) = \exp(-1/\tau)$, $a_k = k/L$ with 

$$r_k \equiv r(\tau_k) = r(-1/\log[a_k]) = r(-1/\log[k/L])$$

i.e.

$$r_k = [1 - k/L]\sum_{m=0}^{\infty}(k/L)^ms_m$$

so that $k/L = 0$ maps to $\tau = 0$ and $k/L = 1$ maps to $\tau = \infty$.

We want to compute

$$H[r(\tau)] \equiv \lim_{L \rightarrow \infty} H[r_0, ..., r_L]$$

First note that $H[r(\tau)]$ is finite. Roughly speaking, since $r$ is a smooth function of $a$: $$r(\tau) = [1 - a]\sum_{m=0}^\infty a^ms_m$$, eventually taking finer and finer measurements will yield no new information, since each measurement at $a_k$ will be almost identical to the measurement at $a_{k+1}$, and we need only consider $0 \leq a \leq 1$, so we are measuring $r$ over a bounded domain. 

Although $H[r(\tau)]$ may be analytically solvable for i.i.d. $s_m$, we would like a way to estimate it from a collection of samples of $r(\tau)$, so we can compare the i.i.d. case to when $\{s_m\}$ has higher order statistics. This is tricky, however, since $\mathbf{r}$ is high-dimensional and for most reasonable resolutions $1/\Delta r$, most of the $(1/\Delta r)^{L+1}$ bins will contain at most one sample.

Note: we can estimate entropy empirically via monte carlo methods if we can compute $-\log p$ of each sample, since then we can estimate $$\hat{H}[r_0, ..., r_L] = \frac{1}{N} \sum_{\mathbf{r}_{sample}} -\log p(\mathbf{r}_{sample})$$

Possible directions for solving this:

1. Start with very large bin size and compute entropy from histogram + finite sampling corrections, decrease slowly, and extrapolate dependence of $H$ on $\Delta r$ beyond limitations from finite sampling.
2. Take advantage of the structure of the full support of $P(\mathbf{r})$ that allows to define a similarity among the different samples of $r(\tau)$. Intuitively, samples with a lot of other samples nearby should correspond to regions of high probability whereas samples with few neighbors should correspond to low probability.
3. Maybe we can't solve it directly, but maybe we can compare entropies given two different distributions over $\{s_m\}$, which is our ultimate goal anyway.
4. Directly compute entropies from histograms for small numbers of measurements, slowly increase number of measurements and see whether e.g. entropy of one set of samples stays higher than other set of samples...