# $\Delta H(p)$ calculations

In this notebook we compare the relative populations of gaps $w_{g,1}(\lambda)$ to samples of the populations
of gaps across the intervals of survival 
$$\Delta H(p_k) = [p_k^2, p_{k+1}^2]$$
We compare these actual counts to estimates made from the $w_{g,1}(\lambda)$.

For the models $w_{g,1}(p_k^\#)=w_{g,1}(\lambda)$ of relative populations, we use $p_0=37$.
The parameter
$$\lambda(p_k) = \prod_{p=41}^{p_k} \frac{p-3}{p-2} $$
So we create an array of values of $\lambda$ associated with the primes covered by the array <i>smallprimes[]</i>
for $p_0 = 37$.  The minimum value for $\lambda$ under this range is
$$ \lambda(p_k=1023101273) = 0.17990711 $$
There is an offset of 10 between the array of small primes and the array of the corresponding $\lambda$

This notebook prepares the data for comparing the actual populations of gaps between primes with the estimated populations
from the $w_{g,1}(\lambda)$.

This code includes 
* a function for identifying the primes and prime gaps in long intervals among large primes.
* code for summarizing the populations of these gaps by interval of survival.



In [1]:
%reset -f

import numpy as np
import array
import pickle

import gc
import psutil
import sys

import itertools


## Loading the array of primes and the array for $\lambda$
We will be comparing actual counts $N_\Delta$ of gaps in the intervals of survival
$$\Delta H(p_k) \; = \; [p_k^2,\; p_{k+1}^2],$$
across several sampled intervals of survival, to the estimated populations from the relative population models
$w_{g,1}(\lambda)$.

We load the small primes to track $p_k$, and to lookup the $\lambda$ corresponding to $p_k$ we load the array <i>lambdaE9[]</i>.
For our relative population models $w_{g,1}(\lambda)=w_{g,1}(p_k^\#)$ we start at $p_0=37$.  So $\lambda=1$ for $p_0=37$
and then
$$ \lambda(p_k) \; = \; \prod_{41}^{p_k} \frac{p-3}{p-2}$$

In [2]:
# we use primesE9 as the range for pk
# The interval of survival Delta-H(pk) goes from (pk)^2 to (p{k+1})^2
smallprimes = np.load('primesE9.npy')


In [3]:
# block to check the available system memory
gc.collect()
memory = psutil.virtual_memory()
available_memory = memory.available
del memory
print(f"Available memory: {available_memory / (1024 ** 2):.2f} MB")

Available memory: 2754.77 MB


In [4]:
lensmallprimes = len(smallprimes)
maxsmallprime = smallprimes[-1]
print(f"Length primes {lensmallprimes}  maxp {maxsmallprime}={maxsmallprime:.4e} maxhorizon {maxsmallprime**2} or {(maxsmallprime**2):.4e}")
# These values= primes 51961553  maxp 1023094327 maxhorizon 1046722001939582929

Length primes 51961884  maxp 1023101273=1.0231e+09 maxhorizon 1046736214814220529 or 1.0467e+18


In [5]:
# displaying the offset between smallprimes[] and lambdaE9[]
smallprimes[10:15]

array([37, 41, 43, 47, 53])

In [6]:
try:
    lambdaE9 = np.load('lambdaE9.npy')
except FileNotFoundError:
    i=10 # offset for p=37
    j=0
    lambdaE9 = np.zeros(lensmallprimes-10)
    lambdaE9[0] = 1
    while (i < (lensmallprimes-1)):
        j += 1
        i += 1
        lambdaE9[j] = lambdaE9[j-1] * (smallprimes[i]-3)/(smallprimes[i]-2)
    np.save('lambdaE9.npy', lambdaE9)

In [7]:
print(f"lenp {lensmallprimes} maxp {smallprimes[-1]} lenlam {len(lambdaE9)} minlam {lambdaE9[-1]}")

lenp 51961884 maxp 1023101273 lenlam 51961874 minlam 0.17990704950441305


In [8]:
# finding the index for a prime near the max for p^2 near 66 trillion
# for indexing, lambda37[i] corresponds to smallprimes[i+12]
i=500000
while (smallprimes[i] < 8172000):
    i += 1
print(f"{i} p {smallprimes[i]} lambda {lambdaE9[i-12]:.4f}")
i = len(smallprimes)-1
print(f"last {i} p {smallprimes[i]} lambda {lambdaE9[i-12]:.4f}")
lambdaE9[-100:-1]

550553 p 8172013 lambda 0.2345
last 51961883 p 1023101273 lambda 0.1799


array([0.17990707, 0.17990707, 0.17990707, 0.17990707, 0.17990707,
       0.17990707, 0.17990707, 0.17990707, 0.17990707, 0.17990707,
       0.17990707, 0.17990706, 0.17990706, 0.17990706, 0.17990706,
       0.17990706, 0.17990706, 0.17990706, 0.17990706, 0.17990706,
       0.17990706, 0.17990706, 0.17990706, 0.17990706, 0.17990706,
       0.17990706, 0.17990706, 0.17990706, 0.17990706, 0.17990706,
       0.17990706, 0.17990706, 0.17990706, 0.17990706, 0.17990706,
       0.17990706, 0.17990706, 0.17990706, 0.17990706, 0.17990706,
       0.17990706, 0.17990706, 0.17990706, 0.17990706, 0.17990706,
       0.17990706, 0.17990706, 0.17990706, 0.17990706, 0.17990706,
       0.17990706, 0.17990706, 0.17990706, 0.17990706, 0.17990706,
       0.17990706, 0.17990706, 0.17990706, 0.17990706, 0.17990706,
       0.17990706, 0.17990706, 0.17990706, 0.17990706, 0.17990706,
       0.17990706, 0.17990706, 0.17990706, 0.17990705, 0.17990705,
       0.17990705, 0.17990705, 0.17990705, 0.17990705, 0.17990

In [8]:
# calculate MertensC, the ratio between parameter lambdaE9 and ln(pk)
MertensC0 = lambdaE9[550541]*np.log(smallprimes[550551])
MertensC1 = lambdaE9[-1]*np.log(smallprimes[-1])
MertensC0, MertensC1

(np.float64(3.7323131692898377), np.float64(3.7323704160415))

## Preparing for the comparison of sampled populations to estimates
The following code provides code for aggregating the populations of gaps from these samples, within intervals of survival $\Delta H(p)$

We save the data as a list for easy saving and loading.


In [9]:
# global variables describing the data for the figures
min_samp =3
max_samp = 101
len_samp = 1
num_DH = 0


In [10]:

# function for a light characterization of the data in a data-sample file of primes
def blockcheck(filename):
    global min_samp
    global max_samp
    global len_samp
    global num_DH

    testsample = np.load(filename)
    min_samp = testsample[0]
    max_samp = testsample[-1]
    len_samp = len(testsample)

    i = 0
    minp = smallprimes[i]
    while (minp**2 < min_samp):
        i += 1
        minp = smallprimes[i]
    iminp = i
    minp = smallprimes[iminp]  # minp has the smallest p^2 above min_samp

    maxp = smallprimes[i]
    while (maxp**2 < max_samp):
        i += 1
        maxp = smallprimes[i]
    imaxp1 = i-1
    maxp1 = smallprimes[imaxp1] # maxp1 has the largest p^2 below max_samp

    # pick up marginal cases, where sample is a fragment of a DH
    if (imaxp1 <= iminp):  # sample is a fragment of a single DH
        # FRAGMENT - 
        # for now, terminate here.  This case is outside of the design for the interactive display
        num_DH = 0
        print(f"FRAGMENT: data sample is an incomplete fragment of $\Delta H$({minp})")
        return
        
    imaxp = imaxp1-1   # the start of the last interval [p^2,q^2] that fits inside the sample
    maxp = smallprimes[imaxp]

    # iminp, minp, imaxp, maxp, imaxp1, maxp1 are all associated with the array smallprimes[]
    # minp: the start of the first DH(p) in the sample
    # maxp: the start of the last DH(p) in the sample
    # maxp1: the end of the last DH(p) in the sample
    print(f"sample of {len_samp} primes from {min_samp} to {max_samp}")
    print(f"contains DeltaH from smallp[{iminp}]={minp} to smallp[{imaxp}]={maxp}")

    # the indices iDH0, iDH1, and iDH2 are associated with the array testsample[]
    iDH0 = 0
    while (testsample[iDH0] < minp**2):
        iDH0 += 1   # iDH0 marks the first prime beyond minp^2
    iDH1 = iDH0
    if (maxp > minp):
        while (testsample[iDH1] < maxp**2):
            iDH1 += 1       # iDH1 marks the first prime inside the last DH(p)
        iDH1 -= 1 
    iDH2 = iDH1
    while (testsample[iDH2] < (smallprimes[imaxp+1]**2)):
        iDH2 += 1           # iDH2 marks the last prime inside the last DH(p)
    iDH2 -= 1

    print(f"{imaxp-iminp+1} DeltaH cover sample[{iDH0}]={testsample[iDH0]} to sample[{iDH1}]={testsample[iDH1]} ending at sample[{iDH2}]={testsample[iDH2]}")
    print(f" vs minp^2 {minp**2} and maxp^2 {maxp**2} and {(smallprimes[imaxp+1]**2)}")
    


In [11]:
# [31 May 2025] - these blocks contain approximately 50M primes each
# blockcheck('primeblockE18_1046twin.npy') # 1 DH around E18, at twin prime - this file contains over 130M primes
# blockcheck('primeblockE18_1046.npy') # fragment of one DH
# blockcheck('primeblockE17_200.npy')  # 2 DH
# blockcheck('primeblockE15_169.npy')  # one DH
# blockcheck('primeblockE14_568.npy')    # 7 DH
# blockcheck('primeblockE13_668.npy')    # 7 DH
# blockcheck('primeblockE12_742.npy')    # 18 DH
blockcheck('primeblockE11_640.npy')    #  DH
# blockcheck('primeblockE10_108.npy')    # 506 DH
# blockcheck('primeblockE09_900.npy')    # 881 DH
# blockcheck('primeblockE09_360.npy')    # 1021 DH

sample of 66214780 primes from 640001599543 to 641801599973
contains DeltaH from smallp[63950]=800011 to smallp[64043]=801103
94 DeltaH cover sample[588758]=640017600137 to sample[64906115]=641766016583 ending at sample[65141728]=641772425447
 vs minp^2 640017600121 and maxp^2 641766016609 and 641772425449


## Notes on samples of primes

| filename | num primes | $\min P$ | $\max P$ | $\# \Delta H$ | $\Delta H(p_0)$ | $\min P$ | $\Delta H(q_k)$ | $\max P$ |
| :--- | ---: | ---: | ---: | :---: | ---: | ---: | ---: | ---: |
| primeblockE09_360.npy | $55008596$ | $2400255059$ |$3600255031$ | $1021$ | $49003$ | $2401294057$ | $59981$ | $3599879993$ |
| primeblockE09_900.npy | $78902338$ | $7200254309$ |$9000254779$ | $881$ | $84857$ | $7200710471$ | $94847$ | $8996332747$ |
| primeblockE09_960.npy | $52355884$ | $8400254153$ |$9600254149$ | $557$ | $91673$ | $8403938941$ | $97967$ | $9598708711$ |
| primeblockE10_108.npy | $52072153$ | $9600254077$ | $10800254051$ | $506$ |$97987$ | $9601452193$|  $103913$ | $10799158553$ |
| primeblockE11_640.npy | $66214780$ | $6.400E11$ | $6.400E11$ | $94$ |$800011$ | $640017600137$|  $801103$ | $641772425447$ |
| primeblockE12_742.npy | $50612312$ | $7.421E12$ |$7.422E12$ | $18$ | $2724109$ | $7420769843911$ | $2724367$ | $7422230038127$ |
| primeblockE13_668.npy | $50267204$ | $6.679E13$ |$6.679E13$ | $7$ | $8172391$ | $66787974656917$ | $8172487$ | $66789543765101$ |
| primeblockE14_568.npy | $105954508$ | $5.677E14$ |$5.677E14$ | $7$ | $23826527$ | $567703388881771$ | $23826587$ | $567706819906781$ |
| primeblockE15_169.npy | $51337046$ | $1.694E15$ |$1.694E15$ | $1$ | $41161829$ | $1694296166625301$ | $41161829$ | $1694297813098769$ |
| primeblockE17_200.npy | $135549263$ | $2.00E17$ |$2.00E17$ | $2$ | $447216097$ | $200002237415913437$ | $447216101$ | $200002242782506531$ |
| primeblockE18_1046twin.npy | $130135262$ | $1.046E18$ | $1.046E18$ | $1$ |$1023094199$ | $1046721740027451601$|  $1023094201$ | $1046721744119828377$ |



## Accumulations over intervals $\Delta H(p)$
For comparison with the relative populations $w_{g,1}(p^\#)$, we accumulate the counts of gaps within the intervals of survival $\Delta H(p) = [p^2, q^2]$.

In [12]:
# now the processing function to create the DH(p) data over the data sample from file
# createDH returns a list of 3 items:
#  [0]: samp0 = the smallest prime inside the first interval of survival for the data sample
#  [1]: [iminp, minp] = data for the first smallprime[] associated with the intervals of survival
#  [2]: DelH[numintervals,numgaps] = the counts of gaps, indexed j=int(gap/2)-1, across the intervals DH(p)
def createDH(filename):
    global min_samp
    global max_samp
    global len_samp
    global num_DH

    testsample = np.load(filename)
    min_samp = testsample[0]
    max_samp = testsample[-1]
    len_samp = len(testsample)

    i = 0
    minp = smallprimes[i]
    while (minp**2 < min_samp):
        i += 1
        minp = smallprimes[i]
    iminp = i
    minp = smallprimes[iminp]  # minp has the smallest p^2 above min_samp

    maxp = smallprimes[i]
    while (maxp**2 < max_samp):
        i += 1
        maxp = smallprimes[i]
    imaxp1 = i-1
    maxp1 = smallprimes[imaxp1] # maxp1 has the largest p^2 below max_samp

    # check for degenerate cases:  a single DH or a fragment of a DH
    # XXXQHERE === XXXQHERE === [6/3] - pick up marginal cases
    if (imaxp1 <= iminp):  # sample is a fragment of a single DH
        # what to do what to do
        num_DH = 0
        # XXXXQHERE
        
    imaxp = imaxp1-1   # the start of the last interval [p^2,q^2] that fits inside the sample
    maxp = smallprimes[imaxp]

    # iminp, minp, imaxp, maxp, imaxp1, maxp1 are all associated with the array smallprimes[]
    # minp: the start of the first DH(p) in the sample
    # maxp: the start of the last DH(p) in the sample
    # maxp1: the end of the last DH(p) in the sample
    print(f"sample of {len_samp} primes from {min_samp} to {max_samp}")
    print(f"contains DeltaH from {iminp}:{minp} to {imaxp}:{maxp}")

    # the indices iDH0, iDH1, and iDH2 are associated with the array testsample[]
    iDH0 = 0
    while (testsample[iDH0] < minp**2):
        iDH0 += 1   # iDH0 marks the first prime beyond minp^2
    iDH1 = iDH0
    if (maxp > minp):
        while (testsample[iDH1] < maxp**2):
            iDH1 += 1       # iDH1 marks the first prime inside the last DH(p)
        iDH1 -= 1 
    iDH2 = iDH1
    while (testsample[iDH2] < (smallprimes[imaxp+1]**2)):
        iDH2 += 1           # iDH2 marks the last prime inside the last DH(p)
    iDH2 -= 1

    # We have indices for the array of smallprimes[] and the array of testsample[]
    # Record the starting sample-prime and create the array of gaps
    samp0 = testsample[iDH0]
    if (iDH0 > 0):
        gapsample = testsample[iDH0:(iDH2+1)] - testsample[(iDH0-1):iDH2]
    # indexing across the arrays is gap[i] = sampleprimes[i+iDH0]-sampleprimes[i-1+iDH0]
    len_gaps = len(gapsample)
    numintervals = imaxp1 - iminp

    # create the DelH 2-d array
    maxgap = np.max(gapsample)
    numgaps = int(maxgap/2)
    DelH = np.zeros((numintervals, numgaps), dtype=int)
    
    # accumulate counts across gapsample
    ipk=iminp+1  # index in smallprimes[] to mark the boundaries for the DH(p)
    iDH = 0      # index in DH[] for interval of survival
    jg = 0       # index in gapsample[] for counting gaps
    jp = iDH0    # index in testsample[] for tracking progress toward the boundary of DH(p)
    
    while (ipk <= imaxp1):  # outer loop - work across the smallprimes, each DH
        bnd = (smallprimes[ipk])**2  # q^2 value marking end of this DH(p)
        print(f"DH({ipk}) toward DH({imaxp}) and jp {jp}:{testsample[jp]} < bnd {bnd}", end='\r')
        
        while (testsample[jp] < bnd):  # for each DH, count the gaps
            j = int(gapsample[jg]/2)-1
            DelH[iDH][j] += 1
            jp += 1
            jg += 1
    
        ipk += 1
        iDH += 1

    DHlist = [samp0, [iminp, minp], DelH]
    return DHlist


In [13]:
DelHlist = createDH('primeblockE11_640.npy')

sample of 66214780 primes from 640001599543 to 641801599973
contains DeltaH from 63950:800011 to 64043:801103
DH(64044) toward DH(64043) and jp 64906116:641766016789 < bnd 641772425449

In [14]:
DelHlist[0]

np.int64(640017600137)

In [15]:
DelHlist[1]

[63950, np.int64(800011)]

In [16]:
DHsample = DelHlist[2]
DHsample.shape

(94, 204)

In [17]:
DHsample[:,0:6]

array([[ 51347,  51343,  94704,  43599,  56875,  76401],
       [ 68762,  68363, 126222,  57957,  75308, 102613],
       [ 11318,  11502,  21031,   9780,  12510,  17057],
       [ 57032,  57141, 105917,  48682,  63117,  84971],
       [ 17098,  17003,  31459,  14611,  19033,  25348],
       [ 17083,  17159,  31677,  14310,  19231,  25546],
       [ 68504,  68465, 126382,  58256,  75898, 101930],
       [ 11321,  11573,  21078,   9782,  12531,  16960],
       [  5841,   5645,  10391,   4654,   6342,   8478],
       [ 11438,  11437,  21114,   9670,  12590,  16876],
       [ 22754,  22968,  42158,  19228,  25129,  33964],
       [ 34356,  34399,  63072,  28921,  37896,  50836],
       [ 45935,  45582,  84303,  38827,  50603,  68026],
       [  5616,   5760,  10431,   4767,   6294,   8551],
       [ 28529,  28720,  52832,  24450,  31353,  42612],
       [108345, 108610, 199561,  92135, 119743, 161089],
       [ 11451,  11433,  21091,   9519,  12712,  16847],
       [ 22958,  22758,  42220,

## Saving and reading the tabled data
Sample code for saving the returned list to file and reading it to file.  We use pickle.dump() and pickle.load()

To keep the samples organized we use a naming convention 'DHExx_nnn' where xx is the exponent in scientific notation and nnn is the 
coefficient, three or more digits written without a decimal.

In [18]:
# File save - CAUTION - do not clobber existing data files
# 
with open('DHExx_nnn',"wb") as fp:
    pickle.dump(DelHlist, fp)
fp.close()

In [19]:
# check read from pickled file
with open('DHE11_640',"rb") as fp2:
    DHff = pickle.load(fp2)
fp2.close()

In [25]:
DHff[0], DHff[1][0], DHff[1][1]


(np.int64(640017600137), 63950, np.int64(800011))

In [21]:
DHff[2][:,0:6]

array([[ 51347,  51343,  94704,  43599,  56875,  76401],
       [ 68762,  68363, 126222,  57957,  75308, 102613],
       [ 11318,  11502,  21031,   9780,  12510,  17057],
       [ 57032,  57141, 105917,  48682,  63117,  84971],
       [ 17098,  17003,  31459,  14611,  19033,  25348],
       [ 17083,  17159,  31677,  14310,  19231,  25546],
       [ 68504,  68465, 126382,  58256,  75898, 101930],
       [ 11321,  11573,  21078,   9782,  12531,  16960],
       [  5841,   5645,  10391,   4654,   6342,   8478],
       [ 11438,  11437,  21114,   9670,  12590,  16876],
       [ 22754,  22968,  42158,  19228,  25129,  33964],
       [ 34356,  34399,  63072,  28921,  37896,  50836],
       [ 45935,  45582,  84303,  38827,  50603,  68026],
       [  5616,   5760,  10431,   4767,   6294,   8551],
       [ 28529,  28720,  52832,  24450,  31353,  42612],
       [108345, 108610, 199561,  92135, 119743, 161089],
       [ 11451,  11433,  21091,   9519,  12712,  16847],
       [ 22958,  22758,  42220,

In [27]:
DHff[2].shape, DHff[2].shape[0], DHff[2].shape[1]

((94, 204), 94, 204)

In [23]:
lambdaE9[(DHff[1][0])-10]

np.float64(0.27457054356560884)

### Lambda values at samples

| range label | iminp | minp | $\lambda(minp)$ | P0 in $\Delta H(minp)$ |
| :--- | ---: | ---: | :---: | ---: |
| E09_360 | $5034$ | $49003$ | $0.3455065$ | $2401294057$ |
| E09_900 | $8264$ | $84857$ | $0.3287857$ | $7200710471$ |
| E10_108 | $9416$ | $97987$ | $0.3246621$ | $9601452193$ |
| E11_640 | $63950$ | $800011$ | $0.2745705$ | $640017600137$ |
| E12_742 | $198285$ | $2724109$ | $0.2518756$ | $7420769843911$ |
| E14_568 | $1496935$ | $23826527$ | $0.2197262$ | $567703388881771$ |
| E15_169 | $2500003$ | $41161829$ | $0.2128746$ | $1694296166625301$ |
| E17_200 | $23713133$ | $447216097$ | $0.1873815$ | $200002237415913437$ |
| E19_1046 | $51961545$ | $1023094199$ | $0.1799071$ | $1046721740027451601$ |