## 1 Introduction

This tutorial introduces the basic features for simulating titratable systems via the constant pH method.
The constant pH method is one of the methods implemented for simulating systems with chemical reactions within the Reaction Ensemble module. It is a Monte-Carlo method designed to model an acid-base ionization reaction at a given (fixed) value of solution pH.

We will consider a homogeneous aqueous solution of a titratable acidic species $\mathrm{HA}$ that can dissociate in a reaction, that is characterized by the equilibrium constant $\mathrm{p}K_A=-\log_{10} K_A$
$$\mathrm{HA} \Leftrightarrow \mathrm{A}^- + \mathrm{H}^+$$


If $N_0 = N_{\mathrm{HA}} + N_{\mathrm{A}^-}$ is the number of titratable groups in solution, then we define the degree of dissociation $\alpha$ as:

$$\alpha = \dfrac{N_{\mathrm{A}^-}}{N_0}.$$

This is one of the key quantities that can be used to describe the acid-base equilibrium. Usually, the goal of the simulation is to predict the value of $\alpha$ under given conditions in a complex system with interactions. In this tutorial, we will simulate only ideal systems (without intermolecular interactions) to show that in such case our simulations reproduce the well-known analytical solutions.

### 1.1 The Chemical Equilibrium and Reaction Constant

The equilibrium reaction constant describes the chemical equilibrium of a given reaction. The values of equilibrium constants for various reactions can be found in tables. For the acid-base ionization reaction, the equilibrium constant is conventionally called the acidity constant, and it is defined as
\begin{equation}
K_A = \frac{a_{\mathrm{H}^+} a_{\mathrm{A}^-} } {a_{\mathrm{HA}}}
\end{equation}
where $a_i$ is the activity of species $i$. It is related to the chemical potential and to the concentration
The degree of dissociation can also be expressed via a ratio of concentrations:

\begin{equation}
\mu_i = \mu_i^\mathrm{ref} + k_{\mathrm{B}}T \ln a_i
\,,\qquad
a_i = \frac{c_i \gamma_i}{c^{\ominus}}
\end{equation}
where $\gamma_i$ is the activity coefficient, and $c^{0}$ is the (arbitrary) reference concentration, often chosen to be the standard concentration, $c^0 = 1\,\mathrm{mol\,dm^{-3}}$ and $\mu_i^\mathrm{ref}$ is the reference chemical potential.
Note that $K$ is a dimensionless quantity but its numerical value depends on the choice of $c^0$.
For an ideal system, $\gamma_i=1$ by definition, whereas for an interacting system $\gamma_i$ is a non-trivial function of the interactions. For an ideal system we can rewrite $K$ in terms of equilibrium concentrations
\begin{equation}
K_A \overset{\mathrm{ideal}}{=} \frac{c_{\mathrm{H}^+} c_{\mathrm{A}^-} } {c_{\mathrm{HA}} c^{\ominus}}
\end{equation}

The ionization degree can also be expressed via the ratio of concentrations:
\begin{equation}
\alpha 
= \frac{N_{\mathrm{A}^-}}{N_0} 
= \frac{N_{\mathrm{A}^-}}{N_{\mathrm{HA}} + N_{\mathrm{A}^-}}
= \frac{c_{\mathrm{A}^-}}{c_{\mathrm{HA}}+c_{\mathrm{A}^-}}
= \frac{c_{\mathrm{A}^-}}{c_{0}}.
\end{equation}
where $c_0=c_{\mathrm{HA}}+c_{\mathrm{A}^-}$ is the total concentration of titratable groups irrespective of their ionization state.
Then, we can characterize the acid-base ionization equilibrium using the ionization degree and pH, defined as
\begin{equation}
\mathrm{pH} = -\log_{10} a_{\mathrm{H^{+}}} \overset{\mathrm{ideal}}{=} -\log_{10} (c_{\mathrm{H^{+}}} / c^{\ominus})
\end{equation}
Substituting for the ionization degree and pH into the expression for $K_A$ we obtain the Henderson-Hasselbalch equation
\begin{equation}
\mathrm{pH}-\mathrm{p}K_A = \log_{10} \frac{\alpha}{1-\alpha}
\end{equation}
One important implication of the Henderson-Hasselbalch equation is that at a fixed pH value the ionization degree of an ideal acid is independent of concentration. Another important implication is that it does not depend on the absolute values of $\mathrm{p}K_A$ or $\mathrm{pH}$, but only on the difference, $\mathrm{pH}-\mathrm{p}K_A$.

### 1.2 Constant pH Method

The constant pH method [Reed1992] is designed to simulate an acid-base ionization reaction at a given pH. It assumes that the simulated system is coupled to an implicit reservoir of $\mathrm{H^+}$ ions but exchange of ions with this reservoir is not explicitly simulated. Therefore, the number of $\mathrm{H}^+$ ions in the simulation box does not correspond to the chosen pH. This may lead to artifacts when simulating interacting systems, especially at high of low pH values. Discussion of these artifacts is beyond the scope of this tutorial (see e.g. [Landsgesell2019] for further details).

In Espresso, the forward step of the ionization reaction (from left to right) is implemented by 
changing the chemical identity (particle type) of a randomly selected $\mathrm{HA}$ particle to $\mathrm{A}^-$, and inserting another particle that represents $\mathrm{H}^+$. In the reverse direction (from right to left), the chemical identity (particle type) of a randomly selected $\mathrm{A}^{-}$ is changed to $\mathrm{HA}$, and a randomly selected $\mathrm{H}^+$ is deleted from the simulation box. The probability of proposing the  forward reaction step is $P_\text{prop}=N_\mathrm{HA}/N_0$, and probability of proposing the reverse step is $P_\text{prop}=N_\mathrm{A}/N_0$. The trial move is accepted with the acceptance probability

$$ P_{\mathrm{acc}} = \operatorname{min}\left(1, \exp(-\beta \Delta E_\mathrm{pot} \pm \ln_{10} (\mathrm{pH - p}K_A) ) \right)$$

Here $\Delta E_\text{pot}$ is the potential energy change due to the reaction, while $\text{pH - p}K$ is an input parameter. 
The signs $\pm 1$ correspond to the forward and reverse direction of the ionization reaction, respectively. 



## 2 Setup
We start by creating a system instance with an arbitrary box length of 35 $\sigma$ and creating `N0` titratable units (in the associated state). We set the dissociation constant of the acid to $\mathrm{p}K_A=4.88$, that is the acidity constant of propionic acid. We choose propionic acid as it resembles the repeating unit of polyacrylic acid.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

import espressomd
from espressomd import reaction_ensemble

# class for working with SI units (this example works only with a limited set of units)
from si_units import SIunits
# Initialize the class by supplying the SI values of some internal units
# sigma=0.40 nm is a commonly used  particle size in coarse-grained simulations
SI=SIunits(sigma_in_nm=0.355, T_in_K=293.15);

# System parameters
#############################################################
C_acid_SI = 0.001 # 0.05 mol/L is a reasonable concentration that could be used in experiments
C_salt_SI = 0.002 # additional salt to control the ionic strength
# Using the constant-pH method is safe if Ionic_strength > max(10**(-pH), 10**(-pOH) ) and C_salt > C_acid
Bjerrum_SI=0.713; # Bjerrum length in nm in water at 300K
# concentration in the example is arbitrary (see Henderson-Hasselbalch equation)
N_acid = 20  # number of titratable units in the box

PROB_REACTION=0.5; # select the reaction move with 50% probability 
# probability of the reaction is adjustable parameter of the method that affects the speed of convergence

# Simulate an interacting system with steric repulsion (Warning: it will be slower than without WCA!)
USE_WCA = False;
# Simulate an interacting system with electrostatics (Warning: it will be very slow!)
USE_ELECTROSTATICS = False;

# particle types of different species
TYPE_HA = 0; q_HA = 0
TYPE_A  = 1; q_A  =-1
TYPE_H  = 2; q_H  =+1
TYPE_Na = 3; q_Na =+1
TYPE_Cl = 4; q_Cl =-1

# acidity constant
pK = 4.88
K = 10**(-pK)
offset=2.0; # range of pH values to be used pK +/- offset
num_pHs=15 # number of pH values

# dependent parameters
c_acid_sigma=SI.convert(C_acid_SI,'mol/L','N/sigma^3'); # concentration in number of particles per sigma**3
c_salt_sigma=SI.convert(C_salt_SI,'mol/L','N/sigma^3'); # concentration in number of particles per sigma**3
Bjerrum_sigma=SI.convert(Bjerrum_SI,'nm','sigma'); # Bjerrum 
BOX_L= (N_acid/c_acid_sigma)**(1/3.); # box length in units of sigma
print ("c_acid_sigma: {}, c_salt_sigma: {}, BOX_L: {} , Bjerrum_sigma: {}".format(c_acid_sigma, c_salt_sigma,BOX_L,Bjerrum_sigma))
N_salt = int(c_salt_sigma*(BOX_L**3)); # number of salt ion pairs in the box
print ("N_acid: {}, N_salt: {}".format(N_acid, N_salt))

n_blocks=16  # number of block to be used in data analysis
desired_block_size = 10  # desired number of samples per block
num_samples = int(n_blocks * desired_block_size / PROB_REACTION); # number of reaction samples per each pH value
pHmin=pK-offset # lowest pH value to be used
pHmax=pK+offset # highest pH value to be used
pHs = np.linspace(pHmin, pHmax, num_pHs); # list of pH values
  
# Initialize the Espresso system
##############################################
system = espressomd.System(box_l=[BOX_L] * 3)
system.set_random_state_PRNG() # initialize random number generator in the Espresso system
system.time_step = 0.01
system.cell_system.skin = 0.4
system.thermostat.set_langevin(kT=1.0, gamma=1.0, seed=7)
np.random.seed(seed=10) # initialize the random number generator in numpy



After defining the simulation parameters, we set up the system that we want to simulate. It is a polyelectrolyte chain with some added salt that is used to control the ionic strength of the solution. For the first run, we set up the system without any steric repulsion and without electrostatic interactions. In the next runs, we will add the steric repulsion and electrostatic interactions to observe their effect on the ionization.

In [None]:
# create the particles
##################################################
# we need to define bonds before creating polymers
from espressomd.interactions import HarmonicBond
hb= HarmonicBond(k=30, r_0=1.0)
system.bonded_inter.add(hb)

# create the polymer composed of ionizable acid groups, initially in the ionized state
from espressomd import polymer
polymers = polymer.positions(n_polymers=1,
                             beads_per_chain=N_acid,
                             bond_length=0.9, seed=23)
for p in polymers:
    for i, m in enumerate(p):
       id = len(system.part)
       system.part.add(id=id, pos=m, type=TYPE_A, q=q_A)
       if i > 0:
           system.part[id].add_bond((hb, id - 1))

# add the corresponding number of H+ ions
for i in range(N_acid):
        system.part.add(pos=np.random.random(3)*BOX_L, type=TYPE_H, q=q_H)

# add salt ion pairs
for i in range(N_salt):
        system.part.add(pos=np.random.random(3)*BOX_L, type=TYPE_Na, q=q_Na)
        system.part.add(pos=np.random.random(3)*BOX_L, type=TYPE_Cl, q=q_Cl)

# set up the WCA interaction between all particle pairs
if(USE_WCA):
    types=[TYPE_HA, TYPE_A, TYPE_H, TYPE_Na, TYPE_Cl]
    for type_1 in types:
        for type_2 in types:
            system.non_bonded_inter[type_1, type_2].lennard_jones.set_params(
                epsilon=1.0, sigma=1.0,
                cutoff=2**(1.0 / 6), shift="auto")

# run a steepest descent minimization to relax overlaps
system.integrator.set_steepest_descent(
    f_max=0, gamma=0.1, max_displacement=0.1)
system.integrator.run(20)
system.integrator.set_vv()  # to switch back to velocity verlet

# run a short integration before using the elctrostatics
system.integrator.run(steps=1000)

# if needed, set up and tune the Coulomb interaction
if(USE_ELECTROSTATICS):
    print ("set up and tune p3m, please wait....");
    from espressomd import electrostatics
    p3m = electrostatics.P3M(prefactor=Bjerrum_sigma, accuracy=1e-3)
    system.actors.add(p3m)
    p3m_params = p3m.get_params()
    for key in list(p3m_params.keys()):
        print("{} = {}".format(key, p3m_params[key]))
    print ("p3m, tuning done");

In the first step, we initialize the reaction ensemble, by setting the temperature, exclusion radius and seed of the random number generator. Because we are simulating an ideal system, 
we set the temperature to an arbitrary value. In an interacting system the exclusion radius ensures that particle insertions too close to other particles are not attempted. Such insertions would make the subsequent Langevin dynamics integration unstable. If the particles are not interacting, we can set the exclusion radius to $0.0$. Otherwise, $1.0$ is a good value. We set the seed to a constant value to ensure reproducible results.

In [None]:
RE = reaction_ensemble.ConstantpHEnsemble(
        temperature=1, exclusion_radius=1.0, seed=77)

The next step is to define the reaction system and the seed of the pseudo-random number generator which is used for the Monte-Carlo steps.
The order in which species are written in the lists of reactants and products is very important. When a reaction move is performed, identity of the first species in the list of reactants is changed to the first species in the list of products, the second reactant species is changed to the second product species, and so on. If the reactant list has more species than the product list, then excess reactant species are deleted from the system. If the product list has more species than the reactant list, then product the excess product species are created and randomly placed inside the simulation box. This is especially relevant if some of the species belong to a chain-like molecule.

In the example below, the order of reactants and produces ensures that identity of $\mathrm{HA}$ is changed to $\mathrm{A^{-}}$ and vice versa, while $\mathrm{H^{+}}$ is inserted/deleted in the reaction move. Reversing the order of products in our reaction (i.e. from  `product_types=[TYPE_H, TYPE_A]` to `product_types=[TYPE_A, TYPE_H]`), would result in a reaction move, where the identity HA would be changed to $\mathrm{H^{+}}$, while $\mathrm{A^{-}}$ would be inserted/deleted at a random position in the box.
We also assign charges to each type because in general the charge will play a role in simulations with electrostatic interactions. As an easy task for the interested reader we propose to adapt the tutorial to account for electrostatic interactions. Therefore we keep these values for the needed charge assignments in place, although they are not needed for an ideal system.

In [None]:
RE.add_reaction(gamma=K, reactant_types=[TYPE_HA], reactant_coefficients=[1],
                product_types=[TYPE_A, TYPE_H], product_coefficients=[1, 1],
                default_charges={TYPE_HA: q_HA, TYPE_A: q_A, TYPE_H: q_H})
print(RE.get_status())

Next we perform simulations at different pH values. The system must be equilibrated at each pH before taking samples.
Calling `RE.reaction(X)` attempts in total `X` reactions (in back and forward directions).
We also plot the acceptance rate for the dissociation reaction and the association reaction for the first pH value which we set.

In [None]:
# the reference data from Henderson-Hasselbalch equation
def ideal_alpha(pH, pK):
    return 1. / (1 + 10**(pK - pH))

# empty lists as placeholders for collecting data
numAs = []; # number of A- species observed at each sample

#run a productive simulation and collect the data
print("Simulated pH values: ",pHs)
for pH in pHs:
    print("Run pH {:.2f} ...".format(pH))
    RE.constant_pH = pH
    numAs_current = []; # temporary data storage for a given pH
    RE.reaction(20*N_acid + 1) # pre-equilibrate to the new pH value
    for i in range(num_samples):
        if(np.random.random()<PROB_REACTION): 
            RE.reaction(N_acid + 1) # should be at least one reaction attempt per particle
        else:
            system.integrator.run(steps=1000);
        numAs_current.append(system.number_of_particles(type=TYPE_A))
        #print ("NP: A {}, HA {}".format(system.number_of_particles(type=TYPE_A),system.number_of_particles(type=TYPE_HA)))
    numAs.append(numAs_current) #
    print("measured number of A-: {0:.2f}, (ideal: {1:.2f})".format(np.mean(numAs_current),N_acid*ideal_alpha(pH,pK)))
print("finished")

## 3 Results

Finally we plot our results and compare them to the analytical results obtained from the Henderson-Hasselbalch equation.

### 3.1 Statistical Uncertainty

The molecular simulation produces a sequence of snapshots of the system, that 
constitute a Markov chain. It is a sequence of realizations of a random process, where
the next value in the sequence depends on the preceding one. Therefore,
the subsequent values are correlated. To estimate statistical error of the averages
determined in the simulation, one needs to correct for the correlations.

Here, we will use a rudimentary way of correcting for correlations, termed the binning method.
We refer the reader to specialized literature for a more sophisticated discussion [Janke2002]. The general idea is to group a long sequence of correlated values into a rather small number of blocks, and compute an average per each block. If the blocks are big enough, they
can be considered uncorrelated, and one can apply the formula for standard error of the mean of uncorrelated values. If the number of blocks is small, then they are uncorrelated but the obtained error estimates has a high uncertainty. If the number of blocks is high, then they are too short to be uncorrelated, and the obtained error estimates are systematically lower than the correct value. Therefore, the method works well only if the sample size is much greater than the autocorrelation time.

In the example below, we use a fixed number of 8 blocks to obtain the error estimates. To check for consistency, we estimate the autocorrelation time, and print a warning message if some blocks contain less than 100 uncorrelated values. Intentionally, we make our simulation slightly too short, so that it does not produce enough uncorrelated samples. We encourage the reader to vary the number of blocks or the number of samples to see how the estimated error changes with these parameters.

In [None]:
# statistical analysis of the results
def block_analyze(input_data,n_blocks=16):
    # statistical analysis of the results
    data=np.array(input_data)
    block=0;
    # this number of blocks is recommended by Janke as a reasonable compromise
    # between the conflicing requirements on block size and number of blocks
    block_size=int(data.shape[1]/n_blocks);
    print("block_size:", block_size);
    block_average=np.zeros((n_blocks,data.shape[0])); # initialize the array of per-block averages
    # calculate averages per each block
    for block in range(0,n_blocks):
        block_average[block]=np.average(data[:,block*block_size:(block+1)*block_size],axis=1)
    # calculate the average and average of the square
    av_data=np.average(data,axis=1)
    av2_data=np.average(data*data,axis=1);
    # calculate the variance of the block averages
    block_var=np.var(block_average,axis=0);
    # calculate standard error of the mean
    err_data=np.sqrt(block_var/(n_blocks-1));
    # estimate autocorrelation time using the formula given by Janke
    # this assumes that the errors have been correctly estimated
    tau_data=np.zeros(av_data.shape);
    for val in range(0,av_data.shape[0]):
        if (av_data[val] == 0):
            tau_data[val]=-1.0; # unphysical value marks a failure to compute tau
        else:
            tau_data[val]=0.5*block_size*n_blocks/(n_blocks-1)*block_var[val]/(av2_data[val]-av_data[val]*av_data[val]);
    return av_data, err_data, tau_data, block_size;

# estimate the statistical error and the autocorrelation time using the formula given by Janke
av_numAs, err_numAs, tau, block_size = block_analyze(numAs);
print("av = ", av_numAs);
print("err = ", err_numAs);
print("tau = ", tau);

# calculate the average ionization degree
av_alpha=av_numAs/N_acid;
err_alpha=err_numAs/N_acid;

# plot the simulation results compared with the ideal titration curve
plt.figure(figsize=(10, 6), dpi=80)
plt.errorbar(pHs - pK, av_alpha, err_alpha, marker='o', linestyle='none',\
             label=r"simulation")
pHs2 = np.linspace(pHmin, pHmax, num=50)
plt.plot(pHs2 - pK, ideal_alpha(pHs2, pK), label=r"ideal")
plt.xlabel('pH-p$K$', fontsize=16)
plt.ylabel(r'$\alpha$', fontsize=16)
plt.legend(fontsize=16)
plt.show()

The simulation results for the non-interacting case very well compare with the analytical solution of Henderson-Hasselbalch equation. There are only minor deviations, and the estimated errors are small too. This situation will change when we introduce interactions.

It is useful to check whether the estimated errors are consistent with the assumptions that were used to obtain them. To do this, we follow [Janke2000] to estimate the number of uncorrelated samples per block, and check whether each block contains a sufficient number of uncorrelated samples (we choose 10 uncorrelated samples per block as the threshold value).

In [None]:
# check if the blocks contain enough data for reliable error estimates
print("uncorrelated samples per block:\nblock_size/tau = ",\
      block_size/tau);
threshold=10.; # block size should be much greater than the correlation time
if(np.any(block_size/tau<threshold)):
    print("\nWarning: some blocks may contain less than ", threshold, "uncorrelated samples."\
          "\nYour error estimated may be unreliable."\
          "\nPlease, check them using a more sophisticated method or run a longer simulation.")
    print("? block_size/tau > threshold ? :", block_size/tau>threshold);
else:
    print("\nAll blocks seem to contain more than ", threshold, "uncorrelated samples.\
    Error estimates should be OK.");


To look in more details at the statistical accuracy, it is useful to plot the deviations from the analytical result. This provides another way to check the consistency of error estimates.  For 68% of the results, the deviation from the analytical result should be within the error bar, whereas about 95% of the results should be within two times the error bar. Indeed, if you plot the deviations by running the script below, you should observe that most of the results are within one error bar from the analytical solution, a smaller fraction  of the results is slightly further than one error bar, and one or two might be about two error bars apart.

Try increasing the number of samples of in the simulation to see how the estimated error changes, and whether the consistency is still satisfied.

In [None]:
# plot the deviations from the ideal result
plt.figure(figsize=(10, 6), dpi=80)
ylim=np.amax(abs(av_alpha-ideal_alpha(pHs, pK)))
plt.ylim((-1.5*ylim,1.5*ylim))
plt.errorbar(pHs - pK, av_alpha-ideal_alpha(pHs, pK),\
             err_alpha, marker='o', linestyle='none', label=r"simulation")
plt.plot(pHs - pK, 0.0*ideal_alpha(pHs, pK), label=r"ideal")
plt.xlabel('pH-p$K$', fontsize=16)
plt.ylabel(r'$\alpha - \alpha_{ideal}$', fontsize=16)
plt.legend(fontsize=16)
plt.show()

Finally, we demonstrate that the actual number of H+ ions in the simulation box does not correspond to the pH value that was provided as input of the constant-pH simulation. To do so, we try measure the average pH value using the ionization degree, and compare it with the pH value provided as the input. For an ideal system, $\mathrm{pH}$ should be equal to $-\log_{10} (c_{\mathrm{H^{+}}} / c_0)$, where $c_0=1\,\mathrm{mol/L}$ is the reference concentration. Below, we calculate the value of $-\log_{10} (c_{\mathrm{H^{+}}} / c_0)$ to show that it is very different than pH that we have set as an input of the simulation.

In [None]:
av_c = av_alpha*C_acid_SI; # average concentration of H+ in mol/L
err_c = err_alpha*C_acid_SI; # error in the average concentration

av_pH = -np.log10(av_c); # average value of -log10(c_H+/c_0)
print ("av_pH:", av_pH)
err_pH = err_c/(av_c*np.log(10.))
print ("err_pH:", err_pH)

# plot the simulation results compared with the ideal titration curve
plt.figure(figsize=(10, 6), dpi=80)
plt.errorbar(pHs, av_pH, err_pH, marker='o', linestyle='none',\
             label=r"simulation")
plt.plot(pHs, pHs, label=r" input pH = measured pH")
plt.xlabel('input pH', fontsize=16)
plt.ylabel(r'$-\log_{10}(c_{\mathrm{H^+}}/c^{\ominus})$', fontsize=16)
plt.legend(fontsize=16)
plt.show()

Naively, one would expect the value of $-\log_{10} (c_{\mathrm{H^{+}}} / c_0)$ to be equal to the pH value specified as input of the simulation but this is not true. The only $\mathrm{H^+}$ ions in the simulation box are those that have been generated by the ionization reaction. Therefore, the number of $\mathrm{H^+}$ ions increases with increasing ionization (increasing pH on the input), and converserly the value of $-\log_{10} (c_{\mathrm{H^{+}}} / c_0)$ decreases with increasing pH on the input. Moreover, the value of $-\log_{10} (c_{\mathrm{H^{+}}} / c_0)$ depends not only on the ionization degree but also on the total concentration of ionizable species. Therefore, at high pH, the value of $-\log_{10} (c_{\mathrm{H^{+}}} / c_0)$ converges to the logarithm of concentration of ionizable species, and does not vary with input pH anymore. This demonstrates, that in the Constant-pH ensemble, the value of $-\log_{10} (c_{\mathrm{H^{+}}} / c_0)$, and the pH provided as input, should be viewed as two independent parameters.

In an ideal system, this discrepancy between the value of $-\log_{10} (c_{\mathrm{H^{+}}} / c_0)$ and input pH values is harmless because the acceptance probability in the constant-pH ensemble does not depend on the actual number of $\mathrm{H^+}$ ions in the box. In an interacting system, the presence of $\mathrm{H^+}$ ions in the box affects the properties of other parts of the system. Therefore, in an interacting system this discrepancy is harmless only if $\mathrm{H^+}$ ions are the minority component among other ionic species. See also [Landsgesell2019] for a more detailed discussion of this issue, and its consequences.

### Suggested problems for further work

* Try changing the concentration of ionizable species in the non-interacting system. You should observe that it does not affect the obtained titration curve but it affects the pH value measured from the concentration of $\mathrm{H^+}$ ions in the simulation box

* Try changing the number of samples and the number of particles to see how the estimated error and the number of uncorrelated samples will change. Be aware that if the number of uncorrelated samples is low, the error estimation is too optimistic.

* Try running the same simulations with steric repulsion and then again with electrostatic interactions. Observe how the ionization equilibrium is affected by various interactions. Warning: simulations with electrostatics are much slower. If you want to obtain your results quickly, then decrease the number of pH values.

## References

[Janke2002] Janke W. Statistical Analysis of Simulations: Data Correlations and Error Estimation,
In Quantum Simulations of Complex Many-Body Systems: From Theory to Algorithms, Lecture Notes,
J. Grotendorst, D. Marx, A. Muramatsu (Eds.), John von Neumann Institute for Computing, Jülich,
NIC Series, Vol. 10, ISBN 3-00-009057-6, pp. 423-445, 2002.

[Landsgesell2019] Landsgesell, J.; Nová, L.; Rud, O.; Uhlík, F.; Sean, D.; Hebbeker, P.; Holm, C.; Košovan, P. Simulations of Ionization Equilibria in Weak Polyelectrolyte Solutions and Gels. Soft Matter 2019, 15 (6), 1155–1185. https://doi.org/10.1039/C8SM02085J.,
    
[Reed1992] Reed, C. E.; Reed, W. F. Monte Carlo Study of Titration of Linear Polyelectrolytes. The Journal of Chemical Physics 1992, 96 (2), 1609–1620. https://doi.org/10.1063/1.462145.

[Smith1994] Smith, W. R.; Triska, B. The Reaction Ensemble Method for the Computer Simulation of Chemical and Phase Equilibria. I. Theory and Basic Examples. The Journal of Chemical Physics 1994, 100 (4), 3019–3027. https://doi.org/10.1063/1.466443.