Skip to content
Permalink
Browse files

Update tutorial

Rewrite the statistical analysis section, remove unused imports in
the code, format code with autopep8, fix typos in the text.
  • Loading branch information...
jngrad committed Sep 10, 2019
1 parent 235039b commit 2e163bc8b84aa1532a44e928626acefef6ac7f0c
Showing with 65 additions and 39 deletions.
  1. +65 −39 doc/tutorials/12-constant_pH/12-constant_pH.ipynb
@@ -15,22 +15,27 @@
"$\\mathrm{HA} \\Leftrightarrow \\mathrm{A}^- + \\mathrm{H}^+$\n",
"\n",
"If $N_0 = N_{\\mathrm{HA}} + N_{\\mathrm{A}^-}$ is the number of titratable groups in solution, then the degree of dissociation $\\alpha$ can be defined:\n",
"$\\alpha = \\dfrac{N_{\\mathrm{A}^-}}{N_0}.$\n",
"\n",
"$\\alpha = \\dfrac{N_{\\mathrm{A}^-}}{N_0}.$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.1 Reaction Constants\n",
"The reaction constant describes the chemical equilibrium and therefore equilibrium concentrations:\n",
"\n",
"\\begin{equation}\n",
"K = \\frac{[\\mathrm{H}^+][\\mathrm{A}^-]}{[\\mathrm{HA}]} \\frac{\\gamma_{\\mathrm{H}^+}\\gamma_{\\mathrm{A}^-}}{\\gamma_{\\mathrm{HA}}}=\\exp(-\\beta \\Delta G^0)\n",
"\\end{equation}\n",
"\n",
"with $[\\mathrm{H}^+]=\\frac{c_{\\mathrm{H}^+}}{c^0}$ the concencentration divided by the standard concentration (typically one mol per liter) and $\\gamma_{\\mathrm{H}^+}$ the activity coefficient of species $\\mathrm{H}^+$.\n",
"For an ideal system the activity coefficients are unity and $[\\mathrm{H}^+] = 10^{-\\mathrm{pH}}$. In an interacting system the activity coefficients of the species become unity for infinite dilution.\n",
"with $[\\mathrm{H}^+]=\\frac{c_{\\mathrm{H}^+}}{c^0}$ the concencentration divided by the standard concentration (typically one mole per liter) and $\\gamma_{\\mathrm{H}^+}$ the activity coefficient of species $\\mathrm{H}^+$.\n",
"For an ideal system the activity coefficients are unity and $[\\mathrm{H}^+] = 10^{-\\mathrm{pH}}$. In an interacting system the activity coefficients of the species become unity at infinite dilution.\n",
"\n",
"The degree of dissociation can also be expressed via a ratio of concentrations:\n",
"\n",
"\\begin{equation}\n",
"\\alpha = \\frac{N_{\\mathrm{A}^-}}{N_0} \\cdot = \\frac{N_{\\mathrm{A}^-}}{N_{\\mathrm{HA}} + N_{\\mathrm{A}^-}} = \\frac{[\\mathrm{A}^-]}{[\\mathrm{HA}]+[\\mathrm{A}^-]}.\n",
"\\alpha = \\frac{N_{\\mathrm{A}^-}}{N_0} = \\frac{N_{\\mathrm{A}^-}}{N_{\\mathrm{HA}} + N_{\\mathrm{A}^-}} = \\frac{[\\mathrm{A}^-]}{[\\mathrm{HA}]+[\\mathrm{A}^-]}.\n",
"\\end{equation}\n",
"\n",
"Multiplying both the numerator and denominator by $\\dfrac{[\\mathrm{H}^+]}{[\\mathrm{HA}]}$ leads to:\n",
@@ -42,9 +47,14 @@
"Since $K = 10^{-\\mathrm{p}K}$ we can express $\\alpha$ as a simple function of the pH and $\\mathrm{p}K$:\n",
"\n",
"\\begin{equation}\n",
"\\alpha(pH, pK)= \\frac{1}{\\frac{[\\mathrm{H}^+]}{K} + 1} = \\frac{1}{\\frac{10^{-\\mathrm{pH}}}{10^{-\\mathrm{p}K}} + 1} = \\frac{1}{10^{\\mathrm{p}K-\\mathrm{pH}} + 1}.\n",
"\\end{equation}\n",
"\n",
"\\alpha(\\mathrm{pH}, \\mathrm{p}K)= \\frac{1}{\\frac{[\\mathrm{H}^+]}{K} + 1} = \\frac{1}{\\frac{10^{-\\mathrm{pH}}}{10^{-\\mathrm{p}K}} + 1} = \\frac{1}{10^{\\mathrm{p}K-\\mathrm{pH}} + 1}.\n",
"\\end{equation}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.2 Constant pH Method\n",
"In the constant pH method, the acceptance probability for a reaction is [1]:\n",
"\n",
@@ -75,9 +85,6 @@
"plt.ion()\n",
"\n",
"import espressomd\n",
"from espressomd import code_info\n",
"from espressomd import analyze\n",
"from espressomd import integrate\n",
"from espressomd import reaction_ensemble\n",
"\n",
"# System parameters\n",
@@ -104,7 +111,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Next we define some important constants and add particles to the system, like the dissociation constant of the acid. We choose p$K$=4 which is close to the value for (poly)acrylic acid."
"Next we define some important constants and add particles to the system, like the dissociation constant of the acid. We choose $\\mathrm{p}K=4$ which is close to the value for (poly)acrylic acid."
]
},
{
@@ -149,7 +156,7 @@
"metadata": {},
"source": [
"Next we perform simulations at different pH values. The system must be equilibrated at each pH before taking samples.\n",
"Calling `RE.reaction(X)` attempts in total `X` reactions (in back and forward direction).\n",
"Calling `RE.reaction(X)` attempts in total `X` reactions (in back and forward directions).\n",
"We also plot the acceptance rate for the dissociation reaction and the association reaction for the first pH value which we set."
]
},
@@ -159,7 +166,7 @@
"metadata": {},
"outputs": [],
"source": [
"num_samples=100\n",
"num_samples = 100\n",
"\n",
"pHs = np.linspace(1, 8, num=15)\n",
"degrees_of_dissociation = []\n",
@@ -173,62 +180,81 @@
" RE.reaction(4 * N0) # equilibration to the new pH\n",
" numAs = []\n",
" for i in range(num_samples):\n",
" RE.reaction(N0+1)\n",
" RE.reaction(N0 + 1)\n",
" numAs.append(system.number_of_particles(type=type_A))\n",
" degrees_of_dissociation.append(np.mean(numAs) / N0)\n",
" std_dev_degree_of_dissociation.append(np.std(numAs, ddof=1) / N0)\n",
" print(\"occurred particle numbers of type A \", sorted(set(numAs)))\n",
" histogram = np.bincount(np.array(numAs).astype(int), minlength=N0 + 1)\n",
" histograms.append(histogram / float(len(numAs)))\n",
" if(abs(pH-1.0)<1e-4):\n",
" print(\"acceptance rate dissociation\", RE.get_acceptance_rate_reaction(0) )\n",
" print(\"acceptance rate association\", RE.get_acceptance_rate_reaction(1) )"
" if(abs(pH - 1.0) < 1e-4):\n",
" print(\"acceptance rate dissociation\", RE.get_acceptance_rate_reaction(0))\n",
" print(\"acceptance rate association\", RE.get_acceptance_rate_reaction(1))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3 Results\n",
"Finally we plot our results and compare it to the well-known result for an ideal reacting system $\\alpha(pH, pK)$ presented above.\n",
"Finally we plot our results and compare it to the well-known result for an ideal reacting system $\\alpha(\\mathrm{pH}, \\mathrm{p}K)$ presented above.\n",
"### 3.1 Statistical Uncertainty\n",
"The numerical errors in the simulation can be reduced by increasing `num_samples` taken at each pH. To quantify the simulation uncertainty, we will add error bars to the calculated data points. For simplicity, we will estimate the error of the degree of dissociation using the standard error of the mean (SE) for $N$ _uncorrelated_ samples:\n",
"\n",
"To quantify the simulation uncertainty, we will add error bars to the calculated data points.\n",
"Since the underlying probability distribution of our data does not follow a well-known function, such as the Gaussian,\n",
"Binomial, exponential or Poisson distributions, there is no literature to provide us with an analytical form of the\n",
"confidence interval. Sometimes the analytical form of the underlying probability distribution is also unknown.\n",
"However, a _statistic_ calculated from a random sample often has a probability distribution different from the\n",
"sample probability distribution. We will use that to our advantage.\n",
"\n",
"In our case, the values of $\\alpha$ are drawn from a random distribution $X(N_{\\mathrm{A}}, N_{\\mathrm{0}})$ with _parameters_ $N_{\\mathrm{A}}$, $N_{\\mathrm{0}}$ and moments $\\mu_\\alpha = \\operatorname{E}(X)$ and $\\sigma_\\alpha = E([E(X)-X]^2)$.\n",
"The average $\\bar{\\alpha} = \\frac{1}{n}\\sum_{i=1}^n \\alpha_i$ and variance $\\sigma_n^2 = \\frac{1}{n}\\sum_{i=1}^n (\\alpha_i - \\bar{\\alpha})^2$ of a sample of $n$ independent values $\\alpha_i$ are themselves random variables,\n",
"but they do not have the same probability distribution as $X$. They are called _statistics_ because they are calculated\n",
"from a sample of the population, and can be used as _estimators_ of the moments $\\mu_\\alpha$ and $\\sigma_\\alpha$.\n",
"\n",
"According to the Central Limit Theorem (CLT), the probability distribution $\\sqrt{n}(\\bar{\\alpha} - \\mu_\\alpha)$ for large $n$ converges in probability distribution to $\\mathcal{N}(0, \\sigma_n^2)$. This statement can be rewritten as:\n",
"\n",
"\\begin{equation}\n",
" \\mathrm{SE} = \\frac{\\sigma_N}{\\sqrt{N}},\n",
" z = \\frac{\\bar{\\alpha} - \\mu_\\alpha}{\\sigma_n} \\sqrt{n} \\sim \\mathcal{N}(0,1)\n",
"\\end{equation}\n",
"\n",
"where $\\sigma_N$ is the sample standard deviation, defined as:\n",
"in the limit of a large $n$, with $z$ the two-tail $z$-score of the standard Normal distribution.\n",
"The rate of convergence depends on the underlying probability distribution $X$. For this tutorial,\n",
"15'000'000 data points were collected and the distribution of $z$ was very close to Normal for $n=100$.\n",
"\n",
"The 95% probability of finding the population mean $\\mu$ within two boundaries $w^+$ and $w^-$ _over repeated experiments_ is given by:\n",
"\n",
"\\begin{equation}\n",
" \\sigma_N^2 = \\left\\langle \\alpha^2 - \\langle \\alpha\\rangle_N^2 \\right\\rangle_N\n",
" \\operatorname{Pr}\\left( w^- \\leq \\mu \\leq w^+\\right) = 0.95\n",
"\\end{equation}\n",
"\n",
"where $\\langle\\alpha\\rangle_N$ is the sample mean. The probability of finding the true mean $\\mu$ (estimated by $\\langle\\alpha\\rangle_N$) within two bounds at 95% is given by:\n",
"For the standard Normal distribution, by definition $\\mu = 0$ and $w^\\pm = \\pm z_{1-0.95/2}$ with $z_{1-0.95/2} \\approx 1.96$, so for the distribution $\\sqrt{n}(\\bar{\\alpha} - \\mu_\\alpha) / \\sigma_n$:\n",
"\n",
"\\begin{equation}\n",
" \\operatorname{Pr}\\left( \\langle\\alpha\\rangle_N -z\\cdot\\mathrm{SE} \\leq \\mu \\leq \\langle\\alpha\\rangle_N -z\\cdot\\mathrm{SE}\\right) = 0.95,\n",
" w^\\pm = \\bar{\\alpha} \\pm z_{1-0.95/2} \\cdot \\frac{\\sigma_n}{\\sqrt{n}}\n",
"\\end{equation}\n",
"\n",
"where $z \\approx 1.96$ is the two-sided confidence level for the normal distribution at 95%.\n",
"To plot the simulated curve with error bars, we need to estimate three parameters: $\\mu_\\alpha$ and $w^\\pm$.\n",
"The accuracy of our estimation will depend on the sample size `num_samples`. The larger it is, the closer\n",
"$\\sqrt{n}(\\bar{\\alpha} - \\mu_\\alpha)$ converges to $\\mathcal{N}(0, \\sigma_n^2)$ and the smaller the $w^\\pm$ interval gets.\n",
"\n",
"Because our sample size $n$ is small, we will use the Bessel-corrected sample standard deviation $s$:\n",
"The sample mean $\\bar{\\alpha}$ is already an unbiased estimator of $\\mu_\\alpha$.\n",
"However to estimate $w^\\pm$ we need to estimate $\\sigma_N$, for which the sample standard deviation $s_n$ is a biased estimator. We can account for some of the bias using Bessel's correction:\n",
"\n",
"\\begin{equation}\n",
" s = \\sqrt{\\frac{n}{n-1}\\left\\langle \\alpha^2 - \\langle \\alpha\\rangle_n^2 \\right\\rangle_n}\n",
" s_{n-1}^2 = \\frac{1}{n-1}\\sum_{i=1}^n (\\alpha_i - \\bar{\\alpha})^2\n",
"\\end{equation}\n",
"\n",
"The standard error of the mean (SE) is estimated by $s_{\\bar\\alpha}$:\n",
"This correction is required because the sample variance $s_n^2$ is evaluated against $\\bar{\\alpha}$ instead of the\n",
"population mean $\\mu_\\alpha$. $s_{n-1}^2$ is an unbiased estimator of $\\sigma_n^2$, however taking the square root\n",
"introduces a new source of bias due to Jensen's inequality. Different approaches exist to correct for this bias,\n",
"but since it is small, it is common to neglect it. The quantity $w^\\pm$ is estimated by a statistic called the 95% Confidence Interval:\n",
"\n",
"\\begin{equation}\n",
" s_{\\bar\\alpha} = \\frac{s}{\\sqrt{n}}\n",
" \\mathrm{CI}_{95\\%} = \\bar{\\alpha} \\pm z_{1-0.95/2} \\cdot \\frac{s_{n-1}}{\\sqrt{n}}\n",
"\\end{equation}\n",
"\n",
"The confidence interval is defined as:\n",
"\n",
"\\begin{equation}\n",
" \\mathrm{CI}_{95\\%} = \\langle\\alpha\\rangle_n \\pm z \\cdot s_{\\bar\\alpha}\n",
"\\end{equation}"
"We will use that estimate to plot the error bars."
]
},
{
@@ -241,7 +267,7 @@
"def ideal_degree_of_dissociation(pH, pK):\n",
" return 1. / (1 + 10**(pK - pH))\n",
"\n",
"z = scipy.stats.norm.interval(0.95, loc=0)[1]\n",
"z = scipy.stats.norm.interval(1 - 0.05 / 2, loc=0)[1] # two-tail z-score at 95%\n",
"ci95 = z * np.array(std_dev_degree_of_dissociation) / np.sqrt(num_samples)\n",
"pK = -np.log10(K)\n",
"plt.figure(figsize=(10, 6), dpi=80)\n",
@@ -262,7 +288,7 @@
"\n",
"\\begin{equation}\n",
" p(N_\\mathrm{A})=\\frac{{N_0 \\choose N_\\mathrm{A}} 10^{(\\mathrm{pH-p}K)N_\\mathrm{A}}}\n",
" {\\sum_{N_\\mathrm{A}=0}^{N_0} {N_0 \\choose N_\\mathrm{A}} 10^{(\\mathrm{pH}-\\mathrm{p}K)N_\\mathrm{A}}}\n",
" {\\sum_{N=0}^{N_0} {N_0 \\choose N} 10^{(\\mathrm{pH}-\\mathrm{p}K)N}}\n",
"\\end{equation}"
]
},
@@ -288,8 +314,8 @@
" pH = pHs[index_pH]\n",
" ax.plot(NAs, constant_pH_numA_pmf(NAs, pH, pK, N0), 'ro', ms=8, mec='r', label=\"Theory pH \"+str(pH))\n",
" ax.vlines(NAs, 0, constant_pH_numA_pmf(NAs, pH, pK, N0), colors='r', lw=4)\n",
" ax.plot(histogram_edges, histograms[index_pH], 'bo', ms=8, mec='b'\n",
" , label=\"Simulation pH \"+str(pH))\n",
" ax.plot(histogram_edges, histograms[index_pH], 'bo', ms=8, mec='b',\n",
" label=\"Simulation pH \" + str(pH))\n",
" ax.vlines(histogram_edges, 0, histograms[index_pH], colors='b', lw=1)\n",
" ax.legend()\n",
" ax.set_xlabel('$N_\\\\mathrm{A}}$')\n",

0 comments on commit 2e163bc

Please sign in to comment.
You can’t perform that action at this time.