Clean up simulation notebooks and streamlit page

jbreffle · Feb 18, 2024 · 972628b · 972628b
1 parent 0bd9a2f
commit 972628b
Show file tree

Hide file tree

Showing 3 changed files with 64 additions and 46 deletions.
diff --git a/notebooks/3a_sim_simple.ipynb b/notebooks/3a_sim_simple.ipynb
@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Simulation by drawing mistake counts\n"
+    "# Simply speed typing simulation\n"
    ]
   },
   {
@@ -14,11 +14,6 @@
     "## Set up\n"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": []
-  },
   {
    "cell_type": "code",
    "execution_count": 1,
@@ -42,7 +37,6 @@
    "source": [
     "import matplotlib.pyplot as plt\n",
     "import numpy as np\n",
-    "import scipy\n",
     "\n",
     "from src import util\n",
     "from src import plot"
@@ -54,22 +48,13 @@
    "source": [
     "## Simulate\n",
     "\n",
-    "Note: Need to fix the dependent parameter calculations\n",
-    "\n",
-    "For each trial it draws a random number of mistakes and then simulates the typing\n",
+    "For each trial, draw a random number of mistakes and then simulate the typing\n",
     "speed and accuracy for that trial.\n",
-    "The number of mistakes is a Poisson distribution with some mean mistakeLambda\n",
-    "Each mistake is assumed to take a certain amount of time to correct (normal\n",
-    "distribution)\n",
-    "The total amount of time to correct all mistakes reduces the final wpm\n",
-    "\n",
-    "Simulates a typing text accuracy and speed\n",
-    "There is an unerlying average wpm and accuracy\n",
-    "Mistakes are generated randomly\n",
     "\n",
-    "For each test, simulate the number of mistakes and the resulting wpm due to mistake\n",
-    "delays\n",
-    "This is a simple simulation: it does not take into account...\n"
+    "The number of mistakes is a Poisson distribution with some mean $\\lambda$.\n",
+    "Each mistake is assumed to take a certain amount of time to correct (log normal\n",
+    "distribution).\n",
+    "The total amount of time to correct all mistakes reduces the final wpm."
    ]
   },
   {
@@ -123,7 +108,7 @@
     }
    ],
    "source": [
-    "# Plot a histogram of the number of mistakes per trial\n",
+    "# Histogram of the number of mistakes per trial\n",
     "fig = plt.figure(figsize=(6, 2))\n",
     "ax = plot.sim_n_mistakes(n_mistakes)\n",
     "plt.show()"
@@ -146,7 +131,7 @@
     }
    ],
    "source": [
-    "# Plot scatter_hist of wpm and acc\n",
+    "# scatter_hist of wpm and acc\n",
     "fig = plt.figure(figsize=(6, 4))\n",
     "ax, ax_histx, ax_histy = plot.sim_scatter_hist(wpm, acc, fig=fig)\n",
     "ax.axvline(avg_wpm, color=\"k\", linestyle=\"--\", alpha=0.5)\n",
@@ -159,7 +144,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## For streamlit"
+    "## Different parameters"
    ]
   },
   {

diff --git a/notebooks/3b_sim_poisson.ipynb b/notebooks/3b_sim_poisson.ipynb
@@ -49,15 +49,22 @@
     "## Run simulation\n",
     "\n",
     "Simulates typing as a poisson process in discrete time, where in each time bin there\n",
-    "is a probability of either typing a correct letter, a wrong letter, or no letter at all.\n",
+    "is a probability of typing a correct letter, a wrong letter, or no letter at all.\n",
     "\n",
-    "Error of approximating Poisson as Bernoulli is determined by ratio of time step (dt)\n",
-    "to characters per second (avg_correct_cps)\n",
-    "Error leads to slightly lower wpm than expected, but is negligible for reasonable\n",
+    "The error of simulationg a Poisson process in discrete time is determined by ratio of time step (dt)\n",
+    "to characters per second (avg_correct_cps). \n",
+    "This error leads to slightly lower wpm than expected, but is negligible for reasonable\n",
     "values of dt and avg_correct_cps\n",
     "\n",
     "Incorporrates an error time cost (each error causes a delay in typing, in order to fix it).\n",
-    "Set error_cost=0 to assume no cost to correcting errors.\n"
+    "Set error_cost=0 to assume no cost to correcting errors."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 0 Cost simulation"
    ]
   },
   {

diff --git a/streamlit/pages/2_Simulated_typing.py b/streamlit/pages/2_Simulated_typing.py
@@ -33,66 +33,92 @@ def main():
     Home.configure_page(page_title="Simulated typing")
 
     # Data set up
-    data_df, _ = Home.load_data()
+    _, _ = Home.load_data()
 
     # Page introduction
     st.title("Simulated typing")
+    iid_url = "https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables"
     st.write(
-        """
+        f"""
         One important question when practicing typing is, 
         "how carefully should one try to avoid mistakes"?
         In the raw data we can see that performance (wpm) is strongly correlated with
         accuracy. But is this correlation causal?
 
-        One hypothesis is that mistakes are i.i.d. random, 
+        One hypothesis is that mistakes are [i.i.d.](<{iid_url}>), 
         and each mistake requires a fixed time to correct
         (time that otherwise would have been spent typing).
         This alone might cause the degree of correlation observed in the data.
-        Is the best approach for practicing typing to balance the probability of making
-        a mistake with the time it takes to correct it?
+        We can study the causal relationship between accuracy and wpm through 
+        simulations.
         """
     )
     st.divider()
 
     st.subheader("Simulated typing: random mistake draws")
     st.write(
-        f"""
-        One method to simulate typing is to randomly draw mistakes.
+        """
+        A simple method of simulating a typing session is to draw a random
+        number of mistakes from a Poisson distribution and then assume each of those
+        mistakes takes some random amount of time to correct.
+        The WPM and accuracy can then be calculated based on those random values.
+
+        Here we see results from 1000 such simulated trials.
+        If we assume an average performance of 60 WPM and 95\% accuracy
+        then we can reproduce the $R^2$ that is observed in the actual data
+        when we assume each mistake takes an average of 0.5 seconds to correct
+        with a standard deviation of 0.45.
         """
     )
     # TODO
     avg_wpm = 60
     avg_acc = 0.95
     n_trials = 1000
-    wpm, acc, n_mistakes = run_simple_sim(
-        avg_wpm=avg_wpm, avg_acc=avg_acc, n_trials=n_trials
-    )
+    wpm, acc, _ = run_simple_sim(avg_wpm=avg_wpm, avg_acc=avg_acc, n_trials=n_trials)
     # Plot scatter_hist of wpm and acc
     fig = plt.figure(figsize=(6, 4))
-    ax, ax_histx, ax_histy = plot.sim_scatter_hist(wpm, acc, fig=fig)
+    ax, _, _ = plot.sim_scatter_hist(wpm, acc, fig=fig)
     ax.axvline(avg_wpm, color="grey", linestyle="--", alpha=0.5)
     ax.axhline(avg_acc, color="grey", linestyle="--", alpha=0.5)
     ax.plot(np.mean(wpm), np.mean(acc), "ro")
     st.pyplot(fig, use_container_width=True, transparent=True)
-    # TODO
+    st.write(
+        """
+        The dashed grey lines show the target WPM and accuracy.
+        The red dot is the mean WPM and accuracy over all trial simulations.
+        The red line is the linear regression.
+        """
+    )
     st.divider()
 
     st.subheader("Simulated typing: Poisson process")
     st.write(
-        f"""
-        An alternative simulation method is to use a Poisson process.
+        """
+        A more complicated but more realistic simulation approach would be to simulate
+        typing behavior across time within each trial using a Poisson process.
+        
+        We see that we reproduce similar results to the simple method.
+        Here we model mistakes as a Poisson process and
+        assume each mistake takes 0.75 seconds to fix.
         """
     )
     avg_wpm = 60
     avg_acc = 0.95
-    wpm, acc, n_mistakes = run_poisson_sim(avg_wpm=avg_wpm, avg_acc=avg_acc)
+    wpm, acc, _ = run_poisson_sim(avg_wpm=avg_wpm, avg_acc=avg_acc)
     # Plot scatter_hist of wpm and acc
     fig = plt.figure(figsize=(6, 4))
-    ax, ax_histx, ax_histy = plot.sim_scatter_hist(wpm, acc, fig=fig)
+    ax, _, _ = plot.sim_scatter_hist(wpm, acc, fig=fig)
     ax.axvline(avg_wpm, color="grey", linestyle="--", alpha=0.5)
     ax.axhline(avg_acc, color="grey", linestyle="--", alpha=0.5)
     ax.plot(np.mean(wpm), np.mean(acc), "ro")
     st.pyplot(fig, use_container_width=True, transparent=True)
+    st.write(
+        """
+        The dashed grey lines show the target WPM and accuracy.
+        The red dot is the mean WPM and accuracy over all trial simulations.
+        The red line is the linear regression.
+        """
+    )
     st.divider()
 
     nb_url_1 = "https://github.com/jbreffle/monkeytype-analysis/blob/main/notebooks/3a_sim_simple.ipynb"
@@ -101,7 +127,7 @@ def main():
         f"""
         Click here 
         [./notebooks/3a_sim_simple.ipynb]({nb_url_1})
-        for the simple simulation method notebook.
+        for the simple simulation notebook.
 
         Click here
         [./notebooks/3a_sim_poisson.ipynb]({nb_url_2})