matheusfacure · RoyalTS · Oct 14, 2023 · Jan 12, 2024
diff --git a/...nce-for-the-brave-and-true/23-Challenges-with-Effect-Heterogeneity-and-Nonlinearity.ipynb b/...nce-for-the-brave-and-true/23-Challenges-with-Effect-Heterogeneity-and-Nonlinearity.ipynb
@@ -182,7 +182,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "As for the average treatment effect, since the treatment was randomized, we can estimate it as the simple difference in means between treated and control groups: $E[Y|T=1] - E[Y|T=0]$. So, let's see what those treatment averages look like. We will look at them for both the latent outcome and the conversion perspective. There is something importance to see here."
+    "As for the average treatment effect, since the treatment was randomized, we can estimate it as the simple difference in means between treated and control groups: $E[Y|T=1] - E[Y|T=0]$. So, let's see what those treatment averages look like. We will look at them for both the latent outcome and the conversion perspective. There is something important to see here."
    ]
   },
   {
@@ -339,7 +339,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Again, the latent outcome is very nice. Due to its linearity, our expectations match reality pretty closely. But in real life, we don't care (nor have) the latent outcome. All we have is conversion. In conversion, things look a lot more complicated. If we plot the cumulative effect curves, `age` still shows some treatment effect heterogeneity, starting above the ATE and slowly converging towards it. This means that the higher the age, the higher the treatment effect. So far so good. This is what we would expect. "
+    "Again, the latent outcome is very nice. Due to its linearity, our expectations match reality pretty closely. But in real life, we don't care about the latent outcome, nor can we observe it. All we have is conversion. And with conversion, the picture looks a lot more complicated. If we plot the cumulative effect curves, `age` still shows some treatment effect heterogeneity, starting above the ATE and slowly converging to it. This means that the higher the age, the higher the treatment effect. So far so good. This is what we would expect. "
    ]
   },
   {
@@ -372,16 +372,16 @@
     "plt.plot(inc_cumm_effect_latent, label=\"est. income\")\n",
     "plt.legend()\n",
     "plt.xlabel(\"Percentile\")\n",
-    "plt.ylabel(\"Effect on Conversino\");"
+    "plt.ylabel(\"Effect on Conversion\");"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "However, we also have A LOT of treatment effect heterogeneity by `estimated_income`. Customers with higher `estimated_income` have much lower treatment effect, which causes the cumulative effect curve to go all the way to zero at the beginning and then slowly converge to the ATE. This tells us that, as far as personalization is concerned, `estimated_income` will generate segments that have more treatment effect heterogeneity (TEH) compared to the segments we would get with `age`. \n",
+    "However, we also have A LOT of treatment effect heterogeneity by `estimated_income`. Customers with higher `estimated_income` have a much lower treatment effect, which causes the cumulative effect curve to start all the way at zero, to then overshoot the ATE, and only then to converge to it. This tells us that, as far as personalization is concerned, `estimated_income` will generate segments that have more treatment effect heterogeneity (TEH) compared to the segments we would get with `age`. \n",
     " \n",
-    "This is inconvenient right? How come the feature we know to drive effect heterogeneity, `age`, is worse for personalization when compared with a feature (`estimated_income`) we know not to modify the treatment effect? The answer lies in the **non-linearity of the outcome function**. Although `estimated_income` does not modify the effect of the nudge on the latent outcome, it does once we transform that latent outcome to conversion (at least indirectly). Conversion is not linear. This means that **its derivative changes depending on where you are**. Since conversion can only go up to 1, if it is already very high, it will be hard to increase it. In other words, the derivative of high conversion is very low. But because conversion is also bounded at zero, it will also have a low derivative if it is already very low. Conversion follows an S shape, with low derivatives at both ends. We can see that by plotting the average conversion by estimated income bins (bins of 100 by 100)."
+    "This is inconvenient right? How come the feature we know to drive effect heterogeneity, `age`, is worse for personalization when compared with a feature (`estimated_income`) we know not to modify the treatment effect? The answer lies in the **non-linearity of the outcome function**. Although `estimated_income` does not modify the effect of the nudge on the latent outcome, it does once we transform that latent outcome to conversion (at least indirectly). Conversion is not linear. This means that **its derivative changes depending on where you are**. Since conversion can only go up to 1, if it is already very high, it will be hard to increase it further. In other words, the derivative at high conversion is very low. The same thing is true at the lower end: Because conversion is also bounded below at zero, it will also have a low derivative if it is already very low. Conversion follows an S-shape, with low derivatives at both ends, which we can see by plotting the average conversion by `estimated_income` bins (bins of width 100)."
    ]
   },
   {
@@ -430,15 +430,15 @@
    "source": [
     "Notice how the slope (derivative) of this curve is very small when conversion is very high. It is also small when conversion is very low (although that is harder to see due to the small sample in that region). With this information, we can now explain why `estimated_income` generates segments with high treatment effect heterogeneity. \n",
     " \n",
-    "Since `estimated_income` is highly predictive of conversion, we can say that customers with different `estimated_income` fall in different places of the S shaped conversion curve. Customers with very high or very low `estimated_income` fall at the extremes of the curve, where the derivative is lower, meaning that increasing conversion is harder, which in turns means that the treatment effect is likely to be small. On the other hand, customers with reported income in the middle of the range also fall in the middle of the conversion curve, where the derivative is higher and, hece, the treatment effect will likely also be higher. I say likely because, in theory, it is possible for a variable to have such a strong effect modification force that it dominates the change in derivative we see as we traverse the conversion curve. However, at least from my experience, the curvature of the S shaped conversion tends to dominate every other effect modification we have. \n",
+    "Since `estimated_income` is highly predictive of conversion, we can say that customers with different `estimated_income` are located in different places along the S-shaped conversion curve. Customers with very high or very low `estimated_income` fall at the extremes of the curve, where the derivative is lower, meaning that increasing conversion is harder, which in turns means that the treatment effect is likely to be small. On the other hand, customers with reported income in the middle of the range also fall in the middle of the conversion curve, where the derivative is higher and, hece, the treatment effect will likely also be higher. I say likely because, in theory, it is possible for a variable to have such a strong effect modification force that it dominates the change in derivative we see as we traverse the conversion curve. However, at least from my experience, the curvature of the S shaped conversion tends to dominate every other effect modification we have. \n",
     " \n",
-    "This is not just me, though. Here is a slide I got from Susan Atheys' presentation for the Columbia Data Science Institute. Here, she is discussing the effect of a nudge to get students to apply for federal financial aid in order to pay for college. It's also a conversion problem. What she finds is that the best strategy is to target those students that are already likely to convert. She also says it is a bad idea to target those with low probability of conversion\n",
+    "It's not just me, though. Here is a slide from Susan Atheys' presentation at the Columbia Data Science Institute. Here, she is discussing the effect of a nudge to get students to apply for federal financial aid in order to pay for college. It's also a conversion problem. What she finds is that the best strategy is to target those students that are already likely to convert. She also says it is a bad idea to target those with low probability of conversion\n",
     " \n",
     "![image.png](data/img/hte-binary-outcome/slide-susan-athey.png)\n",
     " \n",
-    "Wait a minute! But that is not what you first said! You said that both very high and very low conversions have low derivative and hence, low treatment effect!\n",
+    "Wait a minute! That is not what you said above! You said that treatment effects were low in the middle, away from the low and the high end!\n",
     " \n",
-    "Well, that is correct. However, in real life, conversion rarely spams the entire S shaped curve. What we usually have is everyone smooshed at one or the other end of the curve. In business terms, your average conversion is rarely 50%. More often than not, it is something like 70 to 90% or something like 1 to 20%. In these more likely situations, targeting those with a high baseline can be a good or a bad idea. \n",
+    "True. However, in real life, conversion rarely spams the entire S-shaped curve. What we usually have is everyone smooshed at one or the other end of the curve. In business terms, your average conversion is rarely 50%. More often than not, it is something like 70 to 90% or something like 1 to 20%. In these more likely situations, targeting those with a high baseline can be a good or a bad idea. \n",
     " \n",
     "Here is what I mean: Let's take the same latent outcome from before, but now generate a situation where conversion is low on average, by setting it to `latent_outcome > 2`. Next, let's craft a situation where conversion is high by setting `latent_outcome > -2`."
    ]
@@ -501,7 +501,7 @@
     "plt.plot(age_cumm_effect_latent, label=\"age\")\n",
     "plt.plot(inc_cumm_effect_latent, label=\"est. income\")\n",
     "plt.xlabel(\"Percentile\")\n",
-    "plt.ylabel(\"Effect on Conversino\");\n",
+    "plt.ylabel(\"Effect on Conversion\");\n",
     "plt.legend();"
    ]
   },
@@ -543,36 +543,36 @@
     "plt.plot(age_cumm_effect_latent, label=\"age\")\n",
     "plt.plot(inc_cumm_effect_latent, label=\"est. income\")\n",
     "plt.xlabel(\"Percentile\")\n",
-    "plt.ylabel(\"Effect on Conversino\")\n",
+    "plt.ylabel(\"Effect on Conversion\")\n",
     "plt.legend();"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "To summarize, what we saw is that, when the outcome is binary, the treatment effect tends to be dominated by the curvature (derivative) of the logistic function. \n",
+    "To summarize, what we saw is that when the outcome is binary, the treatment effect tends to be dominated by the curvature (derivative) of the S-shaped function.\n",
     " \n",
     "![image.png](data/img/hte-binary-outcome/logistic.png)\n",
     " \n",
     "For instance, in our conversion problem, if the **average conversion is low**, we are at to the left of the logistic curve and the **treatment effect will be higher at high baseline conversion**. This would translate to a nudge policy that advocates for treating (nudging) those customers with an already high probability of conversion. On the other hand, if the **average conversion is high**, we will be to the right side of the logistic curve, where the derivative (and hence the treatment effect) will be **higher for those customers with lower baseline conversion**. \n",
     " \n",
-    "This is certainly a lot to remember, but we can definitely simplify: **just treat whomever is closer to a baseline conversion of 50%**. The mathematical argument here is pretty solid: the derivative of the logistic is at its peak at 50%, so just treat units closer to that point. \n",
+    "This is certainly a lot to remember, but we can simplify: **just treat whomever is closer to a baseline conversion of 50%**. The mathematical argument here is pretty solid: the derivative of the logistic function is at its peak at 50%, so just treat units closer to that point. \n",
     " \n",
-    "What is even nicer is that this is one of the rare cases where common knowledge matches the math. In marketing, where these conversion problems are very common, there is a belief that we should not target lost bets (those with very low conversion probability) nor sure wins (those with very high conversion probability). Instead, we should target those in the middle. This is pretty fascinating, since it is the exact same thing we figured using a more formal causal argument. "
+    "What is even nicer is that this is one of the rare cases where received wisdom matches the math. In marketing, where conversion problems are very common, there is a belief that we should target neither lost bets (those with very low conversion probability) nor sure wins (those with very high conversion probability). Instead, we should target those in the middle. This is fascinating, since it is the exact same thing we figured using a more formal causal argument. "
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Continues Treatment and Non Linearity\n",
+    "# Continuous Treatment and Non-linearity\n",
     " \n",
-    "We've explored in depth just one example of binary outcome making Heterogeneous Treatment Effect analysis harder. But this phenomena goes beyond the conversion problem from marketing. For instance, in 2021, the world managed to deliver its first batch of approved COVID19 vaccines to the general public. Back then, a crucial question was who should receive the vaccine first. This is, not surprisingly, a Heterogeneous Treatment Effect problem. Policy makers would like to vaccinate those who would benefit the most first. In this situation, the treatment effect is avoiding death or hospitalization. So, whose death or hospitalization decreased the most when given a shot? In most countries, they were the elderly and those with prior health conditions (comorbidities). Now, these are the people that are **more likely to die when getting COVID19**. Also covid mortality rate is thankfully much lower than 50%, which puts us to the left of logistic function. In this region, by the same argument we made for marketing, it would make sense to treat those with a high baseline probability of death when getting COVID19, which are precisely the groups we’ve mentioned earlier. Is this a coincidence? Maybe. Keep in mind that I'm not a health expert, so I might be very wrong here. But the logic makes a lot of sense to me. \n",
+    "We've explored in depth just one example of binary outcome making Heterogeneous Treatment Effect analysis harder. But this phenomenon goes beyond the conversion problem from marketing. For instance, in 2021, the world managed to deliver the first batch of approved COVID-19 vaccines to the general public. Back then, a crucial question was who should receive the vaccine first. This is, not surprisingly, a Heterogeneous Treatment Effect problem. Policy makers would like to vaccinate those who would benefit the most first. In this situation, the treatment effect is averting death or hospitalization. So, whose death or likelihood to be hospitalized would decrease the most when given a shot? In most countries, it was the elderly and those with pre-existing health conditions (comorbidities). These are people that are **more likely to die when infected with COVID-19**. Also, COVID mortality rate is (thankfully!) much lower than 50%, which puts us to the left of logistic function. In this region, by the same argument we made for marketing, it would make sense to treat those with a high baseline probability of death when getting COVID-19, which are precisely the groups we’ve mentioned earlier. Is this a coincidence? Maybe. Keep in mind that I'm not a health expert, so I might be very wrong here. But the logic makes a lot of sense to me. \n",
     " \n",
-    "In both cases, marketing nudges and COVID19 vaccines, **the key complicating factor for Treatment Effect Heterogeneity is the non-linearity of the outcome function** $Y(0)$. This nonlinearity makes it so that, as we go from $Y(0)$ to $Y(1)$, the increase in the outcome is primarily due to the curvature in the outcome function. We saw how this happened in binary outcome, where $E[Y|X]$ follows a logistic shape. But this is even more general. In fact, it is a problem that keeps popping up in business, especially if the treatment is a continuous variable. Let's go through one last example to make this idea more clear.\n",
+    "In both cases, marketing nudges and COVID-19 vaccines, **the key complicating factor for Treatment Effect Heterogeneity is the non-linearity of the outcome function** $Y(0)$. This nonlinearity makes it so that, as we go from $Y(0)$ to $Y(1)$, the increase in the outcome is primarily due to the curvature in the outcome function. We saw how this happened with a binary outcome, where $E[Y|X]$ follows a logistic shape. But this is more general. In fact, it is a problem that keeps popping up in business, especially if the treatment is a continuous variable. Let's go through one last example to drive the point home.\n",
     " \n",
-    "Let's consider the classic pricing problem. You are working for a streaming company, like Netflix or HBO. A key question the company wants answered is what price to charge customers. In order to answer that, they run an experiment where they randomly assign customers to different priced deals: 5 BLR/month, 10 BRL/month, 15 BRL/month or 20 BRL/month. By doing so, they hope to answer not just how sensitive customers are to price increases, but also if some types of customers are more sensitive than others. In the plot below, you can see the results from that experiment broken down by two customer segments: `A`, customers with higher estimated income, and `B`, customers with lower estimated income. "
+    "Let's consider the classic pricing problem. You are working for a streaming company like Netflix or HBO. A key question the company wants answered is what price to charge customers. In order to answer the question, they run an experiment in which they randomly assign customers to differently priced deals: 5 BLR/month, 10 BRL/month, 15 BRL/month or 20 BRL/month. By doing so, they hope to answer not just how sensitive customers are to price increases, but also if some types of customers are more sensitive than others. In the plot below, you can see the results from that experiment broken down by two customer segments: `A`, customers with higher estimated income, and `B`, customers with lower estimated income. "
    ]
   },
   {
@@ -741,7 +741,7 @@
    "source": [
     "## Key Concepts\n",
     " \n",
-    "I realize I might have brought more questions than answers, but sometimes the best we can do about a problem is to be very aware of it. In this chapter, I hope I've managed to open your eyes to the complications that arise when the outcome we care about is non-linear. \n",
+    "I realize I may have raised more questions than provide answers. Alas, sometimes the best we can do about a problem is to be very aware of it. In this chapter, I hope I've managed to open your eyes to the complications that arise when the outcome we care about is non-linear. \n",
     " \n",
     "This is a common and more studied problem with binary outcomes. In this case, the treatment effect tends to be higher the closer we are to an average outcome of 0.5. Since the outcome is bounded at 0 and 1, effects tend to be very small if we are too close to 0 or 1. \n",
     " \n",