Following [Total Portfolio Project](https://www.total-portfolio.org/)'s Impact Returns model, in this notebook we analyse the impact-adjusted returns for an investment in [DataPond](https://data-pond.co/) (DP), a seed/pre-seed stage startup working on water quality monitoring.

Previous work: An [impact assesment report](https://docs.google.com/document/d/1AVwksd7_-d-a-bB1a4udAPLJI9XiOvPFieFOI_ZwaRQ/edit), with an accompanying back-of-the-envelope estimates [here](https://drive.google.com/file/d/1f0FmH8aCoTb5Odk9pg-bVOpnYlAjiCam/view?usp=sharing), but with serious methodological issues. 

*Disclaimer - while I try to be objective, the company belongs to my father.*

# Context
## DataPond
* DataPond is an impact startup in the seed funding stage, they have developed a technology to monitor water quality cheaply and scalably. 
* Their primary innovation is the use of cheap ($10s) sensors to algorithmically infer the existence of biological contamination.
* They are planning on working in India, where they have solid connections and expertise, and in concordance with the national Jal Jeevan Mission to deliver piped water to all households. 
  * However, investor preferences and the operational challenges involved in working with the Indian public sector might shift their efforts to the US market.
## The problem
* Water-borne diseases are a huge problem worldwide. 
  * About 2 billion people live without access to safe drinking water.
  * Diarrheal diseases alone are responsible for more than a million deaths each year, a third of which are children under 5.
* Current data collection of water quality is severely lacking
  * Testing for biological contamination directly is complicated and expensive, so it’s used sparingly (even in developing countries).
  * Water treatment and infrastructure operations operate while practically ignorant of water quality status and are thus considerably less effective.
* Dealing with an identified water contamination is relatively simple.
  * Methods such as chlorination and boiling may be used at the household level if alerted to the contamination. 
  * Installing water treatment facilities, navigating people to non-contaminated sources, and treating the upstream contamination source directly might be even more promising.
* Recently, GiveWell [evaluated water-treatment interventions](https://www.givewell.org/international/technical/programs/water-quality-interventions) (such as chlorination tablets), and it seems highly promising. They are focused on interventions that add chlorine to water (say, by installing water dispensers or in-line chlorination), which generally seems not to work well in India due to poor adherence.


# Magnitude

This we evaluate in 3 stages: 
1. Potential Gross Impact (The total future impact of the main successful scenarios)
2. Expected Gross Impact  (Taking the last step, and accounting for probabilities)
3. Enterprise Effectiveness (Taking the last step, and dividing by the total cost)

In [1]:
!pip install squigglepy




[notice] A new release of pip is available: 23.0.1 -> 23.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip





## Potential Gross Impact
Our main successful scenario is a global widespread use, and a 100% precision and recall. We also then assume that both locals and water suppliers have perfect understanding of where there are biological contaminations and how severe these are. The whole scenario is unrealistic, so this would be discounted.



In [None]:
import squigglepy as sq
import numpy as np

# estimate the total problem size from water-borne diseases globaly
# using Our World in Data diarrhoeal disease data (https://ourworldindata.org/diarrheal-diseases)
# we assume that the number of deaths from diarrhoeal diseases is a good proxy for the total number of deaths from water-borne diseases
# we only consider deaths of people under 70 years old
# we assume that yearly deaths continues to decline linearly
total_yearly_deaths_2019_under_70 = 900_000
total_yearly_deaths_1990_under_70 = 2_300_000
def total_yearly_deaths_under_70(year):
    return max(0, total_yearly_deaths_1990_under_70 + (total_yearly_deaths_2019_under_70 - total_yearly_deaths_1990_under_70) * (year - 1990) / (2019 - 1990))

