# The Need For Balanced Global Freshwater Usage 

Fei Wang

12/10/2017

## Introduction

The sustainability of global freshwater consumption is a topic with huge implications on humanity that affects everyone’s life, whereas there is a lack of general understanding of the deficiency of traditional water usage study and the modern framework on this subject – Global Water Footprint. The goal of this project is to raise the awareness of the issue of imbalance of global freshwater usage, promote the idea of Global Water Footprint study initiated by prof. Arjen Hoekstra, and provide example research questions and visualization tools for those who are interested to easily access, replicate and extend the studies by prof. Arjen Hoesktra et al. 

The ultimate solution to the freshwater sustainability issue lies in the hands of each individual and I hope this study can be helpful in the campaign of tackling this complicated issue.

## Background

The issue of the world's fresh-water consumption has been traditionally studied only at a national level based on a basic principle of national demand is less than or equal to the national supply. But recent studies by Prof. Arjen Hoekstra et al. [1-3] showed that through international trade of products, countries have transferred vast amount of freshwater consumption responsibility to other countries which might be less water resource abundant or with less efficient water usage technology. So, the amount of freshwater usage a country should be accountable for is the water usage due to the production of the products that a country’s inhabitants consumed, instead of just the water usage during a country’s production process. 

To study the issue of freshwater consumption within a global context, Prof. Arjen Hoekstra coined the term "Water Footprint" (WF), which extended the traditional definition of water usage by incorporating new dimensions like time, location, water type etc. Furthermore, the WF framework also separate water usage in different sectors (crop, animal, industrial, direct consumption) and types of freshwater consumed:

 - Green – rainfall water consumed
 - Blue – Surface and ground water consumed
 - Grey – The water polluted to assimilate the pollutants generated during the production process
 

The WF of an individual is defined as the Freshwater usage during the production process of all the products this person consumes, and the WF of a country is the sum of WFs of all its inhabitants. However, through the global trade of products, countries can transfer water consumption accross borders, hence the global flow of virtual water. In a simplified model of world trade with only 2 countries A and B, graph shown below [4], we can see the difference between "Water Footprint of national consumption" -- the amount of freshwater consumption a country should be responsible for, and "Water Footprint within the nation", the traditional metric for a country's water usage which excludes the external component of a country's water consumption. 

In [None]:
setwd('C:/Users/v-wafe/working/data-512-final-project/Data')
library("IRdisplay")

In [None]:
display_png(file="../Images/Background_Plot1.png")  

With the establishment of the Water Footprint Network, the general public can get access to numerous studies [1-3] on global water consumption over the last decade and see the vast imbalance of water usage among the world. But the study results are mostly presented via a form of exploratory analyses, indicating key areas of extreme water footprint imbalance or the water resource scarcity. More hidden patterns of global water footprint would be discovered by applying more data science oriented approaches (regression analysis, anomaly detection for example). 


To extend the studies of Prof. Arjen Hoekstra et al., I analyzed the data of global virtual water flow and the WF-Per-Capita by country to answer the following research questions:

- RQ 1: Are the global flow of virtual water and WF-Per-Capita distributed evenly?
- RQ 2: Are the developed countries transferring water pollution to the developing world? 

The first research question is an effort to operationalize the process of identifying countries with freshwater usage that is disproportional to the size of its economy, therefore could only be explained by other factors.

The second research question originated from a key principle in human-centered data science - fairness. My goal is to test whether the virtual water flow due to trade are causing imbalanced distribution of polluted water among the world. In other words, are the developed countries polluting water resources of the developing countries?

Lastly, based on the principle of replicability in open research, I built an interactive visualization dashboard using the R Shiny app, published at [ShinyApps.io](https://feiwang.shinyapps.io/GlobalWaterFootprint/). This dashboard provides users a more flexible way to discover global water footprint patterns. Most of the analyses above are conducted with assistance of this dashboard.

## Methods and Findings

- RQ 1: Are the global flow of virtual water and WF-Per-Capita distributed evenly?
 
This question is of exploratory analysis nature. I used the R Shiny dashboard to toggle between different water types and product types to dynamically identify the "fun facts" of global virtual water flow and WF-Per-Capita. There are indeed quite a lot of interesting patterns from the data and here are 3 that I found most interesting:

First, I looked at the distribution of virtual water import amounts by countries. As shown in the graph on the lower left, USA is the country with largest amount of import, far above the second tier countries like Japan, China, Frances etc., assuming more than 10% of global total. In the lower right chart, the top 4 countries by virtual water import are also those with large volumes of imported goods, whereas the 5th place country Italy, is not famous for its large size of economy. My hypothesis is that this is due to the country's heavy production of leather goods, which are highly water-intensive.

In [None]:
display_png(file="../Images/RQ_1_Top5Countries_VW_Import.png")  

Second, when looking at the virtual export of blue water (surface and ground water), I found Pakistan stood out as a top country whose size of export is comparable to that of the USA. After drilling down to the product types associated with Pakistan's export of blue water, I found it's mostly from the crop products, which is even more surprising because typically crop production rely on the usage of green (rainfall) water. My hypothesis is that due to the country's climate condition and low precipitation, the crop production in Pakistan heavily rely on the ground or surface water. Also the relatively low efficiency of the country's irrigation system may also contributed to the large volume of water usage.

In [None]:
display_png(file="../Images/RQ_1_Distribution_Blue_VW_Export.png")  

Third, I applied a statisitcal control method to detect the outliers of countries with high WF-Per-Capita. Countries whose WF-Per-Capita fall beyond 2 standard deviations of all countries are (from high to low): Mongolia, Niger, Bolivia, Brunei Darussalam, United Arab Erimates, Bermuda and USA. All except the USA are countries of small population, which typically can be biased in per-capita analysis. This suggests the living style of people in the USA are highly water-intensive, probably ranked as the highest among all the countries with large population. This raises an alarm because we know USA ranks at top 3 in both aggregated virtual water import and export and total WF also. We can also hypothesize that the high WF-per-capita of Mongolia is due to the meat-oriented diet of its people and for Brunei and UAE are due to their high standard of living. I am not sure about the reasons for Niger, Bolivia and Bermuda being on the top ones but there could be other contextual information I am not aware of.

In [None]:
display_png(file="../Images/RQ_1_Anomaly_Detection_WF_Per_Capita.png")  

- RQ 2: Are the developed countries transferring water pollution to the developing world? 

Through global trade countries not only transfer the liability of freshwater usage, but also water pollution. The latter is of more serious concern since it requires additional effort from the grey water (polluted water generated during production processes) exporting countries, the exporter of products, to deal with the pollution. In this RQ I applied a T-test of unequal variances to check the percentages of exported grey water in total WF per capita among developed countries and those of the developping countries. The result is statistically significant, evidencing the fact that the transfer of water pollution is mostly directional, from developed world to the developing countries whose economy is geared towards the high-pollution industrial sector for exports. The graphs below are from the "RQ - Analyses.xlsx" file in the repo.

In [None]:
display_png(file="../Images/RQ_2_Perc_External_Grey_In_WF_Per_Capita.png")  

## Discussion

There are a few limitations in this study, mainly related to the datasets.

 - First, since the data on global WF by country are annual average amounts of years 1996 – 2005, I was not able to present any trend of WF changes by country. There are countries that experienced significant economic and societal changes during this time frame and thus could have dramatical change in their water usage. 

 - Second, I was not able to connect the water usage data with water scarcity data. Prof. Arjen Hoesktra did provide data of water scarcity on world's major river basins, but since many river basins are international, it's hard to tie that with the WF data by country.

 - Third, the dataset is not quite up-to-date (1996 - 2005), but the process of acquiring new data on this topic is tedious and involves data retrieval from multiple data sources of very diverse designs, assumptions and ways to quantify and handle uncertainties.

In future iterations of this study, possibly with the updated Water Footprint data of recent years and WF by product, I plan to further the analyses by identifying key trading relationship between countries that are most inefficient from a global water saving perspective. The resulting policy suggestions are more specific and actionable.

## Conclusion

Through this study I cnofirmed the issue of imbalanced and unsustainable distribution of global freshwater usage. As Prof. Arjen Hoekstra pointed out, the freshwater usage of a country not only depend on the volume of aggregated consumption, but also the water intensity of its inhabitants' consumption pattern, and the water scarcity and technology of water usage of its trading partners.

To reduce the global freshwater consumption in aggregate, not only do water policy decision makers need to put their water consumption within a global context, also they need to adopt a more holistic view by considering of the interaction between one country's water policy and policies in related sectors like energy, trade, technology, agriculture and so on. At the individual level, the public should be more informed with the diversity of water intensity among all the products they consume every day.

## References

1) Mekonnen, M.M. and Hoekstra, A.Y. (2011) National water footprint accounts: [The green, blue and grey water footprint of production and consumption](http://www.waterfootprint.org/Reports/Report50-NationalWaterFootprints-Vol1.pdf), Value of Water Research Report Series No. 50, UNESCO-IHE, Delft, the Netherlands

2) Hoekstra, A.Y. and Mekonnen, M.M. (2012) [The water footprint of humanity](http://waterfootprint.org/media/downloads/Hoekstra-Mekonnen-2012-WaterFootprint-of-Humanity.pdf), Proceedings of the National Academy of Sciences, 109(9): 3232–3237.

3) Mekonnen, M.M. and Hoekstra, A.Y. (2011) [The green, blue and grey water footprint of crops and derived crop products](http://waterfootprint.org/media/downloads/Mekonnen-Hoekstra-2011-WaterFootprintCrops.pdf), Hydrology and Earth System Sciences, 15(5): 1577-1600.

4) Arjen Hoekstra et al.(2011), [The Water Footprint Assessment Manual](http://waterfootprint.org/media/downloads/TheWaterFootprintAssessmentManual_2.pdf)

## Code for Data Preparation and EDA

In [None]:
library(reshape2)
library(dplyr)
library(plotly)
library(stringr)
options(scipen=999)

### Load Data on global virtual water flow

I downloaded the data on global virtual water flow "Report50-Appendix-II&III.xls" from http://waterfootprint.org/media/downloads/Report50-Appendix-II&III.xls. Then I copied out and saved its second tab as "GlobalVirtualWaterFlow.csv" after removing the first 5 rows of header content, the rows where country value is "Others" or "Total" and the last 9 columns. Then I took the following steps to transform the data into my desired format for analysis and visualization.

First, I read the csv file into an R dataframe object and changed the column names of the dataframe.

In [None]:
dfWaterFlow = read.csv('GlobalVirtualWaterFlow.csv', header = TRUE)

names(dfWaterFlow) = c('country'
                      , 'import_crop_green', 'import_crop_blue', 'import_crop_grey'
                      , 'import_animal_green', 'import_animal_blue', 'import_animal_grey'
                      , 'import_industrial_blue', 'import_industrial_grey'
                      , 'export_crop_green', 'export_crop_blue', 'export_crop_grey'
                      , 'export_animal_green', 'export_animal_blue', 'export_animal_grey'
                      , 'export_industrial_blue', 'export_industrial_grey'
                    )

Second, I used the melt function to collapse all the columns besides "country" into one variable column, based on which I generated three more columns on "action", "product_type" and "water_type" of the associated Water Footprint amount.

In [None]:
a = melt(dfWaterFlow, id.vars = c("country"))
a[, c('action', 'product_type', 'water_type')] = str_split_fixed(a$variable, "_", 3)
dfWaterFlow = a[,c('country', 'action', 'product_type', 'water_type', 'value')]
names(dfWaterFlow)[5] <- 'amount'

Third, I changed all the factor-type columns into character-type for the convenience of table joining in future steps. I also manually corrected a few country names so that they can be mapped to a 3-letter code in the country_code reference table for plotting purpose later.

In [None]:
dfWaterFlow%>% mutate_if(is.factor, as.character) -> dfWaterFlow

dfWaterFlow[dfWaterFlow$country == 'CÃ´te d\'Ivoire', ]$country <- 'Cote d\'Ivoire'
dfWaterFlow[dfWaterFlow$country == 'Congo, Dem Republic', ]$country <- 'Congo, Democratic Republic'
dfWaterFlow[dfWaterFlow$country == 'East Timor   ', ]$country <- 'East Timor'
dfWaterFlow[dfWaterFlow$country == 'Korea, Dem People\'s Rep', ]$country <- 'Korea, Democratic People\'s Rep'

Next, for each combination of measure of Water Footprint amount (segmented by country, product_type, water_type), I calculated the net import amount, which is equal to the difference between virtual water import and export.

In [None]:
a = aggregate(amount ~ country+product_type+water_type, dfWaterFlow[dfWaterFlow$action == 'import', ], function(x) sum(x))
b = aggregate(amount ~ country+product_type+water_type, dfWaterFlow[dfWaterFlow$action == 'export', ], function(x) sum(x))
names(a) <- c('country', 'product_type', 'water_type', 'import')
names(b) <- c('country', 'product_type', 'water_type', 'export')
c = merge(a, b, on = c('country', 'product_type', 'water_type'), all = TRUE)
c$amount = c$import - c$export
c = c[, c('country', 'product_type', 'water_type', 'amount')]
c$action = 'net'
c = c[, c('country', 'action', 'product_type', 'water_type', 'amount')]
dfWaterFlow = rbind(dfWaterFlow, c)

Lastly, I saved the dataframe object into a RData file, which can be easily accessed for those who wants to replicate this study.

In [None]:
save(dfWaterFlow, file = 'dfWaterFlow.RData')

### Load Data on global virtual water flow

I downloaded the data on water footprint per-capita by country "Report50-Appendix-VIII&IX.xls" from http://waterfootprint.org/media/downloads/Report50-Appendix-VIII&IX.xls. Then I copied out and saved its second tab as "WFPerCapita.csv" after removing the first 5 rows of header content, rows where country value == "World", and the last 10 columns (which are under the title "Total water footprint of national consumption"). Then I took the following steps to transform the data into my desired format for analysis and visualization.

First, I read the csv file into an R dataframe object and changed the column names of the dataframe.

In [None]:
dfWFPC = read.csv('WFPerCapita.csv', header = TRUE)

names(dfWFPC) = c('country', 'population'
                      , 'agricultural_internal_green', 'agricultural_internal_blue', 'agricultural_internal_grey'
                      , 'agricultural_external_green', 'agricultural_external_blue', 'agricultural_external_grey'
                      , 'industrial_internal_blue', 'industrial_internal_grey'
                      , 'industrial_external_blue', 'industrial_external_grey'
                      , 'domestic_internal_blue', 'domestic_internal_grey'
                    )

Second, I used the melt function to collapse all the columns besides "country" and "population" into one variable column, based on which I generated three more columns on "product_type", "water_resource" and "water_type" of the associated Water Footprint amount.

In [None]:
a = melt(dfWFPC, id.vars = c("country", "population"))
a[, c('product_type', 'water_source', 'water_type')] = str_split_fixed(a$variable, "_", 3)
dfWFPC = a[,c('country', 'population', 'product_type', 'water_source', 'water_type', 'value')]
names(dfWFPC)[6] <- 'amount'

Third, I changed all the factor-type columns into character-type for the convenience of table joining in future steps. I also manually corrected a few country names so that they can be mapped to a 3-letter code in the country_code reference table for plotting purpose later.

In [None]:
dfWFPC%>% mutate_if(is.factor, as.character) -> dfWFPC

dfWFPC[dfWFPC$country == 'CÃ´te d\'Ivoire', ]$country <- 'Cote d\'Ivoire'
dfWFPC[dfWFPC$country == 'Congo, Dem Republic', ]$country <- 'Congo, Democratic Republic'
dfWFPC[dfWFPC$country == 'East Timor   ', ]$country <- 'East Timor'
dfWFPC[dfWFPC$country == 'Korea, Dem People\'s Rep', ]$country <- 'Korea, Democratic People\'s Rep'

Lastly, I saved the dataframe object into a RData file, which can be easily accessed for those who wants to replicate this study.

In [None]:
save(dfWFPC, file = 'dfWFPC.RData')