# Topic ideas

---

Group name: Ji Soo Ha & Alexander Hörmann

---

*Replace the italic texts with your descriptions*

## AQUASTAT

### Data source

Initiated in 1993, AQUASTAT, which is Food and Agricultur Organization (FAO)global information system on water and
agriculture, consists mainly of a) systematic descriptions of the state of agricultural water management by
country and region with focus on developing countries and countries in transition; b) up-to-date online data by
country; c) digital geographical data on water resources and irrigation; d) specific studies such as the review of
world water resources by country, the irrigation potential in Africa, projections of future agricultural water use
and irrigation development, and contribution to the World Water Development Report.

Information management process to collect the data:

1) Review of literature and information on the country and the sub-country level

2) Country surveys consisting of data collection and country description by means of a detailed questionnaire were the source reference and comments are associated with each value, through national resource persons

3) Critical analysis of information and data processing by the AQUASTAT team at FAO headquarters. Preference is given to national sources and expert knowledge. The data validation and processing is supported by the AQUASTAT database management system

4) Modelling of data by means of GIS and water balance models for estimating unavailable data and for providing spatial data. GIS data and remote sense data are important input data together with the data acquired trough the country surveys, which are also used for calibration

5) Standardization of information and data tables

6) Feed back and approval from national authorities/institutions

7) Dissemination on the web, as publications and/or as CD-rom

8) Finally, voluntary feedback is acquired from users and through co-operation with other institutions.


### Data characterisitcs

Around 150 variables on water and agriculture by country can be accessed through the internet on the AQUASTAT database query system
(http://www.fao.org/ag/agl/aglw/aquastat/dbase/index.stm).

Data can be queried online or downloaded as a coma separated
value file (CSV). The query allows for multiple selection options, were the user can a) select a country, a number of countries, or a continent, for b) one variable, a group of variables, or selected variables, for c) one time period or several time periods. Due to the problems of acquiring time-series data, one value for each five year period is aimed at for the moment.
Data can be queried for every five year period back to 1965, but the time-series are far from
complete. A complete time-serie of yearly values from 1961 exists only for the variable on irrigated land, which is made available through the FAOSTAT database, on the basis of the AQUASTAT data. To allow for further analysis the user can include the variable and country codes and define their own display of the data table in the AQUASTAT database. 

The database will be reduced to Year 2019 and the focus will be on the following variables:

	- Population affected by water related desease (response variable)
	- Water Use Efficiancy
	- Total renewable water resources
	- Gross Domestic Product
	- Total Population
	- Direct use of untreated municipal wastewater for irrigation purposes (10^9m3 per year)

### Research question

- How are water related deseases and usage of water resources related to each other?

- Comparison of agriculture water deprivation impacts on human health

### Overview of data


In [None]:
import numpy as np
import pandas as pd
import io



In [None]:
from google.colab import files
uploaded = files.upload()

Saving AQUASTAT Statistics Bulk Download.csv to AQUASTAT Statistics Bulk Download.csv


In [None]:
df = pd.read_csv(io.BytesIO(uploaded['AQUASTAT Statistics Bulk Download.csv']),sep=';',decimal=',')

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 778017 entries, 0 to 778016
Data columns (total 9 columns):
 #   Column              Non-Null Count   Dtype  
---  ------              --------------   -----  
 0   Unnamed: 0          778017 non-null  int64  
 1   Country             778017 non-null  object 
 2   Year                778017 non-null  int64  
 3   Variable            778017 non-null  object 
 4   M49                 778017 non-null  int64  
 5   Unit                778017 non-null  object 
 6   Symbol              778017 non-null  object 
 7   Symbol_Description  778017 non-null  object 
 8   Value               777997 non-null  float64
dtypes: float64(1), int64(3), object(5)
memory usage: 53.4+ MB


In [None]:
df

Unnamed: 0,",Country,Year,Variable,M49,Unit,Symbol,Symbol_Description,Value"
0,"0,Luxembourg,1965,Long-term average annual pre..."
1,"1,Turkey,1965,Long-term average annual precipi..."
2,"2,Ecuador,1965,Long-term average annual precip..."
3,"3,Fiji,1965,Long-term average annual precipita..."
4,"4,Congo,1965,Groundwater produced internally,1..."
...,...
778012,"778012,Italy,2013,% of the agricultural holdin..."
778013,"778013,Latvia,2013,% of the agricultural holdi..."
778014,"778014,France,2013,% of the agricultural holdi..."
778015,"778015,Austria,2013,% of the agricultural hold..."




### Sources
Data source description - https://mdgs.un.org/unsd/environment/envpdf/pap_wasess2a3aquastat.pdf

Analysis of water use impact assessment methods - https://link.springer.com/article/10.1007/s11367-014-0814-2

Economic and Sustainability Inequalities and Water Consumption of European Union Countries - https://www.mdpi.com/2073-4441/13/19/2696

Analysis and Modeling of Wastewater reduce - https://www.researchgate.net/profile/Ayotunde-Kolawole/publication/303768223_Analysis_and_Modeling_of_Wastewater_Reuse_Externalities_in_African_Agriculture/links/5755470e08aec74acf579c9e/Analysis-and-Modeling-of-Wastewater-Reuse-Externalities-in-African-Agriculture.pdf


### Optional additional data sources:
To enrich the quantity and quality of data, the data set can optionally be joined with additional data tables from the data sources below.

- Death rate from unsafe water sources, 2019
  - https://ourworldindata.org/grapher/death-rates-unsafe-water



## Labor Productivity

### Data source

The original source of this data is a website that does specific analysis based on data gathered from various books and sources. The website has made it possible to observe each important indicator through various data visualizations based on different countries and on time scales. All data required for the purpose of my research was obtained from four main analyzes of original sources, all data provided in these analyzes are data from various published books and scholarly sources.
- Four main analzyes of original website:
  - Working hours
  - Income Inequality
  - Life Expectancy
  - Happiness and Life Satisfaction
Since we collected the necessary data after selecting the research topic first, we searched for analysis data with indicators suitable for the research purpose.


At each of the links below, the analysts (herein, "we") collected various csv files for research purposes, and further integrated and organized the data for analysis through scripting in **MySQL**.


#### Data source link:
- Most of the dimensions and variables come from this data source:
  - Annual working hours per worker:
      - https://ourworldindata.org/grapher/annual-working-hours-per-worker
  - Annual working hours vs. GDP per capita:
      - https://ourworldindata.org/grapher/annual-working-hours-vs-gdp-per-capita-pwt?time=2019..latest
  - Productivity: output per hour worked:
      - https://ourworldindata.org/grapher/labor-productivity-per-hour-pennworldtable?country=AUS~BEL~BRA~KHM~CHL~CHN~DEU~IND~ZAF~KOR~TWN~GBR~USA~CHE
  - Life expectancy vs. healthcare expenditure:
      - https://ourworldindata.org/grapher/life-expectancy-vs-healthcare-expenditure
  - Income inequality: Gini coefficient:
      - https://ourworldindata.org/grapher/economic-inequality-gini-index
  - Self-reported life satisfaction:
      - https://ourworldindata.org/grapher/happiness-cantril-ladder

- Extra data source
  - Country and Continent Information:
      - https://www.kaggle.com/datasets/statchaitya/country-to-continent

- Related Topic and researchs:
  - https://edition.cnn.com/2018/07/02/health/south-korea-work-hours/index.html
  - https://d1wqtxts1xzle7.cloudfront.net/48609340/An_Empirical_Analysis_of_the_Determinant20160906-22934-1plhak2-with-cover-page-v2.pdf?Expires=1669919382&Signature=a26QSVkVTIec9nHtLG68tt2p01lsI0fy5PxBgndvGhnN-TPkykCYPXe2DN3t4hSI3OmB7iBQ8S~hOwKij7-2uTgcdqp0qaxH0YP3tWqjS8wyryHyDhj6KQ0OzD6KvBX3j~r9RSj-MCKr4sEebJPPk3zutYFsW6R4KYSs2Mpa0~YAzZdZg678t3z35VbtkTmuzBESsmIyobLUFJzo8UjuBJ~rjrDxK0vjqMBbEtRSeq2LBFI~TmyywrCT5AFEkM-MxuDAGhjNVFabY02VSPj3En4gfzIXfiW3lxv1FeotOF1LYr20oI67q0VbPExUafHbjTAnljmLdHAxe6UJyhjCog__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA
  - https://www.econstor.eu/bitstream/10419/161345/1/dp10722.pdf


### Data characterisitcs

This data set contains a total of 704 observations, 11 columns, and 66 rows. 
First four columns include four spatial and temporal dimensions:
- Continent
- Country
- Country Code
- Year: The original data covered different time span dimensions from 1950 to 2020, but only 2017 data was considered in this analysis to satisfy the independence condition.

At least six columns have useful and unique numeric predictors, and each column contains different numeric informations that could be independent variable. There is one variable that can be defined as a dependent variable to perform an analysis based on the research problem described below.
- Resonable dependent variable: Labor productivity
- Independent variable
  - Average annual working hours per worker
  - GDP per capita
  - Population
  - Gini coefficient
  - Life safisfaction
  - Current health expenditure per capita


### Research question

- Does the longer average annual working hours increase labor productivity?
- What is the most related factor in improving labor productivity?
  - Annual working hours per worker
  - GDP of the country
  - Gini coefficient of country
  - Life satisfaction level of worker
  - Amount of health expenditure of worker


### Overview of data



In [None]:
import numpy as np
import pandas as pd
import io

In [None]:
from google.colab import files
uploaded = files.upload()

Saving Labor_Productivity_Analysis_final.csv to Labor_Productivity_Analysis_final.csv


In [None]:
df2 = pd.read_csv(io.BytesIO(uploaded['Labor_Productivity_Analysis_final.csv']),sep=';',decimal=',')

In [None]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 66 entries, 0 to 65
Data columns (total 11 columns):
 #   Column                                   Non-Null Count  Dtype  
---  ------                                   --------------  -----  
 0   Continent                                66 non-null     object 
 1   Country                                  66 non-null     object 
 2   Code                                     66 non-null     object 
 3   Year                                     66 non-null     int64  
 4   Average annual working hours per worker  66 non-null     float64
 5   GDP per capita                           66 non-null     float64
 6   Population                               66 non-null     int64  
 7   gini_coefficient                         48 non-null     float64
 8   Life satisfaction                        65 non-null     float64
 9   Productivity                             65 non-null     float64
 10  Current health expenditure per capita    64 non-null

In [None]:
df2

Unnamed: 0,Continent,Country,Code,Year,Average annual working hours per worker,GDP per capita,Population,gini_coefficient,Life satisfaction,Productivity,Current health expenditure per capita
0,Americas,Argentina,ARG,2017,1691.5363,23272.18,44054616,,6.085561,30.955460,2470.11
1,Oceania,Australia,AUS,2017,1731.4943,52536.19,24590336,,7.233995,60.341679,4715.83
2,Europe,Austria,AUT,2017,1613.0519,51954.28,8797497,0.297376,7.195361,64.780170,5641.18
3,Asia,Bangladesh,BGD,2017,2232.3542,4112.70,161793968,,5.114217,4.307637,101.18
4,Europe,Belgium,BEL,2017,1544.2690,45150.20,11384491,0.273880,6.772138,68.516521,5450.48
...,...,...,...,...,...,...,...,...,...,...,...
61,Asia,Turkey,TUR,2017,1832.0000,26611.18,82089824,0.414057,4.872074,41.799203,1166.73
62,Europe,United Kingdom,GBR,2017,1670.2728,44093.93,66064808,0.351488,7.157151,54.676921,4515.59
63,Americas,United States,USA,2017,1757.2255,60116.57,329791232,0.411806,6.943701,71.638571,10103.09
64,Americas,Uruguay,URY,2017,1552.3470,20615.56,3422205,0.394645,6.600337,28.295846,2029.89
