### Introduction

The industrial revolution was exactly that, a revolution. A time in which our way of living was changed drastically through the automation of the production of many goods in bulk, the advancement of science and medicine, and the following growth in the quality of life itself for so many all over the world. The rise of industry brought a great deal of incredible things with skyrocketing the world into a global society connecting nearly everyone on Earth.

Though as we know today, the effects of industry are not all fantastic. In recent years more and more concerns have been raised on the topic of global warming which has been on the rise ever since the conception of modern day industry. Apart from the explosive increase of greenhouse gasses such as CO2 and methane in the atmosphere caused by human activities such as industry there are also plenty of other pollutants that the environment has to deal with. By infecting the ground and water, pollutants threaten to destroy entire ecosystems and cause many species of plants and animals to go extinct.

People are starting to wake up more and more to the harsh reality of the irreversible life threatening effects of human industry. Yet, there are still plenty of people on Earth who turn a blind eye to these facts and claim that industry is something we must cherish as is. These two types of people clash often in the real world, but what are the facts? What does the data say about all this? What truly are the effects of industry on our world? By looking at data referring to  it might be possible to obtain an answer to these questions.


### Dataset and Preprocessing

In order to conduct our research on this matter it was necessary to acquire a few datasets to look at and find connections between. Five separate datasets were selected, a dataset on water quality, a dataset on industry, a dataset on infrastructure, a dataset on innovation, and lastly a dataset on nature. Every dataset contains a column with three-letter country tags which were used to join the datasets together during the research process. Therefor this column was renamed to 'ISO3' in every dataset. The dataset were also each pivoted in order to match the same column pattern and order. With this country tag column as the join key all of the data has been ordered by year. The following is a short rundown of each dataset with links to each webpage.

#### Dataset 1: water_set_done.csv

Original link: https://washdata.org/data/country/WLD/household/download

Short description:  This dataset contains 46 different attributes and 4914 different entries. These attributes include things such as total population, rural surface water percentage and to what extend water quality and access has improved in different regions for almost every major country on earth from the year 2000 up until the year 2020. The data is split in three main categories:
1. National (Proportion)
2. Urban (Proportion)
3. Rural (Proportion)

Preprocessing: With preprocessing the attributes' names changed, in order to make it more readable in pandas. The categories National/Urban/Rural got merged into the attribute names, so from category Rural with subcategory Surface water, the new attribute name would be changed to 'RURAL-Surface water'. In order to make up for missing data the data has been filled in, grouped by country, with the python functions ffill() and bfill(). The countries have been seperated for this step so their data wouldn't intervene. Strings like '>99' have also been changed to floats 99.0 and these are now the maximum values for these attributes.

In [48]:
import pandas as pd

pd.read_csv('water_set_done.csv').head(n=3)

Unnamed: 0,Country,ISO3,Year,Population (thousands),% urban,NATIONAL-Basic,NATIONAL-Limited,NATIONAL-Unimproved,NATIONAL-Surface water,NATIONAL-Annual rate change basic,...,URBAN-Proportion-Available when needed,URBAN-Proportion-Free from contamination,URBAN-Proportion-Annual rate of change in safely managed,URBAN-Proportion-Piped,URBAN-Proportion-Non-piped,Sl,SDG region,WHO region,UNICEF Programming region,UNICEF Reporting region
0,Afghanistan,AFG,2000,20779.95703125,22.0779991149902,28.1714150105238,3.6606382504644,43.1783058382678,24.9896409007439,2.34599995613098,...,-,20.5701546958978,0.790562748908997,17.4084639784946,39.1318259550121,1,Central and Southern Asia,Eastern Mediterranean,South Asia,South Asia
1,Afghanistan,AFG,2001,21606.9921875,22.1690006256104,28.199366109364,3.66154200011045,43.1675415501036,24.9715503404219,2.34599995613098,...,-,20.5701546958978,0.790562748908997,17.4084639784946,39.1318259550121,2,Central and Southern Asia,Eastern Mediterranean,South Asia,South Asia
2,Afghanistan,AFG,2002,22600.7734375,22.2609996795654,30.2363845908931,3.94947200297299,41.689619677787,24.1245237283469,2.34599995613098,...,-,21.416514182163,0.790562748908997,18.6953574596773,40.1712839985921,3,Central and Southern Asia,Eastern Mediterranean,South Asia,South Asia


#### Dataset 2: iso_infra.csv

Original link: https://stats.oecd.org/Index.aspx?QueryId=73638

Short description:  This dataset contains 23 attributes and 1340 entries which provide information about a multitude of different major countries from the year 2000 up until the year 2020 on the topic of nation wide infrastructure. Such points of information include such things as the number of airports per 1 million of inhabitants, the total density of road and inland waterway infrastructure spending.

In [49]:
pd.read_csv('iso_infra.csv').head(n=3)

Unnamed: 0.1,Unnamed: 0,ISO3,Country,Year,Airports per one hundred thousand sq. km,Airports per one million inhabitants,Density of rail lines (km per one hundred sq. km),Density of road (km per one hundred sq. km),Inland waterways infrastructure investment in constant USD per inhabitant,Inland waterways infrastructure investment per GDP,...,Share of electrified rail lines in total rail network,Share of high-speed rail lines in total rail network,Share of inland waterways infrastructure investment in total inland transport infrastructure investment,Share of motorways in total road network,Share of rail infrastructure investment in total inland transport infrastructure investment,Share of road infrastructure investment in total inland transport infrastructure investment,Share of road infrastructure maintenance in total road infrastructure spending,Share of urban roads in total road network,Total inland transport infrastructure investment in constant USD per inhabitant,Total inland transport infrastructure investment per GDP
0,0,ALB,Albania,2000,,,1.605839,,0.007421,0.000599,...,,,0.020489,,1.85767,98.121841,3.421619,,36.218024,2.921394
1,1,ALB,Albania,2001,,,1.631387,,0.009391,0.00071,...,,,0.028565,,1.285439,98.685996,5.037108,,32.875656,2.485229
2,2,ALB,Albania,2002,,,1.631387,,0.015076,0.000819,...,,,0.053908,,7.989218,91.956873,9.573791,,27.965404,1.519261


#### Dataset 3: iso_green

Original link: https://stats.oecd.org/Index.aspx?DataSetCode=GREEN_GROWTH

Short description:  This dataset contains 165 different attributes and 5366 total entries detailing different points of information from a grand multitude of different countries on the topic of green growth. Green growth is a concept which is measured in a number of ways such as the CO2 productivity, energy productivity, and freshwater and forest resources. These points of data provide insight into the progress countries have been making towards a greener and more environmentally friendly way of living.

In [50]:
pd.read_csv('iso_green.csv').head(n=3)

Unnamed: 0.1,Unnamed: 0,ISO3,Country,Year,Adjustment for pollution abatement,"Artificial surfaces, % total","Bare land, % total","Biomass, % of DMC",Built up area per capita,"Built up area, % total land",...,"Value added in industry, % of total value added","Value added in services, % of total value added","Water stress, total freshwater abstraction as % total available renewable resources","Water stress, total freshwater abstraction as % total internal renewable resources","Water, % total","Welfare costs of premature deaths from exposure to ambient ozone, GDP equivalent","Welfare costs of premature deaths from exposure to lead, GDP equivalent","Welfare costs of premature mortalities from exposure to ambient PM2.5, GDP equivalent","Welfare costs of premature mortalities from exposure to residential radon, GDP equivalent","Women, % total population"
0,0,ABW,Aruba,2000,,,,4.34,561.71,28.44,...,14.28,85.7,,,,,,,,51.93
1,1,ABW,Aruba,2001,,,,4.23,,,...,14.13,85.85,,,,,,,,52.02
2,2,ABW,Aruba,2002,,,,4.1,,,...,14.29,85.69,,,,,,,,52.11


#### Dataset 4: iso_ino.csv

Original link: https://stats.oecd.org/Index.aspx?DataSetCode=REGION_INNOVATION

Short description: This dataset has 82 different attributes and 1005 total entries which provide information on the topic of regional information for most countries on Earth. These attributes include things such as different levels of student enrolment, the spread of the workforce throughout the different sectors, and progress and innovation on numerous things such as nanotech and medicine. This dataset could provide insight into how well developed countries are and how they have progressed throughout the years.

In [51]:
pd.read_csv('iso_inno.csv').head(n=3)

Unnamed: 0.1,Unnamed: 0,ISO3,Region,Year,Labour Force with Elementary Education (ISCED 0-2),Labour Force with Secondary education (ISCED 3-4),Labour Force with Tertiary education (ISCED 5-8),Labour Force with Unknown Educational Level,R&D Employment in Full-Time Equivalent by the Business Sector,R&D Employment in Full-Time Equivalent by the Government Sector,...,Share of Labour Force with Tertiary Education (in % of labour force),Share of R&D Female in R&D Total Personnel,Share of R&D Total Expenditure (in % of GDP),Share of employment in high-technology manufacturing (in % of total employment),Share of employment in knowledge-intensive services (as a share of total employment),Share of population 25 to 64 year-olds with tertiary education,"Share of population 25 to 64 year-olds, below upper secondary education","Share of population 25 to 64 year-olds, with upper secondary and post-secondary non-tertiary education",Student Enrolment Total,"Total employment, all activities"
0,0,AUS,Australia,2000,,,,,,,...,,,1.47,,,,,,,
1,1,AUS,Australia,2001,5599300.0,1734900.0,2750300.0,,,,...,27.3,,,,,,,,,
2,2,AUS,Australia,2002,4845200.0,2352000.0,3167100.0,,,,...,30.6,,1.63,,,,,,,


#### Dataset 4: iso_mei.csv

Original link: https://stats.oecd.org/Index.aspx?DataSetCode=MEI_REAL

Short description:  This dataset contains 14 attributes and 987 different entries of a multitude of different countries from the year 2000 up until the year 2020 on the topic of the production and sales of different products a country produces. These attributes include things such as production of different kinds of goods such as intermediate goods and service goods, and the production of energy and electricity. With this dataset it is possible to gain insight into the kind of services a country provides which allows one to determine its place on the world's grand stage.

In [43]:
pd.read_csv('iso_mei.csv').head(n=3)

Unnamed: 0.1,Unnamed: 0,ISO3,Country,Time,"Passenger car registrations sa, Index","Permits issued for dwellings sa, Index","Production in total manufacturing sa, Index","Production of electricity, gas, steam and air conditioning supply sa, index","Production of total construction sa, Index","Production of total energy sa, Index","Production of total industry sa, Index","Production of total manufactured intermediate goods sa, Index","Production of total manufactured investment goods sa, Index","Total retail trade (Volume) sa, Index","Work started for dwellings sa, Index"
0,0,AUS,Australia,2000,107.3669,57.15109,98.49372,,47.84125,,70.86642,,,57.69667,62.9238
1,1,AUS,Australia,2001,102.6701,61.87128,98.55775,,45.79744,,71.85328,,,60.15889,60.99082
2,2,AUS,Australia,2002,104.762,74.1684,103.1726,,55.03091,,73.78157,,,63.93376,76.70287
