In [None]:
import dataUtls  # ancilliary utility module: source in repository

In [None]:
import importlib

importlib.reload(dataUtls)

# Exploring the CIA's World Factbook
***

### OBJECTIVE
This report is a broad exploration of World Factbook data from the US Central 
Intelligence Agency, seeking to provide general point-of-interest insights on 
key datapoints in areas including health, gender, economy and environment, both 
directly from the sample and deriving new features through feature-maxima bar 
plots ("top tens"), and correlation scatter plots. 

The report implements data processing techniques including expression matching, 
correlation analysis, and z-score and probability density analysis.  

**Some example datapoints...**
- Where in the world do women contribute the most to smoking rates?
- Where does agriculture get the most value per worker?
- Who bucks the HIV/AIDs trend the most in the Southern Africa epicentre?
- What countries have both the most adults obese and children underweight? 

***
### DATA: acquisition, cleaning and feature engineering

In [None]:
nData = dataUtls.nbData()
dUtls = dataUtls.dUtls( nData )

# authenticate, download and load kaggle dataset
dUtls.getKaggleSet( 'lucafrance', 'the-world-factbook-by-cia' )

# display data type-count
dUtls.typeCount( nData.OR )

#### *STATE OF INFORMATION*

- *[The World Factbook by CIA](https://www.kaggle.com/datasets/lucafrance/the-world-factbook-by-cia)
 (owner: Luca Franceschini, available at Kaggle.com)*

The objective concerns only the features which have a primary value that can be
included in numeric operations. The dataset is documented to have been acquired
by browser extraction, and contains a substantial amount of noise or extraneous 
information, in inconsistent format. Pattern matching is further complicated by 
the presence of historical data in the same cell, and variations of scale within 
many features.

Processing extracts the primary numeric value to float-type, and omits features
with null values beyond a minimal threshold (initially, the dataset is 74% 
null). For clarity, source code for cleaning operations is in the ancillary 
module. Substring segment frequencies are analysed to derive feature unit.

In [None]:
# parse distinct numbers via regular expression to dictionary
dUtls.generateMatchDct()

# isolate clean float from number-match records
dUtls.isolateClean()

# generate dataframe from clean feature data
dUtls.getCleanDF()

# enforce non-nan threshold ( av. dense + .5 sDev rounded ), convert numeric
dUtls.getNumericNonNan()

# review cleaned dataframe
dUtls.cleanReport( [ nData.OR, nData.CL ] )

# Manual analysis: has valid primary number value, scale variations
#   dataUtls.runScaleAnalysis(dfColsClean, cleanReman)
#   Returning from PKL:
dropFeats, scaleNotes = dUtls.unPklData( 'dropFeatrs', 'cleanNotes', dct=False )

# apply drop to flagged features
dUtls.runDrops( dropFeats )

# adjust scale-variant values to unify scale
dUtls.flattenScale( scaleNotes, dropFeats )

# drop summative observation
dUtls.popRowsByFtVal( 'Country', [ 'World' ] )

# numericise df (inc. ctry to code), get country-code converter dct
dUtls.numercisedDF()

# collect candidate units from substring frequency in feature data
#   return stored collection from PKL
#   dataUtls.generateUnitDct()
unitDat = dUtls.unPklData( s := 'unitDct_220918_144049038287.pkl' )[ s ]

dUtls.cleanUnits( unitDat )

#### *OUTLIER ANALYSIS*
Rather than clipping or smoothing as in many machine learning purposes, this 
report is especially interested in retaining and presenting genuine outliers 
(the sample maximums and minimums for various features). Here, outlier analysis 
is used to determine that outliers are not obviously the result of errors in 
the data.

For example, here we confirm that the highly differentiated probability density
of the 'Geography: Area - total' feature (strongly skewed toward the minimum, 
with many larger countries of singular size creating a "platykurtic" or very
flattened upper tail ) corresponds to observations in the original data.

In [None]:
# identify outliers with z-score standardization
pDense = dUtls.showPDens  # display probability distrib. with fit
zThresh = dUtls.getZThreshDF  # For showing observations above z-score

# visual Gaussian-fit check 
pDense( 'Geography: Area - total' )

# checking with z-score 
zThresh( 'Geography: Area - total', 2.5 )

By contrast, though still with a minimum-skewed distribution: in **Military and 
Security: Military expenditures**, North Korea's dramatic, isolated prominence 
at the "most militarized" end turns out to be the result of extraction error. 
They do still remain the world leader for the feature, at between 20-25% of GDP.

In [None]:
ft = 'Military and Security: Military expenditures'
pDense( ft )
zDF = zThresh( ft, 0.228, ret=True )
print( f"\"{[ b for a, b in zDF.orVal.items() ][ 0 ]}\"" )

***
### Exploration, Analysis and Visualisation
Reporting is broadly categorized into discussions of correlations and maxima in:
1. Age, health and gender
2. Coal, Energy and Pollution
3. Economy
4. Geography and Environment

In [None]:
# provide local access to required module data tools
getCorrs_T = dUtls.getCTDct  # get dict for correlations at threshold
showMax = dUtls.showMaxima  # barplot for <=10 feature-max/min countries
pltSctr = dUtls.plotScttr  # scatterplot distribution for feature pair
getRank = dUtls.getRank  # return country's rank for value for feature
getVal = dUtls.getVal  # return val for country for feature
reportDifs = dUtls.reportDiffs

#### *FEATURE MAXIMA*
The highest and lowest values for features compiled by the CIA's World Factbook 
provide a convenient global view on important aspects of humanity and the 
environment. Some caution is necessary before interpreting these summaries 
due to some issues of accuracy around conflagration of time period (the CIA
resource mixes the latest reporting year, which can vary by decade or more).

#### *CORRELATIONS AND SCATTERPLOTS*

In [None]:
# generate correlation significance between all features 
dUtls.getCorDct()  # generate feature master correl. dict

**Perfect correlations** (of significance -1 or 1) are rarely informative, in 
that they identify effectively identical, or duplicate, features. For this 
reason, they can be useful for reducing unnecessary dimensionality in large 
datasets.

In [None]:
# drop perfect correlations (duplicates)
dUtls.dropDupCorrs()

**Strong correlations** will tend to be more self-evident (for example, "Total 
area" being near-perfectly correlated to "Total land", with some noise caused 
by variable water-area), but still provide an empirical, observational basis 
for testing assumptions. 

**Scatter plots** are useful for identifying or demonstrating where a clear 
pattern, like a linear correlation, is present, and also for an alternative 
view on observations made with different techniques.

**Some near-perfect correlations** (including some duplicate features where
the data is differentiated only by null values):

In [None]:
importlib.reload( dataUtls )
dUtls = dataUtls.dUtls( nData )

# Examine near-perfect threshold (likely still duplicates or self-evident)
dUtls.reportDiffs( getCorrs_T( 0.998 ) )

<hr style="height:2px;background-color:gray">

**Strong negative correlations** (for features X and Y, as X approaches maximum, 
Y clearly tends towards minimum):

In the Factbook data, at least one of each pair of features that has a negative 
correlations above a minimal 0.5 significance threshold is either **Birth Rate**
or **Rate of Urbanization.** Excluding these two features, the strongest 
negative correlation has a signicance score of -0.45.

In [None]:
importlib.reload(dataUtls)
dUtls = dataUtls.dUtls(nData)

dUtls.reportStrongNeg(-0.5)

**Including** these two features, many correlations are fairly **self-evident**. 
For example, the strongest, at **-0.81**:

In [None]:
pltSctr( [
    'Energy: Electricity access - electrification - total population',
    'People and Society: Birth rate' ] )

It stands that with a higher birth rate, the percentage of the population 
(including babies) that does not have access to an existing electrical 
connection, particularly where such connections are prohibitively expensive for 
the average income, should rise proportionally.

One that takes a little more unpacking is a correlation of significance -0.73
between **Literacy (total population)** and **Rate of urbanization**.

In [None]:
pltSctr( [
    'People and Society: Literacy - total population',
    'People and Society: Urbanization - rate of urbanization' ] )


According to this correlation, the higher the rate of people moving to live in
urban areas, the lower the average literacy of the population. A suitable 
explanation might be that the rate of urbanization is an indicator - or even the 
same as - a rate of industrialization (the industrial centralization of 
societies); in other words, that literacy is less prevalent where sudden, large 
increases in industrial development are possible. 

***
#### *AGE, HEALTH AND GENDER*

- Neat curvilinear distribution between **percent pop. over 64** and
**percent population under 15** showing a clear negative relationship.
- Likewise, in a positive trend, the strong association of **higher birthrate**
with **bigger proportion of 15-24yo's** rapidly decreases after the birthrate 
hits around 20%; after this point, that segment is relatively stable for higher 
birthrates.

In [None]:
pltSctr( [
    'People and Society: Age structure - 65 years and over',
    'People and Society: Age structure - 0-14 years' ] )

pltSctr( [
    'People and Society: Birth rate',
    'People and Society: Age structure - 15-24 years' ] )

<hr style="height:2px;background-color:gray">

- Gulf countries occupy the distribution of **sex ratio weighting** where it 
tends toward men - startlingly so for Qatar & the Unite Arab Emirates.
Note we can see strong support for 
[Fisher's principle](https://en.wikipedia.org/wiki/Fisher%27s_principle): sex 
ratio is leptokurtic, closely gathered around the one-to-one ratio.

In [None]:
zThresh( ft := 'People and Society: Sex ratio - total population', 0.8 )
pDense( ft )

- At the same time, both have (by good measure) the **highest proportion of
total population that is 25-54 years old**. 

So, in other words, Qatar and the UAE have a particularly large and masculine 
working-age population.

In [None]:
showMax( 'People and Society: Age structure - 25-54 years' )

- in Palau, North Korea, and particularly many East European countries, there 
are **more than twice as many old women as old men**. In other terms, men are 
dying considerably earlier than women in these countries.

In [None]:
showMax( 'People and Society: Sex ratio - 65 years and over', asc=True )

<hr style="height:2px;background-color:gray">

- Some surprise might arise from the data on **Health Expenditure** - despite 
highest costs for *individuals* in the OECD (see 
[here](https://en.wikipedia.org/wiki/Health_care_prices_in_the_United_States) 
for example), the US Government is spending more on health than the rest of the 
world (Tuvalu excepted. Along with other island states in this top-ten, 
expenditure proportions might be considered less significant given the 
susceptibility of comparably small budgets to weighting.)

In [None]:
showMax( ft := 'People and Society: Current Health Expenditure' )
zThresh( ft, 2.5 )

<hr style="height:2px;background-color:gray">

- Highly interesting to see Monaco leading the top-ten for **Physician 
density** while at the same time trailing in the very bottom for **Health 
expenditure**. Citizens are privately funding most of a large, expert-intensive 
healthcare system. 
- There are surprisingly **zero nations from the Anglosphere** in the top-ten 
for physician density. More data would be needed to probe a following question: 
with fewer physicians per individual, is there a measurable failure of 
preventative care that might have been afforded by the access and familiary 
that community-embedded physicians might provide?

In [None]:
showMax( 'People and Society: Physicians density', short=True )
showMax( 'People and Society: Current Health Expenditure', asc=True, 
    short=True, unit="% GDP" )

<hr style="height:2px;background-color:gray">

- Southern African nations exclusively form the top 10 **percent of population 
living with HIV/AIDs**

In [None]:
dataUtls = importlib.reload( dataUtls )
dUtls = dataUtls.dUtls( nData )
showMax = dUtls.showMaxima

nData.DF[ (ft := 'People living with HIV/AIDs as percentage of population') ] = (
    nData.DF[ 'People and Society: HIV/AIDS - people living with HIV/AIDS' ] /
    nData.DF[ 'People and Society: Population' ])

showMax( ft, unit="_" )

<hr style="height:2px;background-color:gray">

- What is the **lowest-HIV prevalence in the Southern Africa high-HIV/AIDS 
region**?

In [None]:
sthEquatAfrica = nData.OR[
    (nData.OR[ "Geography: Map references" ] == "Africa") &
    (nData.OR[ "Geography: Geographic coordinates" ].str.contains( "S" )) ]

showMax( 'People living with HIV/AIDs as percentage of population', asc=True,
    mask=nData.DF[ 'Country' ].isin( sthEquatAfrica.Country.to_list() ),
    sub="African Nations South of Equator", unit="%" )

- For mainland countries, Angola - being large, and close to the HIV/AIDs 
epicentre, appears to have some form of strongly inhibiting factor.
A look at recent history identifies a cause for the low prevalence: civil war. 

> The 27-year civil war in Angola, lasting from 1975 until 2002, kept the spread 
> of HIV to a minimum due to large parts of the country being inaccessible to 
> people infected with the virus. During the civil war, individuals from 
> neighboring countries such as Zambia, Botswana, and Zimbabwe (all countries 
> with high prevalence rates of HIV) were also not allowed to come into the 
> country, which played a significant role in controlling the spread of HIV.
> [(source: Wikipedia)](https://en.wikipedia.org/wiki/HIV/AIDS_in_Angola#History)

<hr style="height:2px;background-color:gray">

- **Gender and Tobacco usage**: nearly half the people in Nauru and Burma smoke.
However, when limited to females, European nations remain in the t10, while
the Asia-Pacific nations Burma, Kiribati, Timor Leste, PNG and Indonesia all
disappear (the men are the smokers). This pattern is 
strikingly reversed for Nauru, where it is female smoking alone which places it 
at number one, with Nauru's male smoking at 140th place in the world.

In [None]:
getRank( 'Nauru', 'People and Society: Tobacco use - male' )
showMax( 'People and Society: Tobacco use - total' )
showMax( 'People and Society: Tobacco use - male' )
showMax( 'People and Society: Tobacco use - female' )

<hr style="height:2px;background-color:gray">

- **Generational weight disparity**: where is there the highest observations 
for both adult obesity prevalence and children 4 years and under who are 
underweight? </br> </br> These maxima are limited to nations reporting **above-
mean observations** for both features (total seven).

In [None]:
uFeat = 'People and Society: Children under the age of 5 years underweight'
oFeat = 'People and Society: Obesity - adult prevalence rate'

nData.DF[ 'Generational weight disparity' ] = (
    nData.DF[ uFeat ] + nData.DF[ oFeat ])

aboveMeans = [ country for country in nData.DF[ 'Country' ] if (
    nData.DF.loc[ nData.DF[ 'Country' ] == country, uFeat ].iloc[ 0 ]
    >= nData.DF[ uFeat ].mean() and
    nData.DF.loc[ nData.DF[ 'Country' ] == country, oFeat ].iloc[ 0 ]
    >= nData.DF[ oFeat ].mean()) ]

showMax( 'Generational weight disparity', unit="_",
    mask=nData.DF[ 'Country' ].isin( aboveMeans ),
    sub="% adults obese + % children underweight, both above-mean" )

<hr style="height:2px;background-color:gray">

- **Education disparities** - by a modest yet significant margin, Australia has 
the longest times spent in school, and the position **holds true for 
women** as much as men.

In [None]:
lifexpFt = ("People and Society: School life expectancy (primary to tertiary "
            "education) -")
showMax( f'{lifexpFt} total', short=True )
showMax( f'{lifexpFt} male', short=True )
showMax( f'{lifexpFt} female', short=True )


<hr style="height:2px;background-color:gray">

- **School-completion disparity in the sexes**: Liechtenstein stands out at 
least in terms of supposed development, having one of the highest GDPs per 
capita in the world.

In [None]:
nData.DF[ (ft := 'School completion disparity between sexes') ] = (
    nData.DF[ f'{lifexpFt} male' ] - nData.DF[ f'{lifexpFt} female' ])

showMax( ft, unit='(Years)',
    sub="Average lifetime in education, male minus female" )


<hr style="height:2px;background-color:gray">

- Most **emmigration**, and **populations in greatest contraction**: two very 
strong categories: either islands (esp. Pacific), or eastern Europe

In [None]:

showMax( 'People and Society: Net migration rate', asc=True )
showMax( 'People and Society: Population growth rate', asc=True )

***
#### *COAL, ENGERY AND POLLUTION*

- **Not a glitch:** this is what the stats on the top-ten coal producers looks 
like:

In [None]:
product = 'Energy: Coal - Production'
consump = 'Energy: Coal - Consumption'
exports = 'Energy: Coal - Exports'
imports = 'Energy: Coal - Imports'

showMax( product )
zThresh( product, 0.3, )
showMax( consump )
zThresh( consump, 0.3, )

# compare: Energy: Electricity - Consumption

- For both **Coal production and consumption**, The only nation falling outside 
three standard deviations of the mean, falls outside by around ***fourteen*** 
standard deviations. More astonishingly, China remains a net importer - they 
consume this and more. 
- Between production to consumption, Australia disappears down to 199th in the 
world; quite a feat for the fourth-largest producer, whereas the rest of the 
top ten producers are in the top ten consumers (excepting Kazakhstan, who drops 
out similarly as consumer to 197th).

In [None]:
chinaProd = nData.DF[ nData.DF.Country == 'China' ][ product ].sum()
chinaCsmp = nData.DF[ nData.DF.Country == 'China' ][ consump ].sum()

notChinaProd = nData.DF[ nData.DF.Country != 'China' ][ product ].sum()
notChinaCsmp = nData.DF[ nData.DF.Country != 'China' ][ consump ].sum()

# get longest string length to pad report field
pad = (max( [ len( str( i ) )
    for i in [ chinaProd, notChinaProd, chinaCsmp, notChinaCsmp ] ] ))

print( f"PRODUCTION: China's production is "
       f"[ {(chinaProd / notChinaProd):,.2f} ] times the rest of world's:\n"
       f"   [ {chinaProd:>{pad},.2f} ]: China's coal production\n"
       f"   [ {notChinaProd:>{pad},.2f} ]: rest of world combined\n" )
print( f"CONSUMPTION: China's consumption is "
       f"[ {(chinaCsmp / notChinaCsmp):,.2f} ] times the rest of world's:\n"
       f"   [ {chinaCsmp:>{pad},.2f} ]: China's coal consumption\n"
       f"   [ {notChinaCsmp:>{pad},.2f} ]: rest of world combined\n" )
getRank( 'Australia', consump )
print()
getRank( 'Kazakhstan', consump )

<hr style="height:2px;background-color:gray">

- We can get an image of a country's **relationship with coal** if we look at the 
combined production and imports in ratio to exports. Where this ratio is above 
one, a country has exported above the total produced and imported, meaning it 
has sold reserves. Refining further to only the countries whose coal exports 
are above the world-mean, we can see who has a strong reliance on coal exports. 
</br> </br> In order, the refinement exludes Venezuela, Belarus and Eswatini 
such that Russia, South Africa and the Phillipines enter the **top sellers**

In [None]:
nData.DF[ 'Coal: Exports-to-Total-Holdings ratio' ] = (
    nData.DF[ exports ] / (nData.DF[ product ] + nData.DF[ imports ]))

aboveMeans = [ country for country in nData.DF[ 'Country' ] if (
    nData.DF.loc[ nData.DF[ 'Country' ] == country, exports ].iloc[ 0 ]
    >= nData.DF[ exports ].mean()) ]

showMax( 'Coal: Exports-to-Total-Holdings ratio',
    mask=nData.DF[ 'Country' ].isin( aboveMeans ), unit="_" )

<hr style="height:2px;background-color:gray">

Naturally, on the topic of coal, a look at the **top CO2 emitters**:
- An unsurprising correlation between **CO2 emissions and coal consumption** 
(that's China out in the deep end)
- **Dirty consumers** (hi there Australia!) - emissions by coal consumption 

In [None]:
yFeat = 'Energy: Carbon dioxide emissions - From coal and metallurgical coke'
pltSctr( [ consump, yFeat ] )

nData.DF[ 'Coal/Metalurgical CO2 emissions PER Coal consumption' ] = (
    nData.DF[ consump ] / nData.DF[ yFeat ])

showMax( 'Coal/Metalurgical CO2 emissions PER Coal consumption',
    sub='Dirtiest emitters per unit consumed', unit="_" )


<hr style="height:2px;background-color:gray">

- Kenya's **Geothermal energy mix** is impressive. Good geology: </br> </br>
>In places where tectonic plates – consisting of the Earth's crust, and the 
upper mantle – are being pushed together or torn apart, this heat rises closer 
to the surface. One such place is Africa's Great Rift Valley, which runs 
7,000km (4,350 miles) across the eastern side of the continent.
> *(Source: 
> [BBC](https://www.bbc.com/future/article/20210303-geothermal-the-immense-volcanic-power-beneath-our-feet))*

In [None]:
ft = 'Energy: Electricity generation sources - Geothermal'
showMax( ft )
zThresh( ft, 2.5 )

***
#### *ECONOMY*
- If you tend to assume that countries usually **spend close to what they make in 
revenue**, take confidence from seeing how both rise together in very close 
proportion - **almost a perfect correlation of 0.997**

In [None]:
x, y = 'Economy: Budget - revenues', 'Economy: Budget - expenditures'
print( f"{dUtls.reportCorr( [ x, y ] )} for\n    {x}\n    and {y}\n" )

pltSctr( [
    'Economy: Budget - revenues',
    'Economy: Budget - expenditures' ] )

- **Services share of GDP**: which sovereignties pay the bills almost entirely 
from desk-work?

In [None]:
showMax( 'Economy: GDP - composition, by sector of origin - services' )

<hr style="height:2px;background-color:gray">

- **Inflation**: Venezuela dwarfs the world at *fifteen* times the standard 
deviation.

In [None]:
showMax( ft := 'Economy: Inflation rate (consumer prices)' )
zThresh( ft, 0.0677 )

<hr style="height:2px;background-color:gray">

- **Robot Ranchers**: A surprise in the bottom ten **percent of workers in 
agriculture** is the USA: only ~0.7%, while the US **Agri-sector share of GDP** 
is still 22nd-highest in the world.

In [None]:
# Error in the scraped data for Tonga (date taken as percent). Excluding Tonga, 
showMax( 'Economy: Labor force - by occupation - agriculture',
    mask=nData.DF[ 'Country' ] != "Tonga", asc=True )

getRank( 'United States',
    'Economy: GDP - composition, by sector of origin - agriculture' )

- Following on from this: who are the **most efficient farmers**? 

In [None]:
nData.DF[ 'Agriculture: GDP composition to labour' ] = (
    nData.DF[ 'Economy: GDP - composition, by sector of origin - agriculture' ] /
    nData.DF[ 'Economy: Labor force - by occupation - agriculture' ])

showMax( 'Agriculture: GDP composition to labour', unit="_",
    mask=nData.DF[ 'Country' ] != "Tonga", sub="Efficient farmers" )

<hr style="height:2px;background-color:gray">

- One last item on *Economy*: Maldives' **goods and services importation** is 
uniquely astronomical.

In [None]:
# gdp-by-imports: maldives only positive, and vastly so [...]
ft = 'Economy: GDP - composition, by end use - imports of goods and services'
showMax( ft )
zThresh( ft, 2.5 )

***
#### *GEOGRAPHY AND ENVIRONMENT*

- **Most watery nations** ( total area by surface water )

In [None]:
# highest percent water area
nData.DF[ 'Water-area ratio' ] = (
    nData.DF[ 'Geography: Area - water' ] /
    nData.DF[ 'Geography: Area - total' ])
showMax( 'Water-area ratio', unit="_" )

The British Indian Ocean Territory value here is an outlier arising from its
the water-area being recorded as considerably larger than the total area, 
which purports to include both water and land. For those understandably 
unfamiliar with the territory, it is designated across a rather isloated 
archipelago in a geographically significant position. Some interesting reading 
for both  observers of colonialism and of ongoing events in the projection of 
sovereign power across large sea vectors:

##### British Indian Ocean Territory
![British_Indian_Ocean_Territory](https://upload.wikimedia.org/wikipedia/commons/thumb/9/96/British_Indian_Ocean_Territory_in_United_Kingdom.svg/1466px-British_Indian_Ocean_Territory_in_United_Kingdom.svg.png)

WIKI: 
>The only inhabitants are British and U.S. military personnel and associated 
contractors, who collectively number around 3,000 (2018 figures). The forced 
removal of Chagossians from the Chagos Archipelago occurred between 1968-1973. 
Today, the exiled Chagossians are still trying to return, saying that the 
forced expulsion and dispossession was unlawful, but the UK government has 
repeatedly denied them the right of return. The islands are off-limits to 
Chagossians, casual tourists, and the media.

- **Excluding the BIOT as an outlier** should reveal a more intuitive distribution.
Better yet, to avoid catching so many islands, let's filter down to countries 
whose coastline is no longer than their land boundaries with other countries.

In [None]:
showMax( 'Water-area ratio', unit="_",
    mask=nData.DF[ 'Geography: Land boundaries - total' ] >=
         nData.DF[ 'Geography: Coastline' ],
    sub="Where Coastline =< Land Boundaries" )

<hr style="height:2px;background-color:gray">

- **Elevation difference**: Countries with the largest difference between their 
lowest and highest point. While the China-Nepal border equally dissects 
Siggamartha's (Everest's) highest point, China itself has a lower minimum 
elevation than Nepal.

In [None]:
nData.DF[ 'Maximum elevation difference' ] = (
    nData.DF[ 'Geography: Elevation - highest point' ] -
    nData.DF[ 'Geography: Elevation - lowest point' ])

showMax( 'Maximum elevation difference', unit="m" )

- **Flattest places in the world**: no point on natural ground is at an ascent 
of more of than five meters from any other point. At #4 in the world, Pakistan's 
current floods are devastating partly due to this marked flatness. </br> </br> 
Cayman Islands is easiest on the hips with a M.E.D of one meter.

In [None]:
showMax( 'Maximum elevation difference', asc=True, unit="m" )

- **Water withdrawrals minus renewable sources**: Who are the most- and least- 
reliant on trade for water? (Hi there, Middle East!)

In [None]:
nData.DF[ 'Water withdrawal exposure to trade' ] = (
    (nData.DF[ 'Environment: Total water withdrawal - municipal' ] +
     nData.DF[ 'Environment: Total water withdrawal - industrial' ] +
     nData.DF[ 'Environment: Total water withdrawal - agricultural' ]) -
    nData.DF[ 'Environment: Total renewable water resources' ])

showMax( 'Water withdrawal exposure to trade',
    sub="Withdrawrals minus resources", unit="cubic meters" )

# Brazil is sitting happy there around the Amazon. Russia and Canada just 
# melt vast amounts of snow.
showMax( 'Water withdrawal exposure to trade', asc=True, unit="cubic meters",
    sub="Withdrawrals minus resources" )

- **Irrigated area ratio:** Many members of this T10 may not surprise, as 
familiar origins of agricultural commodities, but the Gaza Strip may conjure a 
more arid image.

In [None]:
# Ratio of irrigated land to total land

nData.DF[ 'Irrigated-area ratio' ] = (
    nData.DF[ 'Geography: Irrigated land' ] /
    nData.DF[ 'Geography: Area - total' ])

showMax( 'Irrigated-area ratio', unit="_" )

print( f"Irrigated area in Gaza Strip is "
       f"{getVal( 'Gaza Strip', 'Geography: Irrigated land' )} sqkm" )

print( f"Total area of Gaza Strip is "
       f"{getVal( 'Gaza Strip', 'Geography: Area - total' )} sqkm" )

The value comes down to proportion, of course (irrigation takes up 240 of the 
territory's 360 square kilometers), but more deeply to demographics. The Gaza 
Strip is the third-most densely populated territory in the world, and has a 
special reliance on agricultre as a 
[vital element of food production and employment](https://socialsciences.mcmaster.ca/kubursi/ebooks/water.htm).

In [None]:
# One final set of datapoints: 
# Who are the "most top/bottom 10/5/3/1" countries, and what for?

# import pandas as pd
# 
# ctrMaxDct = { c: { 'top': [ ], 'bot': [ ], 'tCnt': 0, 'bCnt': 0 }
#     for c in list( nData.DF.Country ) }
# 
# for ft in list( nData.DF.columns )[ 1: ]:
#     
#     dfFtSrt = pd.concat( [ nData.DF[ 'Country' ],
#         pd.Series( nData.DF[ ft ] ) ], axis=1 ).sort_values( by=[ ft ],
#         ascending=False )
#     
#     topCtrs = list( dfFtSrt.head( 10 ).Country )
#     botCtrs = list( dfFtSrt.tail( 10 ).Country )
#     
#     for c in topCtrs:
#         ctrMaxDct[ c ][ 'top' ].append( ft )
#         ctrMaxDct[ c ][ 'tCnt' ] += 1
#     for c in botCtrs:
#         ctrMaxDct[ c ][ 'bot' ].append( ft )
#         ctrMaxDct[ c ][ 'bCnt' ] += 1
# 
# sortByTops = sorted( [ [ ctrMaxDct[ c ][ 'tCnt' ], c ] for c in ctrMaxDct.keys() ], reverse=True )
# 
# for count, ctr in sortByTops[:10]:
#     print( f"{ctr} is in {count} top-tens:\n{ctrMaxDct[ctr]['top']}" )
# 
# # what are the top ten top ten top ten features?
# 
# from collections import Counter
# 
# countt10t10t10fs = [ 
#     ft for ftLi in [ ctrMaxDct[ctr]['top'] for _, ctr in sortByTops[:10] ] for ft in ftLi ]
# 
# Counter(countt10t10t10fs ).most_common(n=10)

***
Thanks for the read! Check for more data analysis and Python projects at 
**[github.com/romstroller](https://github.com/romstroller/)**


In [None]:
# Add positional bar colour-gradient to barplots
#   https://stackoverflow.com/questions/60220089/how-to-add-color-gradients-according-to-y-value-to-a-bar-plot