## Data Pipeline: UK Population Estimates
- This data pipeline downloads an Excel workbook from the Office of National Statistics (ONS) website and processes it into a DataFrame.
- The resultant DataFrame is a **Population Estimate Breakdown by UK Administrative Geographies** (426 rows).
- This DataFrame was then saved to Postgres and 3 views were created - these views subset the UK Administrative Geographies into 3 distinct categories: **Countries** (4 rows), **Regions** (12 rows) and **Local Authority Districts** (379 rows).
- **Downloaded data format: xls**.
- **Processed data purpose:** Use as a data augmentation source for various projects.
- **Data provider: The Office of National Statistics**.

### Data summary: 
- The result of this data pipeline is a Postgres table called **uk_pop_stats** and its related views. This table contains 426 rows representing distinct Administrative Geographies of the UK i.e. Countries, Regions and Local Authority Districts. For each row, a number of population estimate values are presented.

In [1]:
# IMPORT LIBRARIES.
import pandas as pd
import io
import requests
import copy
import sqlalchemy
from sqlalchemy import create_engine

In [2]:
# SET DISPLAY OPTIONS (None MEANS UNLIMITED).
# TO SET NUMBER OF ROWS DISPLAYED:
pd.set_option("max_rows", 200)
# TO SET NUMBER OF COLUMNS DISPLAYED:
pd.set_option("max_columns",None)

## 1. DOWNLOAD AND PROCESS MULTI SHEET EXCEL WORKBOOK.

### 1.1. DOWNLOAD DATA AND PROCESS INTO A SINGLE DATAFRAME.

In [3]:
url = "https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/populationestimatesforukenglandandwalesscotlandandnorthernireland/mid2019april2020localauthoritydistrictcodes/ukmidyearestimates20192020ladcodes.xls"
# UPDATED 2020 URL - RAISES ERROR WHEN USED - INVESTIGATE.
#url = "https://www.ons.gov.uk/file?uri=%2fpeoplepopulationandcommunity%2fpopulationandmigration%2fpopulationestimates%2fdatasets%2fpopulationestimatesforukenglandandwalesscotlandandnorthernireland%2fmid2020/ukpopestimatesmid2020on2021geography.xls"

def create_dataframe(url):

    # SET USER AGENT TO BYPASS 403 ERROR. THIS IS THE MOZILLA USER AGENT. REPLACE AS APPROPRIATE.
    headers = {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X x.y; rv:10.0) Gecko/20100101 Firefox/10.0"}
    # DOWNLOAD MULTI SHEET EXCEL WORKBOOK.
    response = requests.get(url, headers=headers)

    # CREATE AN EMPTY DICTIONARY TO HOLD THE RESULT DATAFRAMES.
    pop_df_dict = {}

    # THESE 5 SHEETS WILL BE USED:
    sheets = ["MYE2 - Persons", "MYE2 - Males", "MYE2 - Females", "MYE 5", "MYE 6"]
    # CREATE A DATAFRAME FROM EACH OF THE 5 SHEETS.
    for sheet in sheets:
        # WRAP THE EXCEL sheet response.content IN A BytesIO.
        with io.BytesIO(response.content) as fh:
            # NAME DATAFRAME BASED ON SHEET NAME, REMOVE SPACES, CHANGE DASHES TO UNDERSCORES AND LOWERCASE EACH DATAFRAME NAME. 
            pop_df_dict[("df_" + sheet)
                        .replace(" ","")
                        .replace("-","_")
                        .lower()] \
            = pd.io.excel.read_excel(fh, sheet,
                        # SKIP 1st 4 ROWS OF SHEET.
                        header=4,
                        # SKIP LAST 2 ROWS OF SHEET.
                        skipfooter=2,
                        # INCLUDE THESE ADDITIONAL REPRESENTATIONS OF NULL VALUES.
                        na_values=["Nan","NAN","--","-","__","_"],
                        # INDEX ON UK ADMINISTRATIVE GEOGRAPHY CODE.
                        index_col="Code"
                        )
            
            
    # EXTRACT EACH DATAFRAME FROM pop_df_dict DICTIONARY.
    # THIS 1ST DATAFRAME WILL BE AUGMENTED WITH ATTRIBUTES FROM THE OTHER 4 DATAFRAMES.
    df_mye2_persons = pop_df_dict["df_mye2_persons"]

    # THE NEW VERSION OF THE url EXCEL WORKBOOK IS RELEASED AT THE END OF JUNE EACH YEAR.
    # WHEN USING THE NEW VERSION, year IS 1 YEAR BEHIND THE CURRENT YEAR. OTHERWISE year IS 2 YEARS BEHIND THE ...
    # CURRENT YEAR.
    ##if pd.to_datetime("now").month > 6:
    ##    year = str(pd.to_datetime("now").year - 1)
    ##else:
    ##    year = str(pd.to_datetime("now").year - 2)
        
    # REINSTATE ABOVE IF/ELSE IF YOU FIX URL ISSUE TO UPDATE TO NEWER FILE VERSION. CURRENTLY USING OLD FILE.
    year = '2019'
        
    # CREATE DICTIONARIES AS PRECURSORS TO NEW ATTRIBUTES.
    mye2_male_dict = pop_df_dict["df_mye2_males"]["All ages"].to_dict()
    mye2_female_dict = pop_df_dict["df_mye2_females"]["All ages"].to_dict()
    mye5_area_dict = pop_df_dict["df_mye5"]["Area (sq km)"].to_dict()
    mye5_ppl_dict = pop_df_dict["df_mye5"][f"{year} people per sq. km"].to_dict()
    mye6_age_dict = pop_df_dict["df_mye6"][f"Mid-{year}"].to_dict()

    # CREATE DATAFRAME FROM ABOVE DICTIONARIES TO JOIN WITH df_mye2_persons DATAFRAME.
    extra_atts = pd.DataFrame.from_dict({"male_population":mye2_male_dict,
                                        "female_population":mye2_female_dict,
                                        "area_sq_km":mye5_area_dict,
                                        "ppl_per_sq_km":mye5_ppl_dict,
                                        "median_age":mye6_age_dict
                                        })

    # JOIN 2 DATAFRAMES ON INDEX.
    uk_pop_pre = df_mye2_persons.join(extra_atts)
    
    return uk_pop_pre

In [4]:
create_dataframe(url)

Unnamed: 0_level_0,Name,Geography1,All ages,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90+,male_population,female_population,area_sq_km,ppl_per_sq_km,median_age
Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1
K02000001,UNITED KINGDOM,Country,66796807,722881,752554,777309,802334,802185,809152,827149,852059,838680,822812,813774,820269,793405,777849,748569,736855,717056,708482,733067,761508,797247,811223,842201,850411,851998,879406,882616,911206,928979,912042,903442,912000,889687,896728,895275,872653,879070,877449,883170,883325,848120,790679,778814,793909,808017,822075,858311,895065,924065,902606,924754,924666,936289,934335,940971,930783,909684,888131,856779,820531,801220,782729,752215,723647,695374,694374,682311,659691,661251,670572,683532,714929,768023,588245,564138,556173,511519,451509,400077,406018,393605,372612,344104,316201,288806,255542,230667,210077,186163,159641,605181,32978229,33818578,242743,275,40.3
K03000001,GREAT BRITAIN,Country,64903140,700160,729146,753103,777260,777225,784154,801776,825785,812581,797010,787647,794127,768470,754088,725407,713972,694660,686098,710566,739275,774174,788433,819124,827256,828734,855780,858947,886375,903860,886962,878457,886649,864133,871106,870026,847655,854401,852853,857965,858210,823998,767670,755956,770974,784687,798092,833365,869781,898296,876808,898690,898167,909988,907991,914195,904528,884109,863181,832285,797151,778569,760463,730946,703319,676084,675493,663890,642055,643742,653146,666359,697988,751228,572800,549093,541518,497605,439224,389239,395836,383494,363446,335419,308434,281696,249323,224878,205072,181788,156024,591447,32045512,32857628,228950,283,40.3
K04000001,ENGLAND AND WALES,Country,59439840,649388,676412,698837,720721,719821,726317,742744,765225,750173,737531,726528,733267,709958,696722,668590,658280,640608,632385,653732,677608,708336,720698,748254,755826,757151,782598,784090,807248,824760,810973,802809,810906,790832,798415,797946,777820,783817,781425,787003,788497,756871,705441,694855,706616,720070,732367,763688,795939,821630,801260,820123,818248,829626,828203,831741,823099,802885,784119,755249,722779,705065,689075,661702,636452,612394,612894,602897,583460,585085,594546,606965,637206,686169,524406,503866,496130,455010,400818,354441,361072,350455,332255,306983,282197,257792,228197,206177,188071,167219,143992,547789,29382509,30057331,151047,394,40.2
E92000001,ENGLAND,Country,56286961,618858,644056,665596,686135,684992,691122,706742,727938,712204,700200,689733,695753,673789,660928,634043,624590,607496,599393,618873,639880,668129,679576,706968,715442,717748,740656,742735,765411,782363,770244,762666,771667,752937,760681,760003,741443,746952,745065,749311,750871,721254,672514,661799,673246,685484,696569,725600,755206,778729,759708,776578,775173,785471,784074,786165,777616,758665,740085,712624,681661,664457,649021,622905,599252,576200,575744,566050,547827,549233,557886,569617,598038,645078,493261,473332,466239,427207,375723,332047,339470,329713,312737,289092,265631,242740,214727,194007,177399,157770,135875,517273,27827831,28459130,130311,432,40.0
E12000001,NORTH EAST,Region,2669941,26621,27612,28621,29575,29315,30224,30960,31956,32027,31543,30703,31194,30129,30285,28445,28077,27511,27259,28888,33611,36055,36399,35413,34700,34123,35082,35489,37202,37182,35165,34128,34102,33245,34140,33395,32114,32265,31911,32091,32390,31135,28122,27288,28879,29145,29785,31867,34581,36492,35149,35720,36420,37573,38213,39253,38684,38887,38410,36700,35591,35918,35625,33882,32437,31030,30802,30408,29121,29323,29725,30183,31426,33758,25248,24129,23136,20472,18708,17182,17539,17158,16364,15159,13345,12509,10851,9793,8725,7825,6511,22608,1312124,1357817,8579,311,41.8
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
N09000006,Fermanagh and Omagh,Local Government District,117397,1422,1565,1561,1618,1565,1564,1581,1667,1665,1696,1676,1635,1622,1603,1505,1487,1550,1510,1515,1189,1098,1112,1179,1299,1338,1382,1349,1332,1375,1412,1495,1441,1529,1506,1473,1538,1510,1463,1564,1564,1534,1452,1425,1453,1488,1541,1582,1632,1637,1647,1599,1605,1534,1624,1640,1616,1625,1551,1495,1492,1470,1451,1393,1379,1327,1277,1274,1199,1171,1165,1167,1067,1102,941,919,907,820,734,665,606,596,562,555,469,410,398,342,307,285,232,880,58833,58564,2864,41,39.7
N09000007,Lisburn and Castlereagh,Local Government District,146002,1781,1863,1804,1778,1867,1847,1894,1899,1989,1942,1978,1958,1851,1782,1691,1664,1595,1601,1621,1508,1427,1410,1532,1650,1629,1801,1745,1801,1797,1940,1912,1985,1956,1980,2000,2020,1941,1963,1995,1987,1939,1824,1795,1833,1813,1825,1926,1928,1993,2021,2056,2154,2233,2135,2252,2143,2071,2062,2064,1860,1806,1767,1657,1571,1451,1417,1444,1368,1295,1326,1334,1332,1354,1201,1144,1220,1216,1074,938,894,896,764,697,707,610,515,508,442,364,302,1077,71654,74348,506,289,40.3
N09000008,Mid and East Antrim,Local Government District,139274,1442,1558,1636,1622,1657,1653,1686,1731,1724,1734,1702,1806,1758,1730,1676,1622,1658,1601,1578,1421,1424,1432,1541,1559,1582,1640,1681,1731,1639,1771,1711,1766,1685,1686,1677,1637,1639,1655,1704,1764,1700,1576,1593,1536,1682,1809,1962,1896,2028,2039,2137,2129,2084,2129,2094,2067,2065,1952,1866,1871,1844,1800,1717,1680,1573,1509,1531,1429,1474,1444,1400,1429,1475,1312,1230,1260,1223,1085,889,909,865,764,733,657,652,552,469,413,380,302,1140,68375,70899,1059,131,42.3
N09000009,Mid Ulster,Local Government District,148528,2049,2138,2220,2211,2172,2209,2271,2180,2262,2286,2274,2187,2111,2068,2050,1941,1847,1919,1873,1553,1554,1498,1645,1835,1965,1967,1870,2076,2034,2077,2073,2049,2132,2025,2069,2137,2079,2105,2185,2087,2003,1934,1882,1910,1988,2001,2054,2014,2013,1964,1971,1916,1909,1939,1884,1848,1841,1781,1726,1643,1615,1566,1519,1411,1323,1335,1310,1252,1258,1206,1146,1158,1149,1065,1074,979,906,828,775,712,701,587,541,525,468,425,396,330,284,259,921,74725,73803,1831,81,36.7


---
### 1.2. FURTHER PROCESSING: SET DATA TYPES, RENAME ATTRIBUTES, LOWERCASE 2 ATTRIBUTES AND CREATE NEW ATTRIBUTES.

In [5]:
def data_prep(df):
    # SET DTYPES.
    # NOTES: 
    # DOWNCASTING DTYPE float64 TO float32 PREVENTED ROUNDING TO 2 DECIMAL PLACES SO LEAVE AS float64.
    # Name ATTRIBUTE HAS TOO MANY UNIQUE OPTIONS TO SET AS DTYPE category.
    # SETTING THE Geography1 ATTRIBUTE AS DTYPE category CAUSES ISSUES WHEN PLOTTING SO ALSO SET THAT AS DTYPE==str.
    df[["Name","Geography1"]] = df[["Name","Geography1"]].astype("str")
    # SET DTYPE FOR ALL INTEGER ATTRIBUTES TO Int32 NULLABLE INTEGER.
    cols = df.select_dtypes("int64").columns
    df[cols] = df[cols].astype("Int32")

    # RENAME INDEX AND ATTRIBUTES.
    df.index.rename("code",
                    inplace=True)
    df.rename(columns={"Name":"name",
                       "Geography1":"geography",
                       "All ages":"total_population",
                       "90+":90},
                    inplace=True)

    # MAKE ALL name AND geography VALUES LOWERCASE:
    for atts in ["name","geography"]:
        df[atts] = df[atts].str.lower()

    # CREATE NEW AGE BAND ATTRIBUTES:
    age_bands = ["0-4","5-18","19-24","25-34","35-44","45-54","55-64","65-74","75-84","85plus"]
    start_vals = [0,5,19,25,35,45,55,65,75,85]
    end_vals = [5,19,25,35,45,55,65,75,85,91]

    for age_band,start,end in zip(age_bands,start_vals,end_vals):
        df[age_band] = df[[num for num in range(start,end)]].sum(axis=1).astype("Int32")

    # CREATE MALE AND FEMALE COUNT PERCENTAGES ATTRIBUTES:
    for prct,pop in zip(["male_percent","female_percent"],['male_population','female_population']):
        df[prct] = ((df[pop]/df["total_population"])*100).astype("float64").round(2)
    
    # CREATE RATIO OF THE TWO ABOVE ATTRIBUTES:
    df["male_female_ratio"] = df["male_percent"].astype("str") + ":" + df["female_percent"].astype("str")

    return df

In [6]:
uk_pop_pre = data_prep(df = create_dataframe(url))
uk_pop_pre.head()

Unnamed: 0_level_0,name,geography,total_population,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,male_population,female_population,area_sq_km,ppl_per_sq_km,median_age,0-4,5-18,19-24,25-34,35-44,45-54,55-64,65-74,75-84,85plus,male_percent,female_percent,male_female_ratio
code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1
K02000001,united kingdom,country,66796807,722881,752554,777309,802334,802185,809152,827149,852059,838680,822812,813774,820269,793405,777849,748569,736855,717056,708482,733067,761508,797247,811223,842201,850411,851998,879406,882616,911206,928979,912042,903442,912000,889687,896728,895275,872653,879070,877449,883170,883325,848120,790679,778814,793909,808017,822075,858311,895065,924065,902606,924754,924666,936289,934335,940971,930783,909684,888131,856779,820531,801220,782729,752215,723647,695374,694374,682311,659691,661251,670572,683532,714929,768023,588245,564138,556173,511519,451509,400077,406018,393605,372612,344104,316201,288806,255542,230667,210077,186163,159641,605181,32978229,33818578,242743,275,40.3,3857263,10999178,4914588,9011381,8415206,9063137,8161093,6687066,4040624,1647271,49.37,50.63,49.37:50.63
K03000001,great britain,country,64903140,700160,729146,753103,777260,777225,784154,801776,825785,812581,797010,787647,794127,768470,754088,725407,713972,694660,686098,710566,739275,774174,788433,819124,827256,828734,855780,858947,886375,903860,886962,878457,886649,864133,871106,870026,847655,854401,852853,857965,858210,823998,767670,755956,770974,784687,798092,833365,869781,898296,876808,898690,898167,909988,907991,914195,904528,884109,863181,832285,797151,778569,760463,730946,703319,676084,675493,663890,642055,643742,653146,666359,697988,751228,572800,549093,541518,497605,439224,389239,395836,383494,363446,335419,308434,281696,249323,224878,205072,181788,156024,591447,32045512,32857628,228950,283,40.3,3736894,10656341,4776996,8762295,8174369,8805373,7930635,6515794,3935911,1608532,49.37,50.63,49.37:50.63
K04000001,england and wales,country,59439840,649388,676412,698837,720721,719821,726317,742744,765225,750173,737531,726528,733267,709958,696722,668590,658280,640608,632385,653732,677608,708336,720698,748254,755826,757151,782598,784090,807248,824760,810973,802809,810906,790832,798415,797946,777820,783817,781425,787003,788497,756871,705441,694855,706616,720070,732367,763688,795939,821630,801260,820123,818248,829626,828203,831741,823099,802885,784119,755249,722779,705065,689075,661702,636452,612394,612894,602897,583460,585085,594546,606965,637206,686169,524406,503866,496130,455010,400818,354441,361072,350455,332255,306983,282197,257792,228197,206177,188071,167219,143992,547789,29382509,30057331,151047,394,40.2,3465179,9842060,4367873,8010577,7502415,8042825,7192819,5937494,3597153,1481445,49.43,50.57,49.43:50.57
E92000001,england,country,56286961,618858,644056,665596,686135,684992,691122,706742,727938,712204,700200,689733,695753,673789,660928,634043,624590,607496,599393,618873,639880,668129,679576,706968,715442,717748,740656,742735,765411,782363,770244,762666,771667,752937,760681,760003,741443,746952,745065,749311,750871,721254,672514,661799,673246,685484,696569,725600,755206,778729,759708,776578,775173,785471,784074,786165,777616,758665,740085,712624,681661,664457,649021,622905,599252,576200,575744,566050,547827,549233,557886,569617,598038,645078,493261,473332,466239,427207,375723,332047,339470,329713,312737,289092,265631,242740,214727,194007,177399,157770,135875,517273,27827831,28459130,130311,432,40.0,3299637,9342804,4127743,7609363,7147939,7623273,6782486,5576066,3380599,1397051,49.44,50.56,49.44:50.56
E12000001,north east,region,2669941,26621,27612,28621,29575,29315,30224,30960,31956,32027,31543,30703,31194,30129,30285,28445,28077,27511,27259,28888,33611,36055,36399,35413,34700,34123,35082,35489,37202,37182,35165,34128,34102,33245,34140,33395,32114,32265,31911,32091,32390,31135,28122,27288,28879,29145,29785,31867,34581,36492,35149,35720,36420,37573,38213,39253,38684,38887,38410,36700,35591,35918,35625,33882,32437,31030,30802,30408,29121,29323,29725,30183,31426,33758,25248,24129,23136,20472,18708,17182,17539,17158,16364,15159,13345,12509,10851,9793,8725,7825,6511,22608,1312124,1357817,8579,311,41.8,141744,419201,210301,349130,305340,355053,357164,294123,171572,66313,49.14,50.86,49.14:50.86


---
### 1.3. CREATE FINAL RESULT DATAFRAME.

In [7]:
# Population Estimate Breakdown by UK Administrative Geographies:
# FILTER AND REORDER ATTRIBUTES.
uk_pop = uk_pop_pre[["name","geography","total_population","male_population","female_population","male_percent",
                     "female_percent","male_female_ratio","0-4","5-18","19-24","25-34","35-44","45-54","55-64", 
                     "65-74","75-84","85plus","median_age","area_sq_km","ppl_per_sq_km"]].copy()

---
---
## 2. CONNECT TO POSTGRES DATABASE.

In [8]:
def connect_to_postgres():
    """
    Connect to Postgres database 'github_projects' as user 'postgres'.
    """

    conn_params_dict = {"user":"postgres",
                        "password":"password",
                        # USE localhost, 'postgres' CONTAINER NAME NO LONGER WORKS.
                        "host":"postgres",
                        "database":"github_projects"}

    connect_alchemy = "postgresql+psycopg2://%s:%s@%s/%s" % (
        conn_params_dict['user'],
        conn_params_dict['password'],
        conn_params_dict['host'],
        conn_params_dict['database']
    )

    # CREATE POSTGRES ENGINE (CONNECTION POOL).
    engine = create_engine(connect_alchemy)
    print("Connection to Postgres successful.")
    return engine

In [9]:
# EXECUTE FUNCTION TO CONNECT TO POSTGRES.
engine = connect_to_postgres()

Connection to Postgres successful.


---
---
## 3. POSTGRES OPERATIONS:
### 3.1. WRITE DATAFRAME TO POSTGRES TABLE. 

In [10]:
# THE RESULT OF THIS DATA PIPELINE IS A POSTGRES TABLE AND 3 VIEWS.
# IF THE TABLE AND VIEWS ALREADY EXIST (FROM A PREVIOUS RUN OF THIS NOTEBOOK), THEN DROP THEM ALL SO THAT THEY ...
# CAN RECREATED NOW WITHOUT ERROR). 
# THE PRESENCE OF THE 3 VIEWS PREVENTS THE USE OF if_exists='replace' IN uk_pop.to_sql() BELOW.
drop_if_exists = engine.execute("""
                                DROP TABLE IF EXISTS uk_pop_stats CASCADE;
                                DROP VIEW IF EXISTS uk_pop_stats_countries,
                                uk_pop_stats_regions, uk_pop_stats_lad20;
                                """)
drop_if_exists.close()

# CREATE POSTGRES TABLE (POPULATED WITH DATA) CALLED uk_pop_stats FROM uk_pop DATAFRAME.
uk_pop.to_sql("uk_pop_stats", con=engine, index=True,
              dtype={"code":sqlalchemy.types.Text,
                     "name":sqlalchemy.types.Text,
                     "geography":sqlalchemy.types.Text,
                     "total_population":sqlalchemy.types.Integer,
                     "male_population":sqlalchemy.types.Integer,
                     "female_population":sqlalchemy.types.Integer,
                     "male_percent":sqlalchemy.types.Float(2),
                     "female_percent":sqlalchemy.types.Float(2),
                     "male_female_ratio":sqlalchemy.types.Text,
                     "0-4":sqlalchemy.types.Integer,
                     "5-18":sqlalchemy.types.Integer,
                     "19-24":sqlalchemy.types.Integer,
                     "25-34":sqlalchemy.types.Integer,
                     "35-44":sqlalchemy.types.Integer,
                     "45-54":sqlalchemy.types.Integer,
                     "55-64":sqlalchemy.types.Integer,
                     "65-74":sqlalchemy.types.Integer,
                     "75-84":sqlalchemy.types.Integer,
                     "85plus":sqlalchemy.types.Integer,
                     "median_age":sqlalchemy.types.Float(2),
                     "area_sq_km":sqlalchemy.types.Integer,
                     "ppl_per_sq_km":sqlalchemy.types.Integer})

# ADD PRIMARY KEY TO CREATED TABLE.
set_primary_key = engine.execute("""
                                 ALTER TABLE uk_pop_stats ADD PRIMARY KEY (code)
                                 """)
set_primary_key.close()

print("The \033[1muk_pop\033[0m DataFrame has been successfully saved to Postgres under the table name \033[1muk_pop_stats\033[0m.\n")

The [1muk_pop[0m DataFrame has been successfully saved to Postgres under the table name [1muk_pop_stats[0m.



---
### 3.2. CREATE 3 VIEWS FROM ABOVE TABLE DATA. 

In [11]:
# CREATE 3 VIEWS OF uk_pop_stats TABLE - USED TO PRODUCE 3 POPULATION ESTIMATE MAPS BASED ON THE ADMINISTRATIVE GEOGRAPHIES OF UK.
create_views = engine.execute("""
                              CREATE VIEW uk_pop_stats_countries AS
                              SELECT * FROM uk_pop_stats
                              WHERE name IN ('scotland','england','wales','northern ireland');
                              CREATE VIEW uk_pop_stats_regions AS
                              SELECT * FROM uk_pop_stats
                              WHERE geography IN ('region') OR name IN ('scotland','wales','northern ireland');
                              CREATE VIEW uk_pop_stats_lad20 AS
                              SELECT * FROM uk_pop_stats
                              WHERE geography IN ('metropolitan district','non-metropolitan district',
                              'unitary authority','london borough','council area','local government district');
                              """)
create_views.close()

---
---
## 4. CLOSE ALL CONNECTIONS TO POSTGRES DATABASE.

In [12]:
def disconnect_from_postgres():
    """
    Completely disconnect from Postgres.
    """
    engine.dispose() 
    print("All connections to Postgres have been terminated.")

In [13]:
disconnect_from_postgres()

All connections to Postgres have been terminated.
