## Task A - Download the geometries from the source 

In [8]:
import wget

print('Beginning file download with wget module')

url = 'http://raw.githubusercontent.com/datasets/geo-countries/master/data/countries.geojson'
wget.download(url, 'countries.geojson')


Beginning file download with wget module


'countries (1).geojson'

## Task B - Imports the geometries to PostGIS
### ogr2ogr tool
I'm using ogr2ogr to import geojson file into POSTGIS. I also need to setup POSTGIS database using this command:
`CREATE EXTENSION postgis;`

Command for import `countries.geojson` into new table `countries`:

`ogr2ogr -f PostgreSQL PG:"dbname=positiumDB user=enlik" countries.geojson -nln countries -nlt MULTIPOLYGON`

[Reference](https://gdal.org/programs/ogr2ogr.html)

and then I export table 'countries' into countries.csv file using [Postico](https://eggerapps.at/postico/) macOS app

### Import the countries.csv file into table test_gis.geom

In [43]:
import pandas as pd
import psycopg2 # used for PostgreSQL database adapter in Python
import psycopg2.extras # extra feature from psycopg2

In [44]:
countries = pd.read_csv("countries.csv")

In [45]:
countries.head()

Unnamed: 0,ogc_fid,admin,iso_a3,iso_a2,wkb_geometry
0,1,Aruba,ABW,AW,0106000020E6100000010000000103000000010000001A...
1,2,Afghanistan,AFG,AF,0106000020E610000001000000010300000001000000FD...
2,3,Angola,AGO,AO,0106000020E6100000030000000103000000010000000E...
3,4,Anguilla,AIA,AI,0106000020E61000000200000001030000000100000018...
4,5,Albania,ALB,AL,0106000020E6100000010000000103000000010000002D...


In [46]:
countries = countries.filter(['ogc_fid', 'wkb_geometry'])

In [47]:
countries = countries.rename(columns = {"ogc_fid": "geom_id", "wkb_geometry": "geom"})

In [50]:
df_columns = list(countries)
df_columns

['geom_id', 'geom']

In [51]:
columns = ",".join(df_columns)
columns

'geom_id,geom'

In [52]:
values = "VALUES({})".format(",".join(["%s" for _ in df_columns]))
values

'VALUES(%s,%s)'

In [53]:
table = 'test_gis.geom'
table

'test_gis.geom'

In [55]:
conn = psycopg2.connect("host=localhost dbname=positiumDB user=enlik")
cur = conn.cursor()

insert_stmt = "INSERT INTO {} ({}) {}".format(table,columns,values)

psycopg2.extras.execute_batch(cur, insert_stmt, countries.values)
conn.commit()

All source data from `countries.csv` (originally countries.geojson) successfuly inserted into table `test_gis.geom`.

All values in column `ogc_fid` inserted into column `geom_id` and `wkb_geometry` into `geom`

## TASK C - Links each iso_a2 code to a specific geometry

In [56]:
a2_list = pd.read_csv("iso_a2_list.csv")

ParserError: Error tokenizing data. C error: Expected 1 fields in line 29, saw 2


Because of above error from original `iso_a2_list.csv`, I exported table `test_gis.iso_a2_list` into `iso_a2_list_new.csv` for next data manipulation process

In [57]:
a2_list = pd.read_csv("iso_a2_list_new.csv")
a2_list = a2_list.filter(['iso_a2', 'country_name_eng', 'geom_id']) # filter only required column
a2_list.head()

Unnamed: 0,iso_a2,country_name_eng,geom_id
0,ae,United Arab Emirates,
1,af,Afghanistan,
2,ag,Antigua and Barbuda,
3,ai,Anguilla,
4,al,Albania,


In [59]:
countries_new = pd.read_csv("countries.csv")
countries_new = countries_new.filter(['ogc_fid', 'iso_a2'])
countries_new['iso_a2'] = countries_new['iso_a2'].str.lower() # set all 'iso_a2' value to lower case string 
countries_new = countries_new.rename(columns = {"ogc_fid": "geom_id"})
countries_new.head()

Unnamed: 0,geom_id,iso_a2
0,1,aw
1,2,af
2,3,ao
3,4,ai
4,5,al


### Merge Column geom_id based on iso_a2 value

In [22]:
# result = pd.merge(a2_list, countries_new, how = 'left', on = 'iso_a2')
result = pd.merge(a2_list, countries_new, on = 'iso_a2')
result = result.drop(['geom_id_x'], axis = 1)
result = result.rename(columns = {'geom_id_y': "geom_id"})
result.head(20)

Unnamed: 0,iso_a2,country_name_eng,geom_id
0,ae,United Arab Emirates,8
1,af,Afghanistan,2
2,ag,Antigua and Barbuda,15
3,ai,Anguilla,4
4,al,Albania,5
5,am,Armenia,10
6,ao,Angola,3
7,aq,Antarctica,12
8,aq,Antarctica,13
9,ar,Argentina,9


In [60]:
df_columns = list(result)
df_columns

columns = ",".join(df_columns)
columns

'iso_a2,country_name_eng,geom_id'

In [61]:
values = "VALUES({})".format(",".join(["%s" for _ in df_columns]))
values

'VALUES(%s,%s,%s)'

In [62]:
table = 'test_gis.iso_a2_list_new'
table

'test_gis.iso_a2_list_new'

In [63]:
conn = psycopg2.connect("host=localhost dbname=positiumDB user=enlik")
cur = conn.cursor()

cur.execute("""
CREATE TABLE test_gis.iso_a2_list_new (id serial primary key, iso_a2 varchar(2), country_name_eng varchar(50), geom_id int)
""")
conn.commit()

Creating new table `test_gis.iso_a2_list_new` to store `geom_id` value

In [64]:
insert_stmt = "INSERT INTO {} ({}) {}".format(table,columns,values)

psycopg2.extras.execute_batch(cur, insert_stmt, result.values)
conn.commit()

Process completed for link each iso_a2 code to a specific geometry