## Converting the riding boundary shapefile into geojson, topojson 

* TopoJSON Wiki: https://github.com/mbostock/topojson/wiki
* Dorling Cartograms: http://www.ncgia.ucsb.edu/projects/Cartogram_Central/types.html
* TopoJSON API doc: https://github.com/mbostock/topojson/wiki/API-Reference

In [18]:
import subprocess

In [19]:
def exec_command(command_text):
    p = subprocess.call(command_text, shell=True)
    if p == 1: raise Exception('failed command') 

In [20]:
data_path = "../data/fed_cf_CA_2_1_shp_en/"

These are the steps from the Let's make a Map tutorial: http://bost.ocks.org/mike/map/

In [10]:
exec_command("ogr2ogr -f GeoJSON " + data_path + "places.json " + data_path + "FED_CA_2_1_en.shp")

In [11]:
exec_command("topojson -o " + data_path + "ca_wrong.json -- " + data_path + "places.json")

That only renders as a black rectangle, and it takes a really long time.

Loading the shapefile into google earth shows the ridings fine, so I don't think the shapefile is corrupted.

More documentation: http://www.gdal.org/drv_shapefile.html

After reading: http://stackoverflow.com/questions/22081863/problems-converting-from-shape-to-topojson it looks like I can just use topojson to convert the shapefile (without the projection file and others) and keep the data in the conic conformal projection.

In [17]:
exec_command("""topojson -p --width=960 --margin 20 --simplify=.1 -o ../data/ca_w_prop.json -- ../data/fed_cf_CA_2_1_shp_en/FED_CA_2_1_en.shp""")

In [215]:
exec_command("""topojson -p --height=800 --margin 20 --simplify=.1 -o ../data/ca_w_prop_1.json -- ../data/fed_cf_CA_2_1_shp_en/FED_CA_2_1_en.shp""")

In [21]:
exec_command("""topojson -p --height=800 --margin 20 -o ../data/ca_w_prop_3.json -- ../data/fed_cf_CA_2_1_shp_en/FED_CA_2_1_en.shp""")

In [22]:
exec_command("""topojson -p --height=1800 --simplify=.1 --margin 20 -o ../data/ca_w_prop_4.json -- ../data/fed_cf_CA_2_1_shp_en/FED_CA_2_1_en.shp""")

In [23]:
exec_command("""topojson -p --height=3000 --simplify=.1 --margin 20 -o ../data/ca_w_prop_5.json -- ../data/fed_cf_CA_2_1_shp_en/FED_CA_2_1_en.shp""")

In [24]:
exec_command("""topojson -p --height=4000 --simplify=.1 --margin 20 -o ../data/ca_w_prop_6.json -- ../data/fed_cf_CA_2_1_shp_en/FED_CA_2_1_en.shp""")

In [30]:
exec_command("""topojson -p --height=4000 --margin 20 -o ../data/ca_w_prop_7.json -- ../data/fed_cf_CA_2_1_shp_en/FED_CA_2_1_en.shp""")

This one's a keeper

In [31]:
exec_command("""topojson -p --height=4000 --margin 20 -o ../data/ca_w_prop.json -- ../data/fed_cf_CA_2_1_shp_en/FED_CA_2_1_en.shp""")

The -p keeps all the original properties in the file, including links to the elections canada website, riding names, etc.

## Generate fake prediction data 

In [1]:
import json

In [2]:
f = open("../data/ca_w_prop.json", 'r')
ca_w_prop = json.load(f)
f.close()

In [3]:
list_elements = ca_w_prop['objects']['FED_CA_2_1_en']['geometries']

In [4]:
list_elements[0]

{u'arcs': [[0, 1, 2, 3]],
 u'properties': {u'CREADT': u'20131005',
  u'DECPOPCNT': 108901,
  u'ENLEGALDSC': u'http://www.elections.ca/res/cir/maps2/mapprov.asp?map=48010&lang=e#descrip',
  u'ENNAME': u'Calgary Rocky Ridge',
  u'FEDNUM': 48010,
  u'FRLEGALDSC': u'http://www.elections.ca/res/cir/maps2/mapprov.asp?map=48010&lang=f#descrip',
  u'FRNAME': u'Calgary Rocky Ridge',
  u'NID': u'{B0A17501-2D8B-4912-AAA1-C9CE1ACC6327}',
  u'PROVCODE': u'AB',
  u'QUIPOPCNT': 0,
  u'REPORDER': u'2013',
  u'REVDT': None},
 u'type': u'Polygon'}

In [5]:
import pandas as pd

In [6]:
df = pd.DataFrame(columns=['ENNAME', 'FEDNUM', 'PROVCODE', 'DECPOPCNT'], index=range(len(list_elements)))

In [7]:
for c in df.columns:
    df[c] = [x['properties'][c] for x in list_elements]

In [8]:
names = []
for i in df.index:
    if df['ENNAME'][i] in names:
        print df['ENNAME'][i]
    names.append(df['ENNAME'][i])

Selkirk--Interlake--Eastman
Selkirk--Interlake--Eastman
Selkirk--Interlake--Eastman
Gaspésie--Les Îles-de-la-Madeleine
Manicouagan
Manicouagan
Halifax
Courtenay--Alberni
Brandon--Souris


There are now 338 federal ridings, and there are 347 feature objects in the topojson file. After printing their names, this is because some of the ridings are not congiguous and consist of more than one feature object.

In [9]:
df.query("ENNAME == 'Manicouagan'")

Unnamed: 0,ENNAME,FEDNUM,PROVCODE,DECPOPCNT
228,Manicouagan,24046,QC,94766
229,Manicouagan,24046,QC,94766
254,Manicouagan,24046,QC,94766


In [10]:
print len(set(df['ENNAME']))
print len(set(df['FEDNUM']))

338
338


#### The polling data to add should have the following form: 

In [11]:
parties = ['ndp', 'con', 'lib', 'grn', 'blc', 'oth']
dataz = {"ENNAME": "abc",
         "FEDNUM": 124,
         "leader": "ndp",
         "likelyhood": "81%",
         "details":
        {"con":{"low": '40.9', "med": '40.0', 'high': '40.0'}, 
        "lib":{"low": '40.9', "med": '40.0', 'high': '40.0'},
        "ndp":{"low": '40.9', "med": '40.0', 'high': '40.0'},
        "blc":{"low": '40.9', "med": '40.0', 'high': '40.0'},
        "grn":{"low": '40.9', "med": '40.0', 'high': '40.0'},
        "oth":{"low": '40.9', "med": '40.0', 'high': '40.0'}}
         }

Additional conditions: low, med and high should be between 0 and 100

low < med < high

likelyhood should be between 50 and 90

the leader should be the party with the highest med number

In [12]:
import numpy as np

In [13]:
fednum_dict = {}
for i in df.index:
    if df.FEDNUM[i] not in fednum_dict:
        fednum_dict[df.FEDNUM[i]] = df.ENNAME[i]

In [14]:
fednum_dict

{10001: u'Avalon',
 10002: u'Bonavista--Burin--Trinity',
 10003: u'Coast of Bays--Central--Notre Dame',
 10004: u'Labrador',
 10005: u'Long Range Mountains',
 10006: u"St. John's East",
 10007: u"St. John's South--Mount Pearl",
 11001: u'Cardigan',
 11002: u'Charlottetown',
 11003: u'Egmont',
 11004: u'Malpeque',
 12001: u'Cape Breton--Canso',
 12002: u'Central Nova',
 12003: u'Cumberland--Colchester',
 12004: u'Dartmouth--Cole Harbour',
 12005: u'Halifax',
 12006: u'Halifax West',
 12007: u'Kings--Hants',
 12008: u'Sackville--Preston--Chezzetcook',
 12009: u'South Shore--St. Margarets',
 12010: u'Sydney--Victoria',
 12011: u'West Nova',
 13001: u'Acadie--Bathurst',
 13002: u'Beaus\xe9jour',
 13003: u'Fredericton',
 13004: u'Fundy Royal',
 13005: u'Madawaska--Restigouche',
 13006: u'Miramichi--Grand Lake',
 13007: u'Moncton--Riverview--Dieppe',
 13008: u'New Brunswick Southwest',
 13009: u'Saint John--Rothesay',
 13010: u'Tobique--Mactaquac',
 24001: u'Abitibi--Baie-James--Nunavik--Eey

In [17]:
f = open('fednum_dict.py', 'w')
f.writelines('fednum_dict = ' + str(fednum_dict))
f.close()

In [205]:
prediction = {}
for n in set(df['FEDNUM']):
    p = {}
    p['FEDNUM'] = n
    p["ENNAME"] = fednum_dict[n]
    s = 0
    temp = []
    for party in parties:
        j = np.random.rand()
        temp.append({'party': party, 'val': j})
        s += j
    
    m = max(temp, key= lambda x: x['val'])
    
    p['leader'] = m['party']
    p['likelyhood'] = str(np.random.randint(50, 90))
    
    details = {}    
    for party in parties:
        o = 100*[x['val'] for x in temp if x['party']==party][0]/s
        
        details[party] = {"low": str(np.round(o - o*0.1, 1)), 
                          "med": str(np.round(o, 1)), 
                          "high": str(np.round(o+o*0.1, 1))}
    
    p['details'] = details
    
    
    prediction[n] = p

In [206]:
fsave = open('../data/fake_predictions.json', 'w')

In [207]:
json.dump(prediction, fsave)

In [208]:
fsave.close()

In [209]:
prediction

{10001: {'ENNAME': u'Avalon',
  'FEDNUM': 10001,
  'details': {'blc': {'high': '15.2', 'low': '12.5', 'med': '13.8'},
   'con': {'high': '10.0', 'low': '8.1', 'med': '9.0'},
   'grn': {'high': '25.7', 'low': '21.0', 'med': '23.3'},
   'lib': {'high': '19.5', 'low': '15.9', 'med': '17.7'},
   'ndp': {'high': '12.8', 'low': '10.5', 'med': '11.6'},
   'oth': {'high': '26.9', 'low': '22.0', 'med': '24.5'}},
  'leader': 'oth',
  'likelyhood': '54'},
 10002: {'ENNAME': u'Bonavista--Burin--Trinity',
  'FEDNUM': 10002,
  'details': {'blc': {'high': '27.8', 'low': '22.7', 'med': '25.3'},
   'con': {'high': '30.8', 'low': '25.2', 'med': '28.0'},
   'grn': {'high': '18.6', 'low': '15.2', 'med': '16.9'},
   'lib': {'high': '8.6', 'low': '7.1', 'med': '7.9'},
   'ndp': {'high': '14.5', 'low': '11.8', 'med': '13.2'},
   'oth': {'high': '9.7', 'low': '7.9', 'med': '8.8'}},
  'leader': 'con',
  'likelyhood': '68'},
 10003: {'ENNAME': u'Coast of Bays--Central--Notre Dame',
  'FEDNUM': 10003,
  'details

## Let's get the riding prediction data from 308

Crap, they're all image files: http://1.bp.blogspot.com/-Qhio6fQmHbQ/Vddn6BJ1qwI/AAAAAAAAXUY/O2qwrYlRXv8/s1600/Ridings%2B1.png

To get some dummy data, maybe I could just generate it randomly?!

Or try ocr:

In [55]:
def download(url):
    "from http://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python"
    import urllib2

    file_name = url.split('/')[-1]
    file_name = file_name.replace("%", '_')
    u = urllib2.urlopen(url)
    f = open(file_name, 'wb')
    meta = u.info()
    file_size = int(meta.getheaders("Content-Length")[0])
    print "Downloading: %s Bytes: %s" % (file_name, file_size)

    file_size_dl = 0
    block_sz = 8192
    while True:
        buffer = u.read(block_sz)
        if not buffer:
            break

        file_size_dl += len(buffer)
        f.write(buffer)
        status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
        status = status + chr(8)*(len(status)+1)
#         print status,

    f.close()
    return file_name

In [56]:
url = "http://1.bp.blogspot.com/-Qhio6fQmHbQ/Vddn6BJ1qwI/AAAAAAAAXUY/O2qwrYlRXv8/s1600/Ridings%2B1.png"

In [57]:
fname = download(url)

Downloading: Ridings_2B1.png Bytes: 297608


In [66]:
fname

'Ridings_2B1.png'

In [70]:
import Image

image = Image.open(fname)
# image.show()
image.save('a.tif')

In [None]:
pytesser

In [68]:
im = Image.open('a.tiff')

In [63]:
import pytesseract as tess

In [69]:
tess.image_to_string(im)

AttributeError: 'NoneType' object has no attribute 'bands'

In [49]:
f2 = open(fname, 'r')

In [50]:
tess.image_to_string(f2)

AttributeError: 'file' object has no attribute 'split'

In [71]:
cd pytesser_v0.0.1/

/Users/stephenmcmurtry/work/election_map/python/pytesser_v0.0.1


In [72]:
import pytesser

In [81]:
img = Image.open(fname)

In [82]:
pytesser.image_to_string(img)

OSError: [Errno 2] No such file or directory

In [80]:
pytesser.image_file_to_string("pytesser_v0.0.1/fnord.tif")

OSError: [Errno 2] No such file or directory

In [75]:
f = open("../a.tif")

In [78]:
cd ..


/Users/stephenmcmurtry/work/election_map/python
