## Converting the riding boundary shapefile into geojson, topojson 

* TopoJSON Wiki: https://github.com/mbostock/topojson/wiki
* Dorling Cartograms: http://www.ncgia.ucsb.edu/projects/Cartogram_Central/types.html
* TopoJSON API doc: https://github.com/mbostock/topojson/wiki/API-Reference

In [1]:
import subprocess

In [2]:
def exec_command(command_text):
    p = subprocess.call(command_text, shell=True)
    if p == 1: raise Exception('failed command') 

In [3]:
data_path = "../data/fed_cf_CA_2_1_shp_en/"

These are the steps from the Let's make a Map tutorial: http://bost.ocks.org/mike/map/

In [10]:
exec_command("ogr2ogr -f GeoJSON " + data_path + "places.json " + data_path + "FED_CA_2_1_en.shp")

In [11]:
exec_command("topojson -o " + data_path + "ca_wrong.json -- " + data_path + "places.json")

That only renders as a black rectangle, and it takes a really long time.

Loading the shapefile into google earth shows the ridings fine, so I don't think the shapefile is corrupted.

More documentation: http://www.gdal.org/drv_shapefile.html

After reading: http://stackoverflow.com/questions/22081863/problems-converting-from-shape-to-topojson it looks like I can just use topojson to convert the shapefile (without the projection file and others) and keep the data in the conic conformal projection.

In [17]:
exec_command("""topojson -p --width=960 --margin 20 --simplify=.1 -o ../data/ca_w_prop.json -- ../data/fed_cf_CA_2_1_shp_en/FED_CA_2_1_en.shp""")

The -p keeps all the original properties in the file, including links to the elections canada website, riding names, etc.

## Generate fake prediction data 

In [83]:
import json

In [104]:
f = open("../data/ca_w_prop.json", 'r')
ca_w_prop = json.load(f)
f.close()

In [108]:
list_elements = ca_w_prop['objects']['FED_CA_2_1_en']['geometries']

In [109]:
list_elements[0]

{u'arcs': [[0, 1, 2, 3]],
 u'properties': {u'CREADT': u'20131005',
  u'DECPOPCNT': 108901,
  u'ENLEGALDSC': u'http://www.elections.ca/res/cir/maps2/mapprov.asp?map=48010&lang=e#descrip',
  u'ENNAME': u'Calgary Rocky Ridge',
  u'FEDNUM': 48010,
  u'FRLEGALDSC': u'http://www.elections.ca/res/cir/maps2/mapprov.asp?map=48010&lang=f#descrip',
  u'FRNAME': u'Calgary Rocky Ridge',
  u'NID': u'{B0A17501-2D8B-4912-AAA1-C9CE1ACC6327}',
  u'PROVCODE': u'AB',
  u'QUIPOPCNT': 0,
  u'REPORDER': u'2013',
  u'REVDT': None},
 u'type': u'Polygon'}

In [110]:
import pandas as pd

In [111]:
df = pd.DataFrame(columns=['ENNAME', 'FEDNUM', 'PROVCODE', 'DECPOPCNT'], index=range(len(list_elements)))

In [115]:
for c in df.columns:
    df[c] = [x['properties'][c] for x in list_elements]

In [118]:
names = []
for i in df.index:
    if df['ENNAME'][i] in names:
        print df['ENNAME'][i]
    names.append(df['ENNAME'][i])

Selkirk--Interlake--Eastman
Selkirk--Interlake--Eastman
Selkirk--Interlake--Eastman
Gaspésie--Les Îles-de-la-Madeleine
Manicouagan
Manicouagan
Halifax
Courtenay--Alberni
Brandon--Souris


There are now 338 federal ridings, and there are 347 feature objects in the topojson file. After printing their names, this is because some of the ridings are not congiguous and consist of more than one feature object.

In [119]:
df.query("ENNAME == 'Manicouagan'")

Unnamed: 0,ENNAME,FEDNUM,PROVCODE,DECPOPCNT
228,Manicouagan,24046,QC,94766
229,Manicouagan,24046,QC,94766
254,Manicouagan,24046,QC,94766


#### The polling data to add should have the following form: 

In [122]:
parties = ['NDP', 'Conservative', 'Liberal', 'Green', 'Bloc', 'Other']

In [125]:
len(set(df['ENNAME']))

338

In [126]:
len(set(df['FEDNUM']))

338

In [None]:
import numpy as np

In [130]:
prediction = {}
for n in set(df['FEDNUM']):
    o = {'leader': np.random.choice(parties),
         'likelyhood': np.random.randint(50, 100)}
    prediction[n] = o

In [132]:
fsave = open('../data/fake_predictions.json', 'w')

In [133]:
json.dump(prediction, fsave)

In [134]:
fsave.close()

## Let's get the riding prediction data from 308

Crap, they're all image files: http://1.bp.blogspot.com/-Qhio6fQmHbQ/Vddn6BJ1qwI/AAAAAAAAXUY/O2qwrYlRXv8/s1600/Ridings%2B1.png

To get some dummy data, maybe I could just generate it randomly?!

Or try ocr:

In [55]:
def download(url):
    "from http://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python"
    import urllib2

    file_name = url.split('/')[-1]
    file_name = file_name.replace("%", '_')
    u = urllib2.urlopen(url)
    f = open(file_name, 'wb')
    meta = u.info()
    file_size = int(meta.getheaders("Content-Length")[0])
    print "Downloading: %s Bytes: %s" % (file_name, file_size)

    file_size_dl = 0
    block_sz = 8192
    while True:
        buffer = u.read(block_sz)
        if not buffer:
            break

        file_size_dl += len(buffer)
        f.write(buffer)
        status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
        status = status + chr(8)*(len(status)+1)
#         print status,

    f.close()
    return file_name

In [56]:
url = "http://1.bp.blogspot.com/-Qhio6fQmHbQ/Vddn6BJ1qwI/AAAAAAAAXUY/O2qwrYlRXv8/s1600/Ridings%2B1.png"

In [57]:
fname = download(url)

Downloading: Ridings_2B1.png Bytes: 297608


In [66]:
fname

'Ridings_2B1.png'

In [70]:
import Image

image = Image.open(fname)
# image.show()
image.save('a.tif')

In [None]:
pytesser

In [68]:
im = Image.open('a.tiff')

In [63]:
import pytesseract as tess

In [69]:
tess.image_to_string(im)

AttributeError: 'NoneType' object has no attribute 'bands'

In [49]:
f2 = open(fname, 'r')

In [50]:
tess.image_to_string(f2)

AttributeError: 'file' object has no attribute 'split'

In [71]:
cd pytesser_v0.0.1/

/Users/stephenmcmurtry/work/election_map/python/pytesser_v0.0.1


In [72]:
import pytesser

In [81]:
img = Image.open(fname)

In [82]:
pytesser.image_to_string(img)

OSError: [Errno 2] No such file or directory

In [80]:
pytesser.image_file_to_string("pytesser_v0.0.1/fnord.tif")

OSError: [Errno 2] No such file or directory

In [75]:
f = open("../a.tif")

In [78]:
cd ..


/Users/stephenmcmurtry/work/election_map/python
