<a href="https://colab.research.google.com/github/scampion/geocodenet/blob/main/geocodenet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip3 install keras-metrics

Collecting keras-metrics
  Downloading https://files.pythonhosted.org/packages/32/c9/a87420da8e73de944e63a8e9cdcfb1f03ca31a7c4cdcdbd45d2cdf13275a/keras_metrics-1.1.0-py2.py3-none-any.whl
Installing collected packages: keras-metrics
Successfully installed keras-metrics-1.1.0


## Load and prepare data

CSV sample : `lat, lon, postcode, city, street, number`

    38,0.49830201027777776,0.7682701011111116,53100,Mayenne,Rue de la Peyennière,0
    39,0.49830201027777776,0.7682701011111116,53100,Mayenne,,
    46,0.5165364863888888,0.7531874688888894,73230,Saint-Alban-Leysse,,
    47,0.504549603611111,0.7787510938888894,80230,Saint-Valery-sur-Somme,Route du Tréport,0


In [6]:
if 'google.colab' in str(get_ipython()):
    from google.colab import drive
    drive.mount('/content/drive')
    csv_filepath = "/content/drive/My Drive/data/geocodenet/france.csv"
else: 
    csv_filepath = "france.csv"

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
import numpy as np
import pandas as pd
df = pd.read_csv(csv_filepath)
df.fillna('', inplace=True)
for c in ['number', 'street', 'postalcode', 'city']:    
    df[c].astype(object).replace(np.nan, 'None')
df['address'] = (df['number'] + " " + df['street'] + " " + df['postalcode'] + " " + df['city'])
for c in ['number', 'street', 'postalcode', 'city', 'Unnamed: 0']:
    del df[c]
df.head()

Unnamed: 0,long,lat,address
0,0.50404,0.742304,31500 Toulouse
1,0.490218,0.770736,0 Chemin de Halage 22300 Lannion
2,0.490218,0.770736,22300 Lannion
3,0.506738,0.771838,0 Place des Déportés 93350 Le Bourget
4,0.506738,0.771838,93350 Le Bourget


## Load variables

In [9]:
import json 
if 'google.colab' in str(get_ipython()):
    from google.colab import drive
    config = json.load(open("/content/drive/My Drive/src/geocodenet/config.json"))
else: 
    config = json.load(open("./config.json"))


max_address_length = config["data"]["input_size"]
alphabet=config["data"]["alphabet"]
alphabet_size = len(alphabet)

## Encode address

A short example: 

    "12 abc" > [28, 29, 0, 1, 2, 3]

In [16]:
def str_to_indexes(s, max_length):
    s = s.lower()
    str2idx = np.zeros(max_length, dtype='int32')
    for i in range(min(len(s), max_length)): 
        if s[i] in alphabet:
            str2idx[i] = alphabet.index(s[i]) + 1              
    return str2idx
df['address_encoded'] = df['address'].apply(str_to_indexes, max_length=max_address_length)

# Convert to numpy 
nbobs = len(df['address_encoded'])
X = np.concatenate(df['address_encoded'].to_numpy()).ravel()
X = X.reshape(nbobs, max_address_length)
Y = np.concatenate(df[['long', 'lat']].to_numpy()).ravel()
Y = Y.reshape(nbobs, 2)

## TPU management 

Using Colab from Google, you can use TPU to speed up the computing task

In [17]:
import tensorflow as tf
import os 

tpu_strategy = None
try:
    device_name = os.environ['COLAB_TPU_ADDR']
    tpu_address = 'grpc://' + device_name
    print('Found TPU at: {}'.format(tpu_address))
    cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu=tpu_address)
    tf.config.experimental_connect_to_cluster(cluster_resolver)
    tf.tpu.experimental.initialize_tpu_system(cluster_resolver)
    tpu_strategy = tf.distribute.TPUStrategy(cluster_resolver)
except KeyError:
 print('TPU not found')

Found TPU at: grpc://10.49.210.90:8470




INFO:tensorflow:Initializing the TPU system: grpc://10.49.210.90:8470


INFO:tensorflow:Initializing the TPU system: grpc://10.49.210.90:8470


INFO:tensorflow:Clearing out eager caches


INFO:tensorflow:Clearing out eager caches


INFO:tensorflow:Finished initializing TPU system.


INFO:tensorflow:Finished initializing TPU system.


INFO:tensorflow:Found TPU system:


INFO:tensorflow:Found TPU system:


INFO:tensorflow:*** Num TPU Cores: 8


INFO:tensorflow:*** Num TPU Cores: 8


INFO:tensorflow:*** Num TPU Workers: 1


INFO:tensorflow:*** Num TPU Workers: 1


INFO:tensorflow:*** Num TPU Cores Per Worker: 8


INFO:tensorflow:*** Num TPU Cores Per Worker: 8


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)


## NN

In [18]:
from keras import Sequential, Model
from keras.layers import Embedding
from keras.layers import Input, Dense, Flatten, Bidirectional, LSTM, Convolution1D, ThresholdedReLU, MaxPooling1D, Dropout
import keras_metrics

input_size = max_address_length
embedding_size = 128
out_dim = 2
embedding_size=config["char_cnn_zhang"]["embedding_size"]
conv_layers=config["char_cnn_zhang"]["conv_layers"]
fully_connected_layers=config["char_cnn_zhang"]["fully_connected_layers"]
threshold=config["char_cnn_zhang"]["threshold"]
dropout_p=config["char_cnn_zhang"]["dropout_p"]
optimizer=config["char_cnn_zhang"]["optimizer"]
loss=config["char_cnn_zhang"]["loss"]

def build_model():
    inputs = Input(shape=(input_size,), name='sent_input', dtype='int64')      
    x = Embedding(alphabet_size + 1, embedding_size, input_length=input_size)(inputs)
    for cl in conv_layers:
      x = Convolution1D(cl[0], cl[1])(x)
      x = ThresholdedReLU(threshold)(x)
      if cl[2] != -1:
          x = MaxPooling1D(cl[2])(x)
    x = Flatten()(x)
    for fl in fully_connected_layers:
      x = Dense(fl)(x)
      x = ThresholdedReLU(threshold)(x)
      x = Dropout(dropout_p)(x)

    predictions = Dense(2, activation='linear')(x)
    model = Model(name="geocodenet", inputs=inputs, outputs=predictions)
    model.compile(optimizer=optimizer, loss=loss, 
                metrics=[keras_metrics.precision(), keras_metrics.recall()])
    model.summary()
    return model
    
if tpu_strategy:
    with tpu_strategy.scope():
        model = build_model()
else:
    model = build_model()    

Model: "geocodenet"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
sent_input (InputLayer)      [(None, 200)]             0         
_________________________________________________________________
embedding_2 (Embedding)      (None, 200, 128)          8960      
_________________________________________________________________
conv1d_12 (Conv1D)           (None, 194, 256)          229632    
_________________________________________________________________
thresholded_re_lu_16 (Thresh (None, 194, 256)          0         
_________________________________________________________________
max_pooling1d_6 (MaxPooling1 (None, 64, 256)           0         
_________________________________________________________________
conv1d_13 (Conv1D)           (None, 58, 256)           459008    
_________________________________________________________________
thresholded_re_lu_17 (Thresh (None, 58, 256)           0

## Train the model

In [19]:
epochs=config["training"]["epochs"]
batch_size=config["training"]["batch_size"]
checkpoint_every=config["training"]["checkpoint_every"]
    
print("Training model: ")
model.fit(X, Y, epochs=epochs, batch_size=batch_size, callbacks=[])

Training model: 
Epoch 1/7
Instructions for updating:
Use `tf.data.Iterator.get_next_as_optional()` instead.


Instructions for updating:
Use `tf.data.Iterator.get_next_as_optional()` instead.
Exception ignored in: <bound method IteratorResourceDeleter.__del__ of <tensorflow.python.data.ops.iterator_ops.IteratorResourceDeleter object at 0x7fa2e1760b70>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 537, in __del__
    handle=self._handle, deleter=self._deleter)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_dataset_ops.py", line 1282, in delete_iterator
    _ops.raise_from_not_ok_status(e, name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 6843, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.NotFoundError: No registered 'Identity' OpKernel for 'TPU' devices compatible with node {{node Identity}}
	 (OpKernel was f

Instructions for updating:
`inputs` is now automatically inferred


Instructions for updating:
`inputs` is now automatically inferred


Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.


Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.






Epoch 2/7
Epoch 3/7
Epoch 4/7
Epoch 5/7
Epoch 6/7
Epoch 7/7


<tensorflow.python.keras.callbacks.History at 0x7fa34a149e80>

In [21]:
model.evaluate(X[0:100], Y[0:100], verbose=0)

[0.19612851738929749, 0.97817462682724, 0.9898539781570435]