<a href="https://colab.research.google.com/github/michaelgfalk/clean-ocr/blob/master/ocr_transformer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Waves of Words: Correcting Trove's Messy OCR

One aim of the *Waves of Words* project is to extract Aboriginal wordlists from [Trove](https://trove.nla.gov.au). A challenge we face is that historical newspapers are difficult to OCR, so many of the texts are riddled with errors.

Using the training data available from the ALTA 2017 OCR competition, can we create a model that will clean the text enough for our aboriginal word detector to work?

I have been giving some thought to whether uppercase letters and punctuation should be preserved in this model, given that the aim is to clean up the text for our detector, which only requires lower case letters and ignores punctuation. I think we need to include all the characters in this one. The extra information about sentence barriers, for example, should hopefully help the model as it would a human when it tries to correct the text. Moreover, many OCR errors involve exchaning punctuation or digits for letters, e.g. `l = 1 = !`.

**References:**

* D. Mollá, S. Cassidy. Overview of the 2017 ALTA Shared Task:
Correcting OCR Errors (2017). *Proc. ALTA 2017*.
[https://aclanthology.coli.uni-saarland.de/papers/U17-1014/u17-1014](https://aclanthology.coli.uni-saarland.de/papers/U17-1014/u17-1014)

In [0]:
# Install TensorFlow2
!pip install -q tensorflow-gpu==2.0.0-alpha0

In [0]:
from __future__ import absolute_import, division, print_function

from google.colab import drive

import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

import csv
import numpy as np
import os
import time
from datetime import date
import pickle as p

from sklearn.model_selection import train_test_split

In [3]:
# Mount google drive to get training data. Set data_dir
drive.mount('/content/gdrive')
data_dir = '/content/gdrive/My Drive/waves_of_words/ocr_correction_data/'

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


# 1. Data Pipeline (new)

This new data pipeline makes use of Tensorflows Dataset object to manage the extraction and transformation more efficiently.

In [0]:
def create_dataset(source, target, start = 'स', end = 'ए'):
  """Imports articles from csv and packages for training."""
  
  # Lists for import
  raw_x = []
  raw_y = []
  
  # Import raw text
  with open(source, "rt") as f:
    x_reader = csv.reader(f, delimiter = ',', quotechar = '"')
    for row in x_reader:
      raw_x.append(row[1])
  with open(target, "rt") as f:
    y_reader = csv.reader(f, delimiter = ',', quotechar = '"')
    for row in y_reader:
      raw_y.append(row[1])
  
  # Drop header rows
  raw_x = raw_x[1:]
  raw_y = raw_y[1:]
  
  # Add special characters
  x = [start + article + end for article in raw_x if start not in article and end not in article]
  y = [start + article + end for article in raw_y if start not in article and end not in article]
  
  return x, y

In [0]:
def tokenize(source, target, tkzr = None):
  """Instantiates tokenizer if not passed, fits and applies."""
  
  if tkzr is None:
    # Fit tokenizer
    tkzr = tf.keras.preprocessing.text.Tokenizer(
      num_words = None,
      filters = None,
      lower = False,
      char_level = True
    )
    tkzr.fit_on_texts(source + target)
  
  # Apply to texts
  x = tkzr.texts_to_sequences(source)
  y = tkzr.texts_to_sequences(target)
  
  return x, y, tkzr

In [0]:
def split_and_stack(x, y, max_len, batch_size, drop = False):
  """Takes as input two python lists, and outputs a list of tensor buckets.
  
  Arguments:
  ==========
  x (list): the tokenized source strings
  y (list): the tokenized target strings
  max_len (int): the maximum number of time steps the model will consider
  batch_size (int): the batch size for the training examples
  drop (bool): keep the final batch, if len(x) is not a multiple of batch_size?
  
  Returns:
  ==========
  bucket_list (list): a list of length m, each item of which is a bucket of
    similar-length numpy array
  
  A bucket is a tensor of shape (2, m_prime, max_len). Training examples are first
  bucketed into groups of similar length, and seperated into batches of batch_size.
  Each batch is then padded out to an integer multiple of max_len, split into chunks
  of length max_len and stacked.
  
  This bucketing, splitting and stacking allows the data to be fed to a stateful RNN.
  
  Dimensions:
  -The first dimension seperates x examples from y
  -The second dimension seperates individual training examples.
    m_prime = batch_size * ⌈max_len_x_or_y_within_batch / batch_size⌉ 
  -The third dimension seperates individual time-steps, and is fixed at max_len"""
  
  # Set number of training examples (round up if not dropping)
  assert len(x) == len(y)
  m = len(x)
  
  # Sort x and y by sequence length
  x_y = sorted(zip(x,y), key = lambda tup: max(len(tup[0]), len(tup[1])), reverse = True)
  
  # Loop through list and create batches
  bucket_list = []
  steps_per_epoch = 0
  for i in range(0, m, batch_size):
    
    # Slice and unpack the list
    bn = batch_size
    if len(x_y) < batch_size and drop:
      break
    elif len(x_y) < batch_size and not drop:
      bn = len(x_y)
    
    bx, by = zip(*[x_y.pop() for _ in range(bn)])
    
    # Calculate length boundary and m_prime
    bl = max(len(bx[0]), len(by[0])) # Get the length of x or y, whichever is greater
    b = max_len - (bl % max_len) + bl # Round up to a multiple of max_len
    m_prime = int(b / max_len) # Calculate the new number of rows
    
    steps_per_epoch += m_prime
    
    # Pad the sequences
    x_pad = pad_sequences(bx, maxlen = b, padding = 'post')
    y_pad = pad_sequences(by, maxlen = b, padding = 'post')
    
    # Flip x on the time dimension
    x_flipped = np.flip(x_pad, axis = 1)
    
    # Split and stack
    x_out = np.concatenate(np.split(x_pad, m_prime, axis = 1), axis = 0)
    y_out = np.concatenate(np.split(y_pad, m_prime, axis = 1), axis = 0)
    
    # Covert to dataset and append to list
    bucket = tf.data.Dataset.from_tensor_slices((x_out, y_out))
    bucket = bucket.batch(bn) # NB: bn = batch_size except on the last batch, if drop != True
    bucket_list.append(bucket)
    
  return bucket_list, steps_per_epoch

In [0]:
def load_dataset(x_path, y_path, max_len = 20, batch_size = 128, drop = False, test_size = .2):
  
  x, y = create_dataset(x_path, y_path)
  
  x, y, tkzr = tokenize(x, y)
  
  seqs = (x_train, x_test, y_train, y_test) = train_test_split(x, y, test_size = test_size)
  
  train_buckets, train_steps = split_and_stack(x_train, y_train, max_len, batch_size, drop)
  
  test_buckets, val_steps = split_and_stack(x_test, y_test, max_len, batch_size, drop)
  
  return train_buckets, test_buckets, tkzr, seqs, train_steps, val_steps

# 2. Model Definition

Adapted from TensorFlow docs.

In [0]:
class StatefulGRU(tf.keras.layers.GRU):
  """GRU layer with all the necessaries."""
  def __init__(self, units):
    super(StatefulGRU, self).__init__(
        units = units,
        # The following parameters must be set this way
        # to use CuDNN on GPU
        activation='tanh',
        recurrent_activation='sigmoid',
        recurrent_dropout=0,
        unroll=False,
        use_bias=True,
        reset_after=True,
        # The following parameters are necessary for the
        # encoder-decoder architecture
        return_sequences=True, 
        return_state=True,
        # Stateful must be 'True' in order
        # to link the batches in each hyperbatch
        stateful=True,
        # Just the standard initializer
        recurrent_initializer='glorot_uniform'
    )

In [0]:
class Encoder(tf.keras.Model):
  def __init__(self, num_chars, embedding_dim, enc_units, batch_sz):
    super(Encoder, self).__init__()
    self.batch_sz = batch_sz
    self.enc_units = enc_units
    self.embedding = tf.keras.layers.Embedding(num_chars, embedding_dim)
    self.first_gru = StatefulGRU(self.enc_units)
    self.second_gru = StatefulGRU(self.enc_units)

  def call(self, x, hidden):
    x = self.embedding(x)
    x = self.first_gru(x, initial_state = hidden)
    output, state = self.second_gru(x)
    return output, state

  def initialize_hidden_state(self):
    return tf.zeros((self.batch_sz, self.enc_units))

In [0]:
class BahdanauAttention(tf.keras.Model):
  def __init__(self, units):
    super(BahdanauAttention, self).__init__()
    self.W1 = tf.keras.layers.Dense(units)
    self.W2 = tf.keras.layers.Dense(units)
    self.V = tf.keras.layers.Dense(1)
  
  def call(self, query, values):
    # hidden shape == (batch_size, hidden size)
    # hidden_with_time_axis shape == (batch_size, 1, hidden size)
    # we are doing this to perform addition to calculate the score
    hidden_with_time_axis = tf.expand_dims(query, 1)

    # score shape == (batch_size, max_length, hidden_size)
    score = self.V(tf.nn.tanh(
        self.W1(values) + self.W2(hidden_with_time_axis)))

    # attention_weights shape == (batch_size, max_length, 1)
    # we get 1 at the last axis because we are applying score to self.V
    attention_weights = tf.nn.softmax(score, axis=1)

    # context_vector shape after sum == (batch_size, hidden_size)
    context_vector = attention_weights * values
    context_vector = tf.reduce_sum(context_vector, axis=1)
    
    return context_vector, attention_weights

In [0]:
class Decoder(tf.keras.Model):
  def __init__(self, num_chars, embedding_dim, dec_units, batch_sz):
    super(Decoder, self).__init__()
    self.batch_sz = batch_sz
    self.dec_units = dec_units
    self.embedding = tf.keras.layers.Embedding(num_chars, embedding_dim)
    self.first_gru = StatefulGRU(self.dec_units)
    self.second_gru = StatefulGRU(self.dec_units)
    self.fc = tf.keras.layers.Dense(num_chars)

    # used for attention
    self.attention = BahdanauAttention(self.dec_units)

  def call(self, x, hidden, enc_output):
    # enc_output shape == (batch_size, max_length, hidden_size)
    context_vector, attention_weights = self.attention(hidden, enc_output)

    # x shape after passing through embedding == (batch_size, 1, embedding_dim)
    x = self.embedding(x)

    # x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
    x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)

    # passing the concatenated vector to the GRUs
    x = self.first_gru(x)
    output, state = self.second_gru(x)

    # output shape == (batch_size * 1, hidden_size)
    output = tf.reshape(output, (-1, output.shape[2]))

    # output shape == (batch_size, vocab)
    x = self.fc(output)

    return x, state, attention_weights

# 3. Set up Training

In [0]:
optimizer = tf.keras.optimizers.Adam()
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

def loss_function(real, pred):
  # Mask: ignore the model's predictions where the ground truth is padding
  mask = tf.math.logical_not(tf.math.equal(real, 0))
  
  # Calculate the loss
  loss_ = loss_object(real, pred)

  # Make mask compatible with the loss output
  mask = tf.cast(mask, dtype=loss_.dtype)
  
  # Multiply the losses by the mask (i.e. zero out all losses where there's just padding)
  loss_ *= mask
  
  return tf.reduce_mean(loss_)

In [0]:
# Metrics for training
train_loss = tf.keras.metrics.Mean(name='train_loss')
train_acc = tf.keras.metrics.SparseCategoricalAccuracy(name='train_acc')
val_loss = tf.keras.metrics.Mean(name='val_loss')
val_acc = tf.keras.metrics.SparseCategoricalAccuracy(name='val_acc')

def update_accuracy(real, pred, acc_object):
  
  # Find padding
  mask = tf.math.logical_not(tf.math.equal(real, 0))
  
  # If there are no non-padding variables, break out of function
  if tf.math.count_nonzero(mask) == 0:
    return None
  
  # Slice tensors
  real = tf.boolean_mask(real, mask)
  pred = tf.boolean_mask(pred, mask)

  # Compute accuracy
  acc_object.update_state(real, pred)
  
  return None

In [0]:
@tf.function
def train_step(inp, targ, enc_hidden, norm_lim):
  loss = 0
        
  with tf.GradientTape() as tape:
    enc_output, enc_hidden = encoder(inp, enc_hidden)

    dec_hidden = enc_hidden

    dec_input = tf.expand_dims(inp[:,0], 1)

    # Teacher forcing - feeding the target as the next input
    for t in range(1, targ.shape[1]):
      # passing enc_output to the decoder
      predictions, dec_hidden, _ = decoder(dec_input, dec_hidden, enc_output)

      loss += loss_function(targ[:, t], predictions)
      _ = update_accuracy(targ[:, t], predictions, train_acc)

      # using teacher forcing
      dec_input = tf.expand_dims(targ[:, t], 1)

  train_loss.update_state(loss)

  variables = encoder.trainable_variables + decoder.trainable_variables

  gradients = tape.gradient(loss, variables)
  
  # Clip gradients
  clipped_gradients = [tf.clip_by_norm(grad, norm_lim) for grad in gradients]

  optimizer.apply_gradients(zip(clipped_gradients, variables))
  
  return train_loss.result(), train_acc.result()

In [0]:
@tf.function
def val_step(inp, targ, enc_hidden):
  
  loss = 0
  
  # Begin feeding data to network
  enc_output, enc_hidden = encoder(inp, enc_hidden)
  dec_hidden = enc_hidden
  dec_input = tf.expand_dims(inp[:,0], 1)
  
  # Cycle through the rest of the time steps
  for t in range(1, targ.shape[1]):
    # Pass enc_output to the decoder
    predictions, dec_hidden, _ = decoder(dec_input, dec_hidden, enc_output)
    
    # Calculate loss and acc
    loss += loss_function(targ[:,t], predictions)
    _ = update_accuracy(targ[:, t], predictions, val_acc)
    
    # Pass the next correct letter to the decoder (teacher forcing)
    dec_input = tf.expand_dims(targ[:,t], 1)
    
  # Calculate val_loss
  val_loss.update_state(loss)
  
  return val_loss.result(), val_acc.result()

In [0]:
def format_time(flt):
  h = flt//3600
  m = (flt % 3600)//60
  s = flt % 60
  out = []
  if h > 0:
    out.append(str(int(h)))
    if h == 1:
      out.append('hr,')
    else:
      out.append('hrs,')
  if m > 0:
    out.append(str(int(m)))
    if m == 1:
      out.append('min, and')
    else:
      out.append('mins, and')
  out.append(f'{s:.2f}')
  out.append('secs')
  return ' '.join(out)

# 4. Run Training Loop

In [0]:
# Set hyperparameters
MAX_LEN = 32
BATCH_SIZE = 128
EPOCHS = 2
x_dir = data_dir + 'train_input.csv'
y_dir = data_dir + 'train_output.csv'
NORM_LIM = 3 # value for clip_norm

# Load data
train_buckets, test_buckets, tkzr, seqs, train_steps, val_steps = load_dataset(x_dir, y_dir, MAX_LEN, BATCH_SIZE, drop = True)

# Save preprocessed training data
with open(data_dir + "/" + date.isoformat(date.today()) + "-training-data-and-tkzr.pickle", 'wb') as f:
  p.dump((tkzr, seqs, train_steps, val_steps), f)

# Get vocab size
num_chars = len(tkzr.word_index) + 1 # Add one for padding
embedding_dim = 25
units = 50

# Define model(s)
encoder = Encoder(num_chars, embedding_dim, units, batch_sz = BATCH_SIZE)
decoder = Decoder(num_chars, embedding_dim, units, batch_sz = BATCH_SIZE)

In [0]:
checkpoint_dir = data_dir + 'checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
checkpoint = tf.train.Checkpoint(optimizer=optimizer,
                                 encoder=encoder,
                                 decoder=decoder)

In [0]:
# Loop over epochs
for epoch in range(10):
  print(f'Starting Epoch {epoch + 1}\n')
  
  train_loss.reset_states()
  train_acc.reset_states()
  val_loss.reset_states()
  val_acc.reset_states()
  
  start = time.time()
  
  total_batches = 0
  val_batches = 0

  # Loop over buckets
  for bucket, dataset in enumerate(train_buckets):
    # Reset hidden state
    enc_hidden = encoder.initialize_hidden_state()


    for inp, targ in dataset.take(-1):
      loss, acc = train_step(inp, targ, enc_hidden, NORM_LIM)
      
      total_batches += 1
    
      if total_batches % 250 == 0:
          print(f'Epoch {epoch + 1} Bucket {bucket + 1}: Loss {loss:.4f}, Acc {acc:.4f} after {total_batches} batches')
  
  # saving (checkpoint) the model every 2 epochs
  if (epoch + 1) % 2 == 0:
    checkpoint.save(file_prefix = checkpoint_prefix)
    
  # Calculate validation loss and accuracy
  for dataset in test_buckets:
    # Reset hidden state
    enc_hidden = encoder.initialize_hidden_state()
    
    for inp, targ in dataset.take(-1):
      val_loss, val_acc = val_step(inp, targ, enc_hidden)
      
      val_batches += 1

  print(f'\nEpoch {epoch + 1} Loss {loss:.2f}, Avg Acc {acc*100:.2f}%.')
  print(f'Tested on {val_batches * BATCH_SIZE} validation examples.')
  print(f' val_loss = {val_loss:.2f} val_acc = {val_acc*100:.2f}%')
  print(f'Time taken for 1 epoch: {format_time(time.time() - start)}\n===========================\n\n')

Starting Epoch 1

Epoch 1 Bucket 15: Loss 68.4871, Acc 0.3904 after 250 batches
Epoch 1 Bucket 21: Loss 67.1030, Acc 0.4041 after 500 batches
Epoch 1 Bucket 25: Loss 65.7859, Acc 0.4167 after 750 batches
Epoch 1 Bucket 28: Loss 64.5594, Acc 0.4282 after 1000 batches
Epoch 1 Bucket 31: Loss 63.4161, Acc 0.4390 after 1250 batches
Epoch 1 Bucket 32: Loss 62.4643, Acc 0.4480 after 1500 batches
Epoch 1 Bucket 34: Loss 61.6920, Acc 0.4554 after 1750 batches
Epoch 1 Bucket 35: Loss 60.9797, Acc 0.4622 after 2000 batches
Epoch 1 Bucket 36: Loss 60.4408, Acc 0.4673 after 2250 batches
Epoch 1 Bucket 37: Loss 59.9692, Acc 0.4719 after 2500 batches

Epoch 1 Loss 59.4617, Avg Acc 0.4767.
Tested on 63232 validation examples. val_loss = 52.00727844238281 val_acc = 0.552980363368988
Time taken for 1 epoch: 19 mins, and 59.83 secs


Starting Epoch 2

Epoch 2 Bucket 15: Loss 58.3669, Acc 0.4867 after 250 batches


# See How it Goes

In [19]:
# Load model if revisiting notebook
checkpoint.restore(os.path.join(checkpoint_dir, "ckpt-3.index"))

W0417 23:56:32.585687 140095537694592 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/util.py:1074: NameBasedSaverStatus.__init__ (from tensorflow.python.training.tracking.util) is deprecated and will be removed in a future version.
Instructions for updating:
Restoring a name-based tf.train.Saver checkpoint using the object-based restore API. This mode uses global names to match variables, and so is somewhat fragile. It also adds new restore ops to the graph each time it is called when graph building. Prefer re-encoding training checkpoints in the object-based format: run save() on the object-based saver (the same one this message is coming from) and use that checkpoint in the future.


<tensorflow.python.training.tracking.util.NameBasedSaverStatus at 0x7f6a137210b8>

In [0]:
def clean_ocr(article, tkzr, start_char = 'स', end_char = 'ए', max_len = 32):

    # Tokenise the article
    inputs = [tkzr.word_index[i] for i in list(article)]
    
    # Pad, split, stack
    t = len(inputs)
    b = max_len - (t % max_len) + t
    m_prime = b // max_len
    inputs = pad_sequences(inputs, maxlen = b, padding = 'post')
    inputs = np.flip(inputs, axis = 1)
    inputs = np.concatenate(np.split(x_pad, m_prime, axis = 1), axis = 0)
    inputs = tf.convert_to_tensor(inputs)

    # Create empty string for result
    result = ''
    
    # Set threshold to stop outputting results if 'end' character not reached
    give_up = int(len(article) * 1.2)
    
    # Create empty string for result
    result = ''
    
    # Encode and decode the article
    enc_hidden = tf.zeros((1, units))
    dec_hidden = enc_hidden
    dec_input = 
    for r in range():
      # Encode this chunk:
      enc_out, enc_hidden = encoder(inputs[r,:], enc_hidden)
      

    # Ready the decoder
    dec_hidden = enc_hidden
    dec_input = tf.expand_dims([tkzr.word_index[start_char]], 0)

    # Start outputting characters till the end character is reached, or
    # the output sequence is getting much longer than the input
    while len(result) < give_up:
      # Get next result from decoder:
      predictions, dec_hidden, _ = decoder(dec_input,
                                           dec_hidden,
                                           enc_out)
      
      # Which is the next letter?
      predicted_id = tf.argmax(predictions[0]).numpy()
      
      # Add to result
      result += tkzr.index_word[predicted_id]

      # If we've reached the end, stop
      if tkzr.index_word[predicted_id] == end_char:
          break

      # Otherwise the predicted ID is fed back into the model
      dec_input = tf.expand_dims([predicted_id], 0)

    return result, article

In [59]:
# Let's have a look at a shorter article...
random_article = "BANANA SHIRE COUNCIL MEETING ELECTRICITY SUPPLY FOR BILOELA Request For Investigation Those present at the December meet- ing of the Banana Shire Council were : Mr J. C. Graham (Chairman), E. Brad : shaw, H. R. Brake, C. McDoualV, A. J. McPherson, R. G. Maclean, J. M. Car bery, W. H. Leigh, S. A. Barre, and E. Schucucmann. Leave of absence was granted to Crs Homer and Hamilton. The Commercial Bank (Wowan) ad- vised that the council had Wen granted an overdraft of £3000. The matter of Bathurst burr at Cra cGw, as complained of by the Cracow A.L.P., was left for Cr Bradshaw to investigate. The overseer was instructed to report on the matter of the road through Messrs Heywood's property, and Mr Faulkener to be advised accordingly. The Wowan L.P.A. asked that atten- tion be given to the road from Deeford to the Wowan cemetery, and it was de- cided to attend to it. The Land Commissioner, Rockhamp- ton, advised that the Chief Protector oi Aborigines had applied tor Res. R87 Wright as a reserve for aborigines. It was decided to advise toe council are not in favour of this. The Kokotunga L.P.A. wrote re the state of the Kokotunga-Baralaba read, and thc proposed dam at Kokotunga. It was decided to advise that at present there are no funds available tor the road between Baralaba and Kokotunga, and the matter of the dam is under con- sideration. The overseer wai instructed to inspect the work carried out on the road at Kalewa by D. Halberstater. Messrs E. P. Carige and H. Whlt sides asked that their roads ba repaired, and lt was decided to reply that there ar« no funds available at present for the work, but there may be some loan money available at a latir date. F. R. Wafer drew attention to th« state of the road and crossinjrs from Mrs Mulallyt turn-off to his gat«. It was resolved to advise there are no funds available for the work. The Callide Valley Dairymen's Asso- ciation stated a motion waa passed in connorton with th« necessity of an elec- tricity supply being made available to Biloela and district, and asking for the council's support. It was resolved that Mr T. A. Foley and th« Electricity Commission be written to and asked i' an officer could be sent up to investigate the cost of an electricity supply for Biloela. Mr 8. Epinoff was granted permission to erect gates and grids across th* road at the south-western corner of portion 120, parish qf Grevilla. The Biloela Chamber of Commerce wrote on the cam« subject and it was decided to advise the matter was re- ceiving attention. The overseer will Inspect and report on the road complained of by E. A. Rennets, The Lawgi and Kariboe L.P.A. asked that cream rem te« and the road ' from Lawgi to Fisher's Corner^ne re- paired. It was resolved the work be at- tended to early in the Kew Year. The Biloela Chamber of Commerce asked that Ore ville* Street be gravelled, that danger signs and sign , boards be erected end that Melton Street, Broom tit Street, Bell Street, and the northern end of Kariboe Street be formed and graded, and Rhodes grass cleared.--It was decided to advise that finances do not permit of the streets being graded, : but the grass will be attended to, also other matters complained of. MISCELLANEOUS ITEMS. It was resolved to advise H. D. Hewitt that the council have no objec- tion to his leasing the R21 Gibber- gunyah, provided the right« ot the travelling publio are protected. Application will be made to the I Main Roads Commission to nave the third section ot the Theodore-Cracow Road cleared and formed. H. B. Rickard will be advised that ; there is no objection to hts leasing R62, ptovided he keeps lt free td noxious . weeds and the right« of the travelling ; public are protected. s CHAIRMAN'S VISIT TO DAWXS. The Chairman reported on his vlsi» to Dawes when he attended the meeting of the Lawgi and Dawes Ratepayers' Association. After hearing all their com- plaints re roads, etc., the Chairman said he considered they had a great deal to ' complain of, as practically very little work had been carried with not getting : fair representation, it appeared to the Chairman that possibly some ratepayers would be better served by joining np with the' Monto Shire as there wa« an 1 all-weather road to th» railhead. He also suggested that the overseer, an; available councillors of Division i, »nd . himself make an inspection of the work which requires attending to, eo it can be attended to early In tte new year. Tho clerk was instructed to thank th« members of the Dawes Ratepayers' Asso- ciation for their courtesy to the Chair man on has recent visit, and state ai endeavour will be made to have wort curried out in the new year."
result, _ = clean_ocr(random_article, tkzr)

tf.Tensor([   1 4658], shape=(2,), dtype=int32)
tf.Tensor([ 1 30], shape=(2,), dtype=int32)
