## Importing UK Postcodes into Amazon Lex to create a custom slot

This is a sample notebook that shows how to use pandas together the AWS Python SDK, boto3, to process a publicly available postcode file, sample it and create/update a custom slot type in Amazon Lex using the sample to train for slot recognition. 

I am using the postcode file from https://www.doogal.co.uk/ukpostcodes.php, but this should work with other CSV format postcode downloads as long as you set the field header correctly.

This is the header row from the file that used.

```
Postcode,In Use?,Latitude,Longitude,Easting,Northing,Grid Ref,County,District,Ward,District Code,Ward Code,Country,County Code,Constituency,Introduced,Terminated,Parish,National Park,Population,Households,Built up area,Built up sub-division,Lower layer super output area,Rural/urban,Region,Altitude,London zone,LSOA Code,Local authority,MSOA Code,Middle layer super output area,Parish Code,Census output area,Constituency Code,Index of Multiple Deprivation
```

In [1]:
working_dir = '/Users/ianmas/aws/ai/'
filename = "postcodes.csv"
field_name = 'Postcode'
sample_size = 1000

import pandas as pd

Load the csv into a pandas dataframe. This might take a few seconds. You will get a confirmation message with the shape of the dastaframe once it's completed. 

In [4]:
df = pd.read_csv(working_dir + filename,index_col=False, header=0,low_memory=False)
print("Data loaded into pandas dataframe with shape: {}".format(df.shape))

Data loaded into pandas dataframe with shape: (2582173, 36)


Extract the column with the postcodes using the ```field_name``` column defined earlier. Then extract a sample of size ```sample_size```.

In [5]:
postcode_column = df[field_name]
postcode_sample = postcode_column.sample(n=sample_size)

print("Postcode list created with {} samples".format(len(postcode_sample.tolist())))

Postcode list created with 1000 samples


The next step is to build the list of dict objects that is required for the Amazon Lex put_slot request document. You'll see a comfirmation and few samples printed when this is complete.

In [6]:
postcodes_values_list = []
for postcode_value in postcode_sample:
    new_dict = dict()
    new_dict['value'] = postcode_value
    postcodes_values_list.append(new_dict)

print("The first 5 entries that will be added to your model are:")
print(postcodes_values_list[0:5])

The first 5 entries that will be added to your model are:
[{'value': 'CT10 3BJ'}, {'value': 'YO25 8JL'}, {'value': 'CB10 2XY'}, {'value': 'DE73 1YE'}, {'value': 'SA67 8BE'}]


Import the AWS SDK

In [9]:
import boto3

In [10]:
lex_client = boto3.client('lex-models', region_name='us-east-1')

In [11]:
response = lex_client.get_slot_type(
    name='UKPostcodes',
    version='$LATEST'
)

# grab the checksum attribute from the response, we need it to update existing an slot type
latest_checksum = response['checksum']

response = lex_client.put_slot_type(
    name='UKPostcodes',
    description='UK Postcodes',
    enumerationValues=postcodes_values_list,
    valueSelectionStrategy='ORIGINAL_VALUE',
    checksum=latest_checksum
)

if response['ResponseMetadata']['HTTPStatusCode'] == 200:
    print('Suceeded: Updated slot \'{}\' with {} values'.format(response['name'],len(response['enumerationValues'])))
else:
    print('Failed with response code: {}'.format(response['ResponseMetadata']['HTTPStatusCode']))

Suceeded: Updated slot 'UKPostcodes' with 1000 values


#### Only use this cell create the slot. Update with the cell above. 

Once this is done you need to use the version above that posts the checksum for the current version of the slot.

In [None]:
# put slot first version of slot type
# you can only use this once because subsequent requests require a checksum attribute

response = lex_client.put_slot_type(
    name='UKPostcodes',
    description='UK Postcodes',
    enumerationValues=postcodes_values_list,
    valueSelectionStrategy='ORIGINAL_VALUE'
)

if response['ResponseMetadata']['HTTPStatusCode'] == 200:
    print('Suceeded: Updated slot \'{}\' with {} values'.format(response['name'],len(response['enumerationValues'])))
else:
    print('Failed with response code: {}'.format(response['ResponseMetadata']['HTTPStatusCode']))