## 1. Import relevant packages. 

It's a good practise to install these packages inside a [virtual environment](https://realpython.com/python-virtual-environments-a-primer/)

In [5]:
import pandas as pd
import numpy as np
from elasticsearch import Elasticsearch
import re

import sys
sys.path.append('functions')
import preprocessing_fncs as ppf
import elastic_search_fncs as esf

## 2. Check the database connection 

This is how you connect to the remote database. 

In [8]:
# Details of the dataset
db_host = 'https://athena.london.gov.uk'
db_user = 'odbc_readonly'
db_pass = 'odbc_readonly'
db_port = '10099'
db_name = 'gla-ldd-external'

# Creates connection to the dataset
es = Elasticsearch(
    [f"{db_host}:{db_port}"],
    basic_auth=(db_user, db_pass),
    verify_certs=True,
    ca_certs='athena_es_full_chain.crt'
)

# Check connection
if es.ping():
    print("Connected to Elasticsearch!")
else:
    print("Could not connect to Elasticsearch.")

Connected to Elasticsearch!


## 3. Retreive dataset 

Here I use a function I wrote, which queries the dataset and returns the dataframe matching the query. 

In [10]:
borough = 'camden'
year = 2021

df = esf.get_residential_units_by_borough(es=es, borough=borough, year=year)
df = ppf.format_df(df)

In [11]:
df.head()

Unnamed: 0,uprn,pp_id,decision,wgs84_polygon.coordinates,wgs84_polygon.type,description,total_no_proposed_residential_units,habitable_rooms_density,site_area,borough,...,site_name,decision_date,valid_date,lpa_app_no,polygon.geometries,polygon.type,site_number,status,wgs84_polygon,polygon
0,5034091,8285651,Approved,"[[[-0.1420526, 51.5467319], [-0.1420714, 51.54...",Polygon,"Alterations, extensions and changes of use to ...",4,,,Camden,...,,2021-06-02,2019-12-27,2019/6433/P,"[{'coordinates': [[[528926.8496524, 184728.451...",GeometryCollection,197.0,Completed,,
1,5042568,7675754,Approved,"[[[-0.17861749999999998, 51.5503339], [-0.1786...",Polygon,Erection of 3 storey extension plus basement t...,4,,,Camden,...,Ames House,2021-10-12,2019-03-20,2019/1515/P,"[{'coordinates': [[[526381.6487592, 185065.255...",GeometryCollection,26.0,Lapsed,,
2,5049221,10059583,Approved,"[[[-0.1737205, 51.5564226], [-0.173845, 51.556...",Polygon,Amalgamation of 2 flats to form 1 maisonette.,1,,,Camden,...,Flat 3,2021-11-18,2021-09-24,2021/3548/P,"[{'coordinates': [[[526704.2532651, 185750.798...",GeometryCollection,,Lapsed,,
3,5130769,7828873,Approved,"[[[-0.20331439999999998, 51.5491346], [-0.2031...",Polygon,Change of use from HMO (sui generis) to 4 x re...,4,,,Camden,...,,2021-04-28,2020-04-21,2019/2472/P,"[{'coordinates': [[[524672.7031888, 184889.545...",GeometryCollection,42.0,Lapsed,,
4,5088394,9093001,Approved,"[[[-0.12640969999999999, 51.5146607], [-0.1264...",Polygon,Change of use of the 4th floor from office (Cl...,2,,,Camden,...,,2021-06-22,2020-11-03,2020/5067/P,"[{'coordinates': [[[530102.8014928, 181189.754...",GeometryCollection,57.0,Lapsed,,


In [12]:
df.columns

Index(['uprn', 'pp_id', 'decision', 'wgs84_polygon.coordinates',
       'wgs84_polygon.type', 'description',
       'total_no_proposed_residential_units', 'habitable_rooms_density',
       'site_area', 'borough', 'street_name', 'site_name', 'decision_date',
       'valid_date', 'lpa_app_no', 'polygon.geometries', 'polygon.type',
       'site_number', 'status', 'wgs84_polygon', 'polygon'],
      dtype='object')

## 4. Inspect the free-text descriptions

In [13]:
for text in list(df['description'][0:10]):
    print(text+'\n')

Alterations, extensions and changes of use to property including erection of two storey roof extension to provide a Class A2 unit at ground and 1st floors, and 4 new residential flats at part 1st to 4th floors; alterations to the front and rear facades of the building including installation of a new shopfront and balconies; and provision of refuse and cycle storage 

Erection of 3 storey extension plus basement to existing property to provide 4 flats (2x 1-bed and 2x 2-bed) (Class C3) with rear roof terraces and refuse and cycle store at the front, following demolition of 2 storey garage extension and 1-bed flat. 

Amalgamation of 2 flats to form 1 maisonette.

Change of use from HMO (sui generis) to 4 x residential units (C3) with; side and rear dormer roof windows, and; ground floor rear extensions to create; 1 x 3 bed, 1 x 2 bed and 2 x Studio flats, with; refuse, recycling and cycle stores [part retrospective].

Change of use of the 4th floor from office (Class E) to residential (C

In [11]:
camden_21_descriptions = list(df['description'])

In [14]:
print("Number of descriptions for residential planning applictaions in Camden in 2021: ", len(camden_21_descriptions))

Number of descriptions for residential planning applictaions in Camden in 2021:  98


## 5. Regex match 

Simple regex match to see how many applications mention 'lightwells'. 

In [12]:
# regex match for the text 'lightwell'

lightwell_regex = re.compile(r'\b(?:lightwell|light well)\b', re.IGNORECASE)
lightwell_matches = []
for text in camden_21_descriptions:
    if lightwell_regex.search(text):
        lightwell_matches.append(text)

In [15]:
print(f'{len(lightwell_matches)} matches found for the regex "lightwell" in the descriptions of residential units in Camden in 2021.')

6 matches found for the regex "lightwell" in the descriptions of residential units in Camden in 2021.
