## 1. Import relevant packages. 

It's a good practise to install these packages inside a [virtual environment](https://realpython.com/python-virtual-environments-a-primer/)

In [1]:
import pandas as pd
import numpy as np
from elasticsearch import Elasticsearch
import re

import sys
sys.path.append('functions')
import preprocessing_fncs as ppf
import elastic_search_fncs as esf

## 2. Check the database connection 

This is how you connect to the remote database. 

In [2]:
# Details of the dataset
db_host = 'https://athena.london.gov.uk'
db_user = 'odbc_readonly'
db_pass = 'odbc_readonly'
db_port = '10099'
db_name = 'gla-ldd-external'

# Creates connection to the dataset
es = Elasticsearch(
    [f"{db_host}:{db_port}"],
    basic_auth=(db_user, db_pass),
    verify_certs=True,
    ca_certs='athena_es_full_chain.crt'
)

# Check connection
if es.ping():
    print("Connected to Elasticsearch!")
else:
    print("Could not connect to Elasticsearch.")

Connected to Elasticsearch!


## 3. Retreive dataset 

Here I use a function I wrote, which queries the dataset and returns the dataframe matching the query. 

In [3]:
borough = 'camden'
year = 2021

df = esf.get_residential_units_by_borough(es=es, borough=borough, year=year)
df = ppf.format_df(df)

In [4]:
df.head()

Unnamed: 0,uprn,pp_id,decision,wgs84_polygon.coordinates,wgs84_polygon.type,description,total_no_proposed_residential_units,habitable_rooms_density,site_area,borough,...,site_name,decision_date,valid_date,lpa_app_no,polygon.geometries,polygon.type,site_number,status,wgs84_polygon,polygon
0,5048246,,Approved,"[[[-0.204059, 51.553921], [-0.204056, 51.55392...",Polygon,Conversion of 1 x 5 bed dwellinghouse into 2 f...,2,,,Camden,...,,2021-10-13,2020-09-10,2020/4107/P,"[{'coordinates': [[[524609.7521472294, 185420....",geometrycollection,2.0,Lapsed,,
1,5190055,,Approved,"[[[-0.18916339999999998, 51.5406683], [-0.1890...",Polygon,Erection of 2-storey plus basement house with ...,1,,,Camden,...,Garages And Land Adjacent To 39 Priory Terrace,2021-04-14,2020-06-11,2020/2839/P,"[{'coordinates': [[[525677.1274171, 183972.261...",GeometryCollection,41.0,Completed,,
2,5006173,7937949.0,Approved,"[[[-0.2014926, 51.5522848], [-0.2014673, 51.55...",Polygon,Excavation of basement including new front bay...,1,,,Camden,...,,2021-05-26,2019-07-19,2019/3109/P,"[{'coordinates': [[[524790.4014361, 185242.954...",GeometryCollection,1.0,Lapsed,,
3,5090695,8621615.0,Approved,"[[[-0.1514144, 51.5444391], [-0.1513821, 51.54...",Polygon,Redevelopment of site including demolition of ...,115,,,Camden,...,Former Charlie Ratchford Centre,2021-11-05,2020-11-02,2020/5063/P,"[{'coordinates': [[[528284.1966095, 184457.049...",GeometryCollection,,Commenced,,
4,5109475,9450405.0,Refused,"[[[-0.1383528, 51.5352708], [-0.1385007, 51.53...",Polygon,Erection of 2x three storey mews houses on sit...,3,,,Camden,...,,2021-08-10,2021-05-24,2021/0602/P,"[{'coordinates': [[[529215.8189115, 183460.480...",GeometryCollection,8.0,Refused,,


In [5]:
df.columns

Index(['uprn', 'pp_id', 'decision', 'wgs84_polygon.coordinates',
       'wgs84_polygon.type', 'description',
       'total_no_proposed_residential_units', 'habitable_rooms_density',
       'site_area', 'borough', 'street_name', 'site_name', 'decision_date',
       'valid_date', 'lpa_app_no', 'polygon.geometries', 'polygon.type',
       'site_number', 'status', 'wgs84_polygon', 'polygon'],
      dtype='object')

## 4. Inspect the free-text descriptions

In [None]:
for text in list(df['description'][0:10]):
    print(text+'\n')

Conversion of 1 x 5 bed dwellinghouse into 2 flats and replacement of front, side and rear single glazed timber framed windows with double glazed timber framed windows.

Erection of 2-storey plus basement house with front lightwell and associated landscaping following demolition of existing garage. 

Excavation of basement including new front bay window and front garden area. Erection of single storey rear extension and green roof above, installation of two rooflights to front roofslope and replacement dormer window to rear roofslope in the creation of one additional residential unit. Erection of bin and bike store to rear.



Redevelopment of site including demolition of existing buildings and erection of a building up to 10 storeys in height to provide self-contained residential flats (Class C3) and associated works.

Erection of 2x three storey mews houses on site of existing car park

Change of use from office (Class B1a) to residential (Class C3) at ground floor level to provide o

In [11]:
camden_21_descriptions = list(df['description'])

In [14]:
print("Number of descriptions for residential planning applictaions in Camden in 2021: ", len(camden_21_descriptions))

Number of descriptions for residential planning applictaions in Camden in 2021:  98


## 5. Regex match 

Simple regex match to see how many applications mention 'lightwells'. 

In [12]:
# regex match for the text 'lightwell'

lightwell_regex = re.compile(r'\b(?:lightwell|light well)\b', re.IGNORECASE)
lightwell_matches = []
for text in camden_21_descriptions:
    if lightwell_regex.search(text):
        lightwell_matches.append(text)

In [15]:
print(f'{len(lightwell_matches)} matches found for the regex "lightwell" in the descriptions of residential units in Camden in 2021.')

6 matches found for the regex "lightwell" in the descriptions of residential units in Camden in 2021.
