# Image web scraping model for Bangkok 🏛

The model for scraping land information and red parcel boundary point of interest land from https://landsmaps.dol.go.th/

🚨 **Feature requirement:**
- deed_no_ref
- province
- district
- ltt
- lgt

## 📝 Setting all dataset and file path here
For quick use, you can set functions, dataframes, image paths, etc. in this session.

#### 📝 Dataframe path

In [1]:
# Input dataframe
input_df_path = 'mockup_data.csv'
# Output dataframe
output_df_path = 'mockup_result.csv'

#### 📝 Set column names

In [2]:
deed_col_name = "deed_no_ref"
prov_col_name = "province"
dist_col_name = "district"
ltt_col_name = "ltt"
lgt_col_name = "lgt"

#### 📝 Result image directory

🚨 Don't forget to create these folder. This code doesn't create by itself.

In [3]:
# Folder path for collecting website capture image
capture_dir = "scraping_image/capture/"

# Folder path for collecting website capture image with detecting point (for check accuracy)
preview_dir = "missing_bkk_scrp_img/"

**Function mode and flag 🏳**
- **nearby_district_mode**: Use function for changing to nearby district and correct nearest district (True/False)
- **collect_shape_mode**: Use function for collecting land shape points (True/False)
- **capture_img_flg** : If using collect shape, do you want to save website's land image to capture_dir (True/False)
- **preview_img_flg** : If using collect shape, do you want to save website's land image with detecting point (True/False)

In [4]:
nearby_district_mode = True
collect_shape_mode = True  
capture_img_flg = False
preview_img_flg = False

#### 📝 Driver path

Download chrome driver >> https://chromedriver.chromium.org/downloads (Please check your version of chrome)

In [5]:
# Set work dir to chrome driver path
# ~/chromedriver.exe for Windows
# ~/chromedriver     for MacOS
driver_path = '/Users/senmeetechin/Desktop/00Last code/selenium driver/Mac_x64/chrome108/chromedriver'

#### 📝 Move up pixel

In [6]:
# From experiment, we move the red pin for 148 px to set it to the center of canvas
move_px = 148

## 1. INPUT DATA
🚨 Choose interest dataframe which contain **province**, **ltt**, **lgt**, **district** and **deed_no_ref**

In [7]:
import pandas as pd
df_loc = pd.read_csv(input_df_path)
df_loc.head()

Unnamed: 0,deed_no_ref,ltt,lgt,province,district
0,61849,13.665516,100.499072,กรุงเทพมหานคร,ราษฎร์บูรณะ
1,18645,13.844767,100.53752,กรุงเทพมหานคร,ลาดกระบัง


In [8]:
# Handle สาธร
df_loc = df_loc.replace('สาธร', 'สาทร')
df_loc[df_loc.district=='สาธร']

Unnamed: 0,deed_no_ref,ltt,lgt,province,district


# 2. Web Scaraping
For this notebook, I use selenium for webscraping and using chrome as the main browser.

Download chrome driver >> https://chromedriver.chromium.org/downloads (Please check your version of chrome)

### 2.1 District converter for web scraping
using **thai_loc.json**, and **bkk_loc.json** that I attached with this jupyter notebook or if it got some problem, you can create it by yourself from **df_preparation.ipynb**

- **thai_loc.json:** All district converter of all Thai province (Don't support changing to nearby district)
- **bkk_loc.json:** All district converter of Bangkok (Support changing to nearby district in bangkok)

In [9]:
import json
# district_converter is for converting district name to web-text name
converter = 'bkk_loc.json' # thai_lco or bkk_loc

with open(converter, 'r', encoding='utf-8') as json_file:
    district_converter = json.loads(json_file.read())

### 2.2 Open webdriver

#### Open webdriver and access website
### 🚨 Please wait until the webdriver is already!!

In [10]:
from selenium import webdriver
import pandas as pd
import os

#Open webdriver 
options = webdriver.ChromeOptions()
options.add_argument("--window-size=1066,792") # The window size affect to move_px!
driver = webdriver.Chrome(driver_path, options=options)

# Access website
driver.get('https://landsmaps.dol.go.th/')

  


#### Reload and Close pop-up that occur when access website

In [11]:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.select import Select

def reload():
    driver.refresh()
    # WEB UPDATE: DON'T HAVE POP UP ANYMORE
    try:
        # wait until close pop-up is clickable
        wait = WebDriverWait(driver, 4)
        wait.until(EC.element_to_be_clickable((By.XPATH, '//*[@id="modal_news"]/div/div/div[1]/button'))).click()
    except:
        None
    
    # change map layer to OpenStreetMap
    driver.find_element(By.XPATH, '//*[@id="layer"]').click()
    wait = WebDriverWait(driver, 10)
    wait.until(EC.visibility_of_element_located((By.XPATH, '//*[@id="OpenStreetMap"]'))).click()

### 2.3 Access to interest location

In [12]:
# Fill province, district, and deed_no_ref to website text-field
def access_loc(province, district_web, deed_no_ref):    
    # Set province in website
    province_select = Select(driver.find_element(By.XPATH, '//*[@id="cbprovince"]'))
    province_select.select_by_visible_text(province)
    
    # Set district in website
    district_select = Select(driver.find_element(By.XPATH, '//*[@id="cbamphur"]'))
    district_select.select_by_visible_text(district_web)
    
    # Set deed_no_ref in website
    deedno_select = driver.find_element(By.XPATH, '//*[@id="txtparcelno"]')
    deedno_select.clear()
    deedno_select.send_keys(str(deed_no_ref))

    # click find button
    wait = WebDriverWait(driver, 10)
    wait.until(EC.visibility_of_element_located((By.XPATH, '//*[@id="btnSearch"]'))).click()

#### Move canvas by cursor

In [13]:
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.actions.wheel_input import ScrollOrigin
import time

# set pin to center of canvas
def move_up(px):
    canvas = driver.find_element(By.XPATH, '//*[@id="cesiumContainer"]/div[1]/div[1]/div/canvas')
    action = ActionChains(driver)
    action.click_and_hold(canvas)\
        .move_to_element_with_offset(canvas, 0, -1*px)\
        .pause(1)\
        .release()\
        .perform()
    
# Zoom in/out (delY<0: zoom in, delY>0: zoom out)
def zoom_in(delY):
    canvas = driver.find_element(By.XPATH, '//*[@id="cesiumContainer"]/div[1]/div[1]/div/canvas')
    action = ActionChains(driver)
    action.move_to_element_with_offset(canvas, 0, 0)
    canvas_origin = ScrollOrigin.from_element(canvas, 0, 0)
    action.scroll_from_origin(canvas_origin, 0, delY).perform()

### 2.4 Get info by location

In [14]:
def get_loc_info(df_loc):
    
    df_out = df_loc.copy()
    try: 
        # Click office button -> If don't have, the location doesn't access
        wait = WebDriverWait(driver, 4)
        wait.until(EC.element_to_be_clickable((By.XPATH, '//*[@id="accordion"]/div/div[3]'))).click()
        time.sleep(0.5)

        # collect dealing file number - หน้าสำรวจ
        try:
            deal_no = int(driver.find_element(By.XPATH, '//*[@id="demo1"]/div[2]/div[2]').text)
            df_out['deal_no'] = deal_no
        except:
            df_out['deal_no'] = None

        # collect land number - เลขที่ดิน
        try:
            land_no = int(driver.find_element(By.XPATH, '//*[@id="demo1"]/div[3]/div[2]').text)
            df_out['land_no'] = land_no
        except:
            df_out['land_no'] = None

        # collect sheet - ระวาง
        try:
            sheet = driver.find_element(By.XPATH, '//*[@id="demo1"]/div[4]/div[2]').text
            df_out['sheet'] = sheet
        except:
            df_out['sheet'] = None

        # collect area - ไร่, งาน, ตารางวา
        try:
            area = driver.find_element(By.XPATH, '//*[@id="demo1"]/div[8]/div[2]').text
            rai = float(area.split(' ')[0].replace(',',''))
            ngan = float(area.split(' ')[2].replace(',',''))
            talangwa = float(area.split(' ')[4].replace(',',''))
            df_out['area'] = area
            df_out['rai'] = rai
            df_out['ngan'] = ngan
            df_out['talangwa'] = talangwa
        except:
            df_out['area'] = None
            df_out['rai'] = None
            df_out['ngan'] = None
            df_out['talangwa'] = None

        # collect estimation price - ราคาประเมินที่ดิน
        try:
            est_price = driver.find_element(By.XPATH, '//*[@id="demo1"]/div[9]/div[2]').text
            est_price = float(est_price.split(' ')[0].replace(',',''))
            df_out['est_price'] = est_price
        except:
            df_out['est_price'] = None

        # collect parcel coordinates - ค่าพิกัดแปลง
        try:
            parcel_coor = driver.find_element(By.XPATH, '//*[@id="demo1"]/div[10]/div[2]/a').text
            parcel_ltt = float(parcel_coor.split(',')[0])
            parcel_lgt = float(parcel_coor.split(',')[1])
            df_out['parcel_ltt'] = parcel_ltt
            df_out['parcel_lgt'] = parcel_lgt
        except:
            df_out['parcel_ltt'] = None
            df_out['parcel_lgt'] = None

        # collect land office - สำนักงานที่ดิน
        try:
            land_office = driver.find_element(By.XPATH, '//*[@id="demo2"]/div[1]/div[2]').text
            df_out['land_office'] = land_office
        except:
            df_out['land_office'] = None

        # collect office coordinates - ค่าพิกัดสำนักงานที่ดิน
        try:
            office_coor = driver.find_element(By.XPATH, '//*[@id="demo2"]/div[7]/div[2]').text
            office_ltt = float(office_coor.split(',')[0])
            office_lgt = float(office_coor.split(',')[1])
            df_out['office_ltt'] = office_ltt
            df_out['office_lgt'] = office_lgt 
        except:
            df_out['office_ltt'] = None
            df_out['office_lgt'] = None

        # Close office button
        driver.find_element(By.XPATH, '//*[@id="accordion"]/div/div[3]').click()
        time.sleep(0.5)
    
    except:
#         print("ERROR: can't find contour of POI")
        df_out['error'] = "can't find contour of POI"
 
    return df_out

In [15]:
# To prevent the driver click any link, this function will close the other tab
def close_multiple_tab():
    # If driver has multiple tab in browser
    while len(driver.window_handles)>1:

        # Switch to the new window and close it
        driver.switch_to.window(driver.window_handles[1])
        driver.close()

        # Switching to old tab
        driver.switch_to.window(driver.window_handles[0])

### 2.5 Image capturing

In [16]:
# hide website element
def hide_menu():
    js_script = '''
    document.querySelector(".wrapper").style.visibility = "hidden";
    document.querySelector(".navbar").style.visibility = "hidden";
    document.querySelector(".myDiv").style.visibility = "hidden";
    document.querySelector(".cesium-viewer-toolbar").style.visibility = "hidden";
    document.querySelector("#layer").style.visibility = "hidden";
    document.querySelector("#menu").style.visibility = "hidden";

    '''
    driver.execute_script(js_script)
    
# show website element
def show_menu():
    js_script = '''
    document.querySelector(".wrapper").style.visibility = "visible";
    document.querySelector(".navbar").style.visibility = "visible";
    document.querySelector(".myDiv").style.visibility = "visible";
    document.querySelector(".cesium-viewer-toolbar").style.visibility = "visible";
    document.querySelector("#layer").style.visibility = "visible";
    document.querySelector("#menu").style.visibility = "visible";

    '''
    driver.execute_script(js_script)

#### Capture canvas

In [17]:
from PIL import Image
from io import BytesIO

def get_canvas_info():
    canvas = driver.find_element(By.XPATH, '//*[@id="cesiumContainer"]/div[1]/div[1]/div/canvas')
    canvas_loc = canvas.location
    canvas_size = canvas.size
    return canvas_loc, canvas_size

def capture_img(canvas_loc, canvas_size):
    # hide menu for capturing
    hide_menu()

    # saves screenshot of entire page
    canvas = driver.find_element(By.XPATH, '//*[@id="cesiumContainer"]/div[1]/div[1]/div/canvas')
    png = canvas.screenshot_as_png

    # uses PIL library to open image in memory
    capture = Image.open(BytesIO(png))

    # crop only canvas
    left = canvas_loc['x']
    top = canvas_loc['y']
    right = canvas_loc['x'] + canvas_size['width']
    bottom = canvas_loc['y'] + canvas_size['height']

    capture = capture.crop((left, top, right, bottom)) # defines crop points
    show_menu()
    # set capture size (handle in case of zoom-website)
    capture_slice = np.array(capture)[:canvas_size['height'], :canvas_size['width'], :]
    return capture_slice

### 2.6 Image processing

In [18]:
import cv2
import numpy as np
from matplotlib import pyplot as plt

In [19]:
# Filter only red color from image
def filter_red(img):
    im = img.copy()
    red_im = im[:,:,0]
    green_im = im[:,:,1]
    blue_im = im[:,:,2]
    red_filtered = (red_im > 200) & (green_im < 200) & (blue_im < 200)
    
    # Closing morphology
    img = np.array(red_filtered * 255, dtype = np.uint8)
    shape = red_filtered.shape
    return img, shape

In [20]:
# Check that the contour is bound the interesting land parcel
def check_loc(contours, loc_idx, shape):
    # fill contours
    contour_loc = contours[loc_idx]
    fill_contour = np.zeros(shape=(shape), dtype=np.uint8)
    cv2.fillPoly(fill_contour, [contour_loc], 255)
    # check the pin location is filled or not
    if(fill_contour[shape[0]//2, shape[1]//2] == 255):
        return True
    else:
        return False

In [21]:
# Get all edge of image, and get contour of interesting location
def get_map_line(red_filtered):    
    img = red_filtered

    # apply morphology close
    kernel = np.ones((3,3), np.uint8)
#     thresh = cv2.erode(img, kernel, iterations = 1)
    thresh = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)

    # get contours and filter on area
    contours, hierarchy = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    contours_cp = contours

    # get location contours from hierarchy [Next, Previous, First_Child, Parent]
    loc_idx = -1
    contours_del = []
    for hier_idx, pin in enumerate(hierarchy[0]):
        if pin[0] == -1 and pin[1] == -1 and pin[2] == -1:
            # Case: Full pin
            pin_idx = pin[3]
            pin_parent_idx = hierarchy[0][pin_idx][3]
            # check correctness
            if check_loc(contours, pin_parent_idx, img.shape):
                contours_del.append(pin_idx)
                contours_del.append(hier_idx)
                loc_idx = pin_parent_idx
#             # Case: Half pin
#             loc_idx = pin[3]
#             contours_del.append(hier_idx)
    # delete pin
    pin_del = [idx for idx in set(contours_del)]
    contours_cp = np.delete(contours_cp, pin_del)
    # draw all contours
    map_line = np.zeros(shape=(red_filtered.shape), dtype=np.uint8)
    for c in contours_cp:
        cv2.drawContours(map_line, [c], -1, 255, 2)
    return loc_idx, contours, hierarchy, map_line

In [22]:
# Get contour of interest location and points of contour
def get_interest_loc(contours, loc_idx, shape):
    # fill location contours
    contour_loc = contours[loc_idx]
    contour_line = np.zeros(shape=(shape), dtype=np.uint8)
    cv2.drawContours(contour_line, [contour_loc], -1, 255, 2)

    # dilation for increasing point size
    kernel = np.ones((5,5), np.uint8)
    contour_point = cv2.dilate(contour_line, kernel, iterations=1)
    
    # Find center of contour
    M = cv2.moments(contour_loc)
    if M['m00'] != 0:
        cx = int(M['m10']/M['m00'])
        cy = int(M['m01']/M['m00'])
        center = [cy, cx]
    else:
        center = None
    return contour_loc, contour_line, contour_point, center

In [23]:
# Detect corner point of interesting shape from contour_line
def corner_detection(contour_line, contour_loc, shape):
    # Find corner point
    dst = cv2.cornerHarris(contour_line,2,3,0.04)
    cor_point = np.zeros(shape=(shape), dtype=np.uint8)
    cor_point[dst>0.08*dst.max()] = 255
    
    # Dilation -> increase size of point
    kernel = np.ones((5,5), np.uint8)
    large_cor_point = cv2.dilate(cor_point, kernel, iterations=1)
    
    # draw contour point
    contour_point = np.zeros(shape=(shape), dtype=np.uint8)
    for point in contour_loc:
        x, y = point[0]
        contour_point[y][x] = 255
    
    # Find the point that intersect between contour point and corner point
    corner_points = np.bitwise_and(large_cor_point, contour_point)
    
    # Send non-zero position in array as output
    corner_points1 = np.transpose(np.nonzero(np.copy(corner_points))).tolist()
    return corner_points1

In [24]:
# Detection intersection point from all contour of image
def intersec_detection(map_line, contour_loc, shape):
    # thining all contour of image
    skeleton = cv2.ximgproc.thinning(map_line, None, 1)
    _, binaryImage = cv2.threshold(skeleton, 128, 10, cv2.THRESH_BINARY)


    # Set the intersections kernel:
    h = np.array([[1, 1, 1],
                  [1, 10, 1],
                  [1, 1, 1]])

    # Convolve the image with the kernel
    imgFiltered = cv2.filter2D(binaryImage, -1, h)

    # Prepare the final mask of points
    (height, width) = binaryImage.shape
    pointsMask = np.zeros((height, width, 1), np.uint8)

    # Perform convolution and create points mask
    thresh = 130
    # Locate the threshold in the filtered image
    pointsMask = np.where(imgFiltered == thresh, 255, 0)

    # Convert and shape the image to a uint8 height x width x channels
    pointsMask = pointsMask.astype(np.uint8)

    # dilation for increasing point size
    kernel = np.ones((3,3), np.uint8)
    intersec_point = cv2.dilate(pointsMask, kernel, iterations=2)

    # draw contour points
    contour_point = np.zeros(shape=(shape), dtype=np.uint8)
    for point in contour_loc:
        x, y = point[0]
        contour_point[y][x] = 255
    
    # Find the point that intersect between contour points and intersection points
    result = np.bitwise_and(contour_point, intersec_point)
    
    # Send non-zero position in array as output
    corner_points2 = np.transpose(np.nonzero(np.copy(result))).tolist()
    return corner_points2

In [25]:
# Merge corner points and intersection points together and sort in the order of contour points
def merge_point(corner_points1, corner_points2, contour_loc, shape, center):
    # merge corner point and intersection point
    corner_points = corner_points1 + corner_points2
    
    # get contour point
    contour_point = []
    for point in contour_loc:
        x, y = point[0]
        contour_point.append([y,x])
    
    # sort merge list by contour points
    corner_points.sort(key=lambda point: contour_point.index(point))

    # combine closed point together
    count = 1
    while(count):
        count = 0
        for idx, point in enumerate(corner_points):
            a, b = point
            match = 0
            dist = 7
            for other_point in corner_points[idx:]:
                if other_point == point:
                    continue
                c, d = other_point
                if abs(c-a) < dist and abs(d-b) < dist:
                    match = 1
                    corner_points[idx] = [(c+a)//2, (d+b)//2]
                    corner_points.remove(other_point)
                    count = count + 1
            
    return corner_points

### 2.7 Get ltt, lgt from website

In [26]:
# unable mouse interact when pointing on canvas in website
def unable_mouse_interact():
    js_script = '''
    document.querySelector("#cesiumContainer > div.twipsy.right").style['pointer-events'] = "none";
    '''
    driver.execute_script(js_script)

In [27]:
# get list of ltt, lgt on canvas from the result of corner points
def get_ltt_lgt(corner_points):
    # open toolbox
    driver.find_element(By.XPATH, '//*[@id="imageButton"]').click()
    wait = WebDriverWait(driver, 10)
    wait.until(EC.visibility_of_element_located((By.XPATH, '//*[@id="marker"]'))).click()
    hide_menu()
    unable_mouse_interact()

    location = []
    # point the corner location
    for point in corner_points:
        x, y = point
        canvas = driver.find_element(By.XPATH, '//*[@id="cesiumContainer"]/div[1]/div[1]/div/canvas')
        action = ActionChains(driver)
        action.move_to_element_with_offset(canvas, y-canvas.rect['width']//2, x-canvas.rect['height']//2).perform()
#         action.move_to_element_with_offset(canvas, y, x).perform()
        loc = driver.find_element(By.XPATH, '//*[@id="cesiumContainer"]/div[2]/div[2]').text
        location.append(loc)

    # close toolbox
    show_menu()
    wait.until(EC.visibility_of_element_located((By.XPATH, '//*[@id="clear"]'))).click()
    driver.find_element(By.XPATH, '//*[@id="imageButton"]').click()
    return location

In [28]:
# save raw image and result of model in selection path
def preview_img(corner_points, capture, shape, deed_no_ref, district, capture_dir, preview_dir, capture_img_flg, preview_img_flg):
    # Dilation -> Increase corner point size
    kernel = np.ones((3,3), np.uint8)
    test = np.zeros(shape=(shape), dtype=np.uint8)
    for point in corner_points:
        x, y = point
        test[x][y] = 255
    test = cv2.dilate(test, kernel, iterations=2)
    # get point as list of pixel
    test = np.transpose(np.nonzero(np.copy(test))).tolist()

    # set capture image as background
    bg = np.array(capture)
    raw = Image.fromarray(bg)
    # save capture image to selection path
    if capture_img_flg:
        raw.save(preview_dir+str(deed_no_ref)+'_'+str(district)+'_capture.png')
    
    # plot point on image
    for point in test:
        x, y = point
        bg[x][y] = np.array([0,0,255,255])
    
    # save capture image with the result point to selection path
    preview = Image.fromarray(bg)
    if preview_img_flg:
        preview.save(preview_dir+str(deed_no_ref)+'_'+str(district)+'.png')

## 3.COMBINE ALL FUNCTION

In [29]:
def check_ltt_lgt(df, max_diff):
    if 'parcel_ltt' not in df.keys():
        return False
    
    ltt = df['ltt']
    lgt = df['lgt']
    parcel_ltt = df['parcel_ltt']
    parcel_lgt = df['parcel_lgt']
    diff_ltt = abs(ltt - parcel_ltt)
    diff_lgt = abs(lgt - parcel_lgt)
    if diff_ltt <= 0.0001 and diff_lgt <= 0.0001:
        return True
    else:
        return False

In [30]:
# Scraping red parcel
from math import sqrt
import numpy as np
def red_deed_scraping(deed_no_ref, 
                      province, 
                      district,
                      ltt,
                      lgt,
                      nearby_district_mode=True,
                      max_diff=1e-4,
                      collect_shape_mode=True, 
                      capture_img_flg=False,
                      preview_img_flg=False):
    # Set website-zoom as 100%
    canvas_loc, canvas_size = get_canvas_info()
    driver.execute_script("document.body.style.zoom='100%'")
    
    # set result of Series
    df = {}
    df['deed_no_ref'] = deed_no_ref
    df['province'] = province
    df['district'] = district
    df['ltt'] = ltt
    df['lgt'] = lgt
    df['ltt_poly'] = None
    df['lgt_poly'] = None
    df['error'] = None
    
    # correct all nearby district
    district_list = district_converter[province][district]
    ltt_lgt_checker = False
    
    # loop until it can access location (In nearby district mode)
    count_found = 0
    parcel_list = []
    parcel_list_diff = []
    dist_list = []
    mini_distance = np.Inf
    for dist_i, district_web in enumerate(district_list):
        try:
            # refresh website
            reload()
            time.sleep(0.5)
        except:
            # CASE: Can't click openstreetmap layer
            print("ERROR: Problem with reload")
            df['error'] = "problem with reload"
            
        try:
            # access location in website
            access_loc(province, district_web, deed_no_ref)
            time.sleep(3)
            df = get_loc_info(df)
            time.sleep(0.5)

            # close info box
            wait = WebDriverWait(driver, 5)
            wait.until(EC.element_to_be_clickable((By.XPATH, '//*[@id="accordion"]/div/div[12]/button[3]'))).click()
            time.sleep(1)

            # prevent driver click any link
            close_multiple_tab()

            # check ltt and lgt of parcel is correct or not
            if check_ltt_lgt(df, max_diff):
                ltt_lgt_checker = True
                df['web_district'] = district_web.split('-')[-1]
                break
            else:
                if dist_i==0:
                    df['found1st_wrong'] = 1
                count_found = count_found+1
                parcel_list.append((df['parcel_ltt'], df['parcel_lgt']))
                diff_ltt = df['parcel_ltt']-df['ltt']
                diff_lgt = df['parcel_lgt']-df['lgt']
                parcel_list_diff.append((diff_ltt, diff_lgt))
                dist_list.append(district_web)
                df['found_land'] = count_found
                df['parcel_found'] = parcel_list
                df['parcel_diff_found'] = parcel_list_diff
                df['district_found'] = dist_list

                # find minimal distance
                distance = sqrt(diff_ltt**2 + diff_lgt**2)
                if distance < mini_distance:
                    mini_distance = distance
                    df['mini_distance_district'] = district_web
                    df['mini_distance_index'] = len(parcel_list_diff)-1
        except:
            None 
        
        # Break if not nearby district mode for getting only first district
        if not nearby_district_mode:
            break
    
    if not ltt_lgt_checker:
        if mini_distance<np.Inf:
            access_loc(province, df['mini_distance_district'], deed_no_ref)
            df['web_district'] = df['mini_distance_district']
            time.sleep(3)
            df = get_loc_info(df)
            time.sleep(0.5)

            # close info box
            wait = WebDriverWait(driver, 5)
            wait.until(EC.element_to_be_clickable((By.XPATH, '//*[@id="accordion"]/div/div[12]/button[3]'))).click()
            time.sleep(1)
            
            # prevent driver click any link
            close_multiple_tab()
        else:
            # CASE: Can't access interesting location (Wrong data or Website doesn't have location data)
            print("ERROR: can't access POI data")
            df['error'] = "can't access POI data"
            return df
         
    # prevent driver click any link
    close_multiple_tab()

    try:
        # Set pin to the center and zoom to location
        move_up(move_px) # window 146, mac 169
        for i in range(4):
            zoom_in(-400)
        time.sleep(2)

    except:
        # CASE: Found problem while zooming (Selenium initial problem)
        print("ERROR: problem with zoom in")
        df['error'] = "problem with zoom in"
        return df
    
    # Set website-zoom to 50% (for smaller pin size)
    driver.execute_script("document.body.style.zoom='50%'")
    # Capture canvas
    capture = capture_img(canvas_loc, canvas_size)
    # Filter red color from image
    red_filtered, shape = filter_red(capture)
    
    try:
        # Get contour of interested location
        loc_idx, contours, hierarchy, map_line = get_map_line(red_filtered)

        # Zoom out until interested location is closed area
        count = 0
        while(loc_idx < 0 and count < 6):
            zoom_in(200)
            time.sleep(2)
            # Set webiste-zoom to 50% (For smaller pin size, but thinner red line)
            driver.execute_script("document.body.style.zoom='50%'")
            capture = capture_img(canvas_loc, canvas_size)
            red_filtered, shape = filter_red(capture)
            try:
                loc_idx, contours, hierarchy, map_line = get_map_line(red_filtered)
                if(not check_loc(contours, loc_idx, shape)):
                    loc_idx = -1
            except:
                loc_idx = -1
            # Set website-zoom to 100% (For bigger red line, but bigger pin size)
            if loc_idx == -1:
                driver.execute_script("document.body.style.zoom='100%'")
                capture = capture_img(canvas_loc, canvas_size)
                red_filtered, shape = filter_red(capture)
                try:
                    loc_idx, contours, hierarchy, map_line = get_map_line(red_filtered)
                except:
                    count = count + 1
                    continue
            try:
                if(not check_loc(contours, loc_idx, shape)):
                    loc_idx = -1
                    count = count + 1
                    continue
            except:
                count = count + 1
                continue
                
    except:
        # CASE: Can't find contour of location in image (too thin red line, pin touch red line, etc.)
        print("ERROR: can't find contour of POI")
        df['error'] = "can't find contour of POI"
        return df
    
    if(count == 6):
        # CASE: Can't find contour of location in image after 8 times zoom out (too large location)
        print("ERROR: can't find contour of POI")
        df['error'] = "can't find contour of POI"
        return df
    else:
        # Set website-zoom to 100% for using website tool element
        driver.execute_script("document.body.style.zoom='100%'")
        try:
            # Image processing
            contour_loc, contour_line, contour_point, center = get_interest_loc(contours, loc_idx, shape)
            corner_points1 = corner_detection(contour_line, contour_loc, shape)
            corner_points2 = intersec_detection(map_line, contour_loc, shape)
            corner_points = merge_point(corner_points1, corner_points2, contour_loc, shape, center)
            # Get ltt and lgt from location
            location = get_ltt_lgt(corner_points)
            # Collect ltt and lgt and save to df Series
            ltt_list = []
            lgt_list = []
            for loc in location:
                if loc != '':
                    ltt_list.append(float(loc.split(', ')[0]))
                    lgt_list.append(float(loc.split(', ')[1]))
            df['ltt_poly'] = ltt_list
            df['lgt_poly'] = lgt_list
            # Save image for checking the result of model
            if capture_img_flg or preview_img_flg:
                preview_img(corner_points, capture, shape, df['deed_no_ref'], df['district'], capture_dir, preview_dir, capture_img_flg, preview_img_flg)
            df['error'] = None
            return df
        except:
            # CASE: Problem with getting ltt and lgt (website tool element problem)
            print("ERROR: can't get ltt, lgt of POI")
            df['error'] = "can't get ltt, lgt of POI"
            return df

## 4. Apply model to dataframe

### 🚨 If this cell got an error please reload the web driver site and run this cell again 
until the first location can access (If the first location work correctly, the next scraping can handle itself)

In [31]:
import warnings
warnings.filterwarnings('ignore')

start = 0
df_loc_i = df_loc[start:]
length = len(df_loc)

data_dict_list = []
for i in range(len(df_loc_i)):
    print(f"I'm doing [{i+1+start}/{length}]")
    # get info
    deed_no_ref = df_loc_i.iloc[i][deed_col_name]
    province = df_loc_i.iloc[i][prov_col_name]
    district = df_loc_i.iloc[i][dist_col_name]
    ltt = df_loc_i.iloc[i][ltt_col_name]
    lgt = df_loc_i.iloc[i][lgt_col_name]
    
    # scraping
    data_out = red_deed_scraping(deed_no_ref, province, district, ltt, lgt, nearby_district_mode, 1e-4, collect_shape_mode, capture_img_flg, preview_img_flg)
    # add data to dict list
    data_dict_list.append(data_out)
    current_df = pd.DataFrame(data_dict_list)
    
    # save to csv
    current_df.to_csv(output_df_path, index=False, encoding='utf-8-sig')

I'm doing [1/2]
I'm doing [2/2]


## 6. Close web driver when finish

In [32]:
driver.close()