# Houses and Empty Lots for Sale in New Brunswick (as of June 12, 2020)

In this project, I scraped data from this [website](https://www.point2homes.com/CA/Real-Estate-Listings/NB.html) which lists houses and 
empty lots for sale. The selling price is listed along with the lot size. For houses, the number of bedrooms, bathrooms, the house size the house type are also given.

## 1. Data Cleaning (Duplicates and Text Formatting)

In [1]:
# load libraries
import numpy as np
import pandas as pd
import requests
import re
from bs4 import BeautifulSoup
from tqdm.notebook import tqdm

# read csv file
df = pd.read_csv("houseprices_nbf.csv")
df.head(5)

Unnamed: 0,Address,Beds,Baths,House Size,Lot Size,Type,Price
0,"10 Robin Dr, Fredericton, New Brunswick, E3C 1K6",5.0,2.0,1600.0,0.177,\r\n Residential\r\n,259900
1,"03-2 Glebe Rd, Saint Andrews, New Brunswick",,,,1.14,\r\n Residential\r\n,11400
2,"62 Parkin Street, Salisbury, New Brunswick, E4...",3.0,2.0,3790.0,0.85,\r\n Residential\r\n,549900
3,"14 Murray Lane, St. Andrews, New Brunswick",4.0,2.0,2200.0,1.15,\r\n Residential\r\n,449500
4,"140 Orleans St., Dieppe, New Brunswick, E1A 1W9",4.0,3.0,2808.0,0.124,\r\n Residential\r\n,236900


In [3]:
# df['Type'] has '\r\n' at the beginning and end, so we remove it
df['Type'] = df['Type'].str.replace(r'\r\n','')

df.head(5)

Unnamed: 0,Address,Beds,Baths,House Size,Lot Size,Type,Price
0,"10 Robin Dr, Fredericton, New Brunswick, E3C 1K6",5.0,2.0,1600.0,0.177,Residential,259900
1,"03-2 Glebe Rd, Saint Andrews, New Brunswick",,,,1.14,Residential,11400
2,"62 Parkin Street, Salisbury, New Brunswick, E4...",3.0,2.0,3790.0,0.85,Residential,549900
3,"14 Murray Lane, St. Andrews, New Brunswick",4.0,2.0,2200.0,1.15,Residential,449500
4,"140 Orleans St., Dieppe, New Brunswick, E1A 1W9",4.0,3.0,2808.0,0.124,Residential,236900


It would not make sense to feed in the complete address to our machine learning algorithm. As such we would only be getting the postal code. If there is no postal code, fortunately, we can get the postal code of a given address on [www.geocoder.ca](http://www.geocoder.ca).

In [4]:
# Create column for postal code
df['Postal Code'] = np.nan

for index, address in tqdm(df['Address'].items(),total=len(df['Address'])):
    if address[-7] == 'E' and address[-6].isnumeric():
        df['Postal Code'][index] = address[-7:]
    else:
        print(address)
        post_data = {'locate': address}
        url = 'http://www.geocoder.ca'
        html_result = requests.post(url, data=post_data).text
        soup = BeautifulSoup(html_result,'lxml')
        
        complete_address = soup.find('title').text
        postal_code = re.search('E[0-9][A-Z][0-9][A-Z][0-9]',complete_address)
        print(postal_code is None)
        
        if postal_code is not None:
            print(postal_code.group(0))
            df['Postal Code'][index] = postal_code.group(0)

HBox(children=(FloatProgress(value=0.0, max=720.0), HTML(value='')))

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)


03-2 Glebe Rd, Saint Andrews, New Brunswick
False
E5B2Z5
14 Murray Lane, St. Andrews, New Brunswick


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


True
41 Eagle's Passage, Greater Saint Andrews, New Brunswick
False
E5B0A9
7725 Rte 515, Saint-Paul, New Brunswick
False
E4T3R4
133 Daigle Rd., Pointe-Sapin, New Brunswick
False
E9A1T7
294 Carleton St, Saint Andrews, New Brunswick
False
E5B1P2
3574 Route 127, Saint Andrews, New Brunswick
False
E5B2W5
175 Daigle Rd, Pointe-Sapin, New Brunswick
False
E9A1T7
5185 Route 127, St. Andrews, New Brunswick
False
E5B3A7
48 Augustus St, Saint Andrews, New Brunswick
False
E5B2E7
83 Don-De Dieu Dr, Bayside, New Brunswick
False
E5B3Y2
38 Ernest St, Saint Andrews, New Brunswick
False
E5B2E1
45 Argyle Court, Saint Andrews, New Brunswick
False
E5B2H7
133 Daigle Rd., Pointe-Sapin, New Brunswick
False
E9A1T7
117 Anne's Acres, Bayfield, NB, Murray Corner, New Brunswick
False
E4M3K8
45 Argyle Court, Saint Andrews, New Brunswick
False
E5B2H7
38 Ernest St, Saint Andrews, New Brunswick
False
E5B2E1
41 Eagle's Passage, Greater Saint Andrews, New Brunswick
False
E5B0A9
3574 Route 127, Saint Andrews, New Brunswi

False
E5A1P4
735 Bellefond Road, Miramichi, New Brunswick
False
E1V4V1
15-6 Winterport Way, Fredericton, New Brunswick
False
E3A1X5
24 Thompson Avenue, St. Stephen, New Brunswick
False
E3L2M1
51 Main Street, St. Stephen, New Brunswick
False
E3L1Z5
167 Todd's Point Road, Dufferin, New Brunswick
False
E3L3R6
Rrichardson Road, Deer Island, New Brunswick
False
E5V1J2
3888 Rte 620 Tay Creek, Greater Estey's Bridge, New Brunswick
False
E3G6K8
100 Winterport Way, Fredericton, New Brunswick
False
E3A1X5
2511 Route 774, Campobello Island, New Brunswick
False
E5E1L7
98-17 Ross Point Road, Greater Saint Andrews, New Brunswick
False
E5B3G5
- Salt Marsh Road, Saint Andrews, New Brunswick
False
E5B1R9
4973 Route 112, Greater Salisbury, New Brunswick
False
E4Z5T5
98-17 Ross Point Road, Greater Saint Andrews, New Brunswick
False
E5B3G5
- Salt Marsh Road, Saint Andrews, New Brunswick
False
E5B1R9
4973 Route 112, Greater Salisbury, New Brunswick
False
E4Z5T5
9 Bromfield, Moncton, New Brunswick, E1G0N7
F

False
E1A7L3
118 Route 735, Western Charlotte, New Brunswick
True
7453 Route 3, Baille, New Brunswick
True
Vacant Land Chamcook NB, Saint Andrews, New Brunswick
True
126 King Street, Saint Andrews, New Brunswick
False
E5B1Y6
131 Old School House Road, Birneys Lake, New Brunswick
True
- Rollingdam Road, Saint Stephen, New Brunswick
False
E5A1B2
182 Union Street, St. Stephen, New Brunswick
False
E3L1W1
182 Union Street, St. Stephen, New Brunswick
False
E3L1W1
Lot 89-1 Route 445, Fair Isle, New Brunswick
False
E9G2M7
Lot 15-1 Winterport Way, Fredericton, New Brunswick
False
E3A1X5
50 Queen Street East, St. Stephen, New Brunswick
False
E3L2J6
Lot #28 Streamfront, Piskahegan, New Brunswick
True
85 Lakeshore Drive, Bethel, New Brunswick
False
E5C1N0
100 acres Route 735, Western Charlotte, New Brunswick
True
Lot 89-4 Route 445, Fair Isle, New Brunswick
False
E9G2M7
Lot Route 530, Grande Digue, New Brunswick
True
2852 Route 117, Kouchibouguac, New Brunswick
False
E4X2P2
Vacant Lot #2 Rankine R

False
E7M2R5
102 Horton Road, Rothesay, New Brunswick, E2H1P8
False
E2H1P8
Lot Route 390, Greater Perth - Andover, New Brunswick
True
282 Price Road, Drummond, New Brunswick
False
E3Y2N7
26 Birchwood Place, Island View Heights, New Brunswick
False
E2M5G5
18 Islands, Greater Canterbury, New Brunswick
True
234 Edinburgh Street, Fredericton, New Brunswick, E3B2C9
False
E3B2C9
118 St John ST, Pointe-du-Chene, New Brunswick
False
E4P5G5
484 Bon Secours RD, St. Paul, New Brunswick
False
E4T3B4
560 Route 2896, Williamstown, New Brunswick
False
E7K1S8
2170 Route 885, Havelock, New Brunswick, E4Z5N5
False
E4Z5N5
LOT 12-2 Hope RD, Steeves Mountain, New Brunswick, E1G3Z1
False
E1G3Z1
50 Brigadoon Terrace, Rockwood Park, New Brunswick
True
27 Saint Paul's Street, Hampton, New Brunswick, E5N5P8
False
E5N5P8
03-11 Archangel Way, Keswick Ridge, New Brunswick, E6L0A5
False
E6L0A5
124 Valmont CRES, Dieppe, New Brunswick, E1A1N2
False
E1A1N2
Vacant Lot 17-1 John Chessie Road, Hanwell, New Brunswick
Fals

True
105 Route 5609, Mill Cove, New Brunswick
True
102 Route 1303, Upper Gagetown, New Brunswick, E5M1N2
True
4 Summer Point LANE, Baie - Verte, New Brunswick
True
13 Evergreen DR, Shediac, New Brunswick, E4P1R9
True
15 Clarendon DR, Moncton, New Brunswick, E1G0M8
True
154 Frenette, Beresford, New Brunswick, E8K1Y5
True
- Connors Lane, Bloomfield, New Brunswick
True
17 Leeswood Drive, Quispamsis, New Brunswick, E2G1N1
True
Lot 2016-2 Grand Pass Court, Greater Estey's Bridge, New Brunswick
True
4392 Heritage Drive, Tracy, New Brunswick, E5L1B9
True
7 IAN Street, Champlain Heights, New Brunswick
True
11 Coronation Court, Oromocto, New Brunswick, E2V2M9
True
1 Princess Street, Saint John Centre, New Brunswick
True
8 Rockport, Riverview, New Brunswick, E1B5L5
True
Lot Hillandale Drive, Grand Bay-Westfield, New Brunswick
True
176 Sherwood Road, Greater Woodstock, New Brunswick
True
532 Couturier Road, Saint-Joseph, New Brunswick
True
1793 Nicholas Denys, Nicholas - Denys, New Brunswick
True

In [5]:
df.head(20)

Unnamed: 0,Address,Beds,Baths,House Size,Lot Size,Type,Price,Postal Code
0,"10 Robin Dr, Fredericton, New Brunswick, E3C 1K6",5.0,2.0,1600.0,0.177,Residential,259900,E3C 1K6
1,"03-2 Glebe Rd, Saint Andrews, New Brunswick",,,,1.14,Residential,11400,E5B2Z5
2,"62 Parkin Street, Salisbury, New Brunswick, E4...",3.0,2.0,3790.0,0.85,Residential,549900,E4J 2N3
3,"14 Murray Lane, St. Andrews, New Brunswick",4.0,2.0,2200.0,1.15,Residential,449500,
4,"140 Orleans St., Dieppe, New Brunswick, E1A 1W9",4.0,3.0,2808.0,0.124,Residential,236900,E1A 1W9
5,"41 Eagle's Passage, Greater Saint Andrews, New...",3.0,3.0,2160.0,1.0,Residential,794997,E5B0A9
6,"7725 Rte 515, Saint-Paul, New Brunswick",2.0,1.0,1350.0,1.0,Residential,80000,E4T3R4
7,"133 Daigle Rd., Pointe-Sapin, New Brunswick",4.0,2.0,2168.0,0.99,Residential,169900,E9A1T7
8,"93 Principale, Memramcook, New Brunswick, E4K 1A7",5.0,4.0,5286.0,6.2,Residential,799900,E4K 1A7
9,"294 Carleton St, Saint Andrews, New Brunswick",2.0,2.0,1592.0,0.11,Residential,259000,E5B1P2


In [7]:
for index, address in tqdm(df['Address'].items(),total=len(df['Address'])):
    if address[-6] == 'E' and address[-5].isnumeric():
        df['Postal Code'][index] = address[-6:]

HBox(children=(FloatProgress(value=0.0, max=720.0), HTML(value='')))




A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [8]:
# count null values for postal code
print(df['Postal Code'].isna().sum())

91


In [9]:
df.tail(5)

Unnamed: 0,Address,Beds,Baths,House Size,Lot Size,Type,Price,Postal Code
715,"Lot 2019-1 3 Route, Brockway, New Brunswick",,,,,Vacant Land,49900,
716,"45 McAndrew ST, Moncton, New Brunswick, E1G4Z2",3.0,3.0,1460.0,,Single Family,389900,E1G4Z2
717,"985 Vanier, Bathurst, New Brunswick, E2A3N3",4.0,1.0,,,Single Family,139900,E2A3N3
718,"490 Marguerite, Dieppe, New Brunswick, E1A7H1",3.0,2.0,1806.0,,Single Family,224900,E1A7H1
719,"564 Pleasant Vale RD, Elgin, New Brunswick, E4...",2.0,2.0,1950.0,,Single Family,289900,E4Z2C7


In [10]:
df.to_csv('houseprice_withpostalcodes.csv',index=False)