# HOUSE PRICING DATA SCRAPING USING BEAUTIFUL SOUP
In this project, I scraped data from a rightmove.co.uk website using BeautifulSoup. I will use this data to analyse which part of London has the costliest houses.

For scraping data from this website, I'll perform the following tasks:

[**Task 1**](#task1): Importing the libraries

[**Task 2**](#task2): Creating the base url and choosing the header

[**Task 3**](#task3): Extracting product links on the first page

[**Task 4**](#task4): Extracting product links on all the pages

[**Task 5**](#task5): Extracting information of the first product

[**Task 6**](#task6): Extracting information of all the products



<a id='task1'></a>
# Task 1: Importing the libraries

In [2]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

<a id='task2'></a>
# Task 2: Creating the base url and choosing the header

In [3]:
base_url = 'https://www.rightmove.co.uk'

In [4]:
header = {
    'user-agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36"
}

<a id='task3'></a>
# Task 3: Extracting product links on the first page

In [5]:
source = requests.get('https://www.rightmove.co.uk/property-for-sale/find.html?searchType=SALE&locationIdentifier=REGION%5E87490&insId=1&radius=0.0&minPrice=&maxPrice=&minBedrooms=&maxBedrooms=&displayPropertyType=&maxDaysSinceAdded=&_includeSSTC=on&sortByPriceDescending=&primaryDisplayPropertyType=&secondaryDisplayPropertyType=&oldDisplayPropertyType=&oldPrimaryDisplayPropertyType=&newHome=&auction=false')
soup = BeautifulSoup(source.content, 'lxml')

In [6]:
productlist = soup.find_all('div', class_ = 'propertyCard-details')
print(productlist)

[<div class="propertyCard-details">
<a class="propertyCard-link" data-test="property-details" href="/properties/72172317">
<h2 class="propertyCard-title" itemprop="name">
            2 bedroom apartment for sale        </h2>
<address class="propertyCard-address" itemprop="address" itemscope="" itemtype="http://schema.org/PostalAddress">
<meta content="Damac Tower, SW8" itemprop="streetAddress"/>
<meta content="GB" itemprop="addressCountry"/>
<span>Damac Tower, SW8</span>
</address>
</a>
</div>, <div class="propertyCard-details">
<a class="propertyCard-link" data-test="property-details" href="/properties/77411832">
<h2 class="propertyCard-title" itemprop="name">
            Block of apartments for sale        </h2>
<address class="propertyCard-address" itemprop="address" itemscope="" itemtype="http://schema.org/PostalAddress">
<meta content="Investment Opportunity in Mayfair W1J" itemprop="streetAddress"/>
<meta content="GB" itemprop="addressCountry"/>
<span>Investment Opportunity in Ma

In [7]:
productlinks = []
for item in productlist:
    for link in item.find_all('a', href = True, class_ = 'propertyCard-link'):
        print(link['href'])
    


/properties/72172317
/properties/77411832
/properties/97973531
/properties/97872269
/properties/98010533
/properties/79055890
/properties/99907829
/properties/89685296
/properties/74956453
/properties/77223961
/properties/96434144
/properties/102076454
/properties/102229220
/properties/92781332
/properties/70700242
/properties/97153931
/properties/97132076
/properties/77413920
/properties/83798710
/properties/80794139
/properties/94277630
/properties/85982015
/properties/84680905
/properties/64810998
/properties/73552941


In [8]:
for item in productlist:
    for link in item.find_all('a', href = True, class_ = 'propertyCard-link'):
        productlinks.append(base_url + link['href'])
print(len(productlinks))

25


<a id='task4'></a>
# Task 4: Extracting product links on all the pages

In [25]:
productlinks = []



for i in range(0,100,12):
    source = requests.get(f'https://www.rightmove.co.uk/property-for-sale/find.html?locationIdentifier=REGION%5E87490&index={i}&propertyTypes=&includeSSTC=false&mustHave=&dontShow=&furnishTypes=&keywords=')
    soup = BeautifulSoup(source.content, 'lxml')
    productlist = soup.find_all('div', class_ = 'propertyCard-details')
    for item in productlist:
        for link in item.find_all('a', href = True, class_ = 'propertyCard-link'):
            productlinks.append(base_url + link['href']+ '#/')
print(len(productlinks))

225


<a id='task5'></a>
# Task 5: Extracting information of the first product

In [11]:
testlink = 'https://www.rightmove.co.uk/properties/77411832#/'
r = requests.get(testlink, headers = header)
soup = BeautifulSoup(r.content, 'lxml')
print(soup)

<!DOCTYPE html>
<html lang="en-GB" style="">
<head>
<meta charset="utf-8"/>
<title>Block of apartments for sale in Investment Opportunity in Mayfair W1J</title>
<meta content="width=device-width, shrink-to-fit=no, initial-scale=1.0, user-scalable=yes" name="viewport"/>
<meta content="telephone=no" name="format-detection"/>
<meta content="True" name="HandheldFriendly"/>
<meta content="Block of apartments for sale in Investment Opportunity in Mayfair W1J - Rightmove." name="description"/>
<!-- Favicons -->
<link href="//www.rightmove.co.uk/favicon.ico" rel="shortcut icon" type="image/vnd.microsoft.icon"/>
<meta content="Rightmove" name="apple-mobile-web-app-title"/>
<meta content="Rightmove" name="application-name"/>
<meta content="#262637" name="theme-color"/>
<meta content="app-id=323822803, app-argument=https://www.rightmove.co.uk/properties/77411832" name="apple-itunes-app"/>
<meta content="Rightmove Property Search" name="smartbanner:title"/>
<meta content="Rightmove" name="smartban

In [12]:
name = soup.find('h1', class_ = '_2uQQ3SV0eMHL1P6t5ZDo2q').text
print(name)

Investment Opportunity in Mayfair W1J


In [13]:
price = soup.find('div', class_ = '_1gfnqJ3Vtd1z40MlC0MzXu').text
print(price)

POA


In [14]:
propertytype = soup.find('div', class_ = '_1fcftXUEbWfJOJzIUeIHKt').text
print(propertytype)

Block of Apartments


In [15]:
tenure = soup.find('div', class_ = 'OD0O7FWw1TjbTD4sdRi1_').p
span = tenure.find_next('span').find_next('span')
ten = span.string
print(ten)

 Freehold


In [16]:
agent_name = soup.find('div', class_ ='fk2DXJdjfI5FItgj0w4Fd').h3.text
print(agent_name)

Wedgewood Estates, London


In [17]:
agent_address = soup.find('div', class_ = 'fk2DXJdjfI5FItgj0w4Fd').p.text
print(agent_address)

296 Kensington High Street,
London,
W14 8NZ


In [90]:
contact = soup.find('a', class_ = '_3E1fAHUmQ27HFUFIBdrW0u').text
agent_contact = contact.replace('Call agent:','')
print(agent_contact)

 020 8012 9067


In [91]:
house ={
    'name': name,
    'price': price,
    'propertytype': propertytype,
    'tenure': ten,
    'agent_name': agent_name,
    'agent_address':agent_address.replace('\r','').replace('\n',''),
    'agent_contact':agent_contact
    
} 
print(house)

{'name': 'One Kensington Gardens, Kensington Road, London W8', 'price': '£16,750,000', 'propertytype': 'Flat', 'tenure': ' Share of Freehold', 'agent_name': 'Strutt & Parker, London - New Homes', 'agent_address': '13 Hill Street,London,W1J 5LQ', 'agent_contact': ' 020 8012 9067'}


<a id='task6'></a>
# Task 6: Extracting information of all the products

In [94]:
houselist = []
for link in productlinks:
    r = requests.get(link, headers = header)
    soup = BeautifulSoup(r.content, 'lxml')
    try: 
        name = soup.find('div', class_ = '_1KCWj_-6e8-7_oJv_prX0H').h1.text
    except: 
        name = 'not given'
    try:
        price = soup.find('div', class_ = '_1gfnqJ3Vtd1z40MlC0MzXu').text
    except: 
        price = 'not given'
    try: 
        propertytype = soup.find('div', class_ = '_1fcftXUEbWfJOJzIUeIHKt').text
    except: 
        propertytype = 'not given'
    try:
        tenure = soup.find('div', class_ = 'OD0O7FWw1TjbTD4sdRi1_').p
        span = tenure.find_next('span').find_next('span')
        ten = span.string
    except: 
        tenure = 'not given'
    
    try: 
        agent_name = soup.find('div', class_ ='fk2DXJdjfI5FItgj0w4Fd').h3.text
    except: 
        agent_name = 'not given'
    try: 
        agent_address = soup.find('div', class_ = 'fk2DXJdjfI5FItgj0w4Fd').p.text
    except: 
        agent_address = 'not given'
    try:
        contact = soup.find('a', class_ = '_3E1fAHUmQ27HFUFIBdrW0u').text
        agent_contact = contact.replace('Call agent:','')
    except: 
        agent_contact = 'not given'
        
    house ={
    'name': name,
    'price': price,
    'propertytype': propertytype,
    'tenure': ten,
    'agent_name': agent_name,
    'agent_address':agent_address.replace('\r',' ').replace('\n',' '),
    'agent_contact':agent_contact
    } 
    houselist.append(house)
    print('Saving:', house['name'])

df = pd.DataFrame(houselist)    

Saving: Queens Wharf, 2 Crisp Road, Hammersmith, W6
Saving: Investment Opportunity in Mayfair W1J
Saving: Upper Grosvenor Street, London, W1K
Saving: Upper Grosvenor Street, London, W1K
Saving: Brook Street, London, W1K
Saving: Merton Lane, London, N6
Saving: Merton Lane, London, N6
Saving: South Street, Mayfair, London, W1K
Saving: South Street, Mayfair W1K
Saving: Princes Gate, London, SW7
Saving: Ilchester Place, Holland Park, London, W14
Saving: Cleveland Row, St James's, SW1A
Saving: One Hyde Park, Knightsbridge
Saving: Upper Phillimore Gardens, Kensington, London, W8
Saving: Upper Phillimore Gardens, London, W8
Saving: Culross Street, Mayfair, London, W1K
Saving: Culross Street, Mayfair, London, W1K
Saving: Elsworthy Road, London, NW3
Saving: Phillimore Gardens, Kensington, London
Saving: Elsworthy Road, Primrose Hill, London, NW3
Saving: Chapel Street, Belgravia SW1X
Saving: One Hyde Park, Knightsbridge, London, SW1X
Saving: St Johns Wood Park, St John's Wood, London, NW8
Saving

Saving: Marylebone Square, Moxon St, W1U
Saving: Oceanic House, Cockspur Street, London
Saving: Cockspur Street, London, SW1Y
Saving: Lancer Square, Kensington Church Street, London, W8
Saving: Connaught Place, Hyde Park, London, W2
Saving: Kensington Court, Kensington, London, W8
Saving: Rarely available freehold investment opportunity consisting of four outstanding apartments
Saving: Brompton Square, London, SW3
Saving: Godolphin Road, Shepherd's Bush, London, W12
Saving: Marylebone Square, Moxon St, Marylebone, W1U
Saving: Marylebone Square, Moxon Street, Marylebone, W1
Saving: Marylebone Square, Marylebone
Saving: Hamilton Terrace, St John's Wood, London, NW8 
Saving: The Bishops London, N2
Saving: Stratford Place, London, W1C
Saving: The Bishops Avenue, London, N2
Saving: Broomhouse Lane London SW6
Saving: Courtenay Avenue, London, N6
Saving: Courtenay Avenue, Highgate, London, N6
Saving: Cavendish Avenue, St John's Wood, London, NW8 
Saving: Hanover Terrace, London, NW1
Saving: C

# FINAL DATA

In [95]:
df

Unnamed: 0,name,price,propertytype,tenure,agent_name,agent_address,agent_contact
0,"Queens Wharf, 2 Crisp Road, Hammersmith, W6","£1,700,000",Apartment,Leasehold,"MyLondonHome, Central London",35 Catherine Place London SW1E 6DY,020 8012 5678
1,Investment Opportunity in Mayfair W1J,POA,Block of Apartments,Freehold,"Wedgewood Estates, London","296 Kensington High Street, London, W14 8NZ",020 8012 4872
2,"Upper Grosvenor Street, London, W1K","£54,500,000",Terraced,Freehold,"Knight Frank, Mayfair","120a Mount Street, London, W1K 3NN",020 8012 3476
3,"Upper Grosvenor Street, London, W1K","£54,500,000",Town House,Freehold,"Clifton Property Partners Ltd, London","3 Hill Street, London, W1J 5LB",020 7409 5087
4,"Brook Street, London, W1K","£46,000,000",not given,Freehold,"Knight Frank, Mayfair","120a Mount Street, London, W1K 3NN",020 8012 3476
...,...,...,...,...,...,...,...
220,"Lancer Square, Kensington Church Street, Londo...","£16,020,000",Penthouse,Leasehold,"Savills New Homes, Margaret Street",33 Margaret Street London W1G 0JD,020 8012 0342
221,"Connaught Place, Hyde Park, London, W2","£15,950,000",Flat,Leasehold,"Knight Frank, Hyde Park",1 Craven Terrace London W2 3QD,020 8012 4409
222,"Kensington Court, Kensington, London, W8","£15,500,000",Apartment,Freehold,"Sotheby's International Realty, London","77-79 Ebury Street, Belgravia, London, SW1...",020 8012 2565
223,Rarely available freehold investment opportuni...,"£15,500,000",Terraced,Freehold,"Berkeley Lettings & Management Limited, Knight...",45 Pont Street London SW1X 0BD,020 3402 4752
