ImmoScraping

What is this?

A python program using web scraping to create a dataset of Belgian real estate sales data.

Where does it come from?

This project was carried out in Sept 2020 in the context of the course Bouman 2.22 organized by BeCode, a network of inclusive coding bootcamps.

Who's there?

Three learners and collaborators, Emre Ozan, Joachim Kotek and Sara Silvente, designed the program.

Let's go through it!

The immoweb jupyter file provides the ways to scrap data from the results of a search in a real estate company website and store it in a csv file.

Here we used immoweb real estate company and looked for 10000 houses and appartments for sale across Belgium.

Currently, the program grasps the following information (stored in the dataset):

Locality
Type of property (House/apartment)
Subtype of property (Bungalow, Chalet, Mansion, ...)
Price
Type of sale (Exclusion of life sales)
Number of rooms
Area
Fully equipped kitchen (Yes/No)
Furnished (Yes/No)
Open fire (Yes/No)
Terrace (Yes/No)
- If yes: Area
Garden (Yes/No)
- If yes: Area
Surface of the land
Surface area of the plot of land
Number of facades
Swimming pool (Yes/No)
State of the building (New, to be renovated, ...)

Scraping data from the website

The first step is to get the url of each page of results by using webdriver and selector, from selenium and parsel libraries. Each page of the search results contains a number of links to houses for sale (here, around 30). For example, our search in immoweb yielded 333 pages of results. Note that you'll need to change the value below by yours.

We next get the url of each house to a csv file and repeat the same procedure with a search made on appartments, since they have a different url from the one of houses. You can skip this step if you use a website in which all results are contained in the same url.

Accessing the information

We get a the HTML parsed tree of each property by using the python Beautiful Soup package.

As you can see below, propertys' data is stored in the form of a python dictionary, and it seems to be assigned to a variable called "window.classified" under a <script> tag with the attributes type="text/javascript".

We use the string "window.classified" as a reference to select the part we are interested in, and then create a proper python dictionary by which property's attributes are treated as keys and values as values.

The dictionary as well as the entries for each property attribute are defined created by means of an instance methods in the class HouseApartmentScraping.

We iterate through all the urls that have been previously stored in the csv file and we store the values for each property attribute in a defaultdict.

Store the data in a csv file

Finally, we store our results in a csv file by using pandas dataframe, which you can also use to visualize the data. And voilà!

Features Dict Layout

house_dict = {
    "flags":{
    "isPublicSale": <class 'bool'>,
    "isNotarySale": <class 'bool'>,
    "isAnInteractiveSale": <class 'bool'>
    },
    "property": {
    "location": {
    "postalCode": <class 'str'>,
    },
    "building": {
    "condition": <class 'str'>,
    "facadeCount": <class 'int'>
    },
    "land": {
    "surface": <class 'int'>
    },
    "kitchen": {
    "type": <class 'str'>
    },
    "type": <class 'str'>,
    "subtype": <class 'str'>,
    "netHabitableSurface": <class 'int'>,
    "bedroomCount": <class 'int'>,
    "hasGarden": <class 'bool'>,
    "gardenSurface": <class 'int'>,
    "hasTerrace": <class 'bool'>,
    "terraceSurface": <class 'int'>,
    "fireplaceExists": <class 'bool'>,
    "hasSwimmingPool": <class 'bool'>
    },
    "transaction": {
    "sale": {
    "price": <class 'int'>,
    "isFurnished": <class 'bool'>
    }
    }
}

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.ipynb_checkpoints		.ipynb_checkpoints
csv_files		csv_files
screenshots		screenshots
src		src
web_drivers		web_drivers
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ImmoScraping

What is this?

Where does it come from?

Who's there?

Let's go through it!

Scraping data from the website

Accessing the information

Store the data in a csv file

Features Dict Layout

About

Releases

Packages

Contributors 3

Languages

jotwo/Challenge-collecting-data

Folders and files

Latest commit

History

Repository files navigation

ImmoScraping

What is this?

Where does it come from?

Who's there?

Let's go through it!

Scraping data from the website

Accessing the information

Store the data in a csv file

Features Dict Layout

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages