# Web Scraping Project - Properties Sales

In this project we are going to create a code from the beginning for web scraping and obtain information about the sales of different properties in sale.

### Importing Libraries

First we import the required libraries

In [1]:
import requests ## To do a request to our target webpage where the information we need is located
import pandas as pd ## To create a DataFrame so we can store the information we obtained
from bs4 import BeautifulSoup ## To extract the html information from the webpage in a more readable format

Next we use our url and realize a request to get the html for web scraping

In [2]:
url= "https://pythonizing.github.io/data/real-estate/rock-springs-wy/LCWYROCKSPRINGS/t=0&s=0.html"

# We create a request to our url to download the html of the webpage
req = requests.get(url)
html1 = req.content

# We create our soup object with the html obtained from the request
soup = BeautifulSoup(html1, "html.parser")

We observe inspecting the webpage that every box with the data of each property is saved under the class "propertyRow", so we use the method find_all to obtain those sections of the html text

In [3]:
all = soup.find_all("div", {"class": "propertyRow"})

Now from our html text with the class "propertyRow", we search for another part of the text so we can obtain the price of the property, and also we replace certain values in the text so we can get only the number value.

In [4]:
all[0].find("h4", {"class": "propPrice"}).text.replace("\n","").replace(" ","")

'$725,000'

Now we are going to create a loop so we can iterate between all the properties and obtain the data we need in a more efficient way

In [5]:
page_nr=soup.find_all("a",{"class":"Page"})[-1].text  ## We search for the total number of pages in the website
print(page_nr)

3


In [6]:
lista=[] ## An empty list to gather and save the information we scrape from the web
base_url = "https://pythonizing.github.io/data/real-estate/rock-springs-wy/LCWYROCKSPRINGS/t=0&s="

## We create the first section of the code where we iterate through the different pages and scrape the information
for page in range(0,int(page_nr)*10,10): ## From the website we observe that each page contains 10 properties
    url =base_url + str(page)+".html"
    req = requests.get(url)
    html1 = req.content
    soup = BeautifulSoup(html1, "html.parser")
    all=soup.find_all("div",{"class":"propertyRow"})

## Now in this section for each page we are going to iterate through every property, obtain the info and save it on a dictionary, then we proceed to
## put that dictionary as an element of our empty list
    for item in all:
        d={}
        id_count=0 ## We create an id for each element and we increase it in every iteration
        d["ID"]= id_count+1
        d["Price"]=item.find("h4", {"class": "propPrice"}).text.replace("\n","").replace(" ","")
        d["Adress"]=item.find_all("span", {"class": "propAddressCollapse"})[0].text

## In this section we use try/except because some properties has all the characteristics information but for some of them, there is no info available
## so we use this to prevent our code from breaking or causing error when doesn't find the appropiate characteristic in the html
        try:
            d["Locality"]=item.find_all("span", {"class": "propAddressCollapse"})[1].text
        except:
            d["Locality"]=None
        try:
            d["Beds"]=item.find_all("span", {"class": "infoBed"})[0].text
        except:
            d["Beds"]=None
        try:
            d["Full Baths"]=item.find_all("span", {"class": "infoValueFullBath"})[0].text
        except:
            d["Full Baths"]=None
        try:
            d["Area (SqFt)"]=item.find_all("span", {"class": "infoSqFt"})[0].text
        except:
            d["Area (SqFt)"]=None
        try:
            d["Half Baths"]=item.find_all("span", {"class": "infoValueHalfBath"})[0].text
        except:
            d["Half Baths"]=None

## Here we create another iteration to obtain different values in the columnGroup section of the html, here is listed distinct type of features
        for column_group in item.find_all("div", {"class": "columnGroup"}):
            #print(column_group)
            for feature_group, feature_name in zip(column_group.find_all("span",{"class": "featureGroup"}), column_group.find_all("span",{"class":"featureName"})):
                d[feature_group.text.replace(":","").replace(" ","").replace("\xa0","")]=feature_name.text
        lista.append(d)
        print(lista)

[{'ID': 1, 'Price': '$725,000', 'Adress': '0 Gateway', 'Locality': 'Rock Springs, WY 82901', 'Beds': None, 'Full Baths': None, 'Area (SqFt)': None, 'Half Baths': None, 'ArchitectureStyle': 'Other', 'RoofType': 'Unknown'}]
[{'ID': 1, 'Price': '$725,000', 'Adress': '0 Gateway', 'Locality': 'Rock Springs, WY 82901', 'Beds': None, 'Full Baths': None, 'Area (SqFt)': None, 'Half Baths': None, 'ArchitectureStyle': 'Other', 'RoofType': 'Unknown'}, {'ID': 1, 'Price': '$452,900', 'Adress': '1003 Winchester Blvd.', 'Locality': 'Rock Springs, WY 82901', 'Beds': '4 Beds', 'Full Baths': '4 Full Baths', 'Area (SqFt)': None, 'Half Baths': None, 'Age': 'New Construction', 'Appliances': 'Dishwasher, ', 'Basement': 'Finished', 'BathFeatures': 'Stall Shower and Tub, ', 'Cooling': 'Central A/C', 'Exterior': 'Thermal Windows / Doors', 'ExteriorDescription': 'Other, ', 'ExteriorLivingSpace': 'Deck', 'FireplaceCount': '2 Fireplaces', 'FireplaceDescription': 'Gas', 'Flooring': 'Hardwood, ', 'GarageCount': '3 C

In [7]:
first_df=pd.DataFrame(lista)
first_df

Unnamed: 0,ID,Price,Adress,Locality,Beds,Full Baths,Area (SqFt),Half Baths,ArchitectureStyle,RoofType,...,Pre-Wiring,BodyofWater,Location,Views,Zoning,AreaDescription,SpecialMarket,Fireplace,CommunityType,LeaseRequirements
0,1,"$725,000",0 Gateway,"Rock Springs, WY 82901",,,,,Other,Unknown,...,,,,,,,,,,
1,1,"$452,900",1003 Winchester Blvd.,"Rock Springs, WY 82901",4 Beds,4 Full Baths,,,,,...,,,,,,,,,,
2,1,"$396,900",600 Talladega,"Rock Springs, WY 82901",5 Beds,3 Full Baths,"3,154 Sq. Ft",,Ranch,Unknown,...,,,,,,,,,,
3,1,"$389,900",3239 Spearhead Way,"Rock Springs, WY 82901",4 Beds,3 Full Baths,"3,076 Sq. Ft",1 Half Bath,,,...,Garage Door,,,,,,,,,
4,1,"$254,000",522 Emerald Street,"Rock Springs, WY 82901",3 Beds,3 Full Baths,"1,172 Sq. Ft",,,,...,,Reservoir,,,,,,,,
5,1,"$252,900",1302 Veteran's Drive,"Rock Springs, WY 82901",4 Beds,2 Full Baths,"1,932 Sq. Ft",,,,...,,,,,,,,,,
6,1,"$210,000",1021 Cypress Cir,"Rock Springs, WY 82901",4 Beds,3 Full Baths,"1,676 Sq. Ft",,,,...,,,Cul-de-sac,Mountain,R-1,,,,,
7,1,"$209,000",913 Madison Dr,"Rock Springs, WY 82901",3 Beds,2 Full Baths,"1,344 Sq. Ft",,,,...,"Cable,",,,Mountain,,,,,,
8,1,"$199,900",1344 Teton Street,"Rock Springs, WY 82901",3 Beds,2 Full Baths,"1,920 Sq. Ft",,,,...,,,,Mountain,,,,,,
9,1,"$196,900",4 Minnies Lane,"Rock Springs, WY 82901",3 Beds,2 Full Baths,"1,664 Sq. Ft",,,,...,,Reservoir,,Mountain,Residential,"Country Living,",Vacation / Second Home,,,


In [8]:
## Finally we save the data acquired as a .csv file
first_df.to_csv("Output.csv")