# Objective: Scrap the Data of New York Housing Prices.

Things to Collect : 
    
1. Price of the House.
2. Number of Beds and Baths.
3. Floorspace ( sq/ft ).
4. Locality of House.
5. Possession and Age.

## Import  Libraries
To Scrap the Data, we will need some tools at our disposal to make the process as seamless as possible. We will not go through all the libraries but will take the time to explain a few.

1. **BeautifulSoup** - This Library is used for Extracting or Scraping Data from the Website.
2. **csv** - To Save the data in the csv format.
3. **selenium** - Selenium is an open-source tool that automates web browsers.

In [1]:
import csv
from bs4 import BeautifulSoup
from selenium import webdriver

## Getting Url 
To get Data,we need the website url from which we have to extract the data which is required for Analysis or Information Purpose.

In [2]:
def get_url(page):   
    url = 'https://www.makaan.com/listings?propertyType=apartment&budget=,100000000&sortBy=popularity&listingType=buy&pageType=LISTINGS_PROPERTY_URLS&cityName=Pune&cityId=21&templateId=MAKAAN_CITY_LISTING_BUY&page={}'.format(page)
    return url

## Getting Output
We will be getting output for each record and will be calling get_output() function from the MoviesEarning()

In [3]:
def get_output(in_put):
    
    try:
        bhk = in_put.find('a',{'data-type' : 'listing-link'}).text.strip().split()[0]
    except AttributeError:
        bhk = ''
        
    try:
        locality = in_put.find('a',{'data-type' : 'localityName'}).text.strip().split(',')[-2]
    except AttributeError:
        locality = ''
        
    try:
        Price = in_put.find('div',{'data-type' : 'price-link'}).text.strip()
    except AttributeError:
        Price = ''

    try:
        sqft = in_put.find('td',{'class' : 'size'}).text.strip()
    except AttributeError:
        sqft = ''
 
    try:
        status = in_put.find('td',{'class' : 'val'}).text.strip()
    except AttributeError:
        status = ''

    try:
        Possession  = in_put.find('li',{'title' : 'Possession by'}).text.strip()
    except AttributeError:
        Possession = ''
    
    try:
        price_per_sqft = in_put.find('td',{'class' : 'lbl rate'}).text.strip()
    except AttributeError:
        price_per_sqft = ''
        
    try:
        old = in_put.find('li',{'title' : 'old'}).text.strip().split()[0]
    except AttributeError:
        old = ''
        
    try:
        bath = in_put.find('li',{'title' : 'Bathrooms'}).text.strip().split()[0]
    except AttributeError:
        bath = ''
                
        
    output = (bhk,locality,Price,sqft,status,Possession,price_per_sqft,old,bath)
    
    return output

In [4]:
def HousingDataScrap():
    output = []
    driver = webdriver.Chrome('D:\\Python Files\\chromedriver')
    for i in range(2001,2905):
        url = get_url(i)
        driver.get(url)
        soup = BeautifulSoup(driver.page_source,'html.parser')
        in_puts  = soup.find_all('li','cardholder')
        for j in in_puts:
            output.append(get_output(j))
    
    with open('Pune_Housing.csv','w',newline='',encoding='utf-8') as file:
        a = csv.writer(file)
        a.writerow(['bhk','locality','Price','sqft','status','Possession','price_per_sqft','Age_old','bath'])
        a.writerows(output)    

In [5]:
HousingDataScrap()

In [1]:
import pandas as pd

df = pd.read_csv('Pune_Housing.csv')

In [3]:
df.shape

(58072, 9)

# CREDIT
*Here,I have used  [**Makaan**](https://www.makaan.com/) to extract the Housing Data of the Pune City,this information is just for the Project Purpose.*