<center><h1>Web Scraping Kijiji</h1><h3>Using Python and Beautiful Soup</h3></center>

In [3]:
import pandas as pd
from IPython.display import HTML
from bs4 import BeautifulSoup
import urllib.request as request
from ipywidgets import interact
pd.set_option("display.max_rows",1000)
pd.set_option("display.max_columns",20)
pd.set_option("display.max_colwidth", 200)

### For this exercise, I will only be scraping the Toronto listings

In [4]:
base_url = 'http://www.kijiji.ca'
toronto_url = 'http://www.kijiji.ca/h-city-of-toronto/1700273'
html_kijiji = request.urlopen(toronto_url)

soup_kijiji = BeautifulSoup(html_kijiji, 'lxml')

Since I will be creating a drop-down widget containing all the listing categories, I looked at the source page to find where I can find a complete list of all available categories.  I found that I need to grab all the **&lt;a&gt;** elements that have a class attribute equals to **"category-selected"**

In [5]:
div_categories = soup_kijiji.find_all('a', class_='category-selected')

### Let's look at the first 20 rows of the category list:

In [6]:
div_categories[:20]

[<a class="category-selected" data-id="10" href="/b-buy-sell/city-of-toronto/c10l1700273">buy and sell</a>,
 <a class="category-selected" data-id="72" href="/b-services/city-of-toronto/c72l1700273">services</a>,
 <a class="category-selected" data-id="27" href="/b-cars-vehicles/city-of-toronto/c27l1700273">cars &amp; vehicles</a>,
 <a class="category-selected" data-id="112" href="/b-pets/city-of-toronto/c112l1700273">pets</a>,
 <a class="category-selected" data-id="800" href="/b-vacation-rentals/c800l1700273">vacation rentals</a>,
 <a class="category-selected" data-id="1" href="/b-community/city-of-toronto/c1l1700273">community</a>,
 <a class="category-selected" data-id="34" href="/b-real-estate/city-of-toronto/c34l1700273">real estate</a>,
 <a class="category-selected" data-id="45" href="/b-jobs/city-of-toronto/c45l1700273">jobs</a>,
 <a class="category-selected" data-id="218" href="/b-resumes/city-of-toronto/c218l1700273">resumes</a>,
 <a class="category-selected" data-id="63" href="/

From above, we can see the category listings over on the right: "buy and sell", "services", "cars and vehicles", etc

### Now I will create a Python dictionary to map all the category listings to its respective URL

In [7]:
categories = {}
for item in div_categories:
    categories[item.get_text()] = base_url + item['href']

### Let's see what the dictionary looks like:

In [8]:
categories

{'(more categories...)': 'http://www.kijiji.ca/b-resumes/city-of-toronto/c218l1700273',
 'ATVs, snowmobiles': 'http://www.kijiji.ca/b-atv-snowmobile/city-of-toronto/c171l1700273',
 'Canada': 'http://www.kijiji.ca/b-vacation-rentals-canada/c801l1700273',
 'Caribbean': 'http://www.kijiji.ca/b-vacation-rentals-caribbean/c803l1700273',
 'Mexico': 'http://www.kijiji.ca/b-vacation-rentals-mexico/c804l1700273',
 'Other Countries': 'http://www.kijiji.ca/b-vacation-rentals-other-countries/c805l1700273',
 'RVs, campers, trailers': 'http://www.kijiji.ca/b-rv-camper-trailer/city-of-toronto/c172l1700273',
 'SUVs': 'http://www.kijiji.ca/b-cars-trucks/city-of-toronto/suv+crossover/c174l1700273a138',
 'USA': 'http://www.kijiji.ca/b-vacation-rentals-usa/c802l1700273',
 'accessories': 'http://www.kijiji.ca/b-pet-accessories/city-of-toronto/c115l1700273',
 'accounting, mgmt': 'http://www.kijiji.ca/b-accounting-management-jobs/city-of-toronto/c58l1700273',
 'activities, groups': 'http://www.kijiji.ca/b-ac

### Now, I will make a list of just the keys from the category dictionary I created earlier:

In [9]:
category_list = [key for key in categories.keys()]

### Now the fun part.  Below, I am using Jupyter interact decorator to create a drop-down widget.  I will pass the category_list that I just made to it.  The rest of the function is grabbing the listing's image url, title, description, price, etc., from which I will make a pandas data frame which is then outputted to the screen.

Unfortunately, this does NOT work for all listing categories, since some categories may not have a title or description, or some other reason.  But it works for most categories, especially categories that entail selling items.

In [10]:
@interact
def kijiji_listings(category = sorted(category_list)):
    html_cars = request.urlopen(categories[category])

    soup_cars = BeautifulSoup(html_cars, 'lxml')
    
    #tables = soup_cars.find_all('table',  class_ = re.compile('regular-ad|top-'))
    tables = soup_cars.find_all('table')
    
    img_urls = []
    for table in tables[1:]:
        for row in table.find_all('td', class_='image'):
            try:
                img_urls.append("<img src='" + row.div.img['src'] + "'>")
            except:
                img_urls.append("<img src='" + row.img['src'] + "'>")
                
    titles = []
    for table in tables[1:]:
        for row in table.find_all('td', class_='description'):
            titles.append(row.a.get_text().strip())
            
    comments = []
    for table in tables[1:]:
        for row in table.find_all('td', class_='description'):
            comments.append(row.p.get_text().strip())
            
    details = []
    for table in tables[1:]:
        for row in table.find_all('td', class_='description'):
            for item in row.find_all('p', class_='details'):
                details.append(item.get_text().strip())
    
    prices = []
    for table in tables[1:]:
        for row in table.find_all('td', class_='price'):
            prices.append(row.get_text().strip())
            
    df = pd.DataFrame({'Price':prices, 'Image':img_urls, 'Title':titles, 'Comment':comments, 'Details':details})
    # Arrange the columns in a certain order
    df = df[['Image','Title','Comment','Details','Price']]

    
    return HTML(df.to_html(escape=False))  # if escape is set to True, the images won't be rendered

Unnamed: 0,Image,Title,Comment,Details,Price
0,,2010 Ford Fusion SEL V6 AWD-LEATHER-ROOF-1 OWNER-CLEAN CARPOOF,Benchmark Automotive is pleased to offer this low kms Ford Fusion SEL....3.0 Litre V6 AWD....1 private owner since new....Local Ontario Canadian Vehicle....Clean Carproof...No Accidents...No…,64000km | Automatic,"$14,995.00"
1,,"2013 Ford Focus 3 STANDARD CARS $3,799-$12,999 Sedan","Wow 3 standard cars available. 2013 Ford Focus SE 17,000 km full factory warranty. 2007 Volkswagon Golf 82,000 km $6,499, 2004 Acura EL 247,000 km $3,799 They are nice and clean in and out. Looks,…",17000km | Manual,Please Contact
2,,2012 Mercedes-Benz E350 BlueTEC Nav Leather PanoSunroof Backup C,Canadian car with Clean CarProof! Loaded with: BlueTEC Diesel! Navigation! Leather Interior! Panoramic Sunroof! Backup Camera! Xenon HID Headlights! LED Accent Lighting! Harman / Kardon Premium ...,71623km | Automatic | CarProof,"$33,449.00"
3,,2008 BMW M M3 CONVERTIBLE 6 SPD,"This beautiful 2008 BMW M3 Convertible comes equipped with a Manual Transmission, Keyless Entry, Keyless Start, Leather Interior, Power Memory Seats, Navigation, Heated Seats, Bluetooth, M Sport…",77000km | Manual,"$38,910.00"
4,,2008 Mazda MAZDA3 AUTO!!! LOADED!!! ROOF!!! ALLOYS!!!,"AUTOMATIC!!! LOADED!!! SUNROOF!!! POWER WINDOWS, POWER LOCKS, A/C, ALLOYS!!! ................................................................................................ WE ARE EXCLUSIVE TO ...",216000km | Automatic,"$3,990.00"
5,,2004 Ford Taurus SEL Sedan,"Original owner, no accidents. Well maintained, in excellent condition. Certified and Drive Clean Tested. Tan on tan cloth interior. 6 disc CD changer, premium sound system, sunroof, keyless entry…",82000km | Automatic,"$4,600.00"
6,,Cadillac STS,"32northstar motor really strong engin NO LEAKS. Interior is absolutely MINT,MINT ,everything works perfect, car needs brakes , basic brake job pads front& rear that's it !!",230000km | Automatic,"$1,500.00"
7,,2006 Chevrolet Cobalt LS *AS IS*,Great on gas Well maintained A bit of rust under the driver's door Runs well.,202000km | Automatic,"$1,800.00"
8,,2007 NISSAN MAXIMA SE,"2007 NISSAN MAXIMA SE..GREY ON BLACK LEATHER CAR IS IN GREAT MECHANICAL SHAPE AND SHOW ROOM CONDITION! ONLY 159,000...FULLY LOADED PLEASE CHECK OUR WEB AT WWW.CFPOC.CA OVER 50 GOOD QUALITY USED CARS…",159123km | Automatic,"$8,999.00"
9,,2009 PONTIAC VIBE LT***NO ACCIDENTS***LIKE NEW** SUNROOF & MORE!,"AUTO LINK MOTORS PROUD OMVIC & UCDA CERTIFIED DEALERSHIP 1614 Military Trail, Toronto. (647)618-8256 {2009 Pontiac Vibe Hatchback} >Quick & Fuel Efficient >CarProof Reports No Accident Vehicle, ...",172000km | Manual,"$5,995.00"


<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>