<center><h1>Web Scraping Kijiji</h1><h3>Using Python and Beautiful Soup</h3></center>

In [41]:
import pandas as pd
from IPython.display import HTML
from bs4 import BeautifulSoup
import urllib.request as request
from ipywidgets import interact
pd.set_option("display.max_rows",1000)
pd.set_option("display.max_columns",20)
pd.set_option("display.max_colwidth", 200)

### For this exercise, I will only be scraping the Toronto listings

In [42]:
base_url = 'http://www.kijiji.ca'
toronto_url = 'http://www.kijiji.ca/h-city-of-toronto/1700273'
html_kijiji = request.urlopen(toronto_url)

soup_kijiji = BeautifulSoup(html_kijiji, 'lxml')

Since I will be creating a drop-down widget containing all the listing categories, I looked at the source page to find where I can find a complete list of all available categories.  I found that I need to grab all the **&lt;a&gt;** elements that have a class attribute equals to **"category-selected"**

In [43]:
div_categories = soup_kijiji.find_all('a', class_='category-selected')

### Let's look at the first 20 rows of the category list:

In [44]:
div_categories[:20]

[<a class="category-selected" data-id="10" href="/b-buy-sell/city-of-toronto/c10l1700273">buy and sell</a>,
 <a class="category-selected" data-id="72" href="/b-services/city-of-toronto/c72l1700273">services</a>,
 <a class="category-selected" data-id="27" href="/b-cars-vehicles/city-of-toronto/c27l1700273">cars &amp; vehicles</a>,
 <a class="category-selected" data-id="112" href="/b-pets/city-of-toronto/c112l1700273">pets</a>,
 <a class="category-selected" data-id="800" href="/b-vacation-rentals/c800l1700273">vacation rentals</a>,
 <a class="category-selected" data-id="1" href="/b-community/city-of-toronto/c1l1700273">community</a>,
 <a class="category-selected" data-id="34" href="/b-real-estate/city-of-toronto/c34l1700273">real estate</a>,
 <a class="category-selected" data-id="45" href="/b-jobs/city-of-toronto/c45l1700273">jobs</a>,
 <a class="category-selected" data-id="218" href="/b-resumes/city-of-toronto/c218l1700273">resumes</a>,
 <a class="category-selected" data-id="63" href="/

From above, we can see the category listings over on the right: "buy and sell", "services", "cars and vehicles", etc

### Now I will create a Python dictionary to map all the category listings to its respective URL

In [45]:
categories = {}
for item in div_categories:
    categories[item.get_text()] = base_url + item['href']

### Let's see what the dictionary looks like:

In [46]:
categories

{'(more categories...)': 'http://www.kijiji.ca/b-resumes/city-of-toronto/c218l1700273',
 'ATVs, snowmobiles': 'http://www.kijiji.ca/b-atv-snowmobile/city-of-toronto/c171l1700273',
 'Canada': 'http://www.kijiji.ca/b-vacation-rentals-canada/c801l1700273',
 'Caribbean': 'http://www.kijiji.ca/b-vacation-rentals-caribbean/c803l1700273',
 'Mexico': 'http://www.kijiji.ca/b-vacation-rentals-mexico/c804l1700273',
 'Other Countries': 'http://www.kijiji.ca/b-vacation-rentals-other-countries/c805l1700273',
 'RVs, campers, trailers': 'http://www.kijiji.ca/b-rv-camper-trailer/city-of-toronto/c172l1700273',
 'SUVs': 'http://www.kijiji.ca/b-cars-trucks/city-of-toronto/suv+crossover/c174l1700273a138',
 'USA': 'http://www.kijiji.ca/b-vacation-rentals-usa/c802l1700273',
 'accessories': 'http://www.kijiji.ca/b-pet-accessories/city-of-toronto/c115l1700273',
 'accounting, mgmt': 'http://www.kijiji.ca/b-accounting-management-jobs/city-of-toronto/c58l1700273',
 'activities, groups': 'http://www.kijiji.ca/b-ac

### Now, I will make a list of just the keys from the category dictionary I created earlier:

In [47]:
category_list = [key for key in categories.keys()]

### Now the fun part.  Below, I am using Jupyter interact decorator to create a drop-down widget.  I will pass the category_list that I just made to it.  The rest of the function is grabbing the listing's image url, title, description, price, etc., from which I will make a pandas data frame which is then outputted to the screen.

Unfortunately, this does NOT work for all listing categories, since some categories may not have a title or description, or some other reason.  But it works for most categories, especially categories that entail selling items.

In [58]:
pages = ['page-1', 'page-2', 'page-3', 'page-4', 'page-5', 'page-6', 'page-7', 'page-8', 'page-9']

@interact
def kijiji_listings(category = sorted(category_list), page = sorted(pages)):
    if page == 'page-1':
        print(categories[category])
        html_cars = request.urlopen(categories[category])
    else:
        url = categories[category]
        last_forward_slash = url.rfind('/')
        beginning_url = url[:last_forward_slash+1]
        ending_url = url[forward_slash:]
        print(beginning_url + page + ending_url)
        html_cars = request.urlopen(beginning_url + page + ending_url)

    soup_cars = BeautifulSoup(html_cars, 'lxml')
    
    #tables = soup_cars.find_all('table',  class_ = re.compile('regular-ad|top-'))
    tables = soup_cars.find_all('table')
    
    img_urls = []
    for table in tables[1:]:
        for row in table.find_all('td', class_='image'):
            try:
                img_urls.append("<img src='" + row.div.img['src'] + "'>")
            except:
                img_urls.append("<img src='" + row.img['src'] + "'>")
                
    titles = []
    for table in tables[1:]:
        for row in table.find_all('td', class_='description'):
            titles.append(row.a.get_text().strip())
            
    comments = []
    for table in tables[1:]:
        for row in table.find_all('td', class_='description'):
            comments.append(row.p.get_text().strip())
            
    details = []
    for table in tables[1:]:
        for row in table.find_all('td', class_='description'):
            for item in row.find_all('p', class_='details'):
                details.append(item.get_text().strip())
    
    prices = []
    for table in tables[1:]:
        for row in table.find_all('td', class_='price'):
            try:
                prices.append(float(row.get_text().replace('$','').replace(',','').strip()))
            except:
                prices.append(0.0)
    
    try:
        df = pd.DataFrame({'Price':prices, 'Image':img_urls, 'Title':titles, 'Comment':comments, 'Details':details})
        # Arrange the columns in a certain order
        df = df[['Image','Title','Comment','Details','Price']]
    # Some category listings don't have a price and title, so this script would bomb unless we leave them out
    except:
        df = pd.DataFrame({'Image':img_urls, 'Comment':comments, 'Details':details})
        # Arrange the columns in a certain order
        df = df[['Image','Comment','Details']]

    
    return HTML(df.to_html(escape=False))  # if escape is set to True, the images won't be rendered

http://www.kijiji.ca/b-resumes/city-of-toronto/c218l1700273


Unnamed: 0,Image,Comment,Details
0,,We are a fast growing company looking to hire new owner operators and company drivers. We offer a lease to own program for owner operators Brand new equipment with $0 down Very professional and…,
1,,We are a Multi-residential Property Management Company seeking a full-time Receptionist. The Receptionist will be the first point of contact to our clients and will entail support to our Maintenance…,
2,,"I am looking a full time job from 8:am to 5pm Monday to Friday. I have experienced taking care of children from newly born baby and toddlers or elderly care. I am kind, honest, hardworking,…",
3,,"Toronto Winter Work, Competive sqft/rate. Simple houses. On site Framing Foreman. Zoom boom on site. Draw schedule in place. Contact Foreman for drawings and details 647 218 7200 Starting now.",
4,,"I'm a professional kitchen installer with an experience of over 7 years. Have all the tools, car and WSIB insurance in place. Finished countless custom kitchens installations all over GTA. Able to…",
5,,Professional Photography a videography for all types of your Commercial and residential (Indoor and outdoor) needs including: - Real state photography - Wedding - Engagement - Personal Portrait -…,
6,,"We are reliable , Experienced ,Farst and Fully licensed electrcian covering all kind electrial job all over GTA 24/7. Call us And let us to to the job for you .( Residencial , Commercial, Industrial…",
7,,WE ARE LOOKING FOR EXPERIENCE SWIMING INSTUCTOR OVER 2 YEARS OF EXPERIENCE AS A LIFEGUARD OR SWIM INSTRUCTOR WITH 2 MAJOR ORGANIZATIONS. FOR 2 KIDS AGE OF 7 AND 10 YEARS OLD WE HAVE ACCESS TO A POOL…,
8,,Licensed professional Electrician is looking for jobs such as: -Basement and kitchen renovation/Demolition -Bulb and ballast replacement -Commercial and multi-residential lighting retrofit -Parking…,
9,,"CONTRACTOR, BIG OR SMALL, WE CAN DO IT EXCAVATION WORK AND DEMOLITION, WE HAVE 2 DUMP TRUCK TRI AXLE AND 3 EXCAVATOR, 5, 8 , 25 TON , 1- CASE CX50B 5 TON C/W ENCLOSED CAB W/ HEAT AND MINI EXCAVATO...",


<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>