# Plane Prices
My objective is to collect plane prices as a function of time and model.
My datasource is [Trade-A-Plane](https://www.trade-a-plane.com). I am interested in
Vans RV-10, Cessna 182, and all Maules. I want to create a table with the following
fields:

1. Year
2. Manufacturer
3. Model
4. TTAF
5. SMOH
6. Price
7. Price-Date



In [2]:
import requests
website = 'https://trade-a-plane.com'
response = requests.get(website)
response.text

'<!DOCTYPE html>\n<html lang="en" xml:lang="en">\n   <head>\n      <meta charset="utf-8" />\n      <!-- Bing Validation Code for IMI -->\n      <meta name="msvalidate.01" content="A4B7B943CDF6A7FAA1B0EAF74D239B87" />\n      <!-- Bing Validation Code for Wayne -->\n      <meta name="msvalidate.01" content="01B9B0607ECD095E9C4EE1EC818E107C" />\n      <!-- Always force latest IE rendering engine (even in intranet) & Chrome Frame\n         Remove this if you use the .htaccess -->\n      <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" />\n      <meta http-equiv="cache-control" content="" />\n      <meta http-equiv="expires" content="0" />\n      <title>Search For Aircraft & Aircraft Parts - Airplane Sale, Jets, Helicopters, UAVs, Drones, & Aviation Real Estate | Trade-A-Plane</title>\n      <meta name="viewport" content="width=device-width, initial-scale=1">\n      <meta property="og:title" content="Search For Aircraft & Aircraft Parts - Airplane Sale, Jets, Helicopters, UAVs,

The above requests the target homepage. We can't magically get all the information
we want from the response. We need to find a way to filter to only the data we want.
Let us look for Cessna 182 type aircraft.

In [3]:
cessna_182_cat = 'https://www.trade-a-plane.com/search?category_level1=Single+Engine+Piston&make=CESSNA&model_group=CESSNA+182+SERIES&s-type=aircraft'
test_response = requests.get(cessna_182_cat).text

[Beautiful Soup](https://beautiful-soup-4.readthedocs.io/en/latest/) is designed for web scraping. That's exactly what we are doing.

In [4]:
from IPython.display import HTML
from bs4 import BeautifulSoup

soup = BeautifulSoup(test_response)
with open('test_page.html', 'wt') as f:
    f.write(soup.prettify())
    
def is_listing_result(tag):
    """
    True if the node is a result listing.
    
    Result listings are <div> tags with `class="result_listing"
    """
    if not tag.name == 'div':
        return False
    if not tag.has_attr('class'):
        return False
    classes = tag['class']
    return ('result_listing' in classes
     and 'result' in classes)
filtered = soup.find_all(is_listing_result)
filtered[0]

<div class="result_listing result gold" data-cat="Single Engine Piston" data-listing_id="2399126" data-model_group="CESSNA 182 SERIES" data-seller_id="49743">
<div class="row">
<div class="col-md-12 col-sm-12">
<div class="lst-title">
<h3>
<a class="fl_main_a log_listing_click" data-listing_id="2399126" data-seller_id="49743" href="/search?category_level1=Single+Engine+Piston&amp;make=CESSNA&amp;model=182Q+SKYLANE&amp;listing_id=2399126&amp;s-type=aircraft" id="title" onclick="dataLayer.push({'sellerId' : '49743', 'userId' : '', 'listingId' : '2399126', 'page' : 'results', 'listingType' : 'aircraft' });">
									 1977 CESSNA 182Q SKYLANE
								</a>
</h3>
</div>
<div class="img_area">
<a class="fl_main_a log_listing_click" data-listing_id="2399126" data-seller_id="49743" href="/search?category_level1=Single+Engine+Piston&amp;make=CESSNA&amp;model=182Q+SKYLANE&amp;listing_id=2399126&amp;s-type=aircraft" id="title" onclick="dataLayer.push({'sellerId' : '49743', 'userId' : '', 'listing

In [5]:
with open('filtered_0.html', 'wt') as f:
    for tag in filtered[0]:
        f.write(str(tag))

For each `result_listing`, there is a descendent tag of `<p class="description">`. It contains a link to obtain more information. We want this information. Find the child `<a class="log_listing_click" href="url/to/detail/page">`.

In [6]:
# Drill down to the link
# The whole description
description = filtered[0].find(name='p', class_='description')
display(description)
# Just the anchor tag
detail_link = description.select('a.log_listing_click')[0]
display(detail_link)
# Just the href
detail_link['href']

<p class="description">*News Alert* High demand highly desirable Q model has hit the market!!! 

I am listing this aircraft for a friend, I was a previous owner in 2017 and pas... <a class="fl_main_a log_listing_click" data-listing_id="2399126" data-seller_id="49743" href="/search?category_level1=Single+Engine+Piston&amp;make=CESSNA&amp;model=182Q+SKYLANE&amp;listing_id=2399126&amp;s-type=aircraft" onclick="dataLayer.push({'sellerId' : '49743', 'userId' : '', 'listingId' : '2399126', 'page' : 'results', 'listingType' : 'aircraft' });">More Info</a></p>

<a class="fl_main_a log_listing_click" data-listing_id="2399126" data-seller_id="49743" href="/search?category_level1=Single+Engine+Piston&amp;make=CESSNA&amp;model=182Q+SKYLANE&amp;listing_id=2399126&amp;s-type=aircraft" onclick="dataLayer.push({'sellerId' : '49743', 'userId' : '', 'listingId' : '2399126', 'page' : 'results', 'listingType' : 'aircraft' });">More Info</a>

'/search?category_level1=Single+Engine+Piston&make=CESSNA&model=182Q+SKYLANE&listing_id=2399126&s-type=aircraft'

Navigate to the link and get a new page.

In [7]:
website = 'https://trade-a-plane.com'
detail_url = website + detail_link['href']
print('Getting page {}'.format(detail_url))
detail_tree = BeautifulSoup(requests.get(detail_url).text)
with open('aircraft-detail.html', 'wt') as f:
    f.write(detail_tree.prettify())

Getting page https://trade-a-plane.com/search?category_level1=Single+Engine+Piston&make=CESSNA&model=182Q+SKYLANE&listing_id=2399126&s-type=aircraft


In [9]:
import attr
from datetime import datetime
@attr.s
class AircraftSaleEntry:
    """
    A data class for aircraft sale information
    
    This is the main type of obejct that I wish to collect. I want to index
    aircraft sales entries. I want to record what is for sale, when, and for
    how much.
    """
    url: str = attr.ib()
    make_model: str = attr.ib()
    price: float = attr.ib()
    registration: str = attr.ib()
    description: str = attr.ib()
    search_date: datetime = attr.ib()
    ttaf: float = attr.ib()
    smoh: float = attr.ib()

AircraftSaleEntry(url='https://www.trade-a-plane.com/search?category_level1=Single+Engine+Piston&make=CESSNA&model=182Q+SKYLANE&listing_id=2400626&s-type=aircraft',
                  price=15000,
                  make_model='CESSNA 182Q SKYLANE',
                  registration='N735GS',
                  description='1977 Cessna 182Q Skylane, 3461TT, 798 SMOH, 483 SPOH, Garmin GTN 430W, Stratus ES ADS-B Out Transponder (ADS-B In WiFI Traffic and Wx Link to IPad (Foreflight), Narco Mark 12D, Garmin GMA 340, Bendix King KI206, JPI EGT-701 Engine Monitor, Horton STOL Kit (Leading Edge Cuff, Droop Wing Tips, Stall Fences), Rosen Sun Visors, Standby Altimeter, & More!',
                  search_date=datetime(2021, 12, 12, 11, 53),
                  ttaf=0,
                  smoh=0)

AircraftSaleEntry(url='https://www.trade-a-plane.com/search?category_level1=Single+Engine+Piston&make=CESSNA&model=182Q+SKYLANE&listing_id=2400626&s-type=aircraft', make_model='CESSNA 182Q SKYLANE', price=15000, registration='N735GS', description='1977 Cessna 182Q Skylane, 3461TT, 798 SMOH, 483 SPOH, Garmin GTN 430W, Stratus ES ADS-B Out Transponder (ADS-B In WiFI Traffic and Wx Link to IPad (Foreflight), Narco Mark 12D, Garmin GMA 340, Bendix King KI206, JPI EGT-701 Engine Monitor, Horton STOL Kit (Leading Edge Cuff, Droop Wing Tips, Stall Fences), Rosen Sun Visors, Standby Altimeter, & More!', search_date=datetime.datetime(2021, 12, 12, 11, 53), ttaf=0, smoh=0)