# AUTOVILLAGE CAR DATA SCRAPING FOR BUYING USING BEAUTIFUL SOUP
In this project, I scraped data from waterstones website using BeautifulSoup. I will use this data for future analysis.

For scraping data from this website, I'll perform the following tasks:

[**Task 1**](#task1): Importing the libraries

[**Task 2**](#task2): Creating the base url and choosing the header

[**Task 3**](#task3): Extracting product links on the first page

[**Task 4**](#task4): Extracting product links on all the pages

[**Task 5**](#task5): Extracting information of the first product

[**Task 6**](#task6): Extracting information of all the products

<a id='task1'></a>
# Task 1: Importing the libraries

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

<a id='task2'></a>
# Task 2: Creating the base url and choosing the header

In [2]:
base_url = 'https://www.autovillage.co.uk/'
header = {
    'user-agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36"
}

<a id='task3'></a>
# Task 3: Extracting product links on the first page

In [3]:
source = requests.get('https://www.autovillage.co.uk/used-car/filter/postcode/tw18/sort/distance', headers = header)
soup = BeautifulSoup(source.content, 'lxml')


In [6]:
productlist = soup.find_all('div', class_ = 'alink')
print(productlist)

[<div class="alink"><a href="/used-car/ford/fiesta/mtz_48199_58792205">
Ford Fiesta 1.1 Zetec 5dr ** Ford SYNC Satellite Navigation ** Manual</a>
</div>, <div class="alink"><a href="/used-car/citroen/c3/mtz_43415_58923965">
Citroen C3 1.6 i 16v Exclusive 5dr</a>
</div>, <div class="alink"><a href="/used-car/ssangyong/korando/mtz_88232_57904259">
Ssangyong Korando 1.6D ULTIMATE AUTO 4X4</a>
</div>, <div class="alink"><a href="/used-car/volkswagen/golf/mtz_97968_58678796">
Volkswagen Golf 1.6 TDI SE 5dr DSG</a>
</div>, <div class="alink"><a href="/used-car/volkswagen/polo/mtz_81381_58843511">
Volkswagen Polo 1.2 60 Match Edition 5dr</a>
</div>, <div class="alink"><a href="/used-car/toyota/prius/mtz_54587_57372315">
Toyota Prius VVT-I T SPIRIT HYBRID Auto</a>
</div>, <div class="alink"><a href="/used-car/ford/kuga/mtz_48199_58528817">
Ford Kuga 2.0 TDCi 150 Zetec 5dr Powershift ** Sync Emergency System ** Automatic</a>
</div>, <div class="alink"><a href="/used-car/ford/focus/mtz_57

In [7]:
for item in productlist:
    for link in item.find_all('a', href = True):
        print(link['href'])
    
    

/used-car/ford/fiesta/mtz_48199_58792205
/used-car/citroen/c3/mtz_43415_58923965
/used-car/ssangyong/korando/mtz_88232_57904259
/used-car/volkswagen/golf/mtz_97968_58678796
/used-car/volkswagen/polo/mtz_81381_58843511
/used-car/toyota/prius/mtz_54587_57372315
/used-car/ford/kuga/mtz_48199_58528817
/used-car/ford/focus/mtz_57339_57605467
/used-car/toyota/iq/mtz_85923_58748821
/used-car/volkswagen/touran/mtz_68983_58392705


<a id='task4'></a>
# Task 4: Extracting product links on all the pages

In [14]:
productlinks = []
for i in range(1,20):
    source = requests.get(f'https://www.autovillage.co.uk/used-car/page/{i}/filter/postcode/tw18/sort/distance')
    soup = BeautifulSoup(source.content, 'lxml')
    productlist = soup.find_all('div', class_ = 'alink')
    for item in productlist:
        for link in item.find_all('a', href = True):
            productlinks.append(base_url + link['href'])
    
print(len(productlinks))

190


<a id='task5'></a>
# Task 5: Extracting information of the first product

In [15]:
testlink = 'https://www.autovillage.co.uk/used-car/ford/fiesta/mtz_48199_58792205'
source = requests.get(testlink, headers = header)
soup = BeautifulSoup(source.content, 'lxml')

In [16]:
name = soup.find('div', class_ = 'fl pb10').text
print(name)

Ford Fiesta


In [61]:
model = soup.find('h1').text
print(model)

Used Ford Fiesta 1.1 Zetec 5dr ** Ford SYNC Satellite Navigation ** Manual in Staines


In [102]:
price_gbp = soup.find('div', class_ = 'text-right pb10').text.strip().replace('£','')
print(price_gbp)

9,400


In [89]:
result=[]
for div in soup.find('div', class_ = 'vehiclesummary').find_all('div', class_ = 'fl'):
    result.append(div.text)
print(result)


['Manual', 'Reg:\xa0LD67XHU', '1084cc Engine', '7,161 Miles', 'Hatchback', 'Petrol', '5 Doors', 'Shadow Black (Premium Colour)', '\r\n    2017\r\n                        ', '£9,400']


In [104]:
type_ = result[0]
registration = result[1].replace('Reg:','')
engine = result[2].replace('Engine','')
distance_miles = result[3].replace('Miles','')
rear_door_config = result[4]
fuel = result[5]
no_of_doors = result[6].replace('Doors','')
colour = result[7]
year = result[8].strip()
print(type_, registration,engine,distance_miles, rear_door_config,fuel,no_of_doors,colour,year)

Manual  LD67XHU 1084cc  7,161  Hatchback Petrol 5  Shadow Black (Premium Colour) 2017


In [113]:
agent_name = soup.find('div', class_ = 'dName').text
print(agent_name)

TrustFord Staines


In [114]:
agent_contact = soup.find('a', class_ = 'tel')
con = agent_contact['href'].replace('tel:','')
print(con)

01217569048


In [117]:
car = {
    'Name': name,
    'Model':model,
    'Price in GBP': price_gbp,
    'Type': type_,
    'Registration':registration,
    'Engine': engine,
    'Distance in miles': distance_miles,
    'Rear Door Configuration': rear_door_config,
    'Fuel': fuel,
    'No of doors':no_of_doors,
    'Colour': colour,
    'Year': year,
    'Dealer name': agent_name,
    'Dealer_contact': con
}
print(car)

{'Name': 'Ford Fiesta', 'Model': 'Used Ford Fiesta 1.1 Zetec 5dr ** Ford SYNC Satellite Navigation ** Manual in Staines', 'Price in GBP': '9,400', 'Type': 'Manual', 'Registration': '\xa0LD67XHU', 'Engine': '1084cc ', 'Distance in miles': '7,161 ', 'Rear Door Configuration': 'Hatchback', 'Fuel': 'Petrol', 'No of doors': '5 ', 'Colour': 'Shadow Black (Premium Colour)', 'Year': '2017', 'Dealer name': 'TrustFord Staines', 'Dealer_contact': '01217569048'}


<a id='task6'></a>
# Task 6: Extracting information of all the products

In [119]:
carlist=[]
for link in productlinks:
    source = requests.get(link, headers = header)
    soup = BeautifulSoup(source.content, 'lxml')
    name = soup.find('div', class_ = 'fl pb10').text
    model = soup.find('h1').text
    price_gbp = soup.find('div', class_ = 'text-right pb10').text.strip().replace('£','')
    result=[]
    for div in soup.find('div', class_ = 'vehiclesummary').find_all('div', class_ = 'fl'):
        result.append(div.text)
    try:
        type_ = result[0]
    except: 
        type_= 'No info'
    try:
        registration = result[1].replace('Reg:','')
    except: 
        registration = 'no info'
    try:
        engine = result[2].replace('Engine','')
    except: 
        engine = 'No info'
    try: 
        distance_miles = result[3].replace('Miles','')
    except:
        distance_miles = 'No info'
    try:
        rear_door_config = result[4]
    except:
        rear_door_config = 'No info'
        
    try:
        fuel = result[5]
    except:
        fuel = 'No info'
        
    try:
        no_of_doors = result[6].replace('Doors','')
    except:
        no_of_doors = 'No info'
        
    try:
        colour = result[7]
    except:
        colour = 'No info'
        
    try:
        year = result[8].strip()
    except:
        year = 'No info'
        
    agent_name = soup.find('div', class_ = 'dName').text
    agent_contact = soup.find('a', class_ = 'tel')
    con = agent_contact['href'].replace('tel:','')
    
    car = {
    'Name': name,
    'Model':model,
    'Price in GBP': price_gbp,
    'Type': type_,
    'Registration':registration,
    'Engine': engine,
    'Distance in miles': distance_miles,
    'Rear Door Configuration': rear_door_config,
    'Fuel': fuel,
    'No of doors':no_of_doors,
    'Colour': colour,
    'Year': year,
    'Dealer name': agent_name,
    'Dealer_contact': con
    }
    carlist.append(car)
    

  

In [122]:
df = pd.DataFrame(carlist)
df.head()

Unnamed: 0,Name,Model,Price in GBP,Type,Registration,Engine,Distance in miles,Rear Door Configuration,Fuel,No of doors,Colour,Year,Dealer name,Dealer_contact
0,Ford Fiesta,Used Ford Fiesta 1.1 Zetec 5dr ** Ford SYNC Sa...,9400,Manual,LD67XHU,1084cc,7161,Hatchback,Petrol,5,Shadow Black (Premium Colour),2017,TrustFord Staines,1217569048
1,Citroen C3,Used Citroen C3 1.6 i 16v Exclusive 5dr in Sta...,3290,Manual,HV09WTZ,1587cc,41000,Hatchback,Petrol,5,Silver,2009,Reeds Motors Ltd,1784861954
2,Ssangyong Korando,Used Ssangyong Korando 1.6D ULTIMATE AUTO 4X4 ...,24495,Automatic,RE69YDL,1597cc,6500,Estate,Diesel,5,Dandy Blue,2019,Ian Allan Motors,1344932582
3,Volkswagen Golf,Used Volkswagen Golf 1.6 TDI SE 5dr DSG in Fel...,13995,Manual,RX67WZC,1598cc,12000,Hatchback,Diesel,5,Red,2017,Hanworth Motors,2081312540
4,Volkswagen Polo,Used Volkswagen Polo 1.2 60 Match Edition 5dr ...,7299,Manual,NJ63NZC,1198cc,9800,Hatchback,Petrol,5,Red,2013,Trust Motorcars Ltd,2081312336
