<h1 style='color:blue' align='center'>Scraping Cars' website</h1>

### Scraping from four differenrt links of a website:
- first link contains cars' prices in 2023.
- second link contains cars' prices in 2022.
- third link contains cars' prices in 2021.
- fourth link contains cars' prices in 2020.

### Details:
- cars' names.
- years.
- prices (in Kenya shilling).

### Problem Statement:
- extract the details of all the cars from all the four links and import all the details to a single csv file. 

#### 1. Importing Dependencies

In [1]:
# Importing all needed libraries

from bs4 import BeautifulSoup
import requests
from csv import writer
import pandas as pd

#### 2. Web Scraping

#### 2. a. Accessing the website, extracting data, importing to csv file (First Link)

In [2]:
source = requests.get('https://www.ccarprice.com/ke/2023-model-car-price-in-Kenya-86.php') # requesting access

soup = BeautifulSoup(source.text,'html.parser') # pearsing the html page

# creating csv file
with open('cars_prices.csv', 'w', newline='', encoding='utf-8') as f:
    thewriter = writer(f)
    header = ['name', 'price']
    thewriter.writerow(header)
    
    cars = soup.find_all('div', class_='listing') # finding all the div tag which has a class named listing
    
    # looping through all the div tags with a class named listing, and extracting needed cars' details 
    for car in cars:                               
        name = car.strong.text.strip('[]')[:-4] # extracting the text without the date and any square bracket
        price = car.text.split()[-1] # extracting the text without first string
        print(name, price)
        
        # setting columns' names
        carsinfo = [name, price]
        thewriter.writerow(carsinfo)

Audi S1 Hoonitron  1,059,736,000
Bugatti W16 Mistral  695,000,000
Bugatti Chiron Super Sport 300 Plus  542,100,000
Lamborghini Sian Roadster  542,100,000
Bugatti Chiron Super Sport  535,150,000
Hennessey Venom F5 Roadster  417,000,000
Hennessey Venom F5  403,100,000
Rimac Nevera  347,500,000
Ferrari Daytona SP3  326,650,000
Bentley Batur  271,050,000
BMW 3.0 CSL Coupe  104,250,000
BMW 3.0 CSL  104,250,000
Audi PB18 E-Tron Concept  103,972,000
Lamborghini Aventador SVJ Roadster  79,780,440
Toyota Land Cruiser ZX Gasoline 3.5L  74,577,670
Lamborghini Aventador SVR Track-Only Edition  66,720,000
Rolls Royce Wraith  65,288,300
Rolls Royce Phantom  64,635,000
Rolls Royce Dawn  60,888,950
Ferrari 296 GTS  55,600,000
Rolls Royce Spectre  55,600,000
Ferrari 296 GTB  48,844,600
BMW i7 M70 xDrive  48,650,000
Rolls Royce Cullinan  48,441,500
Rolls Royce Ghost  47,468,500
Bentley Continental GT Speed Convertible  44,104,700
BMW M8 Coupe Competition  42,824,510
Lamborghini Huracan Evo  40,476,800
B

In [3]:
# Printing the dataset from the first link
df = pd.read_csv('cars_prices.csv')
df.head()

Unnamed: 0,name,price
0,Audi S1 Hoonitron,1059736000
1,Bugatti W16 Mistral,695000000
2,Bugatti Chiron Super Sport 300 Plus,542100000
3,Lamborghini Sian Roadster,542100000
4,Bugatti Chiron Super Sport,535150000


In [4]:
# Inserting a new column for the year
df.insert(1, "year", [2023]*len(df))
df.head()

Unnamed: 0,name,year,price
0,Audi S1 Hoonitron,2023,1059736000
1,Bugatti W16 Mistral,2023,695000000
2,Bugatti Chiron Super Sport 300 Plus,2023,542100000
3,Lamborghini Sian Roadster,2023,542100000
4,Bugatti Chiron Super Sport,2023,535150000


In [5]:
# Checking the number of rows and columns
df.shape

(72, 3)

#### 2. b. Accessing the website, extracting data, importing to csv file (Second Link)

In [6]:
source = requests.get('https://www.ccarprice.com/ke/2022-model-car-price-in-Kenya-70.php') # requesting access

soup = BeautifulSoup(source.text,'html.parser') # pearsing the html page

# creating csv file
with open('cars_prices2.csv', 'w', newline='', encoding='utf-8') as f:
    thewriter = writer(f)
    header = ['name', 'price']
    thewriter.writerow(header)
    
    cars = soup.find_all('div', class_='listing') # finding all the div tag which has a class named listing
    
    # looping through all the div tags with a class named listing, and extracting needed cars' details
    for car in cars:
        name = car.strong.text.strip('[]')[:-4] # extracting the text without the date and any square bracket
        price = car.text.split()[-1] # extracting the text without first string
        print(name, price)
        
        # setting columns' names
        carsinfo = [name, price]
        thewriter.writerow(carsinfo)

Bugatti Divo  792,300,000
Bugatti W16 Mistral  690,604,820
Bugatti Chiron Super Sport 300 Plus  542,100,000
Bugatti Chiron Super Sport  535,150,000
Lamborghini Sian Roadster  514,300,000
Hennessey Venom F5 Roadster  417,000,000
Hennessey Venom F5  389,200,000
Rimac Nevera  333,600,000
Ferrari Daytona SP3  312,750,000
Mclaren Elva  275,220,000
Lamborghini Aventador SVJ Roadster  114,801,490
Rolls Royce Phantom  64,635,000
Rolls Royce Dawn  59,498,950
Mclaren 765LT Spider  52,542,000
Rolls Royce Cullinan  48,650,000
Rolls Royce Ghost  47,329,500
Ferrari 296 GTB  44,674,600
Bentley Continental GT Speed Convertible  42,033,600
Bentley Continental GT Speed  38,211,100
Lamborghini Huracan Evo  36,317,086
Tesla Roadster 720 MJ  34,750,000
Lamborghini Huracan Tecnica  33,221,000
Bentley Bentayga  30,552,200
Lamborghini Urus  30,302,000
Aston Martin DB11 V8 Coupe  28,578,400
Bentley Flying Spur Hybrid  28,356,000
Tesla Roadster 720 MJ Convertible  28,008,500
Porsche Cayenne E-Hybrid Coupe  23,3

In [7]:
# Printing the dataset from the second link
df2 = pd.read_csv('cars_prices2.csv')
df2.head()

Unnamed: 0,name,price
0,Bugatti Divo,792300000
1,Bugatti W16 Mistral,690604820
2,Bugatti Chiron Super Sport 300 Plus,542100000
3,Bugatti Chiron Super Sport,535150000
4,Lamborghini Sian Roadster,514300000


In [8]:
# Inserting a new column for the year
df2.insert(1, "year", [2022]*len(df2))
df2.head()

Unnamed: 0,name,year,price
0,Bugatti Divo,2022,792300000
1,Bugatti W16 Mistral,2022,690604820
2,Bugatti Chiron Super Sport 300 Plus,2022,542100000
3,Bugatti Chiron Super Sport,2022,535150000
4,Lamborghini Sian Roadster,2022,514300000


In [9]:
# Checking the number of rows and columns
df2.shape

(72, 3)

#### 2. c. Accessing the website, extracting data, importing to csv file (Third Link)

In [10]:
source = requests.get('https://www.ccarprice.com/ke/2021-model-car-price-in-Kenya-53.php') # requesting access

soup = BeautifulSoup(source.text,'html.parser') # pearsing the html page

# creating csv file
with open('cars_prices3.csv', 'w', newline='', encoding='utf-8') as f:
    thewriter = writer(f)
    header = ['name', 'price']
    thewriter.writerow(header)
    
    cars = soup.find_all('div', class_='listing') # finding all the div tag which has a class named listing
    
    # looping through all the div tags with a class named listing, and extracting needed cars' details
    for car in cars:
        name = car.strong.text.strip('[]')[:-4] # extracting the text without the date and any square bracket
        price = car.text.split()[-1] # extracting the text without first string
        print(name, price)
        
        # setting columns' names
        carsinfo = [name, price]
        thewriter.writerow(carsinfo)

Lamborghini Sian Roadster  514,300,000
Bugatti Chiron Pur Sport  493,450,000
Mclaren Sabre  486,500,000
Lamborghini Sian Roadster Hybrid  458,370,570
Mclaren Elva  254,370,000
Rolls Royce Phantom  67,797,250
Lamborghini Aventador SVR Track-Only Edition  62,550,000
Rolls Royce Dawn  49,553,500
Rolls Royce Cullinan  45,870,000
Rolls Royce Ghost  43,354,100
Lamborghini Urus Performante  30,302,000
Lamborghini Urus  30,302,000
Lamborghini Huracan Evo  28,991,230
Bentley Bentayga  24,603,000
Bentley Bentayga Hybrid  22,240,000
Porsche Cayenne Turbo  17,583,500
Bollinger B2  17,375,000
Tesla Model X Plaid  16,678,610
Tesla Model S Plaid  16,678,610
Jaguar F-Type R Convertible  14,720,100
Jaguar F-Type R Coupe  14,344,800
Tesla Model X Performance  13,898,610
Tesla Model S Performance  12,786,610
Lincoln Navigator L Reserve 4x4  12,212,540
Porsche Cayenne E-Hybrid  11,370,200
Dodge Durango SRT Hellcat  11,257,610
Audi RS5 Sportback  10,480,600
Audi RS5 Coupe  10,438,900
Ford Mustang Shelby GT

In [11]:
# Printing the dataset from the third link
df3 = pd.read_csv('cars_prices3.csv')
df3.head()

Unnamed: 0,name,price
0,Lamborghini Sian Roadster,514300000
1,Bugatti Chiron Pur Sport,493450000
2,Mclaren Sabre,486500000
3,Lamborghini Sian Roadster Hybrid,458370570
4,Mclaren Elva,254370000


In [12]:
# Inserting a new column for the year
df3.insert(1, "year", [2021]*len(df3))
df3.head()

Unnamed: 0,name,year,price
0,Lamborghini Sian Roadster,2021,514300000
1,Bugatti Chiron Pur Sport,2021,493450000
2,Mclaren Sabre,2021,486500000
3,Lamborghini Sian Roadster Hybrid,2021,458370570
4,Mclaren Elva,2021,254370000


In [13]:
# Checking the number of rows and columns
df3.shape

(50, 3)

#### 2. d. Accessing the website, extracting data, importing to csv file (First Link)

In [14]:
source = requests.get('https://www.ccarprice.com/ke/2020-model-car-price-in-Kenya-51.php') # requesting access

soup = BeautifulSoup(source.text,'html.parser') # pearsing the html page

# creating csv file
with open('cars_prices4.csv', 'w', newline='', encoding='utf-8') as f:
    thewriter = writer(f)
    header = ['name', 'price']
    thewriter.writerow(header)
    
    cars = soup.find_all('div', class_='listing') # finding all the div tag which has a class named listing
    
    # looping through all the div tags with a class named listing, and extracting needed cars' details
    for car in cars:
        name = car.strong.text.strip('[]')[:-4] # extracting the text without the date and any square bracket
        price = car.text.split()[-1] # extracting the text without first string
        print(name, price)
        
        # setting columns' names
        carsinfo = [name, price]
        thewriter.writerow(carsinfo)

Bugatti Divo  806,200,000
Bugatti Chiron Super Sport 300 Plus  542,100,000
Lamborghini Sian  514,300,000
Mclaren Speedtail  309,970,000
Lamborghini Aventador SVJ Roadster  79,780,440
Lamborghini Aventador SVJ 2 71,585,000
Lamborghini Aventador S Roadster  64,974,160
Rolls Royce Wraith  63,909,420
Rolls Royce Phantom  63,245,000
Rolls Royce Cullinan  53,278,700
Rolls Royce Dawn  48,135,700
Rolls Royce Wraith  44,549,500
Rolls Royce Ghost  43,785,000
Lamborghini Huracan Evo Spyder RWD  39,948,600
Lamborghini Huracan EVO  36,316,530
Bentley Continental GT V8 First Edition  34,786,140
Lamborghini Huracan Evo Spyder  34,388,600
Bentley Continental GT V8 Convertible  30,955,300
Lamborghini Urus  28,354,610
Bentley Bentayga  23,393,700
Bentley Bentayga Hybrid  22,240,000
Porsche Cayenne Turbo Coupe AWD  18,083,900
Tesla Model X Performance  14,593,610
Tesla Model S Performance  13,898,610
Audi RS5 Sportback  12,871,400
Audi RS5 Coupe  12,857,500
Lincoln Navigator L Reserve 4x4  12,124,970
Por

In [15]:
# Printing the dataset from the fourth link
df4 = pd.read_csv('cars_prices4.csv')
df4.head()

Unnamed: 0,name,price
0,Bugatti Divo,806200000
1,Bugatti Chiron Super Sport 300 Plus,542100000
2,Lamborghini Sian,514300000
3,Mclaren Speedtail,309970000
4,Lamborghini Aventador SVJ Roadster,79780440


In [16]:
# Inserting a new column for the year
df4.insert(1, "year", [2020]*len(df4))
df4.head()

Unnamed: 0,name,year,price
0,Bugatti Divo,2020,806200000
1,Bugatti Chiron Super Sport 300 Plus,2020,542100000
2,Lamborghini Sian,2020,514300000
3,Mclaren Speedtail,2020,309970000
4,Lamborghini Aventador SVJ Roadster,2020,79780440


In [17]:
# Checking the number of rows and columns
df4.shape

(50, 3)

In [18]:
# Merging all datasets into a single dataset
my_df = pd.concat([df, df2, df3, df4], ignore_index=True)
my_df

Unnamed: 0,name,year,price
0,Audi S1 Hoonitron,2023,1059736000
1,Bugatti W16 Mistral,2023,695000000
2,Bugatti Chiron Super Sport 300 Plus,2023,542100000
3,Lamborghini Sian Roadster,2023,542100000
4,Bugatti Chiron Super Sport,2023,535150000
...,...,...,...
239,Volkswagen Tiguan 2.0T S,2020,3461100
240,Subaru Legacy,2020,3160860
241,Subaru Crosstrek,2020,3077460
242,Fiat 500L Pop Hatch,2020,3056610


In [19]:
# Checking the number of rows and columns
my_df.shape

(244, 3)