# Web Scrapping on Motor bikes Website.

- We are going to scrape https://www.carandbike.com/new-bikes
- we will extract details about each motor bike.
- we will make a csv file using details like name, price, cc, mileage,weight link


In [2]:
# import requests library to access data from the web pages.
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

In [3]:
project_url= 'https://www.carandbike.com/new-bikes/models'

In [4]:
response= requests.get(project_url)

In [5]:
len(response.text)

493435

In [6]:
response.status_code

200

In [7]:
content= response.text

In [8]:
content[:1000]

'<!DOCTYPE html><html lang="en"><head><meta charSet="utf-8"/><meta name="viewport" content="width=device-width"/><title>New Bikes Prices List, Latest Models, Reviews &amp; News India 2022</title><link rel="shortcut icon" href="/favicons/favicon.png"/><link rel="apple-touch-icon" sizes="180x180" href="/favicons/apple-favicon.png"/><link rel="icon" type="image/png" sizes="32x32" href="/favicons/task-favicon.png"/><link href="https://images.carandbike.com" rel="dns-prefetch"/><link href="www.googletagmanager.com" rel="dns-prefetch"/><meta name="description" content="Get details about all new bike models in India. Visit us and find new bike models, bike prices, bike features, bike comparison of different variants in India."/><meta name="keywords" content="new bikes, new bikes india, bike purchase, buy new bike, new model bike, best bike to buy, purchase bikes, purchase vehicles, buy new bikes, best new bikes to buy, best new vehicles buy india"/><meta property="og:site_name" content="caran

In [9]:
doc= bs(content,"html.parser")

In [10]:
bike_url_tags=  doc.find_all('a',{'class':'newmodel-bike__link h__block h__mb10 js-cty-url-new'})
bike_url_tags[0]['href']

IndexError: ignored

In [None]:
bike_name_tags= doc.find_all('figcaption')
bike_name_tags[0].h2.text

In [None]:
bike_price_tags= doc.find_all('div',{'class':'newmodel-bike__price h__truncate'})
bike_price_tags[0].text.strip()

In [None]:
other_details=[]
other_info= doc.find_all('ul',{'class':'newmodel-bike__spec grid-flex grid-am h__mb15'})
for i in other_info:
    other_details+= [[j.text for j in i.find_all('li',{'class':'newmodel-bike__spec-col'})]]
other_details

In [None]:
bike_details=[]
for url,name,price,mis in zip(bike_url_tags,bike_name_tags,bike_price_tags,other_details):
    bike_details.append({'Model Name':name.h2.text,
                         'Link':url['href'],
                        'Price':price.text.strip(),
                        'Range':mis[0],
                        'Weight':mis[1]})

In [None]:
bike_details[:5],len(bike_details)

In [None]:
#lets create dataframe using dictionary
df= pd.DataFrame(bike_details,columns=['Model Name','Price','Range','Weight','Link'])
df

#### Lets create functions to combine all the steps to perform on multiple multiple pages

In [None]:
def get_page_contents(url):
    """
    This function takes url as input and return html page
    """
    response= requests.get(url)
    
    if response.status_code==200:
        docs= bs(response.text,'html.parser')
        return docs

def get_tags(doc):
    """
    This function takes html page as an input and extracts details about bike
    """
    name_tags= doc.find_all('figcaption')
    
    url_tags= doc.find_all('a',{'class':'newmodel-bike__link h__block h__mb10 js-cty-url-new'})
    
    price_tags= doc.find_all('div',{'class':'newmodel-bike__price h__truncate'})
    
    other_details=[]
    mis_info= doc.find_all('ul',{'class':'newmodel-bike__spec grid-flex grid-am h__mb15'})
    for i in mis_info:
        other_details+= [[j.text for j in i.find_all('li',{'class':'newmodel-bike__spec-col'})]]
        
    return name_tags,price_tags,other_details,url_tags

def form_dict(name_tags,price_tags,other_details,url_tags):
    """
    This function takes differents tags and creates a list of dictionaries out of it
    """

    d=[]
    
    for name,price,mis, url in zip(name_tags, price_tags, other_details, url_tags):
        d.append({'Model Name':name.h2.text,
                 'Price':price.text.strip(),
                  'CC':mis[0],
                 'Mileage':mis[1],
                 'Weight':mis[2],
                 'Links':url['href']})
    return d

def create_df(*args):
    return pd.DataFrame(*args)


#Lets gets details of 300 pages of bikes

base_url='https://www.carandbike.com/new-bikes/models'

final_list=[]
for i in range(300):
    doc= get_page_contents(base_url+'/'+str(i))
    name,price,other,url= get_tags(doc)
    final_list+=form_dict(name,price,other,url)

df= create_df(final_list)

In [None]:
df.head(11)

In [None]:
len(df)

In [None]:
df_copy= df.copy()

In [None]:
#electic bikes has no dimensions for engine as they run on batteries.
electric_bikes= df_copy[df_copy['CC']=='NA'].reset_index().drop('index',axis=1)
electric_bikes

In [None]:
electric_bikes.shape

In [None]:
#saving dataframes
electric_bikes.to_csv('electric_bikes.csv')
df.to_csv('bikes.csv')

In [None]:
electric= pd.read_csv('electric_bikes.csv',index_col=0)
electric.head()

In [None]:
all_bikes= pd.read_csv('bikes.csv',index_col=0)
all_bikes.head(21)

#### Conclusions:
- As we can see there are lots of bikes in indian market
- we have extracted the basic details of each bike like model name, price,cc, mileage, weight, links
- There are more than 3500 bikes in india out of which close to 50-60 are electric. 

#### Future Work:

- extract more details like ratings, colours available, brakes,fuel tank capacity, on road price, max power, max torque, feutures list