## Web Scraping and Analysis

### Introduction 
British Airways (BA) is the flag carrier airline of the United Kingdom (UK). Every day, thousands of BA flights arrive to and depart from the UK, carrying customers across the world. Whether it’s for holidays, work or any other reason, the end-to-end process of scheduling, planning, boarding, fuelling, transporting, landing, and continuously running flights on time, efficiently and with top-class customer service is a huge task with many highly important responsibilities.

As a data scientist at British Airways, it will be your job to apply your analytical skills to influence real life multi-million-pound decisions from day one, making a tangible impact on the business as your recommendations, tools and models drive key business decisions, reduce costs and increase revenue.Customers who book a flight with BA will experience many interaction points with the BA brand. Understanding a customer's feelings, needs, and feedback is crucial for any business, British Airways. The steps taken are;


### 1. Scraping data from Skytrax

For this task, we will reviews the [British Airways Airline data](https://www.airlinequality.com/airline-reviews/british-airways).  `Python` and `BeautifulSoup` will be used to collect all the links to the reviews and then to collect the text data on each of the individual review links.

### 2. Analyse the data
Once we have the dataset, we will prepare it. The data is very messy and contain purely text. We will need to perform data cleaning in order to prepare the data for analysis. When the data is clean, we should perform several analysis to uncover some insights. 

### 3. Present insights
We have been required by the manager to summarise our findings within a single PowerPoint slide, so that they can present the results at the next board meeting. We would create visualisations and metrics to include within this slide, as well as clear and concise explanations in order to quickly provide the key points from our analysis.


In [8]:
# importing required libaries 
import requests
from bs4 import BeautifulSoup
import pandas as pd

In [9]:
base_url = "https://www.airlinequality.com/airline-reviews/british-airways"
pages = 10
page_size = 100

reviews = []

# for i in range(1, pages + 1):
for i in range(1, pages + 1):

    print(f"Scraping page {i}")

    # Create URL to collect links from paginated data
    url = f"{base_url}/page/{i}/?sortby=post_date%3ADesc&pagesize={page_size}"

    # Collect HTML data from this page
    response = requests.get(url)

    # Parse content
    content = response.content
    parsed_content = BeautifulSoup(content, 'html.parser')
    for para in parsed_content.find_all("div", {"class": "text_content"}):
        reviews.append(para.get_text())
    
    print(f"   ---> {len(reviews)} total reviews")

Scraping page 1
   ---> 100 total reviews
Scraping page 2
   ---> 200 total reviews
Scraping page 3
   ---> 300 total reviews
Scraping page 4
   ---> 400 total reviews
Scraping page 5
   ---> 500 total reviews
Scraping page 6
   ---> 600 total reviews
Scraping page 7
   ---> 700 total reviews
Scraping page 8
   ---> 800 total reviews
Scraping page 9
   ---> 900 total reviews
Scraping page 10
   ---> 1000 total reviews


In [13]:
# reading the dataset
df = pd.DataFrame()
df["reviews"] = reviews

In [14]:
# calling the first 15 reviews
df.head(15)

Unnamed: 0,reviews
0,Not Verified | Seat horribly narrow; 3-4-3 on...
1,✅ Trip Verified | Glasgow to London delayed b...
2,✅ Trip Verified | When I tried to check in on...
3,✅ Trip Verified | I flew from Prague to LHR. ...
4,✅ Trip Verified | Disappointing again especia...
5,✅ Trip Verified | During both the outbound an...
6,✅ Trip Verified | I was flying to Warsaw for ...
7,✅ Trip Verified | Booked a BA holiday to Marr...
8,✅ Trip Verified | Extremely sub-par service. H...
9,✅ Trip Verified | I virtually gave up on Brit...


In [16]:
# calling the last 15 reviews
df.tail(15)

Unnamed: 0,reviews
985,✅ Trip Verified | Tampa to Gatwick. I am a di...
986,✅ Trip Verified | Heathrow to Keflavik. I had...
987,✅ Trip Verified | London to Muscat first clas...
988,✅ Trip Verified | My family and I travelled f...
989,✅ Trip Verified | Gatwick to Madeira. The fli...
990,✅ Trip Verified | London to Casablanca. Their ...
991,✅ Trip Verified | British Airways flight manag...
992,✅ Trip Verified | Hyderabad to Brussels via Lo...
993,✅ Trip Verified | London Gatwick to Fort Laude...
994,✅ Trip Verified | Milan to London Heathrow. T...
