# Code Challenge - Task 1: Web scraping to gain company insights

***
*Author: Kadriye Nur Bakirci*
***
Contact regarding the code: nur.bakirci@gmail.com
***

# Task 1

---

## Web scraping and analysis

This Jupyter notebook includes some code to get started with web scraping. We will use a package called `BeautifulSoup` to collect the data from the web. Once we've collected our data and saved it into a local `.csv` file we should start with our analysis.

### Scraping data from Skytrax

If you visit [https://www.airlinequality.com] you can see that there is a lot of data there. For this task, we are only interested in reviews related to British Airways and the Airline itself.

If you navigate to this link: [https://www.airlinequality.com/airline-reviews/british-airways] you will see this data. Now, we can use `Python` and `BeautifulSoup` to collect all the links to the reviews and then to collect the text data on each of the individual review links.

In [3]:
# Import necessary libraries
import requests
from bs4 import BeautifulSoup
import pandas as pd

# Define some rules of displaying DataFrame
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 1000)
pd.set_option('max_colwidth', None)

In [2]:
# Define specific informations to collect data
base_url = "https://www.airlinequality.com/airline-reviews/british-airways"
pages = 38
page_size = 100

reviews = []

# Create for loop to extract each page information
for i in range(1, pages + 1):

    print(f"Scraping page {i}")

    # Create URL to collect links from paginated data
    url = f"{base_url}/page/{i}/?sortby=post_date%3ADesc&pagesize={page_size}"

    # Collect HTML data from this page
    response = requests.get(url)

    # Parse content
    content = response.content
    parsed_content = BeautifulSoup(content, 'html.parser')
    for para in parsed_content.find_all("div", {"class": "text_content"}):
        reviews.append(para.get_text())

    print(f"   ---> {len(reviews)} total reviews")

Scraping page 1
   ---> 100 total reviews
Scraping page 2
   ---> 200 total reviews
Scraping page 3
   ---> 300 total reviews
Scraping page 4
   ---> 400 total reviews
Scraping page 5
   ---> 500 total reviews
Scraping page 6
   ---> 600 total reviews
Scraping page 7
   ---> 700 total reviews
Scraping page 8
   ---> 800 total reviews
Scraping page 9
   ---> 900 total reviews
Scraping page 10
   ---> 1000 total reviews
Scraping page 11
   ---> 1100 total reviews
Scraping page 12
   ---> 1200 total reviews
Scraping page 13
   ---> 1300 total reviews
Scraping page 14
   ---> 1400 total reviews
Scraping page 15
   ---> 1500 total reviews
Scraping page 16
   ---> 1600 total reviews
Scraping page 17
   ---> 1700 total reviews
Scraping page 18
   ---> 1800 total reviews
Scraping page 19
   ---> 1900 total reviews
Scraping page 20
   ---> 2000 total reviews
Scraping page 21
   ---> 2100 total reviews
Scraping page 22
   ---> 2200 total reviews
Scraping page 23
   ---> 2300 total reviews
Scrapi

In [4]:
# Create a dataframe
df = pd.DataFrame()
df["reviews"] = reviews
# Check dataframe
df.head()

Unnamed: 0,reviews
0,"Not Verified | The flight was comfortable enough but with an hour delay on the return leg. However, on both leg I was told I had to put my very small and expensive cabin case into the hold as the flight was full. Having done so I was not amused to see other passengers bringing much larger cases into the cabin. BA should stick to their cabin bag size limit and not inconvenience those who comply."
1,"✅ Trip Verified | We had a really good flying experience with BA, travelling as a young family of 4. The flights left on time and we even arrived early for nearly each one of our flights. Food was generous and quite tasty for Economy class with the crew coming around with water/drinks throughout the flights. Our checked luggage also arrived safely and undamaged both at VCE and our return flight to YUL. On all of our flights the crew were attentive, friendly, and helpful with us and our children, especially the gentlemen who served us on the return flights from VCE to YUL on March 5th. The B787-8 interior is really dated and really needs to be updated to compete with their European counterparts. There were panels squeaking loudly when we hit turbulence, seat covers coming off the seats, and tray tables which were not level and loose for eating. The IFE on the B787-8 worked fine, but it definitely wasn't as responsive and did not have a newer, larger screen like the ones on the B777-200. Thankfully, the B777-200 have had their interior updated but the one we flew on had a clogged sink in one of the lavatories, which created problems for passengers. Unfortunately, on nearly all of our flights, there was garbage left in the seat pockets and the floors weren't quite as clean. On the incoming flight to LHR, the B787-8 was not assigned a gate because we arrived early into the airport, which resulted in significant delays for the airport buses to get to the plane and also slowed down the deplaning process. We would definitely consider flying trans-Atlantic with BA again, as we received value and service for the fare we paid."
2,"✅ Trip Verified | Waited an hour to check-in at the Paphos business check-in. Staff utterly incompetent. Flight crew in business class removed my ruck sack from the flight bins without my consent to make way for another customer luggage. I was then coerced to have my luggage at my feet throughout my flight. Utterly outrageous, last thing you would expect in BA business class."
3,"Not Verified | Not a great experience at all, from the outset it was poorly managed as they bused us out to a parking slot only to have us wait for 15 minutes in the bus as the plane was ready. BA business class is not Business class. Tired, small and generally not worth the ticket price. Tables that don’t sit straight, arm rests that aren’t secure and terrible screens. It’s not a patch on first class airlines which is apparently where BA think they should be. They have a long way to go."
4,"✅ Trip Verified | Boarding was difficult caused by vast majority of the passengers carrying too much hand luggage. FA's were friendly. The seats on BA for European flights are extremely narrow. There was a choice of breakfast and very surprising the Champagne Castelau on European flights is of a better quality as the brand used in club on intercontinental flight. Nothing wrong with this flight, however not pleasant due to the unpleasant seats. Waiting time at Brussels for luggage some 20 minutes what is very acceptable."


In [None]:
# Save as csv file for further analysis
df.to_csv("data/BA_reviews.csv")

Now we have our dataset for this task! The loops above collected 3771 reviews by iterating through the paginated pages on the website.

 The next thing that we should do is clean this data to remove any unnecessary text from each of the rows.