# Task 1

---

## Web scraping and analysis

This Jupyter notebook includes some code to get you started with web scraping. We will use a package called `BeautifulSoup` to collect the data from the web. Once you've collected your data and saved it into a local `.csv` file you should start with your analysis.

### Scraping data from Skytrax

If you visit [https://www.airlinequality.com] you can see that there is a lot of data there. For this task, we are only interested in reviews related to British Airways and the Airline itself.

If you navigate to this link: [https://www.airlinequality.com/airline-reviews/british-airways] you will see this data. Now, we can use `Python` and `BeautifulSoup` to collect all the links to the reviews and then to collect the text data on each of the individual review links.

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

In [27]:
base_url = "https://www.airlinequality.com/airline-reviews/british-airways"
pages = 10
page_size = 100

global rating, Aircraft, Class, Travel_type, Route, Date, comfort, staff, food, entertainment, wifi, ground_service, vfm, recommend

# for i in range(1, pages + 1):
for i in range(1, pages + 1):

    print(f"Scraping page {i}")

    # Create URL to collect links from paginated data
    url = f"{base_url}/page/{i}/?sortby=post_date%3ADesc&pagesize={page_size}"

    # Collect HTML data from this page
    response = requests.get(url)

    # Parse content
    content = response.content
    parsed_content = BeautifulSoup(content, 'html.parser')
    
    df = pd.DataFrame()

    for row in parsed_content.find_all("div", {"class": "body"}):
        data_dict = {'Comment': '', 
                     'Aircraft':'', 'Type Of Traveller':'', 'Seat Type':'', 'Route':'', 'Date Flown':'',
                     'Seat Comfort':0, 'Cabin Staff Service':0, 'Food & Beverages':0, 'Inflight Entertainment':0,
                     'Ground Service':0,'Wifi & Connectivity':0, 'Value For Money':0, 'Recommended':''}

        comment = {'Comment': row.find("div", {"class": "text_content"}).get_text()}
        
        data_dict.update(comment)

        table = row.find_all('table')[0]

        for tr in  table.select('table.review-ratings tr'):            
            header=tr.find('td',class_='review-rating-header').text
            if tr.find('td',class_='review-value') :
                value=tr.find('td',class_='review-value').text
            else :
                value=len(tr.find('td',class_='review-rating-stars').select('.star.fill'))
            data_dict.update({header:value})
        
        df=df.append(data_dict,ignore_index=True)

    print(df)
        
 

Scraping page 1
          Aircraft  Cabin Staff Service  \
0                                   1.0   
1                                   4.0   
2             A320                  5.0   
3     A320 Finnair                  2.0   
4             A319                  3.0   
..             ...                  ...   
95    Boeing 787-8                  5.0   
96  Boeing 777-200                  5.0   
97                                  5.0   
98                                  4.0   
99  Boeing 777-200                  1.0   

                                              Comment     Date Flown  \
0   Not Verified | Top Ten REASONS to not use Brit...       May 2023   
1   Not Verified |  Easy check in on the way to He...     March 2023   
2   ✅ Trip Verified |  Online check in worked fine...       May 2023   
3   ✅ Trip Verified |. The BA first lounge at Term...       May 2023   
4   Not Verified | Paid a quick visit to Nice yest...       May 2023   
..                                 

          Aircraft  Cabin Staff Service  \
0             A320                  0.0   
1     Boeing 787-9                  5.0   
2                                   0.0   
3                                   5.0   
4                                   0.0   
..             ...                  ...   
95                                  3.0   
96         A321NEO                  5.0   
97                                  3.0   
98  Boeing 737-800                  5.0   
99                                  4.0   

                                              Comment     Date Flown  \
0   Not Verified | Having just booked BA for a ret...  February 2022   
1   ✅ Trip Verified |  BA got everything right. Al...  February 2022   
2   ✅ Trip Verified |  Appalling customer service ...  February 2022   
3   ✅ Trip Verified |  This past November/December...  December 2021   
4   ✅ Trip Verified |  I had the best experience I...  February 2022   
..                                                .

                   Aircraft  Cabin Staff Service  \
0                                            3.0   
1                      A320                  5.0   
2                      A380                  5.0   
3                                            3.0   
4                Boeing 747                  5.0   
..                      ...                  ...   
95        A319 / Boeing 789                  2.0   
96  Boeing 777-200 and A319                  3.0   
97                                           5.0   
98           Boeing 777-200                  5.0   
99                     A320                  5.0   

                                              Comment      Date Flown  \
0   Not Verified |  One of the reasons we traveled...    October 2019   
1   Not Verified |  Gatwick to Alicante. On my out...  September 2019   
2   ✅ Trip Verified |  Vancouver to London. Great ...  September 2019   
3   ✅ Trip Verified |  Gatwick to Alicante. 3.5 ho...  September 2019   
4   Not Ve

          Aircraft  Cabin Staff Service  \
0                                   1.0   
1                                   5.0   
2             A320                  5.0   
3                                   4.0   
4   Boeing 777-200                  2.0   
..             ...                  ...   
95                                  4.0   
96            A319                  3.0   
97                                  3.0   
98                                  3.0   
99            A320                  4.0   

                                              Comment     Date Flown  \
0   ✅ Trip Verified |  London to Lyon. The flight ...  November 2018   
1   ✅ Trip Verified |  London to Boston. I was sea...  November 2018   
2   ✅ Trip Verified | Stockholm to London. Standar...  November 2018   
3   ✅ Trip Verified |  Amsterdam to London arrived...  November 2018   
4   ✅ Trip Verified |  Buenos Aires to London. We ...  November 2018   
..                                                .

In [28]:
df

Unnamed: 0,Aircraft,Cabin Staff Service,Comment,Date Flown,Food & Beverages,Ground Service,Inflight Entertainment,Recommended,Route,Seat Comfort,Seat Type,Type Of Traveller,Value For Money,Wifi & Connectivity
0,,1.0,✅ Trip Verified | London to Lyon. The flight ...,November 2018,0.0,1.0,0.0,no,London to Lyon,2.0,Economy Class,Solo Leisure,1.0,0.0
1,,5.0,✅ Trip Verified | London to Boston. I was sea...,November 2018,4.0,4.0,4.0,yes,London to Boston,3.0,Economy Class,Business,5.0,1.0
2,A320,5.0,✅ Trip Verified | Stockholm to London. Standar...,November 2018,2.0,1.0,0.0,yes,Stockholm to London,3.0,Business Class,Solo Leisure,3.0,0.0
3,,4.0,✅ Trip Verified | Amsterdam to London arrived...,November 2018,0.0,1.0,0.0,no,Amsterdam to London,4.0,Economy Class,Business,3.0,0.0
4,Boeing 777-200,2.0,✅ Trip Verified | Buenos Aires to London. We ...,November 2018,1.0,1.0,1.0,no,Buenos Aires to London,1.0,Economy Class,Couple Leisure,1.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,,4.0,✅ Trip Verified | Aberdeen to Boston via Lond...,July 2018,4.0,4.0,5.0,yes,Aberdeen to Boston via London,2.0,Economy Class,Business,3.0,4.0
96,A319,3.0,✅ Trip Verified | London to Hamburg. Baggage ...,June 2018,0.0,5.0,0.0,yes,London to Hamburg,4.0,Economy Class,Solo Leisure,3.0,0.0
97,,3.0,✅ Trip Verified | Flew London Heathrow to Hong...,July 2018,2.0,1.0,3.0,no,London Heathrow to Hong Kong,3.0,Premium Economy,Family Leisure,2.0,2.0
98,,3.0,✅ Trip Verified | Flew to Istanbul with Britis...,June 2018,1.0,1.0,1.0,no,London to Istanbul,3.0,Economy Class,Couple Leisure,2.0,0.0


In [4]:
df.to_csv("data/BA_reviews.csv")

Congratulations! Now you have your dataset for this task! The loops above collected 1000 reviews by iterating through the paginated pages on the website. However, if you want to collect more data, try increasing the number of pages!

 The next thing that you should do is clean this data to remove any unnecessary text from each of the rows. For example, "✅ Trip Verified" can be removed from each row if it exists, as it's not relevant to what we want to investigate.

In [5]:
df

Unnamed: 0,reviews
0,✅ Trip Verified | I will never travel with Br...
1,✅ Trip Verified | I am already in Portugal so...
2,✅ Trip Verified | Terrible. Avoid this airlin...
3,✅ Trip Verified | Despite being a gold member...
4,Not Verified | Regarding the aircraft and seat...
...,...
995,✅ Trip Verified | Had four flights in total w...
996,✅ Trip Verified | Johannesburg to Heathrow. B...
997,✅ Trip Verified | The queue for bag drop was ...
998,✅ Trip Verified | British Airways changed pla...
