# Report

## Comparative Report between British Airways and competing airlines of 3 BA Most Profitable Routes. 

### Introduction

In this analysis, we aim to compare British Airways (BA) with its main competitors on three popular international routes: JFK, Dubai, and Hong Kong, based on 2022 data. The identified main competitors for these routes are American Airlines, Emirati Airline, and Cathay Pacific Airlines. To ensure a uniform analysis, I decide to only take into account the economy class reviews.

### Identified Metrics

Two key metrics will be considered in this comparative analysis:

1. **Customer Service Comments and Quantifiable Rating:**
   - We will analyze customer reviews and ratings related to customer service. This will involve considering both qualitative comments and quantifiable ratings provided by passengers.

2. **Pricing:**
   - The pricing strategy of each airline will be assessed. This includes examining the affordability and value for money perceived by customers.

### Review Limitations and Caveats

It's crucial to acknowledge and address potential biases and limitations in the review data:

1. **Negativity Bias:**
   - Recognize that dissatisfied customers may be more inclined to leave reviews. This negativity bias could skew the representation of the overall customer experience. To mitigate this, we will attempt to extract insights from a diverse range of reviews.

2. **Incomplete Representation:**
   - Acknowledge that online reviews may not provide a comprehensive view of the passenger experience. Some satisfied customers may not leave reviews, leading to an incomplete representation. We will attempt to account for this limitation by considering multiple sources and trends over time.

### Analysis Approach

To conduct the analysis, we will:

1. **Collect Data:**
   - Gather customer reviews and ratings for each airline on the specified routes, focusing on the years 2022-2023.

2. **Segmentation:**
   - Segment the data based on airlines, routes, and travel years to facilitate a detailed and targeted analysis.

3. **NLP Analysis:**
   - Utilize Natural Language Processing (NLP) techniques, including stemming, to analyze and make sense of qualitative comments. This will help extract key themes and sentiments from customer feedback.

4. **Quantitative Analysis:**
   - Conduct a quantitative analysis of the quantifiable ratings to identify trends and patterns in customer satisfaction.

5. **Correlation Analysis:**
   - Explore the correlation between quantifiable ratings and the sentiments expressed in comments. This will provide a deeper understanding of how numerical ratings align with qualitative feedback, representing the comments with the quantifiable metrics

By approaching the analysis systematically and considering potential biases, we aim to provide a comprehensive and insightful comparison of British Airways and its competitors on the specified routes.


### **Step 1**

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

British Airways customer service comment and ratings

In [2]:
base_url = "https://www.airlinequality.com/airline-reviews/british-airways"
response = requests.get(base_url)
response

<Response [200]>

In [46]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

pages=20
page_size=100
base_url = "https://www.airlinequality.com/airline-reviews/british-airways"

# URL of the webpage
for i in range(1, pages + 1):

    print(f"Scraping page {i}")

    # Create URL to collect links from paginated data
    url = f"{base_url}/page/{i}/?sortby=post_date%3ADesc&pagesize={page_size}"

    # Send a GET request to the website
    response = requests.get(url)

    # Check if the request was successful
    if response.status_code == 200:
        # Parse the HTML content
        soup = BeautifulSoup(response.content, 'html.parser')
        
        #Route
        td_tags = soup.find_all('td')
        
        df=pd.DataFrame()
        
        for td_tag in td_tags:
        # Check if the text content of the current <td> tag is "Route"
            if td_tag.text.strip() == 'Route':
                
                # If "Route" is found, get the next <td> tag which contains the route information
                route_td_tag = td_tag.find_next_sibling('td', class_='review-value')
                
                # Extract the text content of the <td> tag
                route = route_td_tag.text.strip()
                
                # Append the extracted route to DataFrame (assuming df has been initialized beforehand)
                df = df.append({'Route': route}, ignore_index=True)
        
for td_tag in td_tags:
    if td_tag.text.strip() == "Seat Type":
        seat_type_label_tag = td_tag.find_next_sibling("td", class_='review-rating-header cabin_flown')
        
        # Check if the next sibling exists before accessing its text attribute
        if seat_type_label_tag:
            seat_type_val_tag = seat_type_label_tag.find_next_sibling("td")
            seat_type = seat_type_val_tag.text.strip() if seat_type_val_tag else None
            df = df.append({'Seat type': seat_type}, ignore_index=True)
        else:
            # Handle the case where the next sibling is not found
            print("Seat Type information not found.")
                
                
                

Scraping page 1


AttributeError: 'NoneType' object has no attribute 'text'

In [38]:
# Import necessary modules
import requests
from bs4 import BeautifulSoup
import pandas as pd

base_url = "https://www.airlinequality.com/airline-reviews/british-airways"
pages = 15
page_size = 100

df = pd.DataFrame()

for i in range(1, pages + 1):
    print(f"Scraping page {i}")

    # Create URL to collect links from paginated data
    url = f"{base_url}/page/{i}/?sortby=post_date%3ADesc&pagesize={page_size}"

    # Collect HTML data from this page
    response = requests.get(url)

    # Parse content
    content = response.content
    parsed_content = BeautifulSoup(content, 'html.parser')
    # Find all <td> tags with class "review-rating-header cabin_flown"
    td_tags_cabin_flown = parsed_content.find_all('td', class_='review-rating-header cabin_flown')
    td_tags_route=parsed_content.find_all("td", class_="review-rating-header route")
    td_tags_traveller=parsed_content.find_all("td", class_="review-rating-header type_of_traveller")
    td_tags_date=parsed_content.find_all("td", class_="review-rating-header date_flown")
    td_text_context=parsed_content.find_all("div", {"class": "text_content"})
    td_tags_recommend = parsed_content.find_all("td", class_="review-rating-header recommended")
    td_tags_comfort = parsed_content.find("td", class_="review-rating-header", text="Seat Comfort")

    for index, td_element in enumerate(td_tags_cabin_flown):
        # Handle td_tags_cabin_flown
        seat_type_td_tag = td_element.find_next_sibling('td', class_='review-value')
        seat_type = seat_type_td_tag.text.strip()
        df.at[index, "Seat Type"] = seat_type

    for index, td_element in enumerate(td_tags_route):
        # Handle td_tags_route
        route_td_tag = td_element.find_next_sibling("td", class_="review-value")
        route = route_td_tag.text.strip()
        df.at[index, "Route"] = route

    for index, td_element in enumerate(td_tags_traveller):
        traveller_td_tag = td_element.find_next_sibling("td", class_="review-value")
        traveller_type = traveller_td_tag.text.strip()
        df.at[index, "Type of Traveller"] = traveller_type

    for index, td_element in enumerate(td_tags_date):
        date_td_tag = td_element.find_next_sibling("td", class_="review-value")
        date_flown = date_td_tag.text.strip()
        df.at[index, "Date flown"] = date_flown
            
    for index, td_elem in enumerate (td_tags_recommend):
        recommendation_val_elem = td_elem.find_next_sibling("td", class_="review-value")
        if "rating-no" in recommendation_val_elem["class"]:
            df.at[index, "Reccomend"]= 0
        else:
            df.at[index, "Reccomend"]= 1
            
    for index, td_comfort_rat in enumerate(td_tags_comfort):
        c = 0
        star_elem = td_comfort_rat.find_next_sibling("td", class_="review-rating-stars")
        for cl in star_elem.find_all("span", class_="star fill"):
                c += 1
        df.at[index, "Comfort rating"] = c

    
    for index, text in enumerate(td_text_context):
        df.at[index, "Comment"] = text.get_text()
            

    

Scraping page 1


  td_tags_comfort = parsed_content.find("td", class_="review-rating-header", text="Seat Comfort")


Scraping page 2
Scraping page 3
Scraping page 4
Scraping page 5
Scraping page 6
Scraping page 7
Scraping page 8
Scraping page 9
Scraping page 10
Scraping page 11
Scraping page 12
Scraping page 13
Scraping page 14
Scraping page 15


In [41]:
#put inside a csv format to make it easier to read
df.to_csv('scrape_airtrax.csv', index=False)

In [39]:
print(df)

         Seat Type                            Route Type of Traveller  \
0    Economy Class         Dubai to London Heathrow      Solo Leisure   
1    Economy Class  New York to Budapest via London      Solo Leisure   
2    Economy Class  Tokyo Narita to London Heathrow      Solo Leisure   
3    Economy Class       Hamburg to London Heathrow    Couple Leisure   
4    Economy Class                   London to Pisa    Family Leisure   
..             ...                              ...               ...   
95   Economy Class           Grand Cayman to London      Solo Leisure   
96   Economy Class       Sydney to Paris via London    Couple Leisure   
97   Economy Class       Larnaca to London Heathrow    Couple Leisure   
98  Business Class               Funchal to Gatwick    Couple Leisure   
99   Economy Class               Orlando to Gatwick    Family Leisure   

        Date flown  Reccomend  Comfort rating  \
0    December 2017        0.0             0.0   
1    December 2017       