## BACKGROUND
 
British Airways is one of the most popular airlines worldwide, particularly in European countries. Skytrax, a website dedicated to reviewing and rating airlines which serves as the source of data for this project. The focus of the analysis for this project is on British Airways' ratings.

## DATA

This is the data source page where the dataset can be accessed: [British Airline Reviews](https://www.airlinequality.com/airline-reviews/british-airways)

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

In [2]:
url = "https://www.airlinequality.com/airline-reviews/british-airways/?sortby=post_date%3ADesc&pagesize=100"

In [3]:
page = requests.get(url)

In [4]:
soup = BeautifulSoup(page.text, "html")

In [5]:
Review_Header = soup.find_all("h2", class_="text_header")

#To extract just the text from the review_header
Clean_Review_Header = [header.get_text().strip() for header in Review_Header]


In [13]:
df = pd.DataFrame(Clean_Review_Header)
print(df)

                                        0
0             "I will never fly BA again"
1            "I was pleasantly surprised"
2    "flight attendants were outstanding"
3        "Food and service was the worst"
4      “flying with BA was disappointing”
..                                    ...
95  “premium price for a sub-par product”
96        "can't even choose my own seat"
97               “Very impressed with BA”
98           "appalling customer service"
99   "baggage customer service is a joke"

[100 rows x 1 columns]


In [14]:
df.to_csv(r'C:\Users\KEHINDE FAITH A\Downloads\BA_Practice\BA_Review_Header.csv', index=False)

In [6]:
Review_Sub_Header = soup.find_all("h3", class_="text_sub_header userStatusWrapper")

#Extract the desired info
text_Sub_Header = []
for header in Review_Sub_Header:
    # Extract name
    name = header.find("span", itemprop="name").get_text().strip()
    
    # Extract country from the text by splitting after the closing tag of <span>
    country = header.get_text().split(')')[0].split('(')[-1].strip()
    
    # Extract datetime and datePublished
    time_tag = header.find("time", itemprop="datePublished")
    datetime = time_tag['datetime']
    date_published = time_tag.get_text().strip()

    # Append the extracted information to the data list
    text_Sub_Header.append({"name": name, "country": country, "datetime": datetime, "date_published": date_published})
    


In [23]:
df1 = pd.DataFrame(text_Sub_Header)
print(df1)

              name         country    datetime      date_published
0     Scott Annett   United States  2024-11-18  18th November 2024
1           R Lane  United Kingdom  2024-11-17  17th November 2024
2       N Christie  United Kingdom  2024-11-15  15th November 2024
3            L Tee       Australia  2024-11-13  13th November 2024
4   Hlynur Jónsson         Iceland  2024-11-08   8th November 2024
..             ...             ...         ...                 ...
95     L Tomlinson  United Kingdom  2024-05-26       26th May 2024
96         G Layne   United States  2024-05-20       20th May 2024
97      H Harrison  United Kingdom  2024-05-18       18th May 2024
98        E Burton  United Kingdom  2024-05-14       14th May 2024
99   Loretta Ahmad          Canada  2024-05-08        8th May 2024

[100 rows x 4 columns]


In [26]:
df1.to_csv(r'C:\Users\KEHINDE FAITH A\Downloads\BA_Practice\BA_Sub_Header.csv', index=False)

In [7]:
Review_Content = soup.find_all("div", class_="text_content")
Cleaned_Review_Content = [f"{index}. {header.get_text().strip()}" for index, header in enumerate(Review_Content, start=1)]

In [30]:
#Save in dataframe
df2 = pd.DataFrame(Cleaned_Review_Content)

In [31]:
# Download and Save the file in a csv format"
df2.to_csv(r'C:\Users\KEHINDE FAITH A\Downloads\BA_Practice\BA_Review_Content.csv', index=False)

In [8]:
# extract individual ratings

ratings = []
for table in soup.find_all('table', class_='review-ratings'):
    rating = {}
    for row in table.find_all('tr'):
        header = row.find('td', class_='review-rating-header')
        value = row.find('td', class_='review-value')
        stars = row.find('td', class_='review-rating-stars')
        if header and value:
            rating[header.get_text(strip=True)] = value.get_text(strip=True)
        elif header and stars:
            rating_value = len(stars.find_all('span', class_='fill'))
            rating[header.get_text(strip=True)] = rating_value
    ratings.append(rating)


In [36]:
df3 = pd.DataFrame(ratings)
print(df3)

     Food & Beverages  Inflight Entertainment  Seat Comfort  Staff Service  \
0                 3.0                     3.0           3.0            3.0   
1                 2.0                     3.0           2.0            NaN   
2                 NaN                     NaN           3.0            NaN   
3                 5.0                     4.0           5.0            NaN   
4                 1.0                     1.0           1.0            NaN   
..                ...                     ...           ...            ...   
96                2.0                     1.0           2.0            NaN   
97                NaN                     NaN           NaN            NaN   
98                5.0                     5.0           4.0            NaN   
99                4.0                     4.0           2.0            NaN   
100               1.0                     2.0           2.0            NaN   

     Value for Money Type Of Traveller       Seat Type  \
0    

In [37]:
df3.to_csv(r'C:\Users\KEHINDE FAITH A\Downloads\BA_Practice\BA_Ratings.csv', index=False)

## The End  
Thank you for reviewing the code. I hope it was clear and easy to understand.