# British Airways customer reviews analysis: data scraping
## Dr José M . Albornoz
### February 2024

The objective of this notebook is to scrape British Airways customer reviews data from [Skytrax](https://www.airlinequality.com/).

# 0.- Imports

In [1]:
import requests
from bs4 import BeautifulSoup
from unicodedata import normalize

# 1.- Reviews retrieval

In [2]:
url1 = 'https://www.airlinequality.com/airline-reviews/british-airways/page/'

In [3]:
# number of pages
numpages = 377

In [4]:
for page in range(1, numpages):
    url2 = str(page) + '/'
    url = url1 + url2
    
    # retrieve page
    result = requests.get(url)
    content = result.text
    soup = BeautifulSoup(content, 'lxml')
    raw_review = soup.find_all("div", class_="text_content")
    
    # extract text
    review = [r.get_text() for r in raw_review]
    
    # remove non-ascii characters
    review = [str(normalize('NFD', rev).encode('ascii','ignore')) for rev in review]
    
    if page == 1:
        reviews = review
    else:
        reviews = reviews + review

In [5]:
len(reviews)

3752

# 2.- Save reviews to disk

In [6]:
with open('data/reviews.txt', 'w') as f:
    for line in reviews:
        f.write(f"{line}\n")