Skip to content

lynseybwisa/british-airways-sentiment-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

british-airways-sentiment-analysis

Overview

This project focuses on scraping customer reviews for British Airways, performing Natural Language Processing (NLP) tasks, and analyzing sentiments. The goal is to gain insights into customer feedback and provide valuable information for business decisions.

Features

  • Web scraping of British Airways reviews from Airline Quality.
  • Data cleaning and preprocessing
  • Topic modeling to identify common themes in customer reviews.
  • Sentiment analysis to categorize reviews as positive, negative, or neutral.
  • Exploratory Data Analysis (EDA) to visualize key metrics and distributions.
  • Visualizations for data exploration

Step 1: Set Up Your Environment

  • Install Necessary Libraries:
  • Install the required Python libraries for web scraping, data analysis, and visualization. Common libraries include BeautifulSoup, requests, pandas, nltk, matplotlib, and seaborn.

Step 2: Web Scraping

  • Understand the Website Structure
  • Explore the structure of the website (https://www.airlinequality.com/) to identify the HTML elements containing the reviews.
  • Utilize Python, along with libraries like BeautifulSoup and requests, to scrape review data from the website. Save the data in a structured format, such as a CSV file.

Step 3: Data Cleaning

  • Load the scraped data into a Pandas DataFrame.
  • Perform text data cleaning, including removing irrelevant characters, handling missing values, and converting text to lowercase.

Step 4: Data Analysis

  • Conduct exploratory data analysis to understand the distribution of reviews, the most common words, and other relevant statistics.
  • Text Preprocessing: Tokenize the text, remove stop words, and perform other text preprocessing steps to prepare the data for analysis.
  • Topic Modeling: Apply topic modeling techniques (e.g., Latent Dirichlet Allocation - LDA) to identify common themes in the reviews.
  • Sentiment Analysis: Use sentiment analysis to categorize reviews as positive, negative, or neutral. This can be done using pre-trained models or libraries like VADER.
  • Word Clouds: Generate word clouds to visually represent the most frequent words in the reviews.

Step 5: Visualizations and Metrics

  • Create Visualizations: Use Matplotlib or Seaborn to create visualizations such as bar charts, pie charts, and word clouds to represent key insights.