This project is a Python application that scrapes news headlines from the BBC website, performs keyword and n-gram analysis, and visualizes the most frequent topics. It showcases skills in web scraping, data processing, natural language processing, and data visualization.
- Web Scraping: Extracts headlines from the BBC News website.
- Keyword Analysis: Identifies the most common words in the headlines.
- N-gram Analysis: Extracts common phrases (ranging from 3 to 5 words) for more context.
- Data Visualization: Displays the results through bar charts for easy comprehension.
- Python
- BeautifulSoup
- NLTK
- Matplotlib
- Seaborn
Ensure you have Python installed on your system. Follow these steps to set up the project:
- Clone the Repository
git clone [Your Repository URL] cd [Your Repository Name]
- Install Required Packages
pip install -r requirements.txt
To run the application, execute the following command in your terminal:
```bash
python main.py
The application will output:
- Top 10 keywords in BBC headlines.
- Top 10 n-grams in BBC headlines.
- Bar charts visualizing the frequency of these keywords and n-grams
Chris Williams