The Airport Scraper API is a Python-based web scraping tool designed to extract flight data from multiple airline websites. This tool provides structured information on flight schedules, prices, and other relevant details. It is optimized for scalability and performance, ensuring efficient data extraction for real-time applications.
- Scrapes flight data from multiple airline websites.
- Supports both one-way and round-trip searches.
- Returns results in JSON format for seamless integration.
- Built with modern Python libraries and tools for efficient scraping.
- Deployed using AWS Lambda and Docker for scalability.
- FastAPI: For building a robust and high-performance API.
- BeautifulSoup: For web scraping and data parsing.
- Pandas: For data cleaning and transformation.
- AWS Lambda: For cloud deployment and scalability.
- Docker: For containerizing the application for easy deployment.
- Python 3.10+
- AWS CLI configured with appropriate permissions.
- Docker installed (if deploying with Docker).
-
Clone the repository:
git clone https://github.com/prinzeval/airport-scraper-api.git cd airport-scraper-api -
Install dependencies:
pip install -r requirements.txt
-
Run locally:
uvicorn app:app --reload
-
Access API Documentation: Visit http://127.0.0.1:8000/docs for Swagger UI.
- Method: GET
- Parameters:
Parameter Type Required Description departingstrYes Departure airport code (e.g., LOS). arrivalstrYes Arrival airport code (e.g., ABV). departure_datestrYes Departure date in YYYY-MM-DD format. return_datestrNo Return date in YYYY-MM-DD format (for round-trips). trip_typestrNo Trip type (one-way or round-trip). Default is one-way.
curl -X GET "http://127.0.0.1:8000/scrape?departing=LOS&arrival=ABV&departure_date=2024-12-15&trip_type=one-way"[
{
"airline": "Green Africa",
"flight_number": "G123",
"departure_time": "10:00 AM",
"arrival_time": "12:00 PM",
"price": "25000 NGN"
},
{
"airline": "Ibom Air",
"flight_number": "IA456",
"departure_time": "11:00 AM",
"arrival_time": "1:00 PM",
"price": "27000 NGN"
}
]-
Build and package the application:
docker build -t airport-scraper-api . docker run -v $(pwd):/app airport-scraper-api
-
Deploy using AWS CLI:
aws lambda update-function-code --function-name airport-scraper-api --zip-file fileb://deployment-package.zip
-
Build Docker Image:
docker build -t airport-scraper-api . -
Run Docker Container:
docker run -p 8000:8000 airport-scraper-api
- Add caching mechanisms for faster repeated queries.
- Integrate more airlines for comprehensive data coverage.
- Optimize scraping scripts for dynamic websites using Selenium