Skip to content

TripAdvisor Hotel Reviews Crawler and Statistical Analysis

Notifications You must be signed in to change notification settings

rvitorgomes/tripadvisor-crawler

Repository files navigation


TripAdvisor Scraper

Scrap milions of hotel reviews.

AboutContentGetting StartedHow to useLicense

About

This project is part of an academic monography developed to collect a corpus to analyse statistical distribuitions of diacritics errors in european languages with high accent frequency and its comparison with brazilian portuguese.

Content

  • Scrapers written in Python3

    • Italian
    • Turkish
  • Scrapers written in NodeJS

    • Hungarian
    • French

Getting Started

To clone and run this application, you'll need:

How To Use

From your command line:

# Clone this repository
$ git clone https://github.com/rvitorgomes/textCrawler tripadvisor-crawler

# Go into the repository
$ cd tripadvisor-crawler

# Check for dependencies
$ conda --version; python --version; node --version; npm --version

# Install dependencies
$ npm install

# Change directory
$ cd tcc

# Create and activate a new conda environment
$ conda create -n crawler; activate crawler

# Install scrapy
$ conda install scrapy

# Run some crawler and watch out the magic
$ scrapy runspider tcc/italian.py

Common Errors

Check if you have the latest WebDriver for Firefox (geckodriver.exe) inside the project root, otherwise you can download from https://github.com/mozilla/geckodriver/releases

License

This project is licensed under Unlicense license.
Not for commercial usage.
For academic usage/citation ask me for instructions.


GitHub @rvitorgomes
Linkedin Rubens Gomes
Email rvitorgomes@gmail.com