Prerequisities:
This scraper was created from a Crawlee template TypeScript
+ CheerioCrawler
using a command:
npx crawlee create my crawler
Install dependencies:
npm ci
npm start
This scraper was created from an Apify template TypeScript
+ CheerioCrawler
using a command:
apify create tripadvisor actor
Install dependencies:
npm ci
Initialize Apify storage
directory:
apify init
An empty input file was generated by the apify init
command. It should be located in storage/key_value_stores/default/INPUT.json
. Fill in a JSON object in the following format:
{
"startUrls": [
"https://www.tripadvisor.com/Attractions-g274707-Activities-oa0-Prague_Bohemia.html",
"https://www.tripadvisor.com/Attractions-g274707-Activities-oa30-Prague_Bohemia.html"
]
}
Provide at least 1 URL of an attraction listing page at Tripadvisor, such as Prague Attractions from the example above.
Results will be stored into storage/datasets/default
directory. Each dataset item will have its own JSON file.
If you omit the -p
(--purge
) flag, a storage won't be cleared before starting your next run. If you already processed some requests in the earlier run, those requests will be considered completed.
apify run -p
You can deploy the actor to your Apify account with the following command:
apify push
Alternatively, you can provide a link to a GitHub / GitLab repository and build the project on the platform. The up-to-date code will be fetched from a remote repository.