A cli tool that helps scraping process based on hidden apis.
We all know how is to go inside the network tab in developer tools and spend some useful time just looking for some api that could work for getting some data.
Api Hunter CLI is a command line tool that automates this process for you, you just need to give the url from the page you are trying to scrape and in a few seconds you would get the possible hidden APIS and their responses.
First you need to install the package by running:
pip install api-hunter-cli
Then you must install chromium in order for the CLI tool to use it, api-hunter-cli get your back by running:
api-hunter-cli-post-install
If everything went ok you should see a message saying 'Chromium installation for Playwright is successful.'
If you are in a simple bash or cmd you can run:
apihunter <page-url-to-analyze>
But if you are running from a notebook like a Jupyter Notebook (Local, Google Colab, Kaggle, Databricks), I recommend you running this:
!apihunter <page-url-to-analyze> --no-style --verbose-response
Because some notebooks don't support the formatting that you could get inside a common terminal.
If the tool find any possible hidden API (an api that returns a JSON), you would see some files that were created in your working directory:
- results_responses.json file contains all the responses from the apis, ready for you to analyze them.
- results_urls.txt file contains all the urls that when requested responded with a JSON.
- Find apis with several http methods (currently the tool only support GET)
- Find apis with responses that are not only json (maybe you could have apis that returns XML)
- Filtering apis based on a keyword, so you only would get apis that have the keyword on the url.