GitHub - jjaves/api-threader: Call APIs using multiple threads

Threaded API Miner

A multithreaded tool for mining data from APIs with configurable endpoints.

Features

Multithreaded API requests
Configurable endpoints
Retry logic with rotating proxies
Progress tracking
Data storage to disk or database

Setup

1. Install Dependencies

Install the required Python dependencies:

pip install -r requirements.txt

2. Configure Environment Variables

Create a .env file in the root directory and configure the required variables:

SCRAPOXY_USER=<your_scrapoxy_user>
SCRAPOXY_TOKEN=<your_scrapoxy_token>
SCRAPOXY_PORT=<your_scrapoxy_port>
SCRAPOXY_URL=<your_scrapoxy_url>
SCRAPOXY_CRT=<optional_certificate_path>
DEFAULT_ENDPOINT=locations
DUCKDB_TOKEN=<your_duckdb_token>

3. Install and Set Up Scrapoxy

Scrapoxy is recommended for proxy management. Follow these steps to install and configure Scrapoxy: For latest instructions visit their docs.

Install Docker if it is not already installed:
```
brew install --cask docker
```
Pull the Scrapoxy Docker image:
```
docker pull scrapoxy/scrapoxy
```

Run the Scrapoxy container:

docker run -d -p 8888:8888 -p 8890:8890 -v ./scrapoxy:/cfg -e AUTH_LOCAL_USERNAME=admin -e AUTH_LOCAL_PASSWORD=password -e BACKEND_JWT_SECRET=secret1 -e FRONTEND_JWT_SECRET=secret2 -e STORAGE_FILE_FILENAME=/cfg/scrapoxy.json scrapoxy/scrapoxy

Access the Scrapoxy dashboard: Open your browser and navigate to http://localhost:8888. Use your Scrapoxy credentials to log in.
Configure your cloud provider using Scrapoxy's docs.

4. Run the Tool

Run the tool with the desired endpoint:

python main.py --endpoint locations

Project Structure

example_client/: API client logic and endpoint configurations.
utils/: Utility modules for logging, retries, and workers.
main.py: Entry point for the tool.

Notes

Ensure that Scrapoxy is running and properly configured before starting the tool.
You can customize the number of threads, batch size, and maximum records by modifying the global variables in main.py.

Troubleshooting

If you encounter issues with proxies, verify that Scrapoxy is running and the .env file is correctly configured.
For database-related issues, ensure that the DuckDB connection string is valid and the database file is accessible.

To-Do List

Implement more robust error handling and retries for database write operations.
Add unit tests for core components (e.g., client functions, parsers, worker logic).
Add configuration to allow easier selection of different data storage backends (e.g., PostgreSQL, local files only).
Improve documentation on how to add and configure new API endpoints.
Add support for different output formats
Implement a way for resuming interrupted jobs, stateful.
Improve logging traceability of individual requests.
Explore options for dynamic scaling of worker threads based on workload.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
example_client		example_client
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Threaded API Miner

Features

Setup

1. Install Dependencies

2. Configure Environment Variables

3. Install and Set Up Scrapoxy

4. Run the Tool

Project Structure

Notes

Troubleshooting

To-Do List

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

jjaves/api-threader

Folders and files

Latest commit

History

Repository files navigation

Threaded API Miner

Features

Setup

1. Install Dependencies

2. Configure Environment Variables

3. Install and Set Up Scrapoxy

4. Run the Tool

Project Structure

Notes

Troubleshooting

To-Do List

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages