Skip to content

ndlryan/API-Data-with-Postgres

Repository files navigation

Tiki API Data to PostgreSQL

Python License Data Source Database Process Manager


Overview

This repository provides a Python-based script (main.py) that fetches product data from the Tiki API and saves it into a PostgreSQL database. Designed for stability, resumability, and scalability, capable of processing hundreds of thousands of API calls while handling errors gracefully.


Features

  • 🔄 Reads product IDs from a CSV input file
  • 🚀 Fetches live product data from Tiki API endpoints
  • 🧾 Saves results directly to PostgreSQL
  • ⚠️ Logs all failed requests and exceptions
  • ♻️ Supports resuming partially completed runs
  • ⚙️ Compatible with Supervisord for continuous background operation

Project Structure

File Description
main.py 🐍 Main Python script to fetch API data
product_id.csv 📄 Input list of product IDs
database.ini 🗄️ PostgreSQL configuration file
requirements.txt 📦 Python dependencies
supervisord.conf 🛠️ Supervisor configuration for background execution

Setup

Before running the script, you need to prepare your environment and database:

  1. Install PostgreSQL Make sure you have a PostgreSQL server running and can create a database for this project.

  2. Create a database Example:

CREATE DATABASE tiki_api_data;
CREATE USER your_user WITH PASSWORD 'your_password';
GRANT ALL PRIVILEGES ON DATABASE tiki_api_data TO your_user;
  1. Database configuration file
[postgresql_tiki]
host=localhost
port=5432
database=tiki_api_data
user=your_user
password=your_password
  1. Optional helper script (connect.py)
  • You don’t strictly need a separate connect.py if your main script reads database.ini and opens a connection internally.

  • If you prefer modularity, you can create a connect.py file:

import psycopg2
from configparser import ConfigParser

def connect(section='postgresql_tiki'):
    parser = ConfigParser()
    parser.read('database.ini')
    db = parser[section]
    conn = psycopg2.connect(**db)
    return conn

Then main.py can import connect() from this file.


Installation

Clone the repository:

git clone https://github.com/ndlryan/API-Data-with-Postgres.git
cd API-Data-with-Postgres

Install dependencies:

pip install -r requirements.txt

Running the Crawler

Run directly from terminal:

python main.py

This will:

  1. Load product IDs from product_id.csv
  2. Fetch product details from Tiki API
  3. Save results into PostgreSQL
  4. Record any failed requests or exceptions in logs

Process Management with Supervisord

For long-running or auto-restarting crawls, you can manage the crawler with Supervisord.

1. Install Supervisor

pip install supervisor

2. Create Configuration File

[unix_http_server]
file=/tmp/supervisor.sock

[supervisord]
logfile=supervisord.log
pidfile=/tmp/supervisord.pid
childlogdir=./logs

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[supervisorctl]
serverurl=unix:///tmp/supervisor.sock

[program:api_data]
command=python3 /path/to/API-Data-with-Postgres/main.py
directory=/path/to/API-Data-with-Postgres
autostart=true
autorestart=true
stderr_logfile=./logs/api_data.err.log
stdout_logfile=./logs/api_data.out.log
  • 🔧 Replace /path/to/API-Data-Crawling with your actual project path.

3. Start and Monitor

supervisord -c supervisord.conf
supervisorctl -c supervisord.conf status

Restart or stop the crawler anytime:

supervisorctl -c supervisord.conf restart api_data
supervisorctl -c supervisord.conf stop api_data

Logs and Outputs

Database: PostgreSQL (records inserted from API)

Error logs: Logs exceptions or 404 errors

Supervisor logs: Stored under ./logs/ when using supervisord.conf


Summary

EST. Runtime: ~1h
Total Processed: 200,000
    - Good Records (Including missing field ones) = 198,942
    - Exceptions (404 - Not found) = 1,058

Notes

Ensure database.ini has correct credentials

Running multiple times does not duplicate records

Use Supervisor to prevent downtime or data loss


Author

Ryan
GitHub Profile

A robust, fault-tolerant Tiki API data loader — lightweight, automated, and production-ready.

About

Python + PostgreSQL pipeline to crawl 200K data via TIKI API

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published