Skip to content

ma3u/BookAwardsAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Book Awards Agent

A comprehensive tool for discovering, tracking, and managing book awards. The application automatically searches for book awards, extracts detailed information, and updates an Airtable base. It combines Python for data processing and Node.js for the web interface.

Table of Contents

Key Features

  • Automated Discovery: Find book awards from various sources
  • Data Extraction: Extract comprehensive award details including deadlines, eligibility, and submission guidelines
  • Airtable Integration: Keep your awards database up-to-date automatically
  • User-friendly Interface: Easy-to-use web interface for managing awards
  • Customizable Search: Tailor your search with specific criteria

Prerequisites

  • Python 3.8 or higher
  • Node.js 14.x or higher
  • An Airtable account with API access
  • Git

Installation

  1. Clone the repository:

    git clone https://github.com/Ma3u/BookAwardsAgent.git
    cd BookAwardsAgent
  2. Set up Python environment:

    # Create and activate virtual environment
    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
    # Install Python dependencies
    cd backend/python
    pip install -r requirements.txt
  3. Set up Node.js environment:

    cd ../../backend/js
    npm install
  4. Configure environment variables:

    cp ../../.env.example ../../.env
    # Edit ../../.env with your Airtable credentials

Configuration

Environment Variables

Create a .env file in the project root with these variables:

# Airtable Configuration
AIRTABLE_API_KEY=your_api_key_here
AIRTABLE_BASE_ID=your_base_id_here
AIRTABLE_TABLE_NAME=Book Awards

For Users

Getting Started

  1. Installation: Follow the installation steps in the Prerequisites and Installation sections
  2. Configuration: Set up your .env file with Airtable credentials
  3. Run the Application: Use the Python backend to start collecting award data

Python Backend Usage

Input File Formats

The Python backend supports two types of input files via the --input-file argument:

1. URL List (Plain Text)

  • Use with: Standard processing (python -m src.main --input-file input_template.txt)
  • Format: Each line contains a single award URL. Blank lines and lines starting with # are ignored.
  • Template: See input_template.txt
  • Status Tracking:
    • Each URL line ends with a status comment: # pending, # completed, or # failed.
    • The workflow automatically updates the status after each attempt:
      • # completed if processed successfully
      • # failed if extraction or update fails
      • # pending for unprocessed URLs
    • The file is updated in-place, so you can monitor progress or resume processing at any time.
  • Usage Example:
    # From the backend/python directory:
    python -m src.main --input-file input_template.txt
  • Example Entry:
    https://www.bookerprizes.com/  # pending
    

2. Award Data (JSON)

  • Use with: Update-only mode (python -m src.main --update-only --input-file book_awards_data.json)
  • Format: A JSON file containing a list of award data dictionaries. Each dictionary should match the expected Airtable schema.
  • Usage Example:
    # From the backend/python directory:
    python -m src.main --update-only --input-file ../book_awards_data.json
  • Example:
    [
      {
        "Award Name": "Example Award",
        "Award Website": "https://example.com/award1",
        "Category": "Fiction",
        "Deadline": "2025-06-30",
        "Eligibility": "Open to all authors"
        // ...other fields as required
      },
      {
        "Award Name": "Second Award",
        "Award Website": "https://example.com/award2"
        // ...
      }
    ]

Choosing the Input File Format

  • Use the plain text URL list for discovering and extracting new awards.
  • Use the JSON format for updating Airtable from pre-extracted or manually curated data.

Running with an Input File

  • Process URLs:

    python -m src.main --input-file backend/input_template.txt
  • Update from Data File:

    python -m src.main --update-only --input-file ../book_awards_data.json

See the template in input_template.txt for the plain text format.

The Python backend provides the core functionality for searching and processing book awards:

# Basic usage
python -m src

# Search for awards without updating Airtable
python -m src --search-only

# Process specific URLs from a file
python -m src --input-file urls.txt

# Update Airtable from existing data file
python -m src --update-only --input-file book_awards_data.json

Node.js Backend

The Node.js backend provides a web interface for managing awards:

# Test configuration
cd backend/js
node test-config.js

# Start the web server (if implemented)
npm start

For Developers

Project Structure

BookAwardsAgent/
├── backend/
│   ├── js/                 # Node.js backend
│   │   ├── config.js       # Configuration loader
│   │   ├── test-config.js  # Configuration test
│   │   └── package.json    # Node.js dependencies
│   └── python/             # Python backend
│       ├── src/            # Python source code
│       │   ├── config.py   # Configuration settings
│       │   ├── websearch.py
│       │   ├── extractor.py
│       │   ├── airtable_updater.py
│       │   └── main.py
│       └── requirements.txt # Python dependencies
├── .env.example           # Example environment variables
├── .gitignore            # Git ignore file
└── README.md             # This documentation

Core Components

1. Web Search Module (websearch.py)

  • Purpose: Searches for book awards using DuckDuckGo API
  • Key Features:
    • Performs web searches using predefined queries
    • Filters results to find relevant book awards
    • Handles rate limiting and error cases
    • Removes duplicate results

2. Data Extractor (extractor.py)

  • Purpose: Extracts detailed information from award websites
  • Key Features:
    • Parses HTML content to extract award details
    • Handles various website structures
    • Extracts contact information, deadlines, and submission guidelines
    • Normalizes data for consistent storage

3. Airtable Updater (airtable_updater.py)

  • Purpose: Manages interaction with Airtable
  • Key Features:
    • Creates, updates, and deletes records in Airtable
    • Handles batch operations for better performance
    • Implements error handling and retry logic
    • Tracks changes and updates

Development Setup

Python Environment

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install development dependencies
pip install -r requirements.txt

# Install in development mode
pip install -e .

# Run tests
pytest

Node.js Environment

# Install dependencies
cd backend/js
npm install

# Run tests
npm test

Extending the Application

  1. Adding New Data Sources

    • Modify websearch.py to include new search queries
    • Update extractor.py to handle new website structures
  2. Customizing Data Storage

    • Modify airtable_updater.py to work with different database systems
    • Implement new data export formats as needed
  3. Enhancing the Web Interface

    • Extend the Node.js backend with new API endpoints
    • Update the frontend to display additional data

Deployment

Python

  1. Install production dependencies:

    pip install -r requirements.txt

Node.js

  1. Install production dependencies:

    cd backend/js
    npm install --production

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Copyright [2024] [Ma3u]

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Contributing

We welcome contributions! Here's how to get started:

  1. Fork the repository

  2. Clone your fork:

    git clone https://github.com/yourusername/BookAwardsAgent.git
  3. Create a feature branch:

    git checkout -b feature/your-feature
  4. Commit your changes:

    git commit -m "Add your feature"
  5. Push to the branch:

    git push origin feature/your-feature
  6. Open a Pull Request

Development Guidelines

  • Follow PEP 8 style guide for Python code
  • Write clear, concise commit messages
  • Add tests for new features
  • Update documentation as needed
  • Keep the codebase clean and well-documented

Examples

Basic Usage

Search for awards and update Airtable:

python -m src

Advanced Scenarios

Search without updating Airtable (useful for testing):

python -m src --search-only

Process specific award URLs from a file:

python -m src --input-file urls.txt

Update Airtable from an existing data file:

python -m src --update-only --input-file book_awards_data.json

Environment Configuration

You can configure the application using environment variables instead of the config file:

# Required Airtable configuration
export AIRTABLE_API_KEY="your_api_key_here"
export AIRTABLE_BASE_ID="your_base_id_here"

# Optional configuration
export AIRTABLE_TABLE_NAME="Book Awards"
export MAX_SEARCH_RESULTS=10
export REQUEST_DELAY=2

Running in Production

For production use, consider:

  1. Setting up a cron job or scheduled task
  2. Implementing proper logging and monitoring
  3. Setting up error notifications
  4. Using environment variables for sensitive data

Data Model

Airtable Schema

Below is the recommended Airtable schema for the Book Award table. Use these field types and options when creating your Airtable base for best compatibility with the Book Awards Agent.

Field Name Field Type Suggested Options / Description
Award Name Single line text
Category Single select e.g., Fiction, Non-fiction, Poetry, Children’s, etc.
Entry Deadline Date
Eligibility Criteria Long text
Application Procedures Long text
Award Website URL
Prize Amount Single line text e.g., "$1000", "€500"
Application Fee Single line text e.g., "$75", "€50"
Award Status Single select Open, Closed, Upcoming
Award Logo Attachment Image upload
Awarding Organization Single line text
Contact Person Single line text
Contact Email Email
Contact Phone Phone number
Physical Address Long text
Past Winners URL URL
Extra Benefits Long text
In-Person Celebration Checkbox
Number of Categories Number Integer
Geographic Restrictions Single line text
Alli Rating Number Integer (1–5 or as appropriate)
Accepted Formats Multiple select e.g., Print, eBook, Audiobook
ISBN Required Checkbox
Accepts Series Checkbox
Accepts Anthologies Checkbox
Accepts Debut Authors Checkbox
Evaluates Covers Checkbox
Evaluates Illustrations Checkbox
Evaluates Interior Design Checkbox
Secondary Website URL
Judging Criteria Long text
Listed in Lead Magnet Checkbox
Described in Drip Campaign Checkbox

Tips:

  • For select fields, define the allowed options in Airtable.
  • For checkboxes, use them for all Yes/No type fields.
  • Attachments are best for logos.

Note: If any data cannot be written to Airtable due to permissions or schema mismatches, the failed request (and the SQL schema for the attempted fields) will be logged in failed_airtable_requests.sql in the project root. See Troubleshooting for details, including how to re-run or manipulate these statements in SQL.

The application collects and manages the following information for each book award:

Core Information

  • Award Name: Official name of the award
  • Category: Type of award (Fiction, Non-fiction, etc.)
  • Award Status: Current status (Open, Closed, Upcoming)
  • Award Website: Official URL
  • Vetting Notes: Data provenance or vetting status. By default, all records imported by this agent are marked as imported by Web Scraper to distinguish them from manually entered or externally sourced records.
  • Awarding Organization: Organization that presents the award

Submission Details

  • Entry Deadline: Application deadline
  • Eligibility Criteria: Who can apply
  • Application Procedures: How to apply
  • Application Fee: Cost to enter (if any)
  • Accepted Formats: What formats are accepted
  • ISBN Required: Whether an ISBN is needed

Prizes & Recognition

  • Prize Amount: Monetary value (if any)
  • Extra Benefits: Additional benefits for winners
  • In-Person Celebration: Details about award ceremonies
  • Past Winners URL: Link to previous winners

Contact Information

  • Contact Person: Primary contact
  • Email: Contact email
  • Phone: Contact phone number
  • Physical Address: Organization's address

Technical Metadata

  • Data Source: Where the information was obtained
  • Last Updated: When the information was last verified
  • Data Quality: Confidence score for the collected data
  • In-Person Celebration: Whether an in-person event is part of the award
  • Number of Categories
  • Geographic Restrictions
  • Accepted Formats
  • ISBN Required
  • Judging Criteria
  • And many more fields matching the provided CSV structure

Testing and Validation

Run the validation script to test the agent's functionality:

python tests/test_validation.py

This will:

  1. Test the websearch functionality
  2. Test data extraction with sample URLs
  3. Validate data completeness
  4. Test Airtable integration (if credentials are provided)

Customization

You can customize the agent's behavior by modifying the following files:

  • config.py: Adjust search queries, field definitions, and other settings
  • websearch.py: Modify search behavior and result filtering
  • extractor.py: Enhance data extraction patterns for specific fields
  • airtable_updater.py: Customize Airtable integration logic

Troubleshooting

Failed Airtable Requests & SQL Logging

If an Airtable insert or update fails (e.g., due to permissions or schema issues), the Book Awards Agent will log the failed request as a SQL statement in:

failed_airtable_requests.sql

This file is always written to the project root directory (e.g. /Users/ma3u/projects/BookAwardsAgent/failed_airtable_requests.sql).

  • Schema: The file begins with a CREATE TABLE IF NOT EXISTS statement matching the fields attempted in the failed request.
  • Failed Requests: Each failed request is appended as an INSERT INTO ... statement, with all attempted field values and the error message.
  • Log Info: Every time a failed request is logged, an info message with the absolute file path is written to the main log output.

Example SQL Output

-- SQL schema for failed Airtable requests
CREATE TABLE IF NOT EXISTS book_awards_failed (
    `Award Name` TEXT,
    `Award Website` TEXT,
    `Category` TEXT,
    `Prize Amount` FLOAT,
    error TEXT
);

-- CREATE failed
INSERT INTO book_awards_failed (`Award Name`, `Award Website`, `Category`, `Prize Amount`, error)
VALUES ('National Book Award', 'https://www.nationalbook.org', 'Fiction', 5000, 'Insufficient permissions to create new select option "Incomplete"');

How to Re-run or Manipulate Failed SQL

  1. Copy the schema and failed INSERT statements from failed_airtable_requests.sql.
  2. Paste them into your SQL environment (e.g., Airflow SQL Explorer, SQLite, or any compatible SQL database).
  3. Run the statements to recreate the failed records for further analysis, troubleshooting, or data migration.
Data Manipulation Example

You can use standard SQL to update, filter, or export the failed data. For example:

-- Find all failures due to select option issues
SELECT * FROM book_awards_failed WHERE error LIKE '%select option%';

-- Export all fields except the error column
SELECT `Award Name`, `Award Website`, `Category`, `Prize Amount` FROM book_awards_failed;
Using Data in Airtable
  • Airtable itself does not support direct SQL imports, but you can:
    • Export the data from your SQL environment as CSV.
    • Import the CSV into Airtable using the "CSV Import" app or manual upload.
    • Use the error column to filter or clean data before re-import.

Tip: Use your SQL environment to fix or enrich the data before re-attempting the upload to Airtable.


Common Issues

  1. No Search Results

    • Verify internet connection
    • Check if DuckDuckGo API is accessible
    • Review search queries in config.py
  2. Airtable Connection Issues

    • Verify API key and base ID
    • Check network connectivity
    • Ensure the API key has proper permissions
  3. Data Extraction Failures

    • Check if the website structure has changed
    • Review error logs for specific issues
    • Update the extractor for the new website structure

Getting Help

For additional support:

  1. Check the GitHub Issues page
  2. Review the project's Wiki
  3. Open a new issue with detailed error information
  • Agent fails to find book awards: Try modifying the search queries in config.py
  • Incomplete data extraction: Check the extraction patterns in extractor.py
  • Module not found errors: Ensure you've activated the virtual environment and installed dependencies
  • Airtable connection issues: Verify your API key and base ID in the .env file
  • For Airtable integration issues, verify your API credentials and table structure

Limitations

  • The agent relies on website structure for data extraction, which may change over time
  • Some fields may require manual verification for complete accuracy
  • Rate limiting may affect the number of awards that can be processed in a single run

About

An agent that searches for book awards, extracts their information, and updates an Airtable base.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors