Book Awards Agent

A comprehensive tool for discovering, tracking, and managing book awards. The application automatically searches for book awards, extracts detailed information, and updates an Airtable base. It combines Python for data processing and Node.js for the web interface.

Key Features

Automated Discovery: Find book awards from various sources
Data Extraction: Extract comprehensive award details including deadlines, eligibility, and submission guidelines
Airtable Integration: Keep your awards database up-to-date automatically
User-friendly Interface: Easy-to-use web interface for managing awards
Customizable Search: Tailor your search with specific criteria

Prerequisites

Python 3.8 or higher
Node.js 14.x or higher
An Airtable account with API access
Git

Installation

Clone the repository:

git clone https://github.com/Ma3u/BookAwardsAgent.git
cd BookAwardsAgent

Set up Python environment:

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install Python dependencies
cd backend/python
pip install -r requirements.txt

Set up Node.js environment:
```
cd ../../backend/js
npm install
```

Configure environment variables:

cp ../../.env.example ../../.env
# Edit ../../.env with your Airtable credentials

Configuration

Environment Variables

Create a .env file in the project root with these variables:

# Airtable Configuration
AIRTABLE_API_KEY=your_api_key_here
AIRTABLE_BASE_ID=your_base_id_here
AIRTABLE_TABLE_NAME=Book Awards

For Users

Getting Started

Installation: Follow the installation steps in the Prerequisites and Installation sections
Configuration: Set up your .env file with Airtable credentials
Run the Application: Use the Python backend to start collecting award data

Python Backend Usage

Input File Formats

The Python backend supports two types of input files via the --input-file argument:

1. URL List (Plain Text)

Use with: Standard processing (python -m src.main --input-file input_template.txt)
Format: Each line contains a single award URL. Blank lines and lines starting with # are ignored.
Template: See input_template.txt
Status Tracking:
- Each URL line ends with a status comment: # pending, # completed, or # failed.
- The workflow automatically updates the status after each attempt:
  - # completed if processed successfully
  - # failed if extraction or update fails
  - # pending for unprocessed URLs
- The file is updated in-place, so you can monitor progress or resume processing at any time.

Usage Example:

# From the backend/python directory:
python -m src.main --input-file input_template.txt

Example Entry:

https://www.bookerprizes.com/  # pending

2. Award Data (JSON)

Use with: Update-only mode (python -m src.main --update-only --input-file book_awards_data.json)
Format: A JSON file containing a list of award data dictionaries. Each dictionary should match the expected Airtable schema.

Usage Example:

# From the backend/python directory:
python -m src.main --update-only --input-file ../book_awards_data.json

Example:

[
  {
    "Award Name": "Example Award",
    "Award Website": "https://example.com/award1",
    "Category": "Fiction",
    "Deadline": "2025-06-30",
    "Eligibility": "Open to all authors"
    // ...other fields as required
  },
  {
    "Award Name": "Second Award",
    "Award Website": "https://example.com/award2"
    // ...
  }
]

Choosing the Input File Format

Use the plain text URL list for discovering and extracting new awards.
Use the JSON format for updating Airtable from pre-extracted or manually curated data.

Running with an Input File

Process URLs:

python -m src.main --input-file backend/input_template.txt

Update from Data File:

python -m src.main --update-only --input-file ../book_awards_data.json

See the template in input_template.txt for the plain text format.

The Python backend provides the core functionality for searching and processing book awards:

# Basic usage
python -m src

# Search for awards without updating Airtable
python -m src --search-only

# Process specific URLs from a file
python -m src --input-file urls.txt

# Update Airtable from existing data file
python -m src --update-only --input-file book_awards_data.json

Node.js Backend

The Node.js backend provides a web interface for managing awards:

# Test configuration
cd backend/js
node test-config.js

# Start the web server (if implemented)
npm start

For Developers

Project Structure

BookAwardsAgent/
├── backend/
│   ├── js/                 # Node.js backend
│   │   ├── config.js       # Configuration loader
│   │   ├── test-config.js  # Configuration test
│   │   └── package.json    # Node.js dependencies
│   └── python/             # Python backend
│       ├── src/            # Python source code
│       │   ├── config.py   # Configuration settings
│       │   ├── websearch.py
│       │   ├── extractor.py
│       │   ├── airtable_updater.py
│       │   └── main.py
│       └── requirements.txt # Python dependencies
├── .env.example           # Example environment variables
├── .gitignore            # Git ignore file
└── README.md             # This documentation

Core Components

1. Web Search Module (`websearch.py`)

Purpose: Searches for book awards using DuckDuckGo API
Key Features:
- Performs web searches using predefined queries
- Filters results to find relevant book awards
- Handles rate limiting and error cases
- Removes duplicate results

2. Data Extractor (`extractor.py`)

Purpose: Extracts detailed information from award websites
Key Features:
- Parses HTML content to extract award details
- Handles various website structures
- Extracts contact information, deadlines, and submission guidelines
- Normalizes data for consistent storage

3. Airtable Updater (`airtable_updater.py`)

Purpose: Manages interaction with Airtable
Key Features:
- Creates, updates, and deletes records in Airtable
- Handles batch operations for better performance
- Implements error handling and retry logic
- Tracks changes and updates

Development Setup

Python Environment

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install development dependencies
pip install -r requirements.txt

# Install in development mode
pip install -e .

# Run tests
pytest

Node.js Environment

# Install dependencies
cd backend/js
npm install

# Run tests
npm test

Extending the Application

Adding New Data Sources
- Modify websearch.py to include new search queries
- Update extractor.py to handle new website structures
Customizing Data Storage
- Modify airtable_updater.py to work with different database systems
- Implement new data export formats as needed
Enhancing the Web Interface
- Extend the Node.js backend with new API endpoints
- Update the frontend to display additional data

Deployment

Python

Install production dependencies:
```
pip install -r requirements.txt
```

Node.js

Install production dependencies:
```
cd backend/js
npm install --production
```

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Copyright [2024] [Ma3u]

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Contributing

We welcome contributions! Here's how to get started:

Fork the repository

Clone your fork:

git clone https://github.com/yourusername/BookAwardsAgent.git

Create a feature branch:
```
git checkout -b feature/your-feature
```
Commit your changes:
```
git commit -m "Add your feature"
```
Push to the branch:
```
git push origin feature/your-feature
```
Open a Pull Request

Development Guidelines

Follow PEP 8 style guide for Python code
Write clear, concise commit messages
Add tests for new features
Update documentation as needed
Keep the codebase clean and well-documented

Examples

Basic Usage

Search for awards and update Airtable:

python -m src

Advanced Scenarios

Search without updating Airtable (useful for testing):

python -m src --search-only

Process specific award URLs from a file:

python -m src --input-file urls.txt

Update Airtable from an existing data file:

python -m src --update-only --input-file book_awards_data.json

Environment Configuration

You can configure the application using environment variables instead of the config file:

# Required Airtable configuration
export AIRTABLE_API_KEY="your_api_key_here"
export AIRTABLE_BASE_ID="your_base_id_here"

# Optional configuration
export AIRTABLE_TABLE_NAME="Book Awards"
export MAX_SEARCH_RESULTS=10
export REQUEST_DELAY=2

Running in Production

For production use, consider:

Setting up a cron job or scheduled task
Implementing proper logging and monitoring
Setting up error notifications
Using environment variables for sensitive data

Data Model

Airtable Schema

Below is the recommended Airtable schema for the Book Award table. Use these field types and options when creating your Airtable base for best compatibility with the Book Awards Agent.

Field Name	Field Type	Suggested Options / Description
Award Name	Single line text
Category	Single select	e.g., Fiction, Non-fiction, Poetry, Children’s, etc.
Entry Deadline	Date
Eligibility Criteria	Long text
Application Procedures	Long text
Award Website	URL
Prize Amount	Single line text	e.g., "$1000", "€500"
Application Fee	Single line text	e.g., "$75", "€50"
Award Status	Single select	Open, Closed, Upcoming
Award Logo	Attachment	Image upload
Awarding Organization	Single line text
Contact Person	Single line text
Contact Email	Email
Contact Phone	Phone number
Physical Address	Long text
Past Winners URL	URL
Extra Benefits	Long text
In-Person Celebration	Checkbox
Number of Categories	Number	Integer
Geographic Restrictions	Single line text
Alli Rating	Number	Integer (1–5 or as appropriate)
Accepted Formats	Multiple select	e.g., Print, eBook, Audiobook
ISBN Required	Checkbox
Accepts Series	Checkbox
Accepts Anthologies	Checkbox
Accepts Debut Authors	Checkbox
Evaluates Covers	Checkbox
Evaluates Illustrations	Checkbox
Evaluates Interior Design	Checkbox
Secondary Website	URL
Judging Criteria	Long text
Listed in Lead Magnet	Checkbox
Described in Drip Campaign	Checkbox

Tips:

For select fields, define the allowed options in Airtable.
For checkboxes, use them for all Yes/No type fields.
Attachments are best for logos.

Note: If any data cannot be written to Airtable due to permissions or schema mismatches, the failed request (and the SQL schema for the attempted fields) will be logged in failed_airtable_requests.sql in the project root. See Troubleshooting for details, including how to re-run or manipulate these statements in SQL.

The application collects and manages the following information for each book award:

Core Information

Award Name: Official name of the award
Category: Type of award (Fiction, Non-fiction, etc.)
Award Status: Current status (Open, Closed, Upcoming)
Award Website: Official URL
Vetting Notes: Data provenance or vetting status. By default, all records imported by this agent are marked as imported by Web Scraper to distinguish them from manually entered or externally sourced records.
Awarding Organization: Organization that presents the award

Submission Details

Entry Deadline: Application deadline
Eligibility Criteria: Who can apply
Application Procedures: How to apply
Application Fee: Cost to enter (if any)
Accepted Formats: What formats are accepted
ISBN Required: Whether an ISBN is needed

Prizes & Recognition

Prize Amount: Monetary value (if any)
Extra Benefits: Additional benefits for winners
In-Person Celebration: Details about award ceremonies
Past Winners URL: Link to previous winners

Contact Information

Contact Person: Primary contact
Email: Contact email
Phone: Contact phone number
Physical Address: Organization's address

Technical Metadata

Data Source: Where the information was obtained
Last Updated: When the information was last verified
Data Quality: Confidence score for the collected data
In-Person Celebration: Whether an in-person event is part of the award
Number of Categories
Geographic Restrictions
Accepted Formats
ISBN Required
Judging Criteria
And many more fields matching the provided CSV structure

Testing and Validation

Run the validation script to test the agent's functionality:

python tests/test_validation.py

This will:

Test the websearch functionality
Test data extraction with sample URLs
Validate data completeness
Test Airtable integration (if credentials are provided)

Customization

You can customize the agent's behavior by modifying the following files:

config.py: Adjust search queries, field definitions, and other settings
websearch.py: Modify search behavior and result filtering
extractor.py: Enhance data extraction patterns for specific fields
airtable_updater.py: Customize Airtable integration logic

Troubleshooting

Failed Airtable Requests & SQL Logging

If an Airtable insert or update fails (e.g., due to permissions or schema issues), the Book Awards Agent will log the failed request as a SQL statement in:

failed_airtable_requests.sql

This file is always written to the project root directory (e.g. /Users/ma3u/projects/BookAwardsAgent/failed_airtable_requests.sql).

Schema: The file begins with a CREATE TABLE IF NOT EXISTS statement matching the fields attempted in the failed request.
Failed Requests: Each failed request is appended as an INSERT INTO ... statement, with all attempted field values and the error message.
Log Info: Every time a failed request is logged, an info message with the absolute file path is written to the main log output.

Example SQL Output

-- SQL schema for failed Airtable requests
CREATE TABLE IF NOT EXISTS book_awards_failed (
    `Award Name` TEXT,
    `Award Website` TEXT,
    `Category` TEXT,
    `Prize Amount` FLOAT,
    error TEXT
);

-- CREATE failed
INSERT INTO book_awards_failed (`Award Name`, `Award Website`, `Category`, `Prize Amount`, error)
VALUES ('National Book Award', 'https://www.nationalbook.org', 'Fiction', 5000, 'Insufficient permissions to create new select option "Incomplete"');

How to Re-run or Manipulate Failed SQL

Copy the schema and failed INSERT statements from failed_airtable_requests.sql.
Paste them into your SQL environment (e.g., Airflow SQL Explorer, SQLite, or any compatible SQL database).
Run the statements to recreate the failed records for further analysis, troubleshooting, or data migration.

Data Manipulation Example

You can use standard SQL to update, filter, or export the failed data. For example:

-- Find all failures due to select option issues
SELECT * FROM book_awards_failed WHERE error LIKE '%select option%';

-- Export all fields except the error column
SELECT `Award Name`, `Award Website`, `Category`, `Prize Amount` FROM book_awards_failed;

Using Data in Airtable

Airtable itself does not support direct SQL imports, but you can:
- Export the data from your SQL environment as CSV.
- Import the CSV into Airtable using the "CSV Import" app or manual upload.
- Use the error column to filter or clean data before re-import.

Tip: Use your SQL environment to fix or enrich the data before re-attempting the upload to Airtable.

Common Issues

No Search Results
- Verify internet connection
- Check if DuckDuckGo API is accessible
- Review search queries in config.py
Airtable Connection Issues
- Verify API key and base ID
- Check network connectivity
- Ensure the API key has proper permissions
Data Extraction Failures
- Check if the website structure has changed
- Review error logs for specific issues
- Update the extractor for the new website structure

Getting Help

For additional support:

Check the GitHub Issues page
Review the project's Wiki
Open a new issue with detailed error information

Agent fails to find book awards: Try modifying the search queries in config.py
Incomplete data extraction: Check the extraction patterns in extractor.py
Module not found errors: Ensure you've activated the virtual environment and installed dependencies
Airtable connection issues: Verify your API key and base ID in the .env file
For Airtable integration issues, verify your API credentials and table structure

Limitations

The agent relies on website structure for data extraction, which may change over time
Some fields may require manual verification for complete accuracy
Rate limiting may affect the number of awards that can be processed in a single run

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
backend		backend
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Book Awards Agent

Table of Contents

Key Features

Prerequisites

Installation

Configuration

Environment Variables

For Users

Getting Started

Python Backend Usage

Input File Formats

1. URL List (Plain Text)

2. Award Data (JSON)

Choosing the Input File Format

Running with an Input File

Node.js Backend

For Developers

Project Structure

Core Components

1. Web Search Module (websearch.py)

2. Data Extractor (extractor.py)

3. Airtable Updater (airtable_updater.py)

Development Setup

Python Environment

Node.js Environment

Extending the Application

Deployment

Python

Node.js

License

Contributing

Development Guidelines

Examples

Basic Usage

Advanced Scenarios

Environment Configuration

Running in Production

Data Model

Airtable Schema

Core Information

Submission Details

Prizes & Recognition

Contact Information

Technical Metadata

Testing and Validation

Customization

Troubleshooting

Failed Airtable Requests & SQL Logging

Example SQL Output

How to Re-run or Manipulate Failed SQL

Data Manipulation Example

Using Data in Airtable

Common Issues

Getting Help

Limitations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

1. Web Search Module (`websearch.py`)

2. Data Extractor (`extractor.py`)

3. Airtable Updater (`airtable_updater.py`)

Packages