A comprehensive tool for discovering, tracking, and managing book awards. The application automatically searches for book awards, extracts detailed information, and updates an Airtable base. It combines Python for data processing and Node.js for the web interface.
- Automated Discovery: Find book awards from various sources
- Data Extraction: Extract comprehensive award details including deadlines, eligibility, and submission guidelines
- Airtable Integration: Keep your awards database up-to-date automatically
- User-friendly Interface: Easy-to-use web interface for managing awards
- Customizable Search: Tailor your search with specific criteria
- Python 3.8 or higher
- Node.js 14.x or higher
- An Airtable account with API access
- Git
-
Clone the repository:
git clone https://github.com/Ma3u/BookAwardsAgent.git cd BookAwardsAgent -
Set up Python environment:
# Create and activate virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install Python dependencies cd backend/python pip install -r requirements.txt
-
Set up Node.js environment:
cd ../../backend/js npm install -
Configure environment variables:
cp ../../.env.example ../../.env # Edit ../../.env with your Airtable credentials
Create a .env file in the project root with these variables:
# Airtable Configuration
AIRTABLE_API_KEY=your_api_key_here
AIRTABLE_BASE_ID=your_base_id_here
AIRTABLE_TABLE_NAME=Book Awards- Installation: Follow the installation steps in the Prerequisites and Installation sections
- Configuration: Set up your
.envfile with Airtable credentials - Run the Application: Use the Python backend to start collecting award data
The Python backend supports two types of input files via the --input-file argument:
- Use with: Standard processing (
python -m src.main --input-file input_template.txt) - Format: Each line contains a single award URL. Blank lines and lines starting with
#are ignored. - Template: See
input_template.txt - Status Tracking:
- Each URL line ends with a status comment:
# pending,# completed, or# failed. - The workflow automatically updates the status after each attempt:
# completedif processed successfully# failedif extraction or update fails# pendingfor unprocessed URLs
- The file is updated in-place, so you can monitor progress or resume processing at any time.
- Each URL line ends with a status comment:
- Usage Example:
# From the backend/python directory: python -m src.main --input-file input_template.txt - Example Entry:
https://www.bookerprizes.com/ # pending
- Use with: Update-only mode (
python -m src.main --update-only --input-file book_awards_data.json) - Format: A JSON file containing a list of award data dictionaries. Each dictionary should match the expected Airtable schema.
- Usage Example:
# From the backend/python directory: python -m src.main --update-only --input-file ../book_awards_data.json - Example:
[ { "Award Name": "Example Award", "Award Website": "https://example.com/award1", "Category": "Fiction", "Deadline": "2025-06-30", "Eligibility": "Open to all authors" // ...other fields as required }, { "Award Name": "Second Award", "Award Website": "https://example.com/award2" // ... } ]
- Use the plain text URL list for discovering and extracting new awards.
- Use the JSON format for updating Airtable from pre-extracted or manually curated data.
-
Process URLs:
python -m src.main --input-file backend/input_template.txt
-
Update from Data File:
python -m src.main --update-only --input-file ../book_awards_data.json
See the template in input_template.txt for the plain text format.
The Python backend provides the core functionality for searching and processing book awards:
# Basic usage
python -m src
# Search for awards without updating Airtable
python -m src --search-only
# Process specific URLs from a file
python -m src --input-file urls.txt
# Update Airtable from existing data file
python -m src --update-only --input-file book_awards_data.jsonThe Node.js backend provides a web interface for managing awards:
# Test configuration
cd backend/js
node test-config.js
# Start the web server (if implemented)
npm startBookAwardsAgent/
├── backend/
│ ├── js/ # Node.js backend
│ │ ├── config.js # Configuration loader
│ │ ├── test-config.js # Configuration test
│ │ └── package.json # Node.js dependencies
│ └── python/ # Python backend
│ ├── src/ # Python source code
│ │ ├── config.py # Configuration settings
│ │ ├── websearch.py
│ │ ├── extractor.py
│ │ ├── airtable_updater.py
│ │ └── main.py
│ └── requirements.txt # Python dependencies
├── .env.example # Example environment variables
├── .gitignore # Git ignore file
└── README.md # This documentation
- Purpose: Searches for book awards using DuckDuckGo API
- Key Features:
- Performs web searches using predefined queries
- Filters results to find relevant book awards
- Handles rate limiting and error cases
- Removes duplicate results
- Purpose: Extracts detailed information from award websites
- Key Features:
- Parses HTML content to extract award details
- Handles various website structures
- Extracts contact information, deadlines, and submission guidelines
- Normalizes data for consistent storage
- Purpose: Manages interaction with Airtable
- Key Features:
- Creates, updates, and deletes records in Airtable
- Handles batch operations for better performance
- Implements error handling and retry logic
- Tracks changes and updates
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install development dependencies
pip install -r requirements.txt
# Install in development mode
pip install -e .
# Run tests
pytest# Install dependencies
cd backend/js
npm install
# Run tests
npm test-
Adding New Data Sources
- Modify
websearch.pyto include new search queries - Update
extractor.pyto handle new website structures
- Modify
-
Customizing Data Storage
- Modify
airtable_updater.pyto work with different database systems - Implement new data export formats as needed
- Modify
-
Enhancing the Web Interface
- Extend the Node.js backend with new API endpoints
- Update the frontend to display additional data
-
Install production dependencies:
pip install -r requirements.txt
-
Install production dependencies:
cd backend/js npm install --production
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Copyright [2024] [Ma3u]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
We welcome contributions! Here's how to get started:
-
Fork the repository
-
Clone your fork:
git clone https://github.com/yourusername/BookAwardsAgent.git
-
Create a feature branch:
git checkout -b feature/your-feature
-
Commit your changes:
git commit -m "Add your feature" -
Push to the branch:
git push origin feature/your-feature
-
Open a Pull Request
- Follow PEP 8 style guide for Python code
- Write clear, concise commit messages
- Add tests for new features
- Update documentation as needed
- Keep the codebase clean and well-documented
Search for awards and update Airtable:
python -m srcSearch without updating Airtable (useful for testing):
python -m src --search-onlyProcess specific award URLs from a file:
python -m src --input-file urls.txtUpdate Airtable from an existing data file:
python -m src --update-only --input-file book_awards_data.jsonYou can configure the application using environment variables instead of the config file:
# Required Airtable configuration
export AIRTABLE_API_KEY="your_api_key_here"
export AIRTABLE_BASE_ID="your_base_id_here"
# Optional configuration
export AIRTABLE_TABLE_NAME="Book Awards"
export MAX_SEARCH_RESULTS=10
export REQUEST_DELAY=2For production use, consider:
- Setting up a cron job or scheduled task
- Implementing proper logging and monitoring
- Setting up error notifications
- Using environment variables for sensitive data
Below is the recommended Airtable schema for the Book Award table. Use these field types and options when creating your Airtable base for best compatibility with the Book Awards Agent.
| Field Name | Field Type | Suggested Options / Description |
|---|---|---|
| Award Name | Single line text | |
| Category | Single select | e.g., Fiction, Non-fiction, Poetry, Children’s, etc. |
| Entry Deadline | Date | |
| Eligibility Criteria | Long text | |
| Application Procedures | Long text | |
| Award Website | URL | |
| Prize Amount | Single line text | e.g., "$1000", "€500" |
| Application Fee | Single line text | e.g., "$75", "€50" |
| Award Status | Single select | Open, Closed, Upcoming |
| Award Logo | Attachment | Image upload |
| Awarding Organization | Single line text | |
| Contact Person | Single line text | |
| Contact Email | ||
| Contact Phone | Phone number | |
| Physical Address | Long text | |
| Past Winners URL | URL | |
| Extra Benefits | Long text | |
| In-Person Celebration | Checkbox | |
| Number of Categories | Number | Integer |
| Geographic Restrictions | Single line text | |
| Alli Rating | Number | Integer (1–5 or as appropriate) |
| Accepted Formats | Multiple select | e.g., Print, eBook, Audiobook |
| ISBN Required | Checkbox | |
| Accepts Series | Checkbox | |
| Accepts Anthologies | Checkbox | |
| Accepts Debut Authors | Checkbox | |
| Evaluates Covers | Checkbox | |
| Evaluates Illustrations | Checkbox | |
| Evaluates Interior Design | Checkbox | |
| Secondary Website | URL | |
| Judging Criteria | Long text | |
| Listed in Lead Magnet | Checkbox | |
| Described in Drip Campaign | Checkbox |
Tips:
- For select fields, define the allowed options in Airtable.
- For checkboxes, use them for all Yes/No type fields.
- Attachments are best for logos.
Note: If any data cannot be written to Airtable due to permissions or schema mismatches, the failed request (and the SQL schema for the attempted fields) will be logged in
failed_airtable_requests.sqlin the project root. See Troubleshooting for details, including how to re-run or manipulate these statements in SQL.
The application collects and manages the following information for each book award:
- Award Name: Official name of the award
- Category: Type of award (Fiction, Non-fiction, etc.)
- Award Status: Current status (Open, Closed, Upcoming)
- Award Website: Official URL
- Vetting Notes: Data provenance or vetting status. By default, all records imported by this agent are marked as
imported by Web Scraperto distinguish them from manually entered or externally sourced records. - Awarding Organization: Organization that presents the award
- Entry Deadline: Application deadline
- Eligibility Criteria: Who can apply
- Application Procedures: How to apply
- Application Fee: Cost to enter (if any)
- Accepted Formats: What formats are accepted
- ISBN Required: Whether an ISBN is needed
- Prize Amount: Monetary value (if any)
- Extra Benefits: Additional benefits for winners
- In-Person Celebration: Details about award ceremonies
- Past Winners URL: Link to previous winners
- Contact Person: Primary contact
- Email: Contact email
- Phone: Contact phone number
- Physical Address: Organization's address
- Data Source: Where the information was obtained
- Last Updated: When the information was last verified
- Data Quality: Confidence score for the collected data
- In-Person Celebration: Whether an in-person event is part of the award
- Number of Categories
- Geographic Restrictions
- Accepted Formats
- ISBN Required
- Judging Criteria
- And many more fields matching the provided CSV structure
Run the validation script to test the agent's functionality:
python tests/test_validation.py
This will:
- Test the websearch functionality
- Test data extraction with sample URLs
- Validate data completeness
- Test Airtable integration (if credentials are provided)
You can customize the agent's behavior by modifying the following files:
config.py: Adjust search queries, field definitions, and other settingswebsearch.py: Modify search behavior and result filteringextractor.py: Enhance data extraction patterns for specific fieldsairtable_updater.py: Customize Airtable integration logic
If an Airtable insert or update fails (e.g., due to permissions or schema issues), the Book Awards Agent will log the failed request as a SQL statement in:
failed_airtable_requests.sql
This file is always written to the project root directory (e.g. /Users/ma3u/projects/BookAwardsAgent/failed_airtable_requests.sql).
- Schema: The file begins with a
CREATE TABLE IF NOT EXISTSstatement matching the fields attempted in the failed request. - Failed Requests: Each failed request is appended as an
INSERT INTO ...statement, with all attempted field values and the error message. - Log Info: Every time a failed request is logged, an info message with the absolute file path is written to the main log output.
-- SQL schema for failed Airtable requests
CREATE TABLE IF NOT EXISTS book_awards_failed (
`Award Name` TEXT,
`Award Website` TEXT,
`Category` TEXT,
`Prize Amount` FLOAT,
error TEXT
);
-- CREATE failed
INSERT INTO book_awards_failed (`Award Name`, `Award Website`, `Category`, `Prize Amount`, error)
VALUES ('National Book Award', 'https://www.nationalbook.org', 'Fiction', 5000, 'Insufficient permissions to create new select option "Incomplete"');- Copy the schema and failed INSERT statements from
failed_airtable_requests.sql. - Paste them into your SQL environment (e.g., Airflow SQL Explorer, SQLite, or any compatible SQL database).
- Run the statements to recreate the failed records for further analysis, troubleshooting, or data migration.
You can use standard SQL to update, filter, or export the failed data. For example:
-- Find all failures due to select option issues
SELECT * FROM book_awards_failed WHERE error LIKE '%select option%';
-- Export all fields except the error column
SELECT `Award Name`, `Award Website`, `Category`, `Prize Amount` FROM book_awards_failed;- Airtable itself does not support direct SQL imports, but you can:
- Export the data from your SQL environment as CSV.
- Import the CSV into Airtable using the "CSV Import" app or manual upload.
- Use the error column to filter or clean data before re-import.
Tip: Use your SQL environment to fix or enrich the data before re-attempting the upload to Airtable.
-
No Search Results
- Verify internet connection
- Check if DuckDuckGo API is accessible
- Review search queries in
config.py
-
Airtable Connection Issues
- Verify API key and base ID
- Check network connectivity
- Ensure the API key has proper permissions
-
Data Extraction Failures
- Check if the website structure has changed
- Review error logs for specific issues
- Update the extractor for the new website structure
For additional support:
- Check the GitHub Issues page
- Review the project's Wiki
- Open a new issue with detailed error information
- Agent fails to find book awards: Try modifying the search queries in
config.py - Incomplete data extraction: Check the extraction patterns in
extractor.py - Module not found errors: Ensure you've activated the virtual environment and installed dependencies
- Airtable connection issues: Verify your API key and base ID in the
.envfile - For Airtable integration issues, verify your API credentials and table structure
- The agent relies on website structure for data extraction, which may change over time
- Some fields may require manual verification for complete accuracy
- Rate limiting may affect the number of awards that can be processed in a single run