A Python-based tool designed to parse and process RSS feeds, primarily aimed at automating the posting of Hugo blog updates to social media or other platforms. This project includes utilities for interacting with Google Cloud services and supports deployment via Docker and Google Cloud Build.
- Parses RSS feeds to extract and process new blog entries.
- Converts published dates to standardized formats for comparison and processing.
- Updates a backend service or database with new or updated feed entries via HTTP requests.
- Integrates with Google Cloud services including BigQuery, Cloud Storage, and Cloud Logging through reusable client utilities.
- Supports containerized deployment with Docker and automated builds using Google Cloud Build.
- Python 3
- feedparser (for RSS parsing)
- requests (for HTTP requests)
- Google Cloud SDKs (BigQuery, Storage, Logging)
- Docker
- Google Cloud Build
- Python 3.7 or higher
- Docker (optional, for containerized deployment)
- Google Cloud account and appropriate permissions
- Clone the repository:
git clone https://github.com/justin-napolitano/python-rss-reader.git
cd python-rss-reader- (Optional) Set up a Python virtual environment:
python3 -m venv venv
source venv/bin/activate- Install dependencies:
pip install -r requirements.txt- Configure environment variables and credentials:
-
Place your Google Cloud service account JSON in
secret.jsonor set the environment variableGOOGLE_APPLICATION_CREDENTIALS. -
Configure any other required environment variables as needed.
python rss-scraper.pyThis will parse the RSS feed (default: https://jnapolitano.com/index.xml) and attempt to update the backend service with new entries.
Build the Docker image:
docker build -t python-rss-reader .Run the container:
docker run --env GOOGLE_APPLICATION_CREDENTIALS=/path/to/secret.json -v /local/path/to/secret.json:/path/to/secret.json python-rss-readerThe cloudbuild.yaml file defines steps to build and push the Docker image to Google Container Registry. Uncomment and configure additional steps to deploy to Cloud Run or set up Cloud Scheduler jobs.
Run Cloud Build:
gcloud builds submit --config cloudbuild.yaml .python-rss-reader/
├── cloudbuild.yaml # Google Cloud Build configuration
├── Dockerfile # Docker image definition
├── gcputils/ # Google Cloud utility submodule
│ ├── BigQueryClient.py # BigQuery client wrapper
│ ├── GCSClient.py # Google Cloud Storage client wrapper
│ ├── GoogleCloudLogging.py# Cloud Logging client wrapper
│ ├── index.md # Documentation
│ └── readme.md # Documentation
├── images/ # Image assets
├── index.md # Project notes and thoughts
├── last_run.txt # Stores last run timestamp
├── readme.md # Project notes and thoughts (similar to index.md)
├── requirements.txt # Python dependencies
├── rss-scraper.py # Main RSS parsing and update script
└── secret.json # Google Cloud service account credentials (sensitive)
- Implement a dedicated API or batch processor for handling feed updates instead of a monolithic script.
- Add more robust error handling and retry mechanisms.
- Extend support for publishing parsed posts to various social media platforms.
- Enhance configuration management for cloud deployments.
- Add automated tests and CI/CD pipelines.
- Improve documentation and usage examples.
Note: This README is based on available source files and inferred project goals. Some assumptions were made regarding deployment and usage.