Automation for sourcing meme images and short Reddit-hosted videos from Reddit and publishing them to Instagram with the Instagram Graph API.
This version supports image posts and Reddit-sourced Instagram Reels. It does not run AI video generation, TTS, account splitting, or a hosted database. Repost tracking, performance history, and generated optimization config are handled by files on a dedicated Git branch.
- Fetch image posts from configured Reddit subreddits using hot, rising, and new listings by default.
- Filter out unsafe or unsupported content: NSFW, spoiler, stickied, gallery, video, GIF, and non-image URLs.
- Fetch top Reddit comments when available and score each candidate for Instagram fit.
- Load
state/posted.jsonlandstate/performance.jsonfrom thebot-statebranch. - Skip any candidate already seen by Reddit id, permalink, normalized source/media URL, normalized title, or image SHA-256 hash.
- Generate a rotating, attribution-preserving caption with minimal hashtags.
- Publish the selected image to Instagram.
- Append the successful post to
state/posted.jsonl, updatestate/performance.json, append run history, and push the state update back tobot-state.
The tracker is only written after Instagram returns a published media id.
Reel posting uses a parallel flow:
- Fetch Reddit-hosted video posts from configured Reel subreddits using hot, rising, and new listings by default.
- Filter out unsafe, unsupported, too short, too long, very low-resolution, or weird-aspect candidates when metadata is available.
- Score candidates for fast payoff, discussion/reaction potential, duration, freshness, duplicate risk, and historical performance.
- Download and lightly normalize the selected video with
yt-dlpandffmpeg. - Skip any candidate already seen by Reddit id, permalink, normalized source/media URL, normalized title, or video SHA-256 hash.
- Generate a rotating Reel caption and publish the local MP4 through Instagram's Reels resumable upload flow.
- Append the successful Reel to
state/reels-posted.jsonl, updatestate/performance.json, append run history, and push the state update back tobot-state.
Candidates are no longer ranked only by Reddit score. The scorer combines:
- Reddit score, comment count, upvote ratio, post age, and listing freshness.
- Title clarity, shareability, quick-payoff language, Reddit-only context risk, and top-comment reaction signals.
- Media type, Reel duration, and available video width/height/aspect metadata.
- Duplicate/repost risk from tracker and performance history.
- Generated historical weights for subreddit, posting hour, media type, caption template, hashtag pool, and Reel duration bucket.
The default scorer prefers posts from the last 6-12 hours, downranks older content, soft-downranks Reels over 45 seconds, and rejects exact duplicates. Thresholds are configurable through env vars or state/optimized-config.json on bot-state.
Captions use the Reddit title, subreddit attribution, media type, score signals, and top-comment reaction signals. The bot rotates CTA templates such as "Send this to the friend who needs to see it.", "Rate this 1-10.", and "The comments were ruthless." It avoids recently used templates, keeps hashtags minimal, randomizes hashtag pools, and preserves via r/{subreddit} on Reddit attribution.
meme_bot/
config.py Environment configuration and validation
instagram.py Instagram Graph API client for images and Reels
analytics_runner.py Scheduled Instagram metric collection
captions.py Caption template and hashtag generation
optimizer.py Scheduled config self-optimization
performance_store.py Versioned JSON performance history and duplicate index
reel_runner.py Reels orchestration
reel_source.py Reddit video candidate fetching and filtering
reddit_source.py Reddit candidate fetching and filtering
retry.py HTTP retry and Retry-After handling
runner.py Main orchestration
scoring.py Instagram-focused candidate scoring
tuning.py Generated config, env overrides, and tuning defaults
token_manager.py Instagram token expiry check and refresh
tracker.py JSONL audit trail loading, duplicate checks, and hashing
video_processing.py yt-dlp/ffmpeg download, normalization, and hashing
scripts/
refresh_instagram_token.py
.github/workflows/
post-meme.yml
post-reel.yml
refresh-instagram-token.yml
main.py Thin image entrypoint wrapper
publish.py Thin image-publishing compatibility wrapper
- Python 3.10 or newer.
ffmpegandffprobefor Reel posting.- Reddit API credentials for a script app.
- Instagram Business or Creator account connected for Instagram Graph API publishing.
- GitHub Actions secrets for scheduled automation.
Install dependencies:
pip install -r requirements.txtRequired GitHub Actions secrets:
ACCESS_TOKENINSTAGRAM_ACCOUNT_IDFB_APP_IDFB_APP_SECRETREDDIT_CLIENT_IDREDDIT_CLIENT_SECRETREDDIT_USER_AGENTGH_SECRETS_TOKEN
Optional secrets:
REDDIT_USERNAMEREDDIT_PASSWORD
Optional GitHub Actions variables or local environment variables:
SUBREDDITS: comma-separated list, defaultmemesIMAGE_LISTING_MODES: comma-separated Reddit listings, defaulthot,rising,new; supportshot,rising,new, andtopPOST_TIME_FILTER: Reddit listing period, defaultdayPOST_LIMIT: posts scanned per subreddit, default100MIN_SCORE: minimum Reddit score, default0TRACKER_PATH: defaultstate/posted.jsonlPERFORMANCE_STORE_PATH: defaultstate/performance.jsonRUN_HISTORY_PATH: defaultstate/run-history.jsonlOPTIMIZED_CONFIG_PATH: defaultstate/optimized-config.jsonGRAPH_VERSION: defaultv22.0REQUEST_TIMEOUT_SECONDS: default20MAX_RETRY_ATTEMPTS: default3RETRY_BASE_SECONDS: default2REFRESH_THRESHOLD_DAYS: default21DRY_RUN: when true, selects and hashes a candidate but does not publish or write tracker stateUSE_REDDIT_SAVED_GUARD: skip Reddit submissions already saved by the authenticated Reddit accountMARK_REDDIT_SAVED: save the Reddit submission after Instagram publish succeedsREEL_SUBREDDITS: comma-separated list, defaults toSUBREDDITS, thenmemesREEL_LISTING_MODES: comma-separated Reddit listings, defaulthot,rising,newREEL_POST_TIME_FILTER: Reddit listing period for Reels, defaultdayREEL_POST_LIMIT: posts scanned per Reel subreddit, default100REEL_MIN_SCORE: minimum Reddit score for Reels, defaults toMIN_SCOREREEL_TRACKER_PATH: defaultstate/reels-posted.jsonlREEL_MAX_DURATION_SECONDS: default90REEL_SOFT_MAX_DURATION_SECONDS: scoring downrank starts above this duration, default45REEL_MAX_BYTES: default100000000REEL_MIN_WIDTH: reject detectable videos below this width, default240REEL_MIN_HEIGHT: reject detectable videos below this height, default240REEL_MIN_ASPECT_RATIO: reject detectable videos below this aspect ratio, default0.35REEL_MAX_ASPECT_RATIO: reject detectable videos above this aspect ratio, default3.0REEL_SHARE_TO_FEED: defaultfalse; keep false to force Reels-only publishing instead of also sharing to the feedREELS_DRY_RUN: when true, selects/downloads/hashes a Reel candidate but does not publish or write tracker stateTOP_COMMENTS_LIMIT: top Reddit comments fetched per candidate, default5SCORING_MINIMUM_TOTAL_SCORE: default35SCORING_PREFERRED_AGE_HOURS: default12SCORING_MAX_AGE_HOURS: default48INSTAGRAM_INSIGHT_METRICS: defaultlikes,comments,saved,shares,reach,views,plays,total_interactions,followsANALYTICS_LOOKBACK_DAYS: default14ANALYTICS_MAX_MEDIA_PER_RUN: default50SELF_OPTIMIZATION_ENABLED: defaulttrue; setfalseto disable scheduled config updatesOPTIMIZER_MIN_TOTAL_SAMPLES: default20OPTIMIZER_MIN_GROUP_SAMPLES: default5OPTIMIZER_MAX_WEIGHT_CHANGE: default0.15
.github/workflows/post-meme.yml runs twice daily at 14:17 and 23:17 UTC and supports manual dispatch.
For a manual dry run, use workflow dispatch with dry_run set to true. The workflow still loads the tracker and hashes the selected image, but it does not publish to Instagram or append tracker state.
The first successful live run creates the bot-state branch if it does not exist.
Posting workflows now commit state/posted.jsonl or state/reels-posted.jsonl, state/performance.json, and state/run-history.jsonl back to bot-state. Dry runs do not publish or commit state.
.github/workflows/post-reel.yml runs three times daily at 03:43, 11:43, and 19:43 UTC and supports manual dispatch.
For a manual dry run, use workflow dispatch with dry_run set to true. The workflow still loads the tracker, downloads, normalizes, and hashes the selected video, but it does not publish to Instagram or append tracker state.
.github/workflows/refresh-instagram-token.yml runs weekly and supports manual dispatch.
The workflow checks token validity and expiry. If the token is inside the refresh threshold, it writes the refreshed token to a workflow-local temp file and updates the GitHub Actions ACCESS_TOKEN secret using GH_SECRETS_TOKEN.
Weekly is intentional: Meta long-lived tokens are a multi-week expiry concern, and the script no-ops unless the token is inside REFRESH_THRESHOLD_DAYS.
GH_SECRETS_TOKEN should be a fine-grained token or GitHub App token with permission to update Actions secrets for this repository.
.github/workflows/collect-instagram-analytics.yml runs daily and supports manual dispatch. It reads recent Instagram media IDs from state/performance.json, requests available Instagram insights, records unsupported metrics in unavailable_metrics, recalculates normalized performance scores, and commits updates to bot-state.
Metric availability varies by Instagram account type, media type, API version, and token permissions. Missing metrics are treated as unavailable rather than fatal.
.github/workflows/self-optimize.yml runs weekly and supports manual dispatch. It reads state/performance.json and state/run-history.jsonl, then writes generated overrides to state/optimized-config.json on bot-state.
Safeguards:
- No changes before
OPTIMIZER_MIN_TOTAL_SAMPLEStotal scored posts andOPTIMIZER_MIN_GROUP_SAMPLESsamples for a group. - Weight changes are capped by
OPTIMIZER_MAX_WEIGHT_CHANGEper run and clamped between0.5and1.5. - Threshold changes are small and based on recent posting rate/performance.
- Every generated update is appended to
state/optimization-changelog.jsonl. - Set
SELF_OPTIMIZATION_ENABLED=falseto no-op safely.
Dry run:
$env:DRY_RUN="true"
python main.pyLive run:
python main.pyReel dry run:
$env:REELS_DRY_RUN="true"
python -m meme_bot.reel_runnerLive Reel run:
python -m meme_bot.reel_runnerRefresh token locally:
python scripts/refresh_instagram_token.pyDry-run Facebook Page cleanup for Page posts created before June 2026:
$env:FACEBOOK_PAGE_ID="your-page-id"
$env:FACEBOOK_PAGE_ACCESS_TOKEN="your-page-access-token"
python scripts/cleanup_facebook_page_history.py --before 2026-06-01 --resource postsIf FACEBOOK_PAGE_ACCESS_TOKEN is not set, the script falls back to ACCESS_TOKEN. If FACEBOOK_PAGE_ID is not set, it tries to resolve the connected Facebook Page from INSTAGRAM_ACCOUNT_ID.
Live cleanup is intentionally capped and slow by default to reduce request pressure:
python scripts/cleanup_facebook_page_history.py --before 2026-06-01 --resource posts --execute --confirm-permanent-delete DELETE_OLD_PAGE_CONTENTUse --resource all to include uploaded Page photos, --max-deletes-per-run to adjust the per-run cap, and rerun later instead of using aggressive delays if Meta returns rate-limit errors.
Each successful post appends one JSON object to state/posted.jsonl:
{"reddit_id":"abc123","image_url":"https://i.redd.it/example.jpg","image_hash":"...","title":"Example","subreddit":"memes","instagram_media_id":"178...","posted_at":"2026-06-02T00:00:00Z"}Malformed lines are ignored with a warning so one bad audit line does not stop posting.
Each successful Reel appends one JSON object to state/reels-posted.jsonl:
{"reddit_id":"abc123","source_url":"https://v.redd.it/example","video_hash":"...","title":"Example","subreddit":"memes","instagram_media_id":"178...","posted_at":"2026-06-02T00:00:00Z"}state/performance.json is a compact versioned JSON file committed to bot-state. It stores each published Reddit post/Reel with:
- Reddit id, Reddit URL, source/media URLs, normalized title, subreddit, media type, and optional media hash.
- Instagram media id and permalink when available.
- Posted timestamp, posting hour/day, generated score, and score breakdown.
- Caption template id, hashtag pool id, and hashtags used.
- Reel duration when available.
- Latest analytics metrics, unavailable metric names, metric snapshots, and final normalized performance score.
The store is pruned automatically by PERFORMANCE_MAX_POSTS, PERFORMANCE_MAX_AGE_DAYS, and PERFORMANCE_MAX_SNAPSHOTS_PER_POST.
To reset history, edit or remove files on the bot-state branch:
- Reset duplicate/performance learning: remove or replace
state/performance.jsonwith{"version":1,"meta":{},"posts":[]}. - Reset image duplicates only: edit
state/posted.jsonl. - Reset Reel duplicates only: edit
state/reels-posted.jsonl. - Reset generated tuning: remove
state/optimized-config.jsonand optionallystate/optimization-changelog.jsonl.
If Instagram publishing succeeds but the workflow fails before pushing bot-state, inspect the workflow logs for the Reddit id, Instagram media id, caption template, score breakdown, and media hash, then manually add the corresponding record to the tracker and performance files on the bot-state branch. Enabling MARK_REDDIT_SAVED=true provides a secondary guard when Reddit user credentials are configured.