Skip to content
📣 Connects your web site to social media. Likes, retweets, mentions, cross-posting, and more...
Python HTML Other
Branch: master
Clone or download

Latest commit

Latest commit 367ac2a Jun 4, 2020

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci circle: add back apt-get update Dec 25, 2019
docs misc setup and doc tweaks Mar 16, 2020
scripts misc config tweaks Jun 4, 2020
static adding reddit support Apr 29, 2020
templates reddit posts and comments being pulled Apr 29, 2020
tests update tests to handle granary's new aria-hidden="true" on empty links May 24, 2020
.gcloudignore python 3: minor readme and config tweaks Dec 20, 2019
.gitignore misc config tweaks Jun 4, 2020
.gitmodules start to port to virtualenv and pip for dependencies Jun 21, 2015
README.md readme: add curl command for manually running poll/propagate tasks Jun 4, 2020
__init__.py unify appengine config code into webutil, refactor other config Dec 23, 2019
admin.py fix /admin/sources Feb 12, 2020
app.py remove publish callback handler, consolidate add and delete callback Apr 29, 2020
app.yaml app.yaml: drop min_instances 1, it increased frontend instance hours … Feb 7, 2020
appengine_config.py unify appengine config code into webutil, refactor other config Dec 23, 2019
background.yaml misc config tweaks Jun 4, 2020
beta_users.txt misc config tweaks Jun 4, 2020
blog_webmention.py unify appengine config code into webutil, refactor other config Dec 23, 2019
blogger.py escape key string id in auth entities if it starts and ends with __ Jan 19, 2020
cloud_storage_lifecycle.json migrate weekly datastore backup to import/export API Feb 7, 2019
cron.py mastodon profile picture cron job bug fix: user id, not account address May 20, 2020
cron.yaml add cron job to update mastodon profile pictures May 15, 2020
domain_blocklist.txt add a few misbehaving blog webmention domains to blocklist Apr 7, 2020
dos.yaml un-block aperture Aug 12, 2019
flickr.py escape key string id in auth entities if it starts and ends with __ Jan 19, 2020
github.py escape key string id in auth entities if it starts and ends with __ Jan 19, 2020
handlers.py adding reddit support Apr 29, 2020
index.yaml tweak app engine scaling: min instances 1 for app, switch background … Jan 23, 2020
indieauth_client_id finish indieauth signup for instagram scraping! for #603 Apr 3, 2016
instagram.py instagram: do signup requests with logged in cookie Apr 5, 2020
keys.md5 delete facebook.py and friends Dec 20, 2019
mastodon.py mastodon: oops, need read:statuses and read:notifications oauth scope Jan 28, 2020
medium.py escape key string id in auth entities if it starts and ends with __ Jan 19, 2020
meetup.py Fix: Request Publish scopes on sign-in Feb 17, 2020
models.py appending old inReplyTos (#941) May 12, 2020
oauth_dropins add mox3 dep, update oauth_dropins symlink Mar 13, 2020
original_post_discovery.py rename domain_blacklist.txt => domain_blocklist.txt Jan 9, 2020
publish.py temporarily drop ndb transaction in publish May 24, 2020
queue.yaml cut poll queue down to just one task at a time Apr 15, 2020
readthedocs.yml yet another attempt to fix sphinx docs build: use sphinx >=2.4 Feb 24, 2020
reddit.py slightly improved search query May 3, 2020
requirements.txt upgrade ndb to 1.2.1; fix crash in publish error handling May 16, 2020
superfeedr.py blogs, in superfeedr handler: short circuit out after 10 links, skip … Apr 5, 2020
tasks.py adding reddit support Apr 29, 2020
tumblr.py unify appengine config code into webutil, refactor other config Dec 23, 2019
twitter.py escape key string id in auth entities if it starts and ends with __ Jan 19, 2020
util.py switch back to google-cloud-ndb's TextProperty(compressed=True) Mar 5, 2020
webmention.py noop: move fragment extraction from publish to webutil Feb 22, 2020
wordpress_rest.py escape key string id in auth entities if it starts and ends with __ Jan 19, 2020

README.md

Bridgy Bridgy Circle CI Coverage Status

Bridgy connects your web site to social media. Likes, retweets, mentions, cross-posting, and more. See the user docs for more details, or the developer docs if you want to contribute.

https://brid.gy/

Bridgy is part of the IndieWeb ecosystem. In IndieWeb terminology, Bridgy offers backfeed, POSSE, and webmention support as a service.

License: This project is placed in the public domain.

Development

You'll need the Google Cloud SDK (aka gcloud) with the gcloud-appengine-python, gcloud-appengine-python-extras and google-cloud-sdk-datastore-emulator components. Then, create a Python 3 virtualenv and install the dependencies with:

python3 -m venv local3
source local3/bin/activate
pip install -r requirements.txt
ln -s local3/lib/python3*/site-packages/oauth_dropins  # needed to serve static file assets in dev_appserver
gcloud config set project brid-gy

Now, you can fire up the gcloud emulator and run the tests:

gcloud beta emulators datastore start --no-store-on-disk --consistency=1.0 --host-port=localhost:8089 < /dev/null >& /dev/null
python3 -m unittest discover -s tests -t .
kill %1

If you send a pull request, please include or update a test for your new code!

To test a poll or propagate task, find the relevant Would add task line in the logs, eg:

INFO:root:Would add task: projects//locations/us-central1/queues/poll {'app_engine_http_request': {'http_method': 'POST', 'relative_uri': '/_ah/queue/poll', 'app_engine_routing': {'service': 'background'}, 'body': b'source_key=agNhcHByFgsSB1R3aXR0ZXIiCXNjaG5hcmZlZAw&last_polled=1970-01-01-00-00-00', 'headers': {'Content-Type': 'application/x-www-form-urlencoded'}}, 'schedule_time': seconds: 1591176072

...pull out the relative_uri and body, and then put them together in a curl command against the background service, which usually runs on http://localhost:8081/, eg:

curl -d 'source_key=agNhcHByFgsSB1R3aXR0ZXIiCXNjaG5hcmZlZAw&last_polled=1970-01-01-00-00-00' \
  http://localhost:8081/_ah/queue/poll

To run the entire app locally, run this in the repo root directory:

dev_appserver.py --log_level debug --enable_host_checking false \
  --support_datastore_emulator --datastore_emulator_port=8089 \
  --application=brid-gy ~/src/bridgy/app.yaml ~/src/bridgy/background.yaml

(Note: dev_appserver.py is incompatible with python3. if python3 is your default python, you can run python2 /location/of/dev_appserver.py ... instead.)

Open localhost:8080 and you should see the Bridgy home page!

If you hit an error during setup, check out the oauth-dropins Troubleshooting/FAQ section. For searchability, here are a handful of error messages that have solutions there:

bash: ./bin/easy_install: ...bad interpreter: No such file or directory

ImportError: cannot import name certs

ImportError: No module named dev_appserver

ImportError: cannot import name tweepy

File ".../site-packages/tweepy/auth.py", line 68, in _get_request_token
  raise TweepError(e)
TweepError: must be _socket.socket, not socket

error: option --home not recognized

There's a good chance you'll need to make changes to granary, oauth-dropins, or webmention-tools at the same time as bridgy. To do that, clone their repos elsewhere, then install them in "source" mode with:

pip uninstall -y oauth-dropins
pip install -e <path-to-oauth-dropins-repo>
ln -sf <path-to-oauth-dropins-repo>/oauth_dropins  # needed to serve static file assets in dev_appserver

pip uninstall -y granary
pip install -e <path to granary>

pip uninstall -y webmentiontools
pip install <path to webmention-tools>

To deploy to App Engine, run scripts/deploy.sh.

remote_api_shell is a useful interactive Python shell that can interact with the production app's datastore, memcache, etc. To use it, create a service account and download its JSON credentials, put it somewhere safe, and put its path in your GOOGLE_APPLICATION_CREDENTIALS environment variable.

Deploying to your own app-engine project can be useful for testing, but is not recommended for production. To deploy to your own app-engine project, create a project on gcloud console and activate the Tasks API. Initialize the project on the command line using gcloud config set project <project-name> followed by gcloud app create. You will need to update TASKS_LOCATION in util.py to match your project's location. Finally, you will need to add your "background" domain (eg background.YOUR-APP-NAME.appspot.com) to OTHER_DOMAINS in util.py and set host_url in tasks.py to your base app url (eg app-dot-YOUR-APP-NAME.wn.r.appspot.com). Finally, deploy (after testing) with gcloud -q beta app deploy --no-cache --project YOUR-APP-NAME *.yaml

Adding a new silo

So you want to add a new silo? Maybe MySpace, or Friendster, or even Tinder? Great! Here are the steps to do it. It looks like a lot, but it's not that bad, honest.

  1. Find the silo's API docs and check that it can do what Bridgy needs. At minimum, it should be able to get a user's posts and their comments, likes, and reposts, depending on which of those the silo supports. If you want publish support, it should also be able to create posts, comments, likes, reposts, and/or RSVPs.
  2. Fork and clone this repo.
  3. Create an app (aka client) in the silo's developer console, grab your app's id (aka key) and secret, put them into new local files in the repo root dir, following this pattern. You'll eventually want to send them to @snarfed too, but no hurry.
  4. Add the silo to oauth-dropins if it's not already there:
    1. Add a new .py file for your silo with an auth model and handler classes. Follow the existing examples.
    2. Add a 100 pixel tall button image named [NAME]_2x.png, where [NAME] is your start handler class's NAME constant, eg 'twitter'.
    3. Add it to the app front page and the README.
  5. Add the silo to granary:
    1. Add a new .py file for your silo. Follow the existing examples. At minimum, you'll need to implement get_activities_response and convert your silo's API data to ActivityStreams.
    2. Add a new unit test file and write some tests!
    3. Add it to api.py (specifically Handler.get), app.py, index.html, and the README.
  6. Add the silo to Bridgy:
    1. Add a new .py file for your silo with a model class. Follow the existing examples.
    2. Add it to app.py and handlers.py (just import the module).
    3. Add a 48x48 PNG icon to static/.
    4. Add a new [SILO]_user.html file in templates/ and add the silo to index.html. Follow the existing examples.
    5. Add the silo to about.html and this README.
    6. If users' profile picture URLs can change, add a cron job that updates them to cron.py.
  7. Optionally add publish support:
    1. Implement create and preview_create for the silo in granary.
    2. Add the silo to publish.py: import its module, add it to SOURCES, and update this error message.

Good luck, and happy hacking!

Monitoring

App Engine's built in dashboard and log browser are pretty good for interactive monitoring and debugging.

For alerting, we've set up Google Cloud Monitoring (née Stackdriver). Background in issue 377. It sends alerts by email and SMS when HTTP 4xx responses average >.1qps or 5xx >.05qps, latency averages >15s, or instance count averages >5 over the last 15m window.

Stats

I occasionally generate stats and graphs of usage and growth from the BigQuery dataset (#715). Here's how.

  1. Export the full datastore to Google Cloud Storage. Include all entities except *Auth and other internal details. Check to see if any new kinds have been added since the last time this command was run.

    gcloud datastore export --async gs://brid-gy.appspot.com/stats/ --kinds Blogger,BlogPost,BlogWebmention,FacebookPage,Flickr,GitHub,GooglePlusPage,Instagram,Medium,Publish,PublishedPage,Response,SyndicatedPost,Tumblr,Twitter,WordPress
    

    Note that --kinds is required. From the export docs, Data exported without specifying an entity filter cannot be loaded into BigQuery.

  2. Wait for it to be done with gcloud datastore operations list | grep done.

  3. Import it into BigQuery:

    for kind in BlogPost BlogWebmention Publish Response SyndicatedPost; do
      bq load --replace --nosync --source_format=DATASTORE_BACKUP datastore.$kind gs://brid-gy.appspot.com/stats/all_namespaces/kind_$kind/all_namespaces_kind_$kind.export_metadata
    done
    
    for kind in Blogger FacebookPage Flickr GitHub GooglePlusPage Instagram Medium Meetup Tumblr Twitter WordPress; do
      bq load --replace --nosync --source_format=DATASTORE_BACKUP sources.$kind gs://brid-gy.appspot.com/stats/all_namespaces/kind_$kind/all_namespaces_kind_$kind.export_metadata
    done
    
  4. Check the jobs with bq ls -j, then wait for them with bq wait.

  5. Run the full stats BigQuery query. Download the results as CSV.

  6. Open the stats spreadsheet. Import the CSV, replacing the data sheet.

  7. Check out the graphs! Save full size images with OS or browser screenshots, thumbnails with the Save Image button. Then post them!

Misc

The datastore is automatically backed up by an App Engine cron job that runs Datastore managed export (details) and stores the results in Cloud Storage, in the brid-gy.appspot.com bucket. It backs up weekly and includes all entities except Response and SyndicatedPost, since they make up 92% of all entities by size and they aren't as critical to keep.

(We used to use Datastore Admin Backup, but it shut down in Feb 2019.)

We use this command to set a Cloud Storage lifecycle policy on that bucket that prunes older backups:

gsutil lifecycle set cloud_storage_lifecycle.json gs://brid-gy.appspot.com

Run this to see how much space we're currently using:

gsutil du -hsc gs://brid-gy.appspot.com/\*

Run this to download a single complete backup:

gsutil -m cp -r gs://brid-gy.appspot.com/weekly/datastore_backup_full_YYYY_MM_DD_\* .

Also see the BigQuery dataset (#715).

You can’t perform that action at this time.