Skip to content
This repository has been archived by the owner on May 23, 2023. It is now read-only.

Remove inactive profiles from the IndieWeb webring #7

Closed
capjamesg opened this issue Oct 5, 2020 · 2 comments
Closed

Remove inactive profiles from the IndieWeb webring #7

capjamesg opened this issue Oct 5, 2020 · 2 comments

Comments

@capjamesg
Copy link

There are a number of inactive profiles on the IndieWeb webring. This means that it is more difficult for people who are surfing the webring to discover new sites.

The webring was the rabbit hole that got me interested in the IndieWeb. If we do not include inactive profiles in the main webring, visitors may be more likely to keep looking through IndieWeb sites.

We could either:

  • Remove the inactive profiles; or
  • Create a new status called "inactive" and add a new subheading to display these webring participants.

We could send a webmention to all inactive profiles asking whether they'd still like to be included on the webring. If someone does not respond, we could add them to the "inactive" list.

Some people may have accidentally removed or changed the webring as they have updated their site and so I'd be in favor of creating a new "inactive" status.

By my count, there are 25 inactive users out of a total of 132 users. One of the inactive users is me as I changed my domain.

I have attached a script below that you can use to verify my findings:

import re
import json
import mf2py
import requests
from bs4 import BeautifulSoup

webring_raw = mf2py.parse(url="https://xn--sr8hvo.ws/directory")

results = []

# Check whether a site still displays a link to the webring

for item in webring_raw["items"]:
	page = item["properties"]["url"]

	try:
		open_page = requests.get(page[0], timeout=10).content

		soup = BeautifulSoup(open_page, "html.parser")

		print("Checking {}".format(page[0]))

		# Search for any instances of the webring URL
		# Users may link in different ways but the webring URL is a common attribute among all profiles

		has_link = soup.findAll("a", attrs={ "href": re.compile("https://xn--sr8hvo.ws*") })

		if has_link:
			status = "active"
		else:
			status = "inactive"

		results.append(
			{
				"url": page[0],
				"status": status
			}
		)
	except:
		print("Error in retrieving {}".format(page[0]))
		results.append(
			{
				"url": page[0],
				"status": "error"
			}
		)

with open("webring_results.json", "w+") as file:
	json.dump(results, file)

# Script to tally up final results

# total = 0
# inactive = 0
# check = 0

# with open("webring_results.json", "r") as file:
# 	data = json.load(file)

# 	for d in data:
# 		total += 1

# 		if d["status"] == "inactive":
# 			inactive += 1
# 		elif d["status"] == "check":
# 			check += 1

# print("There are {} users on the webring.".format(total))
# print("There are {} inactive users.".format(inactive))
# print("There are {} users whose profiles need checked.".format(check))

If you are not familiar with Python, I'd be happy to walk you through the script!

@capjamesg
Copy link
Author

capjamesg commented Oct 5, 2020

Note: The required dependencies are:

  • mf2py
  • requests
  • BeautifulSoup4

@martymcguire
Copy link
Owner

oops, left this open in my tabs! thanks for this @capjamesg ! we discussed it in slack so i'll try to summarize here.

the webring does have a gardener script (here's the source as of this writing). unfortunately as the webring runs on glitch.com i haven't yet felt comfortable setting up an automation to run it periodically.

so, i've been running it by hand whenever folks report in the indieweb chat that the webring is getting full of sites that break the ring.

i've re-run it since we spoke and it cleaned out a bunch of sites.

i like that your version is a bit forgiving about which links a site makes to the webring. the webring checker currently explicitly looks for the previous/next links for that site's unique ID, but encoding issues sometimes give false negatives. 🤔

thanks again!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants