Skip to content

An open, platform-agnostic list of user-agent and referrer regexes for use in podcast analytics services

Notifications You must be signed in to change notification settings

screeley/user-agents

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

User agent list

A list of apps, services and bots that consume podcast data. (A human view is over here).

Contributing to the list

For now, the simplest way is to add to the file at src/user-agents.json. Each app, service or bot should have its own entry.

Each entry must contain the following properties:

  • user_agents (array of strings): a list of regular expressions against which the requesting user-agent should be validated. Backslaches ("\") should be escaped, so instead of ^Echo\/1\., the string should read ^Echo\\/1\\..

Each entry can contain one of the following properties:

  • bot (boolean): set to true when the requesting agent is a bot (no need to set to false otherwise).
  • app (string): set to the human-readable name of the app or service.
  • device (string): set to a slug of the device type, usually one of
    • pc (meaning a desktop or laptop computer running Linux, macOS or Windows)
    • phone
    • radio (a smart radio)
    • speaker (smart speaker)
    • tablet
    • watch
  • os (string): set to the slug of the operating system, usually one of
    • android
    • ios
    • linux
    • macos
    • windows

If bot is set to true, no other properties need to be specified.

Slugs

A slug is a lowercase alphanumeric (ASCII) representation of a string, consisting only of numbers, letters and, in our case, underscores. It's up to apps that implement the list to display this information however they see fit, and using a slug is better for disambiguation.

Unknowns

I propose that we only specify a property above when it is known (not assumed). For example, it's often difficult to know whether an Android app is running on a phone or a tablet. We can assume that since Android tablets are rarer, almost all requests will be via Android phones, but we can't know that.

Parsing order

Right now, there isn't a great deal of thought put into the order... it's sort-of alphabetical depending on circumstance. It might be worth ordering based on the accuracy of each set of regexes.

Future plans

To stop the list becoming unwieldy, I'll probably separate out the apps into separate files, that are then combined together automatically. That makes it harder to make a static list available via Github, but it's possible to run a static site and use a CI script -- a script that is called when code is committed to this repository -- to combine the files and generate the static file.

Happy to accept advice or actual code to make this happen :)

Also, if we do use multiple files, it will become necessary to have some sort of priority or accuracy property for each agent, so that they can be combined in parsing order.

About

An open, platform-agnostic list of user-agent and referrer regexes for use in podcast analytics services

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published