Skip to content
This repository has been archived by the owner on Dec 26, 2023. It is now read-only.

nabeelvalley/gender-pay-gap-bot

 
 

Repository files navigation

gender-pay-gap-bot

Banner

A bot to tweet the gender pay gap of a company, at companies who tweet about international women's day.

You can see the bot in action here for such gold as this Twitter-bot Tweet at UBS

Disclaimer

This was originally hacked together the weekend before IWD so is very badly written, inefficient, buggy with some minor data issues. The main focus was on gathering as much data as possible on UK companies with GPG (gender pay gap) data in order to generate the most tweets in one day. I am currently upgrading and fixing these issues.

Architecture

As of 2022 this is running on AWS and deployed with serverless Architecture

Run

To run:

  1. You need twitter developer access
  2. Make a .env file
  3. run node ./twitter/streamTweets.js or in test mode: node ./twitter/streamTweets.js test

Output of the programme looks like this

Twitter-bot Tweet at UBS

Data

Forming a data set linking UK Gov gender pay gap data with twitter users was by far the hardest part of this hack. The data was downloaded on 2021-03-04 so will be out of date.

Combined GPG data

The GPG data was combined into one file data/companies_GPG_Data.json. This will only be as up to date as the UK Gov GPG data as the deadline is April so more data is submitted then.

Joining twitter data

Twitter data was join on from an API based on company name and location. We then manually checked those linked were correct and added in some more manually gathered data.

This data set is in data/twitterAccountData/twitterUserData-prod.json

TODO

  • Join data on companyId and if company does not exist then on company name to lower
  • Find a more reliable way of finding twitter accounts
  • Clean the rest of the data
  • Write data to a database
  • Collect the twitter Id of the created tweet
  • Check through the tweets for removed attachment_url to see if the company has deleted their tweet
  • Build a backup plan for when we get banned from twitter 😎
  • Make a data set with just companies and twitter accounts
  • Tidy up and reformat the data

Twitter Matching Improvements

Problems

  • We get duplicate companies sometime with different GPG data but same location.
  • Companies house => company website => crawl for twitter url

Manual Data Gathering

  • Search twitter with group: 1
  • no twitter profile: 1

Canvas Lambda Layer

Deploy with this https://github.com/jwerre/node-canvas-lambda

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TypeScript 97.9%
  • Shell 1.9%
  • JavaScript 0.2%