Skip to content
This repository has been archived by the owner on Feb 7, 2023. It is now read-only.

Curated Word Lists #73

Closed
koaning opened this issue Jul 3, 2020 · 4 comments
Closed

Curated Word Lists #73

koaning opened this issue Jul 3, 2020 · 4 comments

Comments

@koaning
Copy link
Owner

koaning commented Jul 3, 2020

Place to discuss a linter to check for biases in word embeddings.

@koaning
Copy link
Owner Author

koaning commented Jul 3, 2020

The idea is to pass it a language backend and then just run a lot of tests to indicate what types of bias may exist in the word-embeddings that you pass it. We can have a linters for different languages and they can be used to generate a report of sorts to demonstrate some of the potential downsides in the dataset.

@koaning
Copy link
Owner Author

koaning commented Jul 3, 2020

Here's some tests that come to mind.

We could project some professions to the man-woman axis in the language embedding. This axis allows us to do some hypothesis tests.

  • There's a set of professions like "nurse" that technically should be gender-neutral. If this is not the case -> flag it.
  • There's a set of descriptions like "beautiful", "handsom" that might also suggest gender imbalance.

@koaning
Copy link
Owner Author

koaning commented Aug 6, 2020

The more that I think about this the more that I wonder if it is better to just add word-lists so people can more easily make comparisons. There's not really a consensus to measuring bias.

@koaning
Copy link
Owner Author

koaning commented Aug 10, 2020

The more and more that I think about it ... a linter is risky. Even a linter will have blind spots in it and we do not want to give a false suggestion here. Instead it may be more appropriate to supply the user with curated word lists.

@koaning koaning changed the title Linter Curated Word Lists Aug 10, 2020
@koaning koaning closed this as completed Aug 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant