This is an experiment in creating a Python library designed to detect names of people with the structure "Mrs. [male first name] [last name]," such as "Mrs. Ralph Mayer" or "Mrs. Tomás Rivera." It uses spaCy and gender-guesser.
This comes out of a practical problem identified in an English-language, U.S.-based library and archives metadata context, in which legacy descriptions are often repurposed for digital collections, dated descriptions don't always align with current modes of address. Names are complicated and this code makes plenty of assumptions about them that are not universal. The goal is not to strive for perfect accuracy or universality, but to flag potential instances in this context where women were not identified by their own names. By flagging these instances we can focus in on them for human review, revision and further research where needed, in order to more equitably name and represent women in descriptive metadata.
This is a work in progress, written as an exploratory experiment by someone who is primarily a metadata librarian rather than a developer. It was developed in a Python 3.7 environment and should work with other 3.x versions, but hasn't been tested extensively.
- spaCy English model
import mrs data = mrs.Text(input_string) flagged_names =  for entity in data.mrs_names: name = mrs.Name(entity) if name.format == "first_last": if name.gender_guess not in ["female", "mostly_female"]: flagged_names.append("Mrs. " + name.text)
For a complete example of analyzing a tabular metadata file and creating a CSV report, see example.py.