Skip to content
Son Bias in US: Evidence from Business Names
R
Branch: master
Clone or download
Latest commit 8e8c626 Dec 14, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data First pass of 50 states done Dec 11, 2019
script rvest learning Dec 14, 2019
README.md include the intersection Nov 24, 2019
states-work-tracker.xlsx First pass of 50 states done Dec 11, 2019

README.md

Son Bias in the US: Evidence from Business Names

I estimate the bias for sons by examining how common words son or sons are compared to daughter(s) in the names of businesses.

In the US, all businesses have to register with a state. All states provide a way to search business names, in part so that new companies can pick names that haven't been used before.

I begin by searching for son(s) and daughter(s) in states' databases of business names. But the results of searching son are inflated because of three reasons:

  • son is part of many English words, from names such as Jason and Robinson to ordinary English words like mason (which can also be a name).

  • son is a Korean name.

  • some businesses use the word son playfully. For instance, son is a homonym of sun and some people use that to create names like son of a beach

I address the first concern by using a regex that only looks at words that exactly match son or sons. I also check if the string contains the words daughter or daughters. But not all states allow for regex searches or allow people to download a full set of results. Where possible, I try to draw a lower bound. But still some care is needed in interpreting the results.

In all, I find that the conservative estimate of son to daughter ratio is between 4 to 1 to 26 to 1 across states.

Script

Script for addressing the first concern.

Results

AL returns at max. 1000 results. Results for son(s) are over a 1000. But when you apply regex to the 1,000, 884 come up as true positive. So the most conservative son:daughter ratio is 4.

CA returns a max. of 500 results. But it gives you total results (3609 vs. 150). We download the 500 results for son(s) and apply the regex. 499 come up as true positive. So the most conservative son:daughter ratio is 24.

HI returns a max. of 300 results. But it gives you total results (10,641 vs. 88). We download the 300 results for son(s) and apply the regex. 41 come up as true positive. So adjusted estimate = (41/300)*10,641 = 1454. The most conservative son:daughter ratio is 17.

MT provides all the search results and we run a regex to narrow down to cases where son(s) is a separate word. A brief glimpse suggests all of the results are legitimate, of the variety X and Son(s) etc. There the ratio between business names with the word son and daughter is about 4.

OH also provides an easy way to download the results. The ratio is about 26 to 1.

OR doesn't return more than 1,000 results. Results for son are over a 1000. But when you apply regex to the 1,000, 985 come up as true positive. So the most conservative son:daughter ratio is 4.

WA returns all the results and after applying the regex, we get 2,424 results for son(s). This means a ratio of 15.

State Son Daughter Son/Daughter Ratio Conservative Est.
AL 1000+ 126 8 7
CA 3609 150 24 24
HI 1,454 88 17 17
ID 60 39 -
MI 2265 93 24
MT 240 66 4 4
NV 1440 20 72
OH 2550 100 26 26
OR 1000+ 227 - 4
PA NA NA -
WA 2424 161 15 15
WI 845 43 20

Underlying Data

Notes

  1. Searched on 11/10/2019 or later

  2. Existence of "son" in the name doesn't preclude existence of the word daughter. Vice versa.

  3. For links to all 50 SoS Business Entity Search Links

By State

  • AL

    • caps returns at 1000.
  • CA

    • gives the number of results.
    • you need to do separate searches for corporations and llc.
  • CT

  • HI

    • gives counts but returns only 300.
    • problematic regex search as counts words with son in them and funny things like 'son of a beach'
  • ID

    • regex used: .*_son(s)_.* and .*_daughter(s)_.*
  • MI

    • keyword search
    • returns number of results
  • MT

    • offers downloadable list
    • doesn't do a good regex search. need to run regex.
  • NV

    • pop-up tells us the number of search results if search results > 500
  • OR

    • max results capped at 1000
    • can copy and paste easily
  • PA

    • doesn't seem to allow for exhaustive search
  • WA

    • gives the full list of results. downloadable.
  • WI

    • Searches for "son" as a separate word.
    • Had to do multiple searches---breaking by time---for son as results > 500
You can’t perform that action at this time.