Son Bias in the US: Evidence from Business Names
I estimate the bias for sons by examining how common words
sons are compared to
daughter(s) in the names of businesses.
In the US, all businesses have to register with a state. All states provide a way to search business names, in part so that new companies can pick names that haven't been used before.
I begin by searching for
daughter(s) in states' databases of business names. But the results of searching
son are inflated because of three reasons:
sonis part of many English words, from names such as
Robinsonto ordinary English words like mason (which can also be a name).
sonis a Korean name.
some businesses use the word
sonplayfully. For instance,
sonis a homonym of sun and some people use that to create names like
son of a beach
I address the first concern by using a regex that only looks at words that exactly match
sons. I also check if the string contains the words
daughters. But not all states allow for regex searches or allow people to download a full set of results. Where possible, I try to draw a lower bound. But still some care is needed in interpreting the results.
In all, I find that the conservative estimate of son to daughter ratio is between 4 to 1 to 26 to 1 across states.
Script for addressing the first concern.
- Regex Script
AL returns at max. 1000 results. Results for
son(s) are over a 1000. But when you apply regex to the 1,000, 884 come up as true positive. So the most conservative son:daughter ratio is 4.
CA returns a max. of 500 results. But it gives you total results (3609 vs. 150). We download the 500 results for
son(s) and apply the regex. 499 come up as true positive. So the most conservative son:daughter ratio is 24.
HI returns a max. of 300 results. But it gives you total results (10,641 vs. 88). We download the 300 results for
son(s) and apply the regex. 41 come up as true positive. So adjusted estimate = (41/300)*10,641 = 1454. The most conservative son:daughter ratio is 17.
MT provides all the search results and we run a regex to narrow down to cases where son(s) is a separate word. A brief glimpse suggests all of the results are legitimate, of the variety
X and Son(s) etc. There the ratio between business names with the word son and daughter is about 4.
OH also provides an easy way to download the results. The ratio is about 26 to 1.
OR doesn't return more than 1,000 results. Results for son are over a 1000. But when you apply regex to the 1,000, 985 come up as true positive. So the most conservative son:daughter ratio is 4.
WA returns all the results and after applying the regex, we get 2,424 results for son(s). This means a ratio of 15.
|State||Son||Daughter||Son/Daughter Ratio||Conservative Est.|
Searched on 11/10/2019 or later
Existence of "son" in the name doesn't preclude existence of the word daughter. Vice versa.
For links to all 50 SoS Business Entity Search Links
- caps returns at 1000.
- gives the number of results.
- you need to do separate searches for corporations and llc.
- gives counts but returns only 300.
- problematic regex search as counts words with son in them and funny things like 'son of a beach'
- regex used:
- regex used:
- keyword search
- returns number of results
- offers downloadable list
- doesn't do a good regex search. need to run regex.
- pop-up tells us the number of search results if search results > 500
- max results capped at 1000
- can copy and paste easily
- doesn't seem to allow for exhaustive search
- gives the full list of results. downloadable.
- Searches for "son" as a separate word.
- Had to do multiple searches---breaking by time---for son as results > 500