# **Probability of having at least one Sister**

Assume the following population of families: 
* 15% of families have 0 children 
* 20% of families have 1 child 
* 30% of families have 2 children 
* 20% of families have 3 children 
* 15% of families have 4 children  

If we randomly select a child from our population of children,
what is the probability that this child has at least one sister?

We should use the families with children of 2, 3 or 4. Assume the child of interest is the first child in the family, and siblings are related to the first child.

* 30% of the families have 2 children, combinations (2^2 = 4):
    - B | B
    - B | G
    - G | B   
    - G | G

    Probability of having at least one sister = 1 - Probability of having only brothers = 1 - (2/4) = 2/4

* 20% of the families have 3 children, combinations (2^3 = 8):
    - B | B B
    - B | B G
    - B | G B
    - B | G G
    - G | B B
    - G | B G
    - G | G B
    - G | G G

    Probability of having at least one sister = 1 - Probability of having only brothers = 1 - (2/8) = 6/8

* 15% of the families have 4 children, combinations (2^4 = 16):
    - B | B B B
    - B | B B G
    - B | B G B
    - B | B G G
    - B | G B B
    - B | G B G
    - B | G G B
    - B | G G G
    - G | B B B
    - G | B B G
    - G | B G B
    - G | B G G
    - G | G B B
    - G | G B G
    - G | G G B
    - G | G G G

    Probability of having at least one sister = 1 - Probability of having only brothers = 1 - (2/16) = 14/16

Then, the probability of having at least one sister is:
$$ P = 0.3 \times \frac{2}{4} + 0.2 \times \frac{6}{8} + 0.15 \times \frac{14}{16} = 0.43125 $$

We can also calculate this by simulating the children and families. 

In [1]:
import random
from collections import Counter


def generate_children():
    return random.choice(["BOY", "GIRL"])


random_children = [generate_children() for _ in range(10_000)]

# Roughly 50% of the children are girls and 50% are boys.
Counter(random_children)

Counter({'GIRL': 5046, 'BOY': 4954})

In [2]:
families_to_children = {
    "A": 0,
    "B": 1,
    "C": 2,
    "D": 3,
    "E": 4,
}

families_probs = {
    "A": 0.15,
    "B": 0.20,
    "C": 0.30,
    "D": 0.20,
    "E": 0.15,
}


def select_family():
    families, weights = zip(*families_probs.items())
    return random.choices(families, weights=weights, k=1)[0]


random_children = [select_family() for _ in range(10_000)]
Counter(random_children)

Counter({'C': 2976, 'D': 2059, 'B': 1985, 'A': 1513, 'E': 1467})

In [3]:
def get_children_from_family(family):
    num_children = families_to_children[family]
    return [generate_children() for _ in range(num_children)]


assert len(get_children_from_family("A")) == 0
assert len(get_children_from_family("B")) == 1
assert len(get_children_from_family("C")) == 2
assert len(get_children_from_family("D")) == 3
assert len(get_children_from_family("E")) == 4

In [4]:
from typing import List


def has_sister(children: List[str]) -> bool:
    # If there are no children or only one child, there can't be a sister.
    if len(children) in [0, 1]:
        return False

    self, siblings = children[0], children[1:]
    return siblings.count("GIRL") >= 1


assert has_sister([]) == False
assert has_sister(["GIRL"]) == False
assert has_sister(["GIRL", "GIRL"]) == True
assert has_sister(["BOY", "GIRL"]) == True
assert has_sister(["BOY", "BOY"]) == False

In [5]:
for _ in range(10):
    family = select_family()
    children = get_children_from_family(family)
    sister = has_sister(children)

    print(f"Family {family} has {children} and has sister: {sister}")

Family A has [] and has sister: False
Family D has ['BOY', 'GIRL', 'GIRL'] and has sister: True
Family D has ['GIRL', 'GIRL', 'BOY'] and has sister: True
Family E has ['GIRL', 'GIRL', 'BOY', 'BOY'] and has sister: True
Family E has ['BOY', 'GIRL', 'GIRL', 'BOY'] and has sister: True
Family D has ['GIRL', 'BOY', 'GIRL'] and has sister: True
Family C has ['BOY', 'BOY'] and has sister: False
Family E has ['GIRL', 'GIRL', 'BOY', 'GIRL'] and has sister: True
Family E has ['GIRL', 'GIRL', 'BOY', 'BOY'] and has sister: True
Family C has ['BOY', 'BOY'] and has sister: False


In [6]:
NUM_TRIALS = 100_000
sister_counts = Counter()

for _ in range(NUM_TRIALS):
    family = select_family()
    children = get_children_from_family(family)
    sister = has_sister(children)
    sister_counts[sister] += 1

print(sister_counts)
print(sister_counts[True] / NUM_TRIALS)

Counter({False: 56483, True: 43517})
0.43517


Which is almost $0.43125$, which is the expected value.