## What questions or claims do we want to answer or test?

-   ~~Does applying for a zone other than core increase your chance of winning the lottery?~~
-   Does applying other than Thurs, Fri, Sat increase your chance of winning the lottery?
-   Which preference was awarded the most? Or how many people were awarded a core enchantments on their first option?
-   ~~What's your chance of winning the lottery?~~
-   How much does applying for a zone ther than Core, during non Thurs, Fri, Sat days increase your chances?
-   Can we verify that lottery is working how it says?
-   Did group size in Core effect awarded permits?
-   ~~Did people who didn't apply for Core have a better chance of people who only applied for Core?~~


In [1]:
import pandas as pd

# Import cleaned data from csv file
df = pd.read_csv(
    "./2021_results_cleaned.csv",
    # Import was failing to parse date columns, so I
    # to add the column names
    parse_dates=[
        "preferred_entry_date_1",
        "preferred_entry_date_2",
        "preferred_entry_date_3",
        "awarded_entry_date",
    ],
    date_format="%m-%d-%Y",  # Align format with export format
    na_filter=False,  # Do not convert 'N/A' to NaN
)

df.head()

Unnamed: 0,preferred_entry_date_1,preferred_division_1,minimum_acceptable_group_size_1,preferred_entry_date_2,preferred_division_2,minimum_acceptable_group_size_2,preferred_entry_date_3,preferred_division_3,minimum_acceptable_group_size_3,results_status,awarded_preference,awarded_entry_date,awarded_entrance_code_name,awarded_group_size
0,2021-07-25,Core Enchantment Zone,7,2021-08-01,Core Enchantment Zone,7,2021-07-25,Snow Zone,8,Unsuccessful,0,1970-01-01,,0
1,2021-08-12,Core Enchantment Zone,8,2021-08-12,Colchuck Zone,8,2021-08-12,Eightmile/Caroline Zone,8,Unsuccessful,0,1970-01-01,,0
2,2021-07-30,Core Enchantment Zone,4,2021-08-14,Core Enchantment Zone,4,2021-07-16,Core Enchantment Zone,7,Unsuccessful,0,1970-01-01,,0
3,2021-07-01,Core Enchantment Zone,4,2021-06-09,Snow Zone,4,2021-07-07,Colchuck Zone,4,Unsuccessful,0,1970-01-01,,0
4,2021-06-21,Colchuck Zone,2,2021-06-28,Colchuck Zone,2,2021-07-13,Stuart Zone,2,Unsuccessful,0,1970-01-01,,0


In [2]:
# Simple probability of being awarded a permit
accepted = df["results_status"] == "Accepted"
no_response = df["results_status"] == "No Response"
declined = df["results_status"] == "Declined"

awarded_filter = accepted | no_response | declined

awarded = df[accepted | no_response | declined]

total_awarded = len(awarded)
total = len(df)

prob_awarded = total_awarded / total

print(
    f"Probability of being awarded a permit: {prob_awarded:.2%} ({total_awarded}/{total})"
)

Probability of being awarded a permit: 6.66% (2445/36695)


In [3]:
# Which zone was awarded the most permits?
pd.crosstab(
    awarded["awarded_entrance_code_name"], awarded["results_status"], margins=True
)

results_status,Accepted,Declined,No Response,All
awarded_entrance_code_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Colchuck Zone,298,6,44,348
Core Enchantment Zone,595,8,70,673
Eightmile/Caroline Zone,224,10,60,294
Eightmile/Caroline Zone (stock),15,0,5,20
Snow Zone,536,20,86,642
Stuart Zone,355,13,86,454
Stuart Zone (stock),13,0,1,14
All,2036,57,352,2445


In [4]:
# It's interesting that the Core Enchantment Zone is considered the
# hardest to get a permit for, but it's the most awarded zone.

# Event A - You are awarded a permit for the Core Enchantments
pd.crosstab(df["awarded_entrance_code_name"], df["results_status"], margins=True)

results_status,Accepted,Cancelled,Declined,No Response,Unsuccessful,All
awarded_entrance_code_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Colchuck Zone,298,0,6,44,0,348
Core Enchantment Zone,595,0,8,70,0,673
Eightmile/Caroline Zone,224,0,10,60,0,294
Eightmile/Caroline Zone (stock),15,0,0,5,0,20
,0,450,0,0,33800,34250
Snow Zone,536,0,20,86,0,642
Stuart Zone,355,0,13,86,0,454
Stuart Zone (stock),13,0,0,1,0,14
All,2036,450,57,352,33800,36695


In [5]:
# Event A - You are awarded a permit for the Core Enchantments

awarded_core = awarded["awarded_entrance_code_name"] == "Core Enchantment Zone"
total_awarded_core = len(awarded[awarded_core])

prob_awarded_core = total_awarded_core / total

# You are awarded a permit for the Core Enchantments
print(
    f"Probability that you were awarded a permit for the Core Enchantments: {prob_awarded_core:.2%} ({total_awarded_core}/{total})"
)

Probability that you were awarded a permit for the Core Enchantments: 1.83% (673/36695)


In [6]:
# Event A - You were awarded a permit for the Core Enchantments
# Event B - You were awarded a permit

# Given that you were awarded a permit, what is the probability that it was for the Core Enchantments?
# P(A|B) = P(A and B) / P(B) = P(A) / P(B) = 0.0183 / 0.0666
print(
    f"Probability that your awarded permit was for the Core Enchantments: {prob_awarded_core / prob_awarded:.2%} ({prob_awarded_core}/{prob_awarded})"
)

Probability that your awarded permit was for the Core Enchantments: 27.53% (0.018340373347867558/0.06663033110778036)


In [7]:
# Event A - You applied for a permit for the Core Enchantments
# Event B - You were awarded a permit

# Given that you applied for a permit for the Core Enchantments,
# What is the probability that you were awarded a permit?

applied_core_1 = df["preferred_division_1"] == "Core Enchantment Zone"
applied_core_2 = df["preferred_division_2"] == "Core Enchantment Zone"
applied_core_3 = df["preferred_division_3"] == "Core Enchantment Zone"

applied_core_filter = applied_core_1 | applied_core_2 | applied_core_3

applied_core = df[applied_core_filter]

total_applied_core = len(applied_core)

prob_applied_core = total_applied_core / total

total_applied_core_awarded = len(df[applied_core_filter & awarded_filter])

prob_applied_core_awarded = total_applied_core_awarded / total_applied_core

# P(B|A) = P(B and A) / P(A)
prob_awarded_given_applied_core = prob_applied_core_awarded / prob_applied_core

print(
    f"Given you applied for a Core Enchantments Permit, what is the probability you were awarded a permit : {prob_awarded_given_applied_core:.2%} ({prob_applied_core_awarded}/{prob_applied_core})"
)

Given you applied for a Core Enchantments Permit, what is the probability you were awarded a permit : 6.76% (0.056166136281618054/0.8306581278103283)


In [8]:
# Event B - You were awarded a permit
# Event A - You did not apply for a permit for the Core Enchantments

# Given that you did not apply for a permit for the Core Enchantments,
# what is the probability that you were awarded a permit?

did_not_apply_core_filter = ~applied_core_filter

did_not_apply_core = df[did_not_apply_core_filter]

total_did_not_apply_core = len(did_not_apply_core)  # 6214

prob_did_not_apply_core = total_did_not_apply_core / total  # 6214 / 36695 = 0.1694

total_did_not_apply_core_awarded = len(
    df[did_not_apply_core_filter & awarded_filter]
)  # 733

prob_did_not_apply_core_awarded = (
    total_did_not_apply_core_awarded / total
)  # 733 / 36695 = 0.0199

# P(B|A) = P(B and A) / P(A)
prob_awarded_given_did_not_apply_core = (
    prob_did_not_apply_core_awarded / prob_did_not_apply_core
)

print(
    f"Given you did not apply for a Core Enchantments Permit, what is the probability you were awarded a permit : {prob_awarded_given_did_not_apply_core:.2%} ({prob_did_not_apply_core_awarded}/{prob_did_not_apply_core})"
)

Given you did not apply for a Core Enchantments Permit, what is the probability you were awarded a permit : 11.80% (0.019975473497751736/0.16934187218967162)


|                        | Awarded | Not Awarded |       |
| ---------------------- | ------- | ----------- | ----- |
| Applied for Core       | 1712    | 28769       | 30481 |
| Did not apply for Core | 733     | 5481        | 6214  |
|                        | 2445    | 34250       | 36695 |

There was an 11.80% you were awarded a permit given you **did not** apply for a Core Enchantments Permit.

There was an 6.76% you were awarded a permit given you **did** apply for a Core Enchantments Permit.

The claim that _you should not apply for the Core Enchantments Zone if you want to win the lottery because there is an 6% chance you win the Enchantments lottery compared to only a 1% you win the lottery for the Core Enchantments Zone_ is misleading. It's misleading because it compares the probability of a simple probability (winning) to a joint probability (winning and Core).

The claim is correct. You do have a better chance of being awarded a permit if you don't apply for the Core Zone. However, not everyone applies for a Core Zone permit, so 1% value compared to the 6% value isn't correct.

Unfortunately, this is still uneasy tread because someone that applied for _at least_ one Core zone and was awarded a permit is a lot different than someone that applied for _at least_ one Core Zone and was awarded a permit _for the Core Zone_.

I would conclude that, if you are concerned about winning, you have a higher probability of winning if you don't apply for the Core Zone at all. But, you still have a 6.76% of winning if _at least one_ of your options is the Core Zone. Albeit, it may not be in the Core Zone, but it shouldn't stop you from splurging on one of your entries.

This opens the door to more exciting questions.

-   ~~What is the probability of being awarded a Core Enchantments permit given you did/did not apply for the Core?~~
-   ~~What is the probability of getting a Core Zone if you only apply for Core Zone?~~
-   ~~How do the probabilities stratify across all the zones depending on your application entries?~~


In [9]:
# Prior probability of being awarded a permit for the Core Enchantments: 1.83% (672/36695)

# Event A - You are awarded a permit for the Core Enchantments
# Event B - You applied for a permit for the Core Enchantments

awarded_core_filter = df["awarded_entrance_code_name"] == "Core Enchantment Zone"

total_applied_core_awarded_core = len(df[applied_core_filter & awarded_core_filter])
prob_awarded_given_applied_core = total_applied_core_awarded_core / total_applied_core

print(
    f"Given you applied for a Core Enchantments Permit, what is the probability you were awarded a permit for the Core Enchantments? : {prob_awarded_given_applied_core:.2%} ({total_applied_core_awarded}/{total_applied_core})"
)

Given you applied for a Core Enchantments Permit, what is the probability you were awarded a permit for the Core Enchantments? : 2.21% (1712/30481)


In [10]:
applied_core_awarded_other = applied_core[
    (applied_core["awarded_entrance_code_name"] != "Core Enchantment Zone")
    & (applied_core["awarded_entrance_code_name"] != "N/A")
]
len(applied_core_awarded_other)

1039

Just looking at the applicants that applied for a Core Zone permit we found there was a 2.21% chance of being awarded permit for the Core Zone given you applied for the Core Zone in _at least_ one option.

To reiterate, saying you shouldn't try for a Core Zone if you want to win _because there's a 1% chance_ is misleading. It needs more information. It'd be better to say, _if you apply for a Core Zone permit in at least one entry_ your chances of getting a Core Zone permit are 2.21%. Conversely, if you don't apply for the Core Zone in any options, your chances of getting a Core Zone are 0%.

Applicants that applied with at least one Core Zone option had a 2.21% of being awarded a permit for the Core Zone. However, these applicants didn't miss out on winning permits in other zones. A total of 1,039 applicants were awarded permits other than the Core Zone **despite** applying for at least one Core Zone option. That makes up ~42.5% of the total permits awarded. Following from our findings above, they maintained a 6.76% chance of winning a permit.


In [11]:
division_2_na_filter = df["preferred_division_2"] == "N/A"
division_3_na_filter = df["preferred_division_3"] == "N/A"

only_applied_core_filter = applied_core_1 & applied_core_2 & applied_core_3
only_applied_core_2_filter = (
    applied_core_1 & division_2_na_filter & division_3_na_filter
)
only_applied_core_3_filter = applied_core_1 & applied_core_2 & division_3_na_filter

only_applied_core = df[
    only_applied_core_filter | only_applied_core_2_filter | only_applied_core_3_filter
]

total_only_applied_core = len(only_applied_core)

total_only_applied_core_awarded_core = len(
    df[only_applied_core_filter & awarded_core_filter]
)

prob_only_applied_core_awarded_core = (
    total_only_applied_core_awarded_core / total_only_applied_core
)

print(
    f"Given you only applied for a Core Enchantments Permit, what is the probability you were awarded a permit for the Core Enchantments? : {prob_only_applied_core_awarded_core:.2%} ({total_only_applied_core_awarded_core}/{total_only_applied_core})"
)

Given you only applied for a Core Enchantments Permit, what is the probability you were awarded a permit for the Core Enchantments? : 2.63% (467/17743)


Applicants who applied for the Core Zone saw there chances of landing a Core Zone permit (2.63%) increase 0.42% compared to applicants who had the Core Zone for at least one option (2.21%).


In [12]:
# Get a list of all the zones
zones_values = df["preferred_division_1"].unique()

prob_awarded_zone_applied_for = []

# Loop over the zones and calculate the probability of being awarded a permit for each zone
for zone in zones_values:
    applied_1 = df["preferred_division_1"] == zone
    applied_2 = df["preferred_division_2"] == zone
    applied_3 = df["preferred_division_3"] == zone

    zone_filter = applied_1 | applied_2 | applied_3

    applied_zone = df[zone_filter]

    total_zone = len(applied_zone)

    awarded_zone_filter = df["awarded_entrance_code_name"] == zone

    total_zone_awarded = len(df[zone_filter & awarded_zone_filter])
    prob_zone_awarded = total_zone_awarded / total_zone

    prob_awarded_zone_applied_for.append(
        [zone, prob_zone_awarded, total_zone_awarded, total_zone]
    )

    print(
        f"Probability of being awarded a permit for {zone}, given applied 1+ option in zone: {prob_zone_awarded:.2%} ({total_zone_awarded}/{total_zone})"
    )

# Sort the list of probabilities
prob_awarded_zone_applied_for.sort(key=lambda x: x[1], reverse=True)

Probability of being awarded a permit for Core Enchantment Zone, given applied 1+ option in zone: 2.21% (673/30481)
Probability of being awarded a permit for Colchuck Zone, given applied 1+ option in zone: 2.97% (348/11698)
Probability of being awarded a permit for Snow Zone, given applied 1+ option in zone: 7.46% (642/8607)
Probability of being awarded a permit for Stuart  Zone, given applied 1+ option in zone: 10.09% (454/4501)
Probability of being awarded a permit for Eightmile/Caroline Zone, given applied 1+ option in zone: 14.56% (294/2019)
Probability of being awarded a permit for Eightmile/Caroline Zone (stock), given applied 1+ option in zone: 8.81% (20/227)
Probability of being awarded a permit for Stuart Zone (stock), given applied 1+ option in zone: 7.95% (14/176)


In [13]:
# Create dataframe from list
df_prob_awarded_zone_applied_for = pd.DataFrame(
    prob_awarded_zone_applied_for,
    columns=["Zone", "Probability", "Total Awarded", "Total Applied"],
)

# Add columns showing probability as percent value
df_prob_awarded_zone_applied_for["Probability (%)"] = df_prob_awarded_zone_applied_for[
    "Probability"
].map("{:.2%}".format)

# Show crosstab of the new dataframe
df_prob_awarded_zone_applied_for

Unnamed: 0,Zone,Probability,Total Awarded,Total Applied,Probability (%)
0,Eightmile/Caroline Zone,0.145617,294,2019,14.56%
1,Stuart Zone,0.100866,454,4501,10.09%
2,Eightmile/Caroline Zone (stock),0.088106,20,227,8.81%
3,Stuart Zone (stock),0.079545,14,176,7.95%
4,Snow Zone,0.07459,642,8607,7.46%
5,Colchuck Zone,0.029749,348,11698,2.97%
6,Core Enchantment Zone,0.022079,673,30481,2.21%


Very interesting results. The Colchuck Zone is almost equally as difficult as the Core Zone. Then, we see a large jump in your chances as we go to the Snow Zone.

Next we'll want take a look:

-   How did day of week for entry affect chances of being awarded a permit?
-   How did the month affect someones chances, are these dependent variables?
-   Are your chances of being awarded a permit dependent on the group size?
