## What questions or claims do we want to answer or test?

-   ~~Does applying for a zone other than core increase your chance of winning the lottery?~~
-   ~~Does applying other than Thurs, Fri, Sat increase your chance of winning the lottery?~~
-   Which preference was awarded the most? Or how many people were awarded a core enchantments on their first option?
-   ~~What's your chance of winning the lottery?~~
-   How much does applying for a zone ther than Core, during non Thurs, Fri, Sat days increase your chances?
-   Can we verify that lottery is working how it says?
-   ~~Did group size in Core effect awarded permits?~~
-   ~~Did people who didn't apply for Core have a better chance of people who only applied for Core?~~


# Zone Analysis

This section of the analysis looks that the probabilities surrounding different zones and application selections.


In [2]:
import pandas as pd

# Import cleaned data from csv file
df = pd.read_csv(
    "./2021_results_cleaned.csv",
    # Import was failing to parse date columns, so I
    # to add the column names
    parse_dates=[
        "preferred_entry_date_1",
        "preferred_entry_date_2",
        "preferred_entry_date_3",
        "awarded_entry_date",
    ],
    date_format="%m-%d-%Y",  # Align format with export format
    na_filter=False,  # Do not convert 'N/A' to NaN
)

df.head()

Unnamed: 0,preferred_entry_date_1,preferred_division_1,minimum_acceptable_group_size_1,preferred_entry_date_2,preferred_division_2,minimum_acceptable_group_size_2,preferred_entry_date_3,preferred_division_3,minimum_acceptable_group_size_3,results_status,awarded_preference,awarded_entry_date,awarded_entrance_code_name,awarded_group_size
0,2021-07-25,Core Enchantment Zone,7,2021-08-01,Core Enchantment Zone,7,2021-07-25,Snow Zone,8,Unsuccessful,0,1970-01-01,,0
1,2021-08-12,Core Enchantment Zone,8,2021-08-12,Colchuck Zone,8,2021-08-12,Eightmile/Caroline Zone,8,Unsuccessful,0,1970-01-01,,0
2,2021-07-30,Core Enchantment Zone,4,2021-08-14,Core Enchantment Zone,4,2021-07-16,Core Enchantment Zone,7,Unsuccessful,0,1970-01-01,,0
3,2021-07-01,Core Enchantment Zone,4,2021-06-09,Snow Zone,4,2021-07-07,Colchuck Zone,4,Unsuccessful,0,1970-01-01,,0
4,2021-06-21,Colchuck Zone,2,2021-06-28,Colchuck Zone,2,2021-07-13,Stuart Zone,2,Unsuccessful,0,1970-01-01,,0


In [3]:
# Simple probability of being awarded a permit
accepted = df["results_status"] == "Accepted"
no_response = df["results_status"] == "No Response"
declined = df["results_status"] == "Declined"

awarded_filter = accepted | no_response | declined

awarded = df[accepted | no_response | declined]

total_awarded = len(awarded)
total = len(df)

prob_awarded = total_awarded / total

print(
    f"Probability of being awarded a permit: {prob_awarded:.2%} ({total_awarded}/{total})"
)

Probability of being awarded a permit: 6.66% (2445/36695)


In [4]:
# Which zone was awarded the most permits?
pd.crosstab(
    awarded["awarded_entrance_code_name"], awarded["results_status"], margins=True
)

results_status,Accepted,Declined,No Response,All
awarded_entrance_code_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Colchuck Zone,298,6,44,348
Core Enchantment Zone,595,8,70,673
Eightmile/Caroline Zone,224,10,60,294
Eightmile/Caroline Zone (stock),15,0,5,20
Snow Zone,536,20,86,642
Stuart Zone,355,13,86,454
Stuart Zone (stock),13,0,1,14
All,2036,57,352,2445


In [5]:
# It's interesting that the Core Enchantment Zone is considered the
# hardest to get a permit for, but it's the most awarded zone.

# Event A - You are awarded a permit for the Core Enchantments
pd.crosstab(df["awarded_entrance_code_name"], df["results_status"], margins=True)

results_status,Accepted,Cancelled,Declined,No Response,Unsuccessful,All
awarded_entrance_code_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Colchuck Zone,298,0,6,44,0,348
Core Enchantment Zone,595,0,8,70,0,673
Eightmile/Caroline Zone,224,0,10,60,0,294
Eightmile/Caroline Zone (stock),15,0,0,5,0,20
,0,450,0,0,33800,34250
Snow Zone,536,0,20,86,0,642
Stuart Zone,355,0,13,86,0,454
Stuart Zone (stock),13,0,0,1,0,14
All,2036,450,57,352,33800,36695


In [6]:
# Event A - You are awarded a permit for the Core Enchantments

awarded_core = awarded["awarded_entrance_code_name"] == "Core Enchantment Zone"
total_awarded_core = len(awarded[awarded_core])

prob_awarded_core = total_awarded_core / total

# You are awarded a permit for the Core Enchantments
print(
    f"Probability that you were awarded a permit for the Core Enchantments: {prob_awarded_core:.2%} ({total_awarded_core}/{total})"
)

Probability that you were awarded a permit for the Core Enchantments: 1.83% (673/36695)


In [7]:
# Event A - You were awarded a permit for the Core Enchantments
# Event B - You were awarded a permit

# Given that you were awarded a permit, what is the probability that it was for the Core Enchantments?
# P(A|B) = P(A and B) / P(B) = P(A) / P(B) = 0.0183 / 0.0666
print(
    f"Probability that your awarded permit was for the Core Enchantments: {prob_awarded_core / prob_awarded:.2%} ({prob_awarded_core}/{prob_awarded})"
)

Probability that your awarded permit was for the Core Enchantments: 27.53% (0.018340373347867558/0.06663033110778036)


In [8]:
# Event A - You applied for a permit for the Core Enchantments
# Event B - You were awarded a permit

# Given that you applied for a permit for the Core Enchantments,
# What is the probability that you were awarded a permit?

applied_core_1 = df["preferred_division_1"] == "Core Enchantment Zone"
applied_core_2 = df["preferred_division_2"] == "Core Enchantment Zone"
applied_core_3 = df["preferred_division_3"] == "Core Enchantment Zone"

applied_core_filter = applied_core_1 | applied_core_2 | applied_core_3

applied_core = df[applied_core_filter]

total_applied_core = len(applied_core)

prob_applied_core = total_applied_core / total

total_applied_core_awarded = len(df[applied_core_filter & awarded_filter])

prob_applied_core_awarded = total_applied_core_awarded / total_applied_core

# P(B|A) = P(B and A) / P(A)
prob_awarded_given_applied_core = prob_applied_core_awarded / prob_applied_core

print(
    f"Given you applied for a Core Enchantments Permit, what is the probability you were awarded a permit : {prob_awarded_given_applied_core:.2%} ({prob_applied_core_awarded}/{prob_applied_core})"
)

Given you applied for a Core Enchantments Permit, what is the probability you were awarded a permit : 6.76% (0.056166136281618054/0.8306581278103283)


In [9]:
# Event B - You were awarded a permit
# Event A - You did not apply for a permit for the Core Enchantments

# Given that you did not apply for a permit for the Core Enchantments,
# what is the probability that you were awarded a permit?

did_not_apply_core_filter = ~applied_core_filter

did_not_apply_core = df[did_not_apply_core_filter]

total_did_not_apply_core = len(did_not_apply_core)  # 6214

prob_did_not_apply_core = total_did_not_apply_core / total  # 6214 / 36695 = 0.1694

total_did_not_apply_core_awarded = len(
    df[did_not_apply_core_filter & awarded_filter]
)  # 733

prob_did_not_apply_core_awarded = (
    total_did_not_apply_core_awarded / total
)  # 733 / 36695 = 0.0199

# P(B|A) = P(B and A) / P(A)
prob_awarded_given_did_not_apply_core = (
    prob_did_not_apply_core_awarded / prob_did_not_apply_core
)

print(
    f"Given you did not apply for a Core Enchantments Permit, what is the probability you were awarded a permit : {prob_awarded_given_did_not_apply_core:.2%} ({prob_did_not_apply_core_awarded}/{prob_did_not_apply_core})"
)

Given you did not apply for a Core Enchantments Permit, what is the probability you were awarded a permit : 11.80% (0.019975473497751736/0.16934187218967162)


|                        | Awarded | Not Awarded |       |
| ---------------------- | ------- | ----------- | ----- |
| Applied for Core       | 1712    | 28769       | 30481 |
| Did not apply for Core | 733     | 5481        | 6214  |
|                        | 2445    | 34250       | 36695 |

There was an 11.80% you were awarded a permit given you **did not** apply for a Core Enchantments Permit.

There was an 6.76% you were awarded a permit given you **did** apply for a Core Enchantments Permit.

The claim that _you should not apply for the Core Enchantments Zone if you want to win the lottery because there is an 6% chance you win the Enchantments lottery compared to only a 1% you win the lottery for the Core Enchantments Zone_ is misleading. It's misleading because it compares the probability of a simple probability (winning) to a joint probability (winning and Core).

The claim is correct. You do have a better chance of being awarded a permit if you don't apply for the Core Zone. However, not everyone applies for a Core Zone permit, so 1% value compared to the 6% value isn't correct.

Unfortunately, this is still uneasy tread because someone that applied for _at least_ one Core zone and was awarded a permit is a lot different than someone that applied for _at least_ one Core Zone and was awarded a permit _for the Core Zone_.

I would conclude that, if you are concerned about winning, you have a higher probability of winning if you don't apply for the Core Zone at all. But, you still have a 6.76% of winning if _at least one_ of your options is the Core Zone. Albeit, it may not be in the Core Zone, but it shouldn't stop you from splurging on one of your entries.

This opens the door to more exciting questions.

-   ~~What is the probability of being awarded a Core Enchantments permit given you did/did not apply for the Core?~~
-   ~~What is the probability of getting a Core Zone if you only apply for Core Zone?~~
-   ~~How do the probabilities stratify across all the zones depending on your application entries?~~


In [10]:
# Prior probability of being awarded a permit for the Core Enchantments: 1.83% (672/36695)

# Event A - You are awarded a permit for the Core Enchantments
# Event B - You applied for a permit for the Core Enchantments

awarded_core_filter = df["awarded_entrance_code_name"] == "Core Enchantment Zone"

total_applied_core_awarded_core = len(df[applied_core_filter & awarded_core_filter])
prob_awarded_given_applied_core = total_applied_core_awarded_core / total_applied_core

print(
    f"Given you applied for a Core Enchantments Permit, what is the probability you were awarded a permit for the Core Enchantments? : {prob_awarded_given_applied_core:.2%} ({total_applied_core_awarded}/{total_applied_core})"
)

Given you applied for a Core Enchantments Permit, what is the probability you were awarded a permit for the Core Enchantments? : 2.21% (1712/30481)


In [11]:
applied_core_awarded_other = applied_core[
    (applied_core["awarded_entrance_code_name"] != "Core Enchantment Zone")
    & (applied_core["awarded_entrance_code_name"] != "N/A")
]
len(applied_core_awarded_other)

1039

Just looking at the applicants that applied for a Core Zone permit we found there was a 2.21% chance of being awarded permit for the Core Zone given you applied for the Core Zone in _at least_ one option.

To reiterate, saying you shouldn't try for a Core Zone if you want to win _because there's a 1% chance_ is misleading. It needs more information. It'd be better to say, _if you apply for a Core Zone permit in at least one entry_ your chances of getting a Core Zone permit are 2.21%. Conversely, if you don't apply for the Core Zone in any options, your chances of getting a Core Zone are 0%.

Applicants that applied with at least one Core Zone option had a 2.21% of being awarded a permit for the Core Zone. However, these applicants didn't miss out on winning permits in other zones. A total of 1,039 applicants were awarded permits other than the Core Zone **despite** applying for at least one Core Zone option. That makes up ~42.5% of the total permits awarded. Following from our findings above, they maintained a 6.76% chance of winning a permit.


In [12]:
division_2_na_filter = df["preferred_division_2"] == "N/A"
division_3_na_filter = df["preferred_division_3"] == "N/A"

only_applied_core_filter = applied_core_1 & applied_core_2 & applied_core_3
only_applied_core_2_filter = (
    applied_core_1 & division_2_na_filter & division_3_na_filter
)
only_applied_core_3_filter = applied_core_1 & applied_core_2 & division_3_na_filter

only_applied_core = df[
    only_applied_core_filter | only_applied_core_2_filter | only_applied_core_3_filter
]

total_only_applied_core = len(only_applied_core)

total_only_applied_core_awarded_core = len(
    df[only_applied_core_filter & awarded_core_filter]
)

prob_only_applied_core_awarded_core = (
    total_only_applied_core_awarded_core / total_only_applied_core
)

print(
    f"Given you only applied for a Core Enchantments Permit, what is the probability you were awarded a permit for the Core Enchantments? : {prob_only_applied_core_awarded_core:.2%} ({total_only_applied_core_awarded_core}/{total_only_applied_core})"
)

Given you only applied for a Core Enchantments Permit, what is the probability you were awarded a permit for the Core Enchantments? : 2.63% (467/17743)


Applicants who applied for the Core Zone saw there chances of landing a Core Zone permit (2.63%) increase 0.42% compared to applicants who had the Core Zone for at least one option (2.21%).


In [13]:
# Get a list of all the zones
zones_values = df["preferred_division_1"].unique()

prob_awarded_zone_applied_for = []

# Loop over the zones and calculate the probability of being awarded a permit for each zone
for zone in zones_values:
    applied_1 = df["preferred_division_1"] == zone
    applied_2 = df["preferred_division_2"] == zone
    applied_3 = df["preferred_division_3"] == zone

    zone_filter = applied_1 | applied_2 | applied_3

    applied_zone = df[zone_filter]

    total_zone = len(applied_zone)

    awarded_zone_filter = df["awarded_entrance_code_name"] == zone

    total_zone_awarded = len(df[zone_filter & awarded_zone_filter])
    prob_zone_awarded = total_zone_awarded / total_zone

    prob_awarded_zone_applied_for.append(
        [zone, prob_zone_awarded, total_zone_awarded, total_zone]
    )

    print(
        f"Probability of being awarded a permit for {zone}, given applied 1+ option in zone: {prob_zone_awarded:.2%} ({total_zone_awarded}/{total_zone})"
    )

Probability of being awarded a permit for Core Enchantment Zone, given applied 1+ option in zone: 2.21% (673/30481)
Probability of being awarded a permit for Colchuck Zone, given applied 1+ option in zone: 2.97% (348/11698)
Probability of being awarded a permit for Snow Zone, given applied 1+ option in zone: 7.46% (642/8607)
Probability of being awarded a permit for Stuart  Zone, given applied 1+ option in zone: 10.09% (454/4501)
Probability of being awarded a permit for Eightmile/Caroline Zone, given applied 1+ option in zone: 14.56% (294/2019)
Probability of being awarded a permit for Eightmile/Caroline Zone (stock), given applied 1+ option in zone: 8.81% (20/227)
Probability of being awarded a permit for Stuart Zone (stock), given applied 1+ option in zone: 7.95% (14/176)


In [14]:
def sort_zone_probabilities(x):
    return x.sort(key=lambda x: x[1], reverse=True)


def create_zone_probability_dataframe(x, columns):
    return pd.DataFrame(
        x,
        columns=columns,
    )


def add_probability_percent_column(df):
    df["Probability (%)"] = df["Probability"].map("{:.2%}".format)
    return df


def zone_probabilities_to_crosstab(x, columns):
    sort_zone_probabilities(x)
    return add_probability_percent_column(create_zone_probability_dataframe(x, columns))


df_prob_awarded_zone_applied_for = zone_probabilities_to_crosstab(
    prob_awarded_zone_applied_for,
    ["Zone", "Probability", "Total Awarded", "Total Applied"],
)

# Show crosstab of the new dataframe
df_prob_awarded_zone_applied_for

Unnamed: 0,Zone,Probability,Total Awarded,Total Applied,Probability (%)
0,Eightmile/Caroline Zone,0.145617,294,2019,14.56%
1,Stuart Zone,0.100866,454,4501,10.09%
2,Eightmile/Caroline Zone (stock),0.088106,20,227,8.81%
3,Stuart Zone (stock),0.079545,14,176,7.95%
4,Snow Zone,0.07459,642,8607,7.46%
5,Colchuck Zone,0.029749,348,11698,2.97%
6,Core Enchantment Zone,0.022079,673,30481,2.21%


Very interesting results. The Colchuck Zone is almost equally as difficult as the Core Zone. Then, we see a large jump in your chances as we go to the Snow Zone.

Next we'll want take a look:

-   ~~How did day of week for entry affect chances of being awarded a permit?~~
-   ~~How did the month affect someones chances, are these dependent variables?~~
-   ~~Are your chances of being awarded a permit dependent on the group size?~~


# Entry Day & Month Analysis

This part of the analysis splits the application into separate entries. It then adds columns for day of the week, month, and whether or not the application entry was awarded or not.


In [15]:
df.columns

Index(['preferred_entry_date_1', 'preferred_division_1',
       'minimum_acceptable_group_size_1', 'preferred_entry_date_2',
       'preferred_division_2', 'minimum_acceptable_group_size_2',
       'preferred_entry_date_3', 'preferred_division_3',
       'minimum_acceptable_group_size_3', 'results_status',
       'awarded_preference', 'awarded_entry_date',
       'awarded_entrance_code_name', 'awarded_group_size'],
      dtype='object')

In [16]:
# It may be better to break up each individual entry into its own row, so that the data can be analyzed more easily.
preferred_options = [1, 2, 3]

# Columns that every dataframe will have
shared_columns = [
    "results_status",
    "awarded_preference",
    "awarded_entry_date",
    "awarded_entrance_code_name",
    "awarded_group_size",
]
new_dataframes = []

# Iterate over each option number creating a new dataframe for each
for option in preferred_options:
    # Get the columns for the current option
    columns = [
        f"preferred_division_{option}",
        f"preferred_entry_date_{option}",
        f"minimum_acceptable_group_size_{option}",
    ]
    # Create a new dataframe for the current option
    df_option = df[columns + shared_columns].copy()
    # Rename the columns to remove the option number
    df_option.columns = [
        "preferred_division",
        "preferred_entry_date",
        "minimum_acceptable_group_size",
    ] + shared_columns
    # Add a column to indicate if the permit was awarded for the current option
    df_option["awarded"] = df_option["awarded_preference"] == option
    df_option["preferred_option"] = option

    # Append the new dataframe to the list of dataframes
    new_dataframes.append(df_option)

# Concatenate the list of dataframes into a single dataframe
df_split = pd.concat(new_dataframes)

# Check the new dataframe
df_split.head()

Unnamed: 0,preferred_division,preferred_entry_date,minimum_acceptable_group_size,results_status,awarded_preference,awarded_entry_date,awarded_entrance_code_name,awarded_group_size,awarded,preferred_option
0,Core Enchantment Zone,2021-07-25,7,Unsuccessful,0,1970-01-01,,0,False,1
1,Core Enchantment Zone,2021-08-12,8,Unsuccessful,0,1970-01-01,,0,False,1
2,Core Enchantment Zone,2021-07-30,4,Unsuccessful,0,1970-01-01,,0,False,1
3,Core Enchantment Zone,2021-07-01,4,Unsuccessful,0,1970-01-01,,0,False,1
4,Colchuck Zone,2021-06-21,2,Unsuccessful,0,1970-01-01,,0,False,1


In [17]:
# Check for n/a rows
print(len(df_split[df_split["preferred_division"] == "N/A"]))
df_split["preferred_division"].value_counts()

1443


preferred_division
Core Enchantment Zone              70433
Colchuck Zone                      16493
Snow Zone                          12481
Stuart  Zone                        6346
Eightmile/Caroline Zone             2425
N/A                                 1443
Eightmile/Caroline Zone (stock)      265
Stuart Zone (stock)                  199
Name: count, dtype: int64

In [18]:
# Drop rows where the preferred division is N/A
df_split = df_split[df_split["preferred_division"] != "N/A"]

# Check the new dataframe
df_split["preferred_division"].value_counts()

preferred_division
Core Enchantment Zone              70433
Colchuck Zone                      16493
Snow Zone                          12481
Stuart  Zone                        6346
Eightmile/Caroline Zone             2425
Eightmile/Caroline Zone (stock)      265
Stuart Zone (stock)                  199
Name: count, dtype: int64

In [19]:
prob_awarded_zone_split = []

# Loop over the zones and calculate the probability of being awarded a permit for each zone
for zone in zones_values:
    zone_filter = df_split["preferred_division"] == zone

    applied_zone = df_split[zone_filter]

    total_zone = len(applied_zone)

    awarded_zone_filter = df_split["awarded"] == True

    total_zone_awarded = len(df_split[zone_filter & awarded_zone_filter])
    prob_zone_awarded = total_zone_awarded / total_zone

    prob_awarded_zone_split.append(
        [zone, prob_zone_awarded, total_zone_awarded, total_zone]
    )

    print(
        f"Probability of being awarded a permit for {zone}, given applied 1+ option in zone: {prob_zone_awarded:.2%} ({total_zone_awarded}/{total_zone})"
    )

Probability of being awarded a permit for Core Enchantment Zone, given applied 1+ option in zone: 0.96% (673/70433)
Probability of being awarded a permit for Colchuck Zone, given applied 1+ option in zone: 2.11% (348/16493)
Probability of being awarded a permit for Snow Zone, given applied 1+ option in zone: 5.14% (642/12481)
Probability of being awarded a permit for Stuart  Zone, given applied 1+ option in zone: 7.15% (454/6346)
Probability of being awarded a permit for Eightmile/Caroline Zone, given applied 1+ option in zone: 12.12% (294/2425)
Probability of being awarded a permit for Eightmile/Caroline Zone (stock), given applied 1+ option in zone: 7.55% (20/265)
Probability of being awarded a permit for Stuart Zone (stock), given applied 1+ option in zone: 7.04% (14/199)


In [20]:
# Create dataframe from list
df_prob_awarded_zone_split = zone_probabilities_to_crosstab(
    prob_awarded_zone_split, ["Zone", "Probability", "Total Awarded", "Total Applied"]
)

# Probability of being awarded a permit
awarded_split_filter = df_split["awarded"] == True

awarded_split = df_split[awarded_split_filter]

total_awarded_split = len(awarded_split)

total_split = len(df_split)

prob_awarded_split = total_awarded_split / total_split

print(
    f"Probability of being awarded a permit by entry option: {prob_awarded_split:.2%} ({total_awarded_split}/{total_split})"
)

# Show crosstab of the new dataframe
df_prob_awarded_zone_split

Probability of being awarded a permit by entry option: 2.25% (2445/108642)


Unnamed: 0,Zone,Probability,Total Awarded,Total Applied,Probability (%)
0,Eightmile/Caroline Zone,0.121237,294,2425,12.12%
1,Eightmile/Caroline Zone (stock),0.075472,20,265,7.55%
2,Stuart Zone,0.071541,454,6346,7.15%
3,Stuart Zone (stock),0.070352,14,199,7.04%
4,Snow Zone,0.051438,642,12481,5.14%
5,Colchuck Zone,0.0211,348,16493,2.11%
6,Core Enchantment Zone,0.009555,673,70433,0.96%


In [21]:
# Add the month of the preferred entry date to the dataframe
import calendar

# Get the month as an integer
df_split["preferred_entry_date" + "_month"] = df_split["preferred_entry_date"].dt.month
# Get the month as a string
df_split["preferred_entry_date" + "_month"] = df_split[
    "preferred_entry_date" + "_month"
].apply(lambda x: calendar.month_name[x])

# Check the data
df_split.head()

Unnamed: 0,preferred_division,preferred_entry_date,minimum_acceptable_group_size,results_status,awarded_preference,awarded_entry_date,awarded_entrance_code_name,awarded_group_size,awarded,preferred_option,preferred_entry_date_month
0,Core Enchantment Zone,2021-07-25,7,Unsuccessful,0,1970-01-01,,0,False,1,July
1,Core Enchantment Zone,2021-08-12,8,Unsuccessful,0,1970-01-01,,0,False,1,August
2,Core Enchantment Zone,2021-07-30,4,Unsuccessful,0,1970-01-01,,0,False,1,July
3,Core Enchantment Zone,2021-07-01,4,Unsuccessful,0,1970-01-01,,0,False,1,July
4,Colchuck Zone,2021-06-21,2,Unsuccessful,0,1970-01-01,,0,False,1,June


In [22]:
# Look at the month column and add up the amounts
df_split["preferred_entry_date_month"].value_counts()

preferred_entry_date_month
August       40877
July         30188
September    23842
June          8579
October       3518
May           1638
Name: count, dtype: int64

In [23]:
# Add the day of the week columns based on preferred entry date
df_split["preferred_entry_date" + "_day"] = df_split[
    "preferred_entry_date"
].dt.day_name()

# Check the data
df_split.head()

Unnamed: 0,preferred_division,preferred_entry_date,minimum_acceptable_group_size,results_status,awarded_preference,awarded_entry_date,awarded_entrance_code_name,awarded_group_size,awarded,preferred_option,preferred_entry_date_month,preferred_entry_date_day
0,Core Enchantment Zone,2021-07-25,7,Unsuccessful,0,1970-01-01,,0,False,1,July,Sunday
1,Core Enchantment Zone,2021-08-12,8,Unsuccessful,0,1970-01-01,,0,False,1,August,Thursday
2,Core Enchantment Zone,2021-07-30,4,Unsuccessful,0,1970-01-01,,0,False,1,July,Friday
3,Core Enchantment Zone,2021-07-01,4,Unsuccessful,0,1970-01-01,,0,False,1,July,Thursday
4,Colchuck Zone,2021-06-21,2,Unsuccessful,0,1970-01-01,,0,False,1,June,Monday


In [24]:
# Look at the day of the week column and add up the amounts
df_split["preferred_entry_date_day"].value_counts()

preferred_entry_date_day
Friday       21918
Thursday     20072
Monday       15372
Wednesday    15112
Saturday     13766
Tuesday      13734
Sunday        8668
Name: count, dtype: int64

In [25]:
# Drop rows where the preferred option is greater than the awarded preferrence
# because we KNOW the lottery didn't have a chance to look at these entries

# Actually, maybe we should just leave all the entries

# preferred_option_greater_than_awarded = (df_split["preferred_option"] > df_split["awarded_preference"])
awarded_preference_greater_than_0 = df_split["awarded_preference"] > 0
# df_split = df_split[preferred_option_greater_than_awarded & awarded_preference_greater_than_0]

In [26]:
# Create a datatframe of the values we KNOW were skipped during
# the lottery process because the preference could not be accomodated.

# Find where the preferred option was less than the awarded preference
preferred_option_less_than_awarded_option = (
    df_split["preferred_option"] < df_split["awarded_preference"]
)

df_split_skipped = df_split[
    awarded_preference_greater_than_0 & preferred_option_less_than_awarded_option
].copy()

df_split_skipped["preferred_division"].value_counts()

preferred_division
Core Enchantment Zone              1146
Colchuck Zone                       374
Snow Zone                           163
Stuart  Zone                        125
Eightmile/Caroline Zone              41
Eightmile/Caroline Zone (stock)       7
Stuart Zone (stock)                   5
Name: count, dtype: int64

In [27]:
# View crosstab
pd.crosstab(
    df_split_skipped["preferred_entry_date_month"],
    df_split_skipped["preferred_entry_date_day"],
    margins=True,
)

preferred_entry_date_day,Friday,Monday,Saturday,Sunday,Thursday,Tuesday,Wednesday,All
preferred_entry_date_month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
August,82,81,40,74,67,78,64,486
July,80,74,61,49,72,55,45,436
June,46,25,44,23,37,29,23,227
May,12,11,33,6,9,8,4,83
October,43,28,19,14,19,22,19,164
September,69,64,64,59,69,63,77,465
All,332,283,261,225,273,255,232,1861


Looking at the entries for applications that were awarded a permit but had skipped entries, there
does not seem to be a wide gap between the months. But, the gap isn't that big between the main months of July, August, September. Actually, entries for September
were skipped more than entries for July (most likely more people applied for September to avoid snow).

Also interesting to note, Saturday had less skipped entries than Monday and Thursday.

This is interesting because it's difficul to know if someone's application was selected but none of their entries were available so they ended up being unsuccessful. However, if someone was awarded a permit for preferred options 2 or 3, we can look at the previous options because we know those options were skipped.

Let's look at some probabilities based on this data.


In [28]:
# The awarded but split data frame is a collection of entries that we know were passed over in the lottery
# because they were awarded a permit for a different option than they preferred.

# What is the probability your option was passed over given the day of the week?

days_of_week = df_split_skipped["preferred_entry_date_day"].unique()

prob_skipped_day = []

for day in days_of_week:
    day_filter = df_split_skipped["preferred_entry_date_day"] == day

    day_data = df_split_skipped[day_filter]

    total_day_skipped = len(day_data)

    total_skipped = len(df_split_skipped)

    prob_day_skipped = total_day_skipped / total_skipped

    prob_skipped_day.append([day, prob_day_skipped, total_skipped, total_day_skipped])

    print(
        f"Probability of being awarded but skipped given the day of the week is {day}: {prob_day_skipped:.2%} ({total_day_skipped}/{total_skipped})"
    )

Probability of being awarded but skipped given the day of the week is Wednesday: 12.47% (232/1861)
Probability of being awarded but skipped given the day of the week is Monday: 15.21% (283/1861)
Probability of being awarded but skipped given the day of the week is Sunday: 12.09% (225/1861)
Probability of being awarded but skipped given the day of the week is Saturday: 14.02% (261/1861)
Probability of being awarded but skipped given the day of the week is Friday: 17.84% (332/1861)
Probability of being awarded but skipped given the day of the week is Thursday: 14.67% (273/1861)
Probability of being awarded but skipped given the day of the week is Tuesday: 13.70% (255/1861)


In [29]:
# Create dataframe from list
df_prob_skipped_day = zone_probabilities_to_crosstab(
    prob_skipped_day,
    ["Day", "Probability", "Total Skipped", "Total Day Skipped"],
)

# Show crosstab of the new dataframe
df_prob_skipped_day

Unnamed: 0,Day,Probability,Total Skipped,Total Day Skipped,Probability (%)
0,Friday,0.178399,1861,332,17.84%
1,Monday,0.152069,1861,283,15.21%
2,Thursday,0.146695,1861,273,14.67%
3,Saturday,0.140247,1861,261,14.02%
4,Tuesday,0.137023,1861,255,13.70%
5,Wednesday,0.124664,1861,232,12.47%
6,Sunday,0.120903,1861,225,12.09%


In [30]:
# Let's see what the most popular days are
df_split["preferred_entry_date_day"].value_counts()

preferred_entry_date_day
Friday       21918
Thursday     20072
Monday       15372
Wednesday    15112
Saturday     13766
Tuesday      13734
Sunday        8668
Name: count, dtype: int64

The above stats shows the probabilities, by day, that an entry had a chance at being awarded, but failed.

The surprise finding with the entries that were skipped, looking at the day of the week, is Monday was skipped A LOT. Monday isn't recommended, or listed, as a day to avoid. On the contrary, it's been said that Monday is a better day to apply for entry.

> Bear in mind that the most popular time to go is August, and the most popular days to start a trip are Fridays, Thursdays, and Saturdays. If you really want to do a Friday-Sunday trip in mid-August, by all means apply for that trip, but remember that you’re odds of getting a permit will be less than if you tried for a Monday-Wednesday trip in July
> [USFS website](https://www.fs.usda.gov/detail/okawen/passes-permits/recreation/?cid=fsbdev3_053607)

If we were to accept the common advice that Friday, Thursday, and Saturday should be avoided if you want to increase your chances of winning we would expect to see those at the top of this list.

However, looking at the individual entries, the most popular days to start a trip in 2021 were Friday, Thursday, Monday, Wednesday, Saturday! That's right. Monday and Wednesday were more popular than Saturday!


In [31]:
# Let's see if August is indeed the most popular month to apply
df_split["preferred_entry_date_month"].value_counts()

preferred_entry_date_month
August       40877
July         30188
September    23842
June          8579
October       3518
May           1638
Name: count, dtype: int64

In [32]:
# What is the probability your option was passed over given the month?

months = df_split_skipped["preferred_entry_date_month"].unique()

prob_skipped_month = []

for month in months:
    month_filter = df_split_skipped["preferred_entry_date_month"] == month

    month_data = df_split_skipped[month_filter]

    total_month_skipped = len(month_data)

    total_skipped = len(df_split_skipped)

    prob_month_df_split_skipped = total_month_skipped / total_skipped

    prob_skipped_month.append(
        [month, prob_month_df_split_skipped, total_skipped, total_month_skipped]
    )

    print(
        f"Probability of being awarded but skipped given the month is {month}: {prob_month_df_split_skipped:.2%} ({total_month_skipped}/{total_skipped})"
    )

# Create dataframe from list
df_prob_skipped_month = zone_probabilities_to_crosstab(
    prob_skipped_month,
    ["Month", "Probability", "Total Awarded But Skipped", "Total Month"],
)

# Show crosstab of the new dataframe
df_prob_skipped_month

Probability of being awarded but skipped given the month is August: 26.11% (486/1861)
Probability of being awarded but skipped given the month is September: 24.99% (465/1861)
Probability of being awarded but skipped given the month is July: 23.43% (436/1861)
Probability of being awarded but skipped given the month is May: 4.46% (83/1861)
Probability of being awarded but skipped given the month is October: 8.81% (164/1861)
Probability of being awarded but skipped given the month is June: 12.20% (227/1861)


Unnamed: 0,Month,Probability,Total Awarded But Skipped,Total Month,Probability (%)
0,August,0.26115,1861,486,26.11%
1,September,0.249866,1861,465,24.99%
2,July,0.234283,1861,436,23.43%
3,June,0.121977,1861,227,12.20%
4,October,0.088125,1861,164,8.81%
5,May,0.0446,1861,83,4.46%


More interesting findings. July (23.43%) didn't fair that much better compared to September (24.99%) in terms of chances that your entry failed to secure a permit despite having a chance to be awarded a permit.

August is by far the most popular month, with July getting ~3/4 of the entries August gets, and September seeing almost 1/2 the entries compared to August.


In [33]:
# Next let's look at the probability of being skipped given your group size
group_sizes = df_split_skipped["minimum_acceptable_group_size"].unique()

prob_group_size_skipped = []

for group_size in group_sizes:
    group_size_filter = df_split_skipped["minimum_acceptable_group_size"] == group_size

    group_size_data = df_split_skipped[group_size_filter]

    total_group_size = len(group_size_data)

    total_skipped = len(df_split_skipped)

    prob_group_size_df_split_skipped = total_group_size / total_skipped

    prob_group_size_skipped.append(
        [
            group_size,
            prob_group_size_df_split_skipped,
            total_skipped,
            total_group_size,
        ]
    )

    print(
        f"Probability of being awarded but skipped given the minimum acceptable group size is {group_size}: {prob_group_size_df_split_skipped:.2%} ({total_group_size}/{total_skipped})"
    )

# Create dataframe from list
df_prob_group_size_skipped = zone_probabilities_to_crosstab(
    prob_group_size_skipped,
    ["Group Size", "Probability", "Total Skipped", "Total"],
)

# Show crosstab of the new dataframe
df_prob_group_size_skipped

Probability of being awarded but skipped given the minimum acceptable group size is 2: 20.10% (374/1861)
Probability of being awarded but skipped given the minimum acceptable group size is 4: 28.64% (533/1861)
Probability of being awarded but skipped given the minimum acceptable group size is 6: 14.72% (274/1861)
Probability of being awarded but skipped given the minimum acceptable group size is 8: 20.37% (379/1861)
Probability of being awarded but skipped given the minimum acceptable group size is 1: 2.10% (39/1861)
Probability of being awarded but skipped given the minimum acceptable group size is 3: 4.84% (90/1861)
Probability of being awarded but skipped given the minimum acceptable group size is 5: 8.01% (149/1861)
Probability of being awarded but skipped given the minimum acceptable group size is 7: 1.24% (23/1861)


Unnamed: 0,Group Size,Probability,Total Skipped,Total,Probability (%)
0,4,0.286405,1861,533,28.64%
1,8,0.203654,1861,379,20.37%
2,2,0.200967,1861,374,20.10%
3,6,0.147233,1861,274,14.72%
4,5,0.080064,1861,149,8.01%
5,3,0.048361,1861,90,4.84%
6,1,0.020956,1861,39,2.10%
7,7,0.012359,1861,23,1.24%


The group size statistic is fun to look at, but the lottery is different for zones when it comes to group size. Therefore, it's really only appropriate to look at group size in the Core Enchantment Zone.


In [59]:
prob_skipped_day = []

for day in days_of_week:
    day_filter = df_split_skipped["preferred_entry_date_day"] == day

    day_data = df_split_skipped[day_filter]

    total_day_skipped = len(day_data)

    total_skipped = len(df_split_skipped)

    prob_day_awarded_but_skipped = total_day_skipped / total_skipped

    prob_skipped_day.append(
        [
            day,
            prob_day_awarded_but_skipped,
            total_skipped,
            total_day_skipped,
        ]
    )

    print(
        f"Probability of being awarded but skipped given {day}: {prob_day_awarded_but_skipped:.2%} ({total_day_skipped}/{total_skipped})"
    )

# Create dataframe from list
df_prob_skipped_day = zone_probabilities_to_crosstab(
    prob_skipped_day,
    [
        "Day",
        "Probability",
        "Total Skipped",
        "Total Skipped Day",
    ],
)

# Show crosstab of the new dataframe
df_prob_skipped_day

Probability of being awarded but skipped given Wednesday: 12.47% (232/1861)
Probability of being awarded but skipped given Monday: 15.21% (283/1861)
Probability of being awarded but skipped given Sunday: 12.09% (225/1861)
Probability of being awarded but skipped given Saturday: 14.02% (261/1861)
Probability of being awarded but skipped given Friday: 17.84% (332/1861)
Probability of being awarded but skipped given Thursday: 14.67% (273/1861)
Probability of being awarded but skipped given Tuesday: 13.70% (255/1861)


Unnamed: 0,Day,Probability,Total Skipped,Total Skipped Day,Probability (%)
0,Friday,0.178399,1861,332,17.84%
1,Monday,0.152069,1861,283,15.21%
2,Thursday,0.146695,1861,273,14.67%
3,Saturday,0.140247,1861,261,14.02%
4,Tuesday,0.137023,1861,255,13.70%
5,Wednesday,0.124664,1861,232,12.47%
6,Sunday,0.120903,1861,225,12.09%


In [60]:
split_awarded = df_split[(df_split["awarded"] == True)]

prob_split_day = []

# Loop over the group sizes on Sunday in August and see who did the best
for day in days_of_week:
    split_day_filter = split_awarded["preferred_entry_date_day"] == day

    total_split_awarded = len(split_awarded)

    split_day_awarded = split_awarded[split_day_filter]

    total_split_day_awarded = len(split_day_awarded)

    prob_split_awarded = total_split_day_awarded / total_split_awarded

    print(
        f"Probability of being awarded a permit on {day}: {prob_split_awarded:.2%} ({total_split_day_awarded}/{total_split_awarded})"
    )

    prob_split_day.append(
        [
            day,
            prob_split_awarded,
            total_split_day_awarded,
            total_split_awarded,
        ]
    )

# Create dataframe from list
df_prob_split_day = zone_probabilities_to_crosstab(
    prob_split_day,
    [
        "Day",
        "Probability",
        "Total Awarded Day of Week",
        "Total Awarded",
    ],
)

# Show crosstab of the new dataframe
df_prob_split_day

Probability of being awarded a permit on Wednesday: 13.50% (330/2445)
Probability of being awarded a permit on Monday: 13.50% (330/2445)
Probability of being awarded a permit on Sunday: 17.71% (433/2445)
Probability of being awarded a permit on Saturday: 14.36% (351/2445)
Probability of being awarded a permit on Friday: 13.42% (328/2445)
Probability of being awarded a permit on Thursday: 13.70% (335/2445)
Probability of being awarded a permit on Tuesday: 13.82% (338/2445)


Unnamed: 0,Day,Probability,Total Awarded Day of Week,Total Awarded,Probability (%)
0,Sunday,0.177096,433,2445,17.71%
1,Saturday,0.143558,351,2445,14.36%
2,Tuesday,0.138241,338,2445,13.82%
3,Thursday,0.137014,335,2445,13.70%
4,Wednesday,0.134969,330,2445,13.50%
5,Monday,0.134969,330,2445,13.50%
6,Friday,0.134151,328,2445,13.42%


In [61]:
# Calculate the awarded to skipped for day of week overall
df_total_day_awarded = df_prob_split_day[["Day", "Total Awarded Day of Week"]]
df_total_day_skipped = df_prob_skipped_day[["Day", "Total Skipped Day"]]

df_day_awarded_skipped = pd.merge(df_total_day_awarded, df_total_day_skipped, on="Day")

df_day_awarded_skipped["Awarded to Skipped"] = (
    df_day_awarded_skipped["Total Awarded Day of Week"]
    / df_day_awarded_skipped["Total Skipped Day"]
)

# Sort Values
df_day_awarded_skipped = df_day_awarded_skipped.sort_values(
    by="Awarded to Skipped", ascending=False
)

# Show as probability
df_day_awarded_skipped["Awarded to Skipped"] = df_day_awarded_skipped[
    "Awarded to Skipped"
].map("{:.2f}".format)


df_day_awarded_skipped

Unnamed: 0,Day,Total Awarded Day of Week,Total Skipped Day,Awarded to Skipped
0,Sunday,433,225,1.92
4,Wednesday,330,232,1.42
1,Saturday,351,261,1.34
2,Tuesday,338,255,1.33
3,Thursday,335,273,1.23
5,Monday,330,283,1.17
6,Friday,328,332,0.99


In [35]:
# Next look at these same statistics but for the Core Enchantment Zone
split_core_zone_filter = (
    df_split_skipped["preferred_division"] == "Core Enchantment Zone"
)

df_split_skipped_core_zone = df_split_skipped[split_core_zone_filter]

df_split_skipped_core_zone

Unnamed: 0,preferred_division,preferred_entry_date,minimum_acceptable_group_size,results_status,awarded_preference,awarded_entry_date,awarded_entrance_code_name,awarded_group_size,awarded,preferred_option,preferred_entry_date_month,preferred_entry_date_day
42,Core Enchantment Zone,2021-08-11,2,Accepted,2,2021-07-06,Eightmile/Caroline Zone,2,False,1,August,Wednesday
65,Core Enchantment Zone,2021-09-06,4,Accepted,3,2021-09-19,Snow Zone,4,False,1,September,Monday
134,Core Enchantment Zone,2021-07-04,4,Accepted,3,2021-07-11,Stuart Zone,4,False,1,July,Sunday
136,Core Enchantment Zone,2021-07-03,6,No Response,3,2021-10-09,Core Enchantment Zone,6,False,1,July,Saturday
209,Core Enchantment Zone,2021-07-01,8,Accepted,2,2021-07-01,Colchuck Zone,8,False,1,July,Thursday
...,...,...,...,...,...,...,...,...,...,...,...,...
36411,Core Enchantment Zone,2021-10-10,4,Accepted,3,2021-10-17,Core Enchantment Zone,4,False,2,October,Sunday
36469,Core Enchantment Zone,2021-06-23,2,Accepted,3,2021-06-13,Core Enchantment Zone,2,False,2,June,Wednesday
36532,Core Enchantment Zone,2021-09-26,4,Accepted,3,2021-09-26,Snow Zone,6,False,2,September,Sunday
36640,Core Enchantment Zone,2021-10-04,6,Accepted,3,2021-10-28,Core Enchantment Zone,6,False,2,October,Monday


In [36]:
prob_skipped_core_zone_month = []

for month in months:
    month_filter = df_split_skipped_core_zone["preferred_entry_date_month"] == month

    month_data = df_split_skipped_core_zone[month_filter]

    total_month_skipped_core_zone = len(month_data)

    total_skipped_core_zone = len(df_split_skipped_core_zone)

    prob_month_awarded_but_skipped_core_zone = (
        total_month_skipped_core_zone / total_skipped_core_zone
    )

    prob_skipped_core_zone_month.append(
        [
            month,
            prob_month_awarded_but_skipped_core_zone,
            total_skipped_core_zone,
            total_month_skipped_core_zone,
        ]
    )

    print(
        f"Probability of being awarded but skipped given the month is {month}: {prob_month_awarded_but_skipped_core_zone:.2%} ({total_month_skipped_core_zone}/{total_skipped_core_zone})"
    )

# Create dataframe from list
df_prob_skipped_core_zone_month = zone_probabilities_to_crosstab(
    prob_skipped_core_zone_month,
    [
        "Month",
        "Probability",
        "Total Skipped Core Zone",
        "Total Month Skipped Core Zone",
    ],
)

# Show crosstab of the new dataframe
df_prob_skipped_core_zone_month

Probability of being awarded but skipped given the month is August: 25.39% (291/1146)
Probability of being awarded but skipped given the month is September: 24.87% (285/1146)
Probability of being awarded but skipped given the month is July: 22.43% (257/1146)
Probability of being awarded but skipped given the month is May: 5.06% (58/1146)
Probability of being awarded but skipped given the month is October: 9.95% (114/1146)
Probability of being awarded but skipped given the month is June: 12.30% (141/1146)


Unnamed: 0,Month,Probability,Total Skipped Core Zone,Total Month Skipped Core Zone,Probability (%)
0,August,0.253927,1146,291,25.39%
1,September,0.248691,1146,285,24.87%
2,July,0.224258,1146,257,22.43%
3,June,0.123037,1146,141,12.30%
4,October,0.099476,1146,114,9.95%
5,May,0.050611,1146,58,5.06%


Similar statistics to what we saw before filtering for the Enchantment Zone. Most likely because such a large percentage of the skipped entries were for the Core Enchantment Zone.


In [37]:
prob_skipped_core_zone_day = []

for day in days_of_week:
    day_filter = df_split_skipped_core_zone["preferred_entry_date_day"] == day

    day_data = df_split_skipped_core_zone[day_filter]

    total_day_skipped_core_zone = len(day_data)

    total_skipped_core_zone = len(df_split_skipped_core_zone)

    prob_day_df_split_skipped_core_zone = (
        total_day_skipped_core_zone / total_skipped_core_zone
    )

    prob_skipped_core_zone_day.append(
        [
            day,
            prob_day_df_split_skipped_core_zone,
            total_skipped_core_zone,
            total_day_skipped_core_zone,
        ]
    )

    print(
        f"Probability of being awarded but skipped given the day is {day}: {prob_day_df_split_skipped_core_zone:.2%} ({total_day_skipped_core_zone}/{total_skipped_core_zone})"
    )

# Create dataframe from list
df_prob_skipped_core_zone_day = zone_probabilities_to_crosstab(
    prob_skipped_core_zone_day,
    ["Day", "Probability", "Total Skipped Core Zone", "Total Skipped by Day Core Zone"],
)

# Show crosstab of the new dataframe
df_prob_skipped_core_zone_day

Probability of being awarded but skipped given the day is Wednesday: 13.44% (154/1146)
Probability of being awarded but skipped given the day is Monday: 16.06% (184/1146)
Probability of being awarded but skipped given the day is Sunday: 12.57% (144/1146)
Probability of being awarded but skipped given the day is Saturday: 12.91% (148/1146)
Probability of being awarded but skipped given the day is Friday: 16.49% (189/1146)
Probability of being awarded but skipped given the day is Thursday: 15.01% (172/1146)
Probability of being awarded but skipped given the day is Tuesday: 13.53% (155/1146)


Unnamed: 0,Day,Probability,Total Skipped Core Zone,Total Skipped by Day Core Zone,Probability (%)
0,Friday,0.164921,1146,189,16.49%
1,Monday,0.160558,1146,184,16.06%
2,Thursday,0.150087,1146,172,15.01%
3,Tuesday,0.135253,1146,155,13.53%
4,Wednesday,0.13438,1146,154,13.44%
5,Saturday,0.129145,1146,148,12.91%
6,Sunday,0.125654,1146,144,12.57%


**Monday almost failed to secure a permit more than Friday** for the Core Enchantment Zone. The best days to avoid having your entry skipped for the Core Enchantment Zone in 2021 were Sunday, and Saturday.


In [38]:
# Finally, let's redo group zone with Core Enchantment Zone filter
group_sizes = df_split_skipped_core_zone["minimum_acceptable_group_size"].unique()

prob_group_size_skipped_core_zone = []

for group_size in group_sizes:
    group_size_filter = (
        df_split_skipped_core_zone["minimum_acceptable_group_size"] == group_size
    )

    group_size_data = df_split_skipped_core_zone[group_size_filter]

    total_group_size = len(group_size_data)

    total_skipped_core_zone = len(df_split_skipped_core_zone)

    prob_group_size_df_split_skipped_core_zone = (
        total_group_size / total_skipped_core_zone
    )

    prob_group_size_skipped_core_zone.append(
        [
            group_size,
            prob_group_size_df_split_skipped_core_zone,
            total_skipped_core_zone,
            total_group_size,
        ]
    )

    print(
        f"Probability of being awarded but skipped given the minimum acceptable group size is {group_size}: {prob_group_size_df_split_skipped_core_zone:.2%} ({total_group_size}/{total_skipped_core_zone})"
    )

# Create dataframe from list
df_prob_group_size_skipped_core_zone = zone_probabilities_to_crosstab(
    prob_group_size_skipped_core_zone,
    [
        "Group Size",
        "Probability",
        "Total Skipped Core Zone",
        "Total Skipped Group Size Core Zone",
    ],
)

# Show crosstab of the new dataframe
df_prob_group_size_skipped_core_zone

Probability of being awarded but skipped given the minimum acceptable group size is 2: 20.42% (234/1146)
Probability of being awarded but skipped given the minimum acceptable group size is 4: 30.28% (347/1146)
Probability of being awarded but skipped given the minimum acceptable group size is 6: 14.57% (167/1146)
Probability of being awarded but skipped given the minimum acceptable group size is 8: 18.76% (215/1146)
Probability of being awarded but skipped given the minimum acceptable group size is 1: 2.79% (32/1146)
Probability of being awarded but skipped given the minimum acceptable group size is 5: 7.16% (82/1146)
Probability of being awarded but skipped given the minimum acceptable group size is 3: 4.36% (50/1146)
Probability of being awarded but skipped given the minimum acceptable group size is 7: 1.66% (19/1146)


Unnamed: 0,Group Size,Probability,Total Skipped Core Zone,Total Skipped Group Size Core Zone,Probability (%)
0,4,0.302792,1146,347,30.28%
1,2,0.204188,1146,234,20.42%
2,8,0.187609,1146,215,18.76%
3,6,0.145724,1146,167,14.57%
4,5,0.071553,1146,82,7.16%
5,3,0.04363,1146,50,4.36%
6,1,0.027923,1146,32,2.79%
7,7,0.016579,1146,19,1.66%


In [39]:
# Get all the Core Enchantment Zone application entries
split_core_zone_filter = df_split["preferred_division"] == "Core Enchantment Zone"

df_split_core_zone = df_split[split_core_zone_filter]

df_split_core_zone

Unnamed: 0,preferred_division,preferred_entry_date,minimum_acceptable_group_size,results_status,awarded_preference,awarded_entry_date,awarded_entrance_code_name,awarded_group_size,awarded,preferred_option,preferred_entry_date_month,preferred_entry_date_day
0,Core Enchantment Zone,2021-07-25,7,Unsuccessful,0,1970-01-01,,0,False,1,July,Sunday
1,Core Enchantment Zone,2021-08-12,8,Unsuccessful,0,1970-01-01,,0,False,1,August,Thursday
2,Core Enchantment Zone,2021-07-30,4,Unsuccessful,0,1970-01-01,,0,False,1,July,Friday
3,Core Enchantment Zone,2021-07-01,4,Unsuccessful,0,1970-01-01,,0,False,1,July,Thursday
5,Core Enchantment Zone,2021-08-06,4,Unsuccessful,0,1970-01-01,,0,False,1,August,Friday
...,...,...,...,...,...,...,...,...,...,...,...,...
36684,Core Enchantment Zone,2021-08-10,6,Unsuccessful,0,1970-01-01,,0,False,3,August,Tuesday
36687,Core Enchantment Zone,2021-07-20,4,Unsuccessful,0,1970-01-01,,0,False,3,July,Tuesday
36689,Core Enchantment Zone,2021-09-03,6,Unsuccessful,0,1970-01-01,,0,False,3,September,Friday
36692,Core Enchantment Zone,2021-08-14,2,Unsuccessful,0,1970-01-01,,0,False,3,August,Saturday


In [40]:
# Look at the group size value counts for the core zone
df_split_core_zone["minimum_acceptable_group_size"].value_counts()

minimum_acceptable_group_size
4    19677
8    16341
6    12720
2    11306
5     4973
3     3887
7      885
1      644
Name: count, dtype: int64

What's interesting about the group size statistics is just how many people apply with an even number. This can create interesting scenarios when trying to match group sizes to fit in the Core Enchantment Zone.

4, 2, 8, and 6 had double digit skip percentages compared to 5, 3, 1, and 7 who only had single digit skip percentages for the Core Zone. That's something to noteworthy. If "smaller group size" improves your chances, we would expect these numbers to be sequential. Now, clearly, there may be something else going on here with group size of four.


In [41]:
# What is the percentage of awarded group size compared to the minimum acceptable group size counts?
# Get and sort the value counts
df_split_core_zone_awarded = df_split_core_zone[df_split_core_zone["awarded"] == True]
(
    df_split_core_zone_awarded["awarded_group_size"].value_counts().sort_index()
    / df_split_core_zone["minimum_acceptable_group_size"].value_counts().sort_index()
).map("{:.2%}".format)

awarded_group_size
1    6.68%
2    1.46%
3    0.93%
4    0.97%
5    0.68%
6    0.79%
7    0.68%
8    0.60%
Name: count, dtype: object

The above percentage creates a ratio of awarded group size to minimum acceptable group size in the Core Enchantment Zone.

The Core Enchantment Zone is different than other zones. The lottery process awards by PEOPLE and not by GROUP.

From [the USFS website](https://www.fs.usda.gov/detail/okawen/passes-permits/recreation/?cid=fsbdev3_053607):

> The CORE Zone entry is by PEOPLE not GROUP. For example each day up to 16 people are allowed entry in the advanced lottery and up to 8 people are allowed in the new geofence application lottery. This can be any combination as long as the total number does not exceed 24 people in one day entering the CORE ZONE.

What's interesting about the ratio of awarded group size to minimum acceptable group size is the disproportionate percentage that the group size of 1 is awarded for.

That's saying, group size of 1 punches WAY above it's weight in terms of being awarded proportinate to how often people apply for it. My hypothesis is that 1 combines better with the most other group sizes. Looking at previous statistics, you may think that the group size of 4 is a bad decision, but it's awarded mostly inline, and slightly above, some other numbers. This makes me think that despite 2 and 4 being skipped at a high percentage, they are also awarded at a higher percentage. Therefore they shouldn't be considered bad group size numbers.

**But if you truly want to increase your chances, apply for a group size of 1**


In [42]:
split_core_zone_august_filter = (
    df_split_core_zone["preferred_entry_date_month"] == "August"
)

split_core_zone_august = df_split_core_zone[split_core_zone_august_filter]

len(split_core_zone_august)

26741

In [43]:
# Sort values
df_split_core_zone_awarded["awarded_group_size"].value_counts().sort_values(
    ascending=False
)

awarded_group_size
4    190
2    165
6    101
8     98
1     43
3     36
5     34
7      6
Name: count, dtype: int64

In [44]:
split_core_zone_august["minimum_acceptable_group_size"].value_counts()

minimum_acceptable_group_size
4    7423
8    6461
6    4865
2    3928
5    2048
3    1491
7     305
1     220
Name: count, dtype: int64

There's a huge bias toward even number applications in the Core Zone.


In [45]:
split_core_zone_august_awarded_filter = split_core_zone_august["awarded"] == True

split_core_zone_august_sunday_filter = (
    split_core_zone_august["preferred_entry_date_day"] == "Sunday"
)
split_core_zone_august_monday_filter = (
    split_core_zone_august["preferred_entry_date_day"] == "Monday"
)
split_core_zone_august_tuesday_filter = (
    split_core_zone_august["preferred_entry_date_day"] == "Tuesday"
)

split_core_zone_august_friday_filter = (
    split_core_zone_august["preferred_entry_date_day"] == "Friday"
)
split_core_zone_august_saturday_filter = (
    split_core_zone_august["preferred_entry_date_day"] == "Saturday"
)
split_core_zone_august_thursday_filter = (
    split_core_zone_august["preferred_entry_date_day"] == "Thursday"
)

prob_split_core_zone_august_sunday_group_size = []

split_core_zone_august_sunday_awarded = split_core_zone_august[
    split_core_zone_august_sunday_filter & (split_core_zone_august["awarded"] == True)
]

# Loop over the group sizes on Sunday in August and see who did the best
for group in group_sizes:
    split_core_zone_august_group_size_filter = (
        split_core_zone_august_sunday_awarded["minimum_acceptable_group_size"] == group
    )

    total_split_core_zone_august_sunday_awarded = len(
        split_core_zone_august_sunday_awarded
    )

    split_core_zone_august_sunday_group_awarded = split_core_zone_august_sunday_awarded[
        split_core_zone_august_group_size_filter
    ]

    total_split_core_zone_august_sunday_group_awarded = len(
        split_core_zone_august_sunday_group_awarded
    )

    prob_split_core_zone_august_sunday_awarded = (
        total_split_core_zone_august_sunday_group_awarded
        / total_split_core_zone_august_sunday_awarded
    )

    print(
        f"Probability of being awarded a permit for the Core Enchantments in August on a Sunday with a group size of {group}: {prob_split_core_zone_august_sunday_awarded:.2%} ({total_split_core_zone_august_sunday_group_awarded}/{total_split_core_zone_august_sunday_awarded})"
    )

    prob_split_core_zone_august_sunday_group_size.append(
        [
            group,
            prob_split_core_zone_august_sunday_awarded,
            total_split_core_zone_august_sunday_group_awarded,
            total_split_core_zone_august_sunday_awarded,
        ]
    )

# Create dataframe from list
df_prob_split_core_zone_august_sunday_group_size = zone_probabilities_to_crosstab(
    prob_split_core_zone_august_sunday_group_size,
    [
        "Group Size",
        "Probability",
        "Awarded Group Size Sunday August Core Zone",
        "Total Awarded Sunday August Core Zone",
    ],
)

# Show crosstab of the new dataframe
df_prob_split_core_zone_august_sunday_group_size

Probability of being awarded a permit for the Core Enchantments in August on a Sunday with a group size of 2: 21.74% (5/23)
Probability of being awarded a permit for the Core Enchantments in August on a Sunday with a group size of 4: 4.35% (1/23)
Probability of being awarded a permit for the Core Enchantments in August on a Sunday with a group size of 6: 21.74% (5/23)
Probability of being awarded a permit for the Core Enchantments in August on a Sunday with a group size of 8: 30.43% (7/23)
Probability of being awarded a permit for the Core Enchantments in August on a Sunday with a group size of 1: 4.35% (1/23)
Probability of being awarded a permit for the Core Enchantments in August on a Sunday with a group size of 5: 13.04% (3/23)
Probability of being awarded a permit for the Core Enchantments in August on a Sunday with a group size of 3: 4.35% (1/23)
Probability of being awarded a permit for the Core Enchantments in August on a Sunday with a group size of 7: 0.00% (0/23)


Unnamed: 0,Group Size,Probability,Awarded Group Size Sunday August Core Zone,Total Awarded Sunday August Core Zone,Probability (%)
0,8,0.304348,7,23,30.43%
1,2,0.217391,5,23,21.74%
2,6,0.217391,5,23,21.74%
3,5,0.130435,3,23,13.04%
4,4,0.043478,1,23,4.35%
5,1,0.043478,1,23,4.35%
6,3,0.043478,1,23,4.35%
7,7,0.0,0,23,0.00%


In [46]:
df_prob_split_core_zone_august_sunday_group_size[
    [
        "Group Size",
        "Awarded Group Size Sunday August Core Zone",
        "Total Awarded Sunday August Core Zone",
    ]
]

Unnamed: 0,Group Size,Awarded Group Size Sunday August Core Zone,Total Awarded Sunday August Core Zone
0,8,7,23
1,2,5,23
2,6,5,23
3,5,3,23
4,4,1,23
5,1,1,23
6,3,1,23
7,7,0,23


Who would have guessed that the group of size of 8 would be awarded the most for Sundays in August out of every other group size option!


In [47]:
# Print the total number of people that entered the Core Zone on Saturdays in August
august_saturday_total = split_core_zone_august[
    split_core_zone_august_saturday_filter & split_core_zone_august_awarded_filter
]["minimum_acceptable_group_size"].sum()
print(
    f"Total number of people awarded entrance in the Core Zone on Saturdays in August: {august_saturday_total}"
)

# Print the total number of people that entered the Core Zone on Sundays in August
august_sunday_total = split_core_zone_august[
    split_core_zone_august_sunday_filter & split_core_zone_august_awarded_filter
]["minimum_acceptable_group_size"].sum()
print(
    f"Total number of people awarded entrance in the Core Zone on Sundays in August: {august_sunday_total}"
)

# Print the total number of people that entered the Core Zone on Mondays in August
august_monday_total = split_core_zone_august[
    split_core_zone_august_monday_filter & split_core_zone_august_awarded_filter
]["minimum_acceptable_group_size"].sum()

print(
    f"Total number of people awarded entrance in the Core Zone on Mondays in August: {august_monday_total}"
)

Total number of people awarded entrance in the Core Zone on Saturdays in August: 63
Total number of people awarded entrance in the Core Zone on Sundays in August: 119
Total number of people awarded entrance in the Core Zone on Mondays in August: 80


In [48]:
split_core_zone_august[
    split_core_zone_august_saturday_filter & split_core_zone_august_awarded_filter
]

Unnamed: 0,preferred_division,preferred_entry_date,minimum_acceptable_group_size,results_status,awarded_preference,awarded_entry_date,awarded_entrance_code_name,awarded_group_size,awarded,preferred_option,preferred_entry_date_month,preferred_entry_date_day
2840,Core Enchantment Zone,2021-08-21,1,Accepted,1,2021-08-21,Core Enchantment Zone,1,True,1,August,Saturday
4279,Core Enchantment Zone,2021-08-28,2,Accepted,1,2021-08-28,Core Enchantment Zone,2,True,1,August,Saturday
8544,Core Enchantment Zone,2021-08-14,5,Accepted,1,2021-08-14,Core Enchantment Zone,5,True,1,August,Saturday
9631,Core Enchantment Zone,2021-08-07,8,Accepted,1,2021-08-07,Core Enchantment Zone,8,True,1,August,Saturday
14197,Core Enchantment Zone,2021-08-21,8,Accepted,1,2021-08-21,Core Enchantment Zone,8,True,1,August,Saturday
14958,Core Enchantment Zone,2021-08-14,2,Accepted,1,2021-08-14,Core Enchantment Zone,2,True,1,August,Saturday
16177,Core Enchantment Zone,2021-08-07,2,Accepted,1,2021-08-07,Core Enchantment Zone,2,True,1,August,Saturday
16317,Core Enchantment Zone,2021-08-07,2,No Response,1,2021-08-07,Core Enchantment Zone,2,True,1,August,Saturday
17066,Core Enchantment Zone,2021-08-21,5,Accepted,1,2021-08-21,Core Enchantment Zone,5,True,1,August,Saturday
21255,Core Enchantment Zone,2021-08-07,4,Accepted,1,2021-08-07,Core Enchantment Zone,4,True,1,August,Saturday


In [49]:
split_core_zone_august_awarded = split_core_zone_august[
    (split_core_zone_august["awarded"] == True)
]

prob_split_core_zone_august_group_size = []

# Loop over the group sizes on Sunday in August and see who did the best
for group in group_sizes:
    split_core_zone_august_group_size_filter = (
        split_core_zone_august_awarded["minimum_acceptable_group_size"] == group
    )

    total_split_core_zone_august_awarded = len(split_core_zone_august_awarded)

    split_core_zone_august_group_awarded = split_core_zone_august_awarded[
        split_core_zone_august_group_size_filter
    ]

    total_split_core_zone_august_group_awarded = len(
        split_core_zone_august_group_awarded
    )

    prob_split_core_zone_august_awarded = (
        total_split_core_zone_august_group_awarded
        / total_split_core_zone_august_awarded
    )

    print(
        f"Probability of being awarded a permit for the Core Enchantments in August with a group size of {group}: {prob_split_core_zone_august_awarded:.2%} ({total_split_core_zone_august_group_awarded}/{total_split_core_zone_august_awarded})"
    )

    prob_split_core_zone_august_group_size.append(
        [
            group,
            prob_split_core_zone_august_awarded,
            total_split_core_zone_august_group_awarded,
            total_split_core_zone_august_awarded,
        ]
    )

# Create dataframe from list
df_prob_split_core_zone_august_group_size = zone_probabilities_to_crosstab(
    prob_split_core_zone_august_group_size,
    ["Group Size", "Probability", "Total Awarded Group Size", "Total Awarded August"],
)

# Show crosstab of the new dataframe
df_prob_split_core_zone_august_group_size

Probability of being awarded a permit for the Core Enchantments in August with a group size of 2: 21.60% (27/125)
Probability of being awarded a permit for the Core Enchantments in August with a group size of 4: 24.80% (31/125)
Probability of being awarded a permit for the Core Enchantments in August with a group size of 6: 12.00% (15/125)
Probability of being awarded a permit for the Core Enchantments in August with a group size of 8: 16.00% (20/125)
Probability of being awarded a permit for the Core Enchantments in August with a group size of 1: 8.80% (11/125)
Probability of being awarded a permit for the Core Enchantments in August with a group size of 5: 11.20% (14/125)
Probability of being awarded a permit for the Core Enchantments in August with a group size of 3: 4.80% (6/125)
Probability of being awarded a permit for the Core Enchantments in August with a group size of 7: 0.80% (1/125)


Unnamed: 0,Group Size,Probability,Total Awarded Group Size,Total Awarded August,Probability (%)
0,4,0.248,31,125,24.80%
1,2,0.216,27,125,21.60%
2,8,0.16,20,125,16.00%
3,6,0.12,15,125,12.00%
4,5,0.112,14,125,11.20%
5,1,0.088,11,125,8.80%
6,3,0.048,6,125,4.80%
7,7,0.008,1,125,0.80%


Given you were awarded a permit for your entry in August, there was a 24.80% chance your group size was 4. Remarkably, the group size of 8 was awarded more than 1, 3, 5, 6, and 7. This is contrary to the claim that a "smaller" group size is better for winning a permit in the Core Enchantments.

Let's look at all the months and not just August.


In [50]:
split_core_zone_awarded = df_split_core_zone[(df_split_core_zone["awarded"] == True)]

prob_split_core_zone_group_size = []

# Loop over the group sizes on Sunday in August and see who did the best
for group in group_sizes:
    split_core_zone_group_size_filter = (
        split_core_zone_awarded["minimum_acceptable_group_size"] == group
    )

    total_split_core_zone_awarded = len(split_core_zone_awarded)

    split_core_zone_group_awarded = split_core_zone_awarded[
        split_core_zone_group_size_filter
    ]

    total_split_core_zone_group_awarded = len(split_core_zone_group_awarded)

    prob_split_core_zone_awarded = (
        total_split_core_zone_group_awarded / total_split_core_zone_awarded
    )

    print(
        f"Probability of being awarded a permit for the Core Enchantments with a group size of {group}: {prob_split_core_zone_awarded:.2%} ({total_split_core_zone_group_awarded}/{total_split_core_zone_awarded})"
    )

    prob_split_core_zone_group_size.append(
        [
            group,
            prob_split_core_zone_awarded,
            total_split_core_zone_group_awarded,
            total_split_core_zone_awarded,
        ]
    )

# Create dataframe from list
df_prob_split_core_zone_group_size = zone_probabilities_to_crosstab(
    prob_split_core_zone_group_size,
    [
        "Group Size",
        "Probability",
        "Total Awarded Group Size Core Zone",
        "Total Awarded Core Zone",
    ],
)

# Show crosstab of the new dataframe
df_prob_split_core_zone_group_size

Probability of being awarded a permit for the Core Enchantments with a group size of 2: 24.37% (164/673)
Probability of being awarded a permit for the Core Enchantments with a group size of 4: 28.38% (191/673)
Probability of being awarded a permit for the Core Enchantments with a group size of 6: 15.01% (101/673)
Probability of being awarded a permit for the Core Enchantments with a group size of 8: 14.56% (98/673)
Probability of being awarded a permit for the Core Enchantments with a group size of 1: 6.24% (42/673)
Probability of being awarded a permit for the Core Enchantments with a group size of 5: 5.20% (35/673)
Probability of being awarded a permit for the Core Enchantments with a group size of 3: 5.35% (36/673)
Probability of being awarded a permit for the Core Enchantments with a group size of 7: 0.89% (6/673)


Unnamed: 0,Group Size,Probability,Total Awarded Group Size Core Zone,Total Awarded Core Zone,Probability (%)
0,4,0.283804,191,673,28.38%
1,2,0.243685,164,673,24.37%
2,6,0.150074,101,673,15.01%
3,8,0.145617,98,673,14.56%
4,1,0.062407,42,673,6.24%
5,3,0.053492,36,673,5.35%
6,5,0.052006,35,673,5.20%
7,7,0.008915,6,673,0.89%


Smaller group size doesn't seem to be better in terms of improving your chances of being awarded a permit for the Core Zone. Looking at these probabilities even numbers seems to be a better predictor, but that's probably because way more people apply for those numbers in the Core Zone.

**I think it's better to compare skipped to awarded and get a ratio to get a better idea here of what the better option is.**


In [51]:
# Compare the skipped group size entries to the awarded group size entries in the Core Enchantment Zone
# Get the total and group size columns from the group size skipped dataframe
df_total_skipped_by_group_size = df_prob_group_size_skipped_core_zone[
    ["Total Skipped Group Size Core Zone", "Group Size"]
]
df_total_awarded_by_group_size = df_prob_split_core_zone_group_size[
    ["Total Awarded Group Size Core Zone", "Group Size"]
]

# Merge the total and group size columns with the awarded group size dataframe
df_prob_group_size_skipped_core_zone_total = pd.merge(
    df_total_skipped_by_group_size, df_total_awarded_by_group_size, on="Group Size"
)

# Calculate the percentage of awarded group size entries compared to the skipped group size entries
df_prob_group_size_skipped_core_zone_total["Percentage"] = (
    df_prob_group_size_skipped_core_zone_total["Total Awarded Group Size Core Zone"]
    / df_prob_group_size_skipped_core_zone_total["Total Skipped Group Size Core Zone"]
)

# Sort the dataframe by the percentage column
df_prob_group_size_skipped_core_zone_total = (
    df_prob_group_size_skipped_core_zone_total.sort_values(
        by="Percentage", ascending=False
    )
)

# Show the percentage column as a percentage
df_prob_group_size_skipped_core_zone_total["Percentage"] = (
    df_prob_group_size_skipped_core_zone_total["Percentage"].map("{:.2%}".format)
)

# Reorder columns
df_prob_group_size_skipped_core_zone_total = df_prob_group_size_skipped_core_zone_total[
    [
        "Group Size",
        "Total Awarded Group Size Core Zone",
        "Total Skipped Group Size Core Zone",
        "Percentage",
    ]
]

# Show the dataframe
df_prob_group_size_skipped_core_zone_total

Unnamed: 0,Group Size,Total Awarded Group Size Core Zone,Total Skipped Group Size Core Zone,Percentage
6,1,42,32,131.25%
5,3,36,50,72.00%
1,2,164,234,70.09%
3,6,101,167,60.48%
0,4,191,347,55.04%
2,8,98,215,45.58%
4,5,35,82,42.68%
7,7,6,19,31.58%


This is a much better comparison, in my opinion. Here, we're taking into account the total numbers of awarded permits AND the total number of times an entry with that group size was rejected.

Additionally, this gives us more of what we would expect when looking at a lottery zone that takes into account group size. The higher odd number groups are a problem. Again, this is more likely because of the large number of people that apply with even number groups.

It's certainly no small fact that applying as a single person group size is awarded more times than it got rejected.


In [52]:
split_core_zone_awarded = df_split_core_zone[(df_split_core_zone["awarded"] == True)]

prob_split_core_zone_day = []

# Loop over the group sizes on Sunday in August and see who did the best
for day in days_of_week:
    split_core_zone_day_filter = (
        split_core_zone_awarded["preferred_entry_date_day"] == day
    )

    total_split_core_zone_awarded = len(split_core_zone_awarded)

    split_core_zone_day_awarded = split_core_zone_awarded[split_core_zone_day_filter]

    total_split_core_zone_day_awarded = len(split_core_zone_day_awarded)

    prob_split_core_zone_awarded = (
        total_split_core_zone_day_awarded / total_split_core_zone_awarded
    )

    print(
        f"Probability of being awarded a permit for the Core Enchantments on {day}: {prob_split_core_zone_awarded:.2%} ({total_split_core_zone_day_awarded}/{total_split_core_zone_awarded})"
    )

    prob_split_core_zone_day.append(
        [
            day,
            prob_split_core_zone_awarded,
            total_split_core_zone_day_awarded,
            total_split_core_zone_awarded,
        ]
    )

# Create dataframe from list
df_prob_split_core_zone_day = zone_probabilities_to_crosstab(
    prob_split_core_zone_day,
    [
        "Day",
        "Probability",
        "Total Awarded Day of Week Core Zone",
        "Total Awarded Core Zone",
    ],
)

# Show crosstab of the new dataframe
df_prob_split_core_zone_day

Probability of being awarded a permit for the Core Enchantments on Wednesday: 13.22% (89/673)
Probability of being awarded a permit for the Core Enchantments on Monday: 12.63% (85/673)
Probability of being awarded a permit for the Core Enchantments on Sunday: 18.87% (127/673)
Probability of being awarded a permit for the Core Enchantments on Saturday: 14.41% (97/673)
Probability of being awarded a permit for the Core Enchantments on Friday: 12.48% (84/673)
Probability of being awarded a permit for the Core Enchantments on Thursday: 13.82% (93/673)
Probability of being awarded a permit for the Core Enchantments on Tuesday: 14.56% (98/673)


Unnamed: 0,Day,Probability,Total Awarded Day of Week Core Zone,Total Awarded Core Zone,Probability (%)
0,Sunday,0.188707,127,673,18.87%
1,Tuesday,0.145617,98,673,14.56%
2,Saturday,0.144131,97,673,14.41%
3,Thursday,0.138187,93,673,13.82%
4,Wednesday,0.132244,89,673,13.22%
5,Monday,0.1263,85,673,12.63%
6,Friday,0.124814,84,673,12.48%


In [53]:
# Let's do the same analysis look but for day of the week and month
# Get the total and day columns from the day skipped dataframe
df_total_skipped_by_day = df_prob_skipped_core_zone_day[
    ["Total Skipped by Day Core Zone", "Day"]
]
df_total_awarded_by_day = df_prob_split_core_zone_day[
    ["Total Awarded Day of Week Core Zone", "Day"]
]

# Merge the total and day columns with the awarded group size dataframe
df_prob_day_skipped_core_zone_total = pd.merge(
    df_total_skipped_by_day, df_total_awarded_by_day, on="Day"
)

# Calculate the percentage of awarded group size entries compared to the skipped group size entries
df_prob_day_skipped_core_zone_total["Percentage"] = (
    df_prob_day_skipped_core_zone_total["Total Awarded Day of Week Core Zone"]
    / df_prob_day_skipped_core_zone_total["Total Skipped by Day Core Zone"]
)

# Sort the dataframe by the percentage column
df_prob_day_skipped_core_zone_total = df_prob_day_skipped_core_zone_total.sort_values(
    by="Percentage", ascending=False
)

# Show the percentage column as a percentage
df_prob_day_skipped_core_zone_total["Percentage"] = df_prob_day_skipped_core_zone_total[
    "Percentage"
].map("{:.2%}".format)

# Reorder columns
df_prob_day_skipped_core_zone_total = df_prob_day_skipped_core_zone_total[
    [
        "Day",
        "Total Awarded Day of Week Core Zone",
        "Total Skipped by Day Core Zone",
        "Percentage",
    ]
]

# Show the dataframe
df_prob_day_skipped_core_zone_total

Unnamed: 0,Day,Total Awarded Day of Week Core Zone,Total Skipped by Day Core Zone,Percentage
6,Sunday,127,144,88.19%
5,Saturday,97,148,65.54%
3,Tuesday,98,155,63.23%
4,Wednesday,89,154,57.79%
2,Thursday,93,172,54.07%
1,Monday,85,184,46.20%
0,Friday,84,189,44.44%


This is a very surprising lineup of days for the Core Zone.

This statistic is saying that Sunday is the best day to increase your chance of being awarded and not being skipped. However, it's also saying that Saturday is the second best day! **Furthermore, application entries for Monday in the Core Zone had a worse awarded-to-skipped ratio than Thursday!**


In [54]:
# Ok, one more look, let's look at the month awarded compared to skipped
split_core_zone_awarded = df_split_core_zone[(df_split_core_zone["awarded"] == True)]

prob_split_core_zone_month = []

# Loop over the group sizes on Sunday in August and see who did the best
for month in months:
    split_core_zone_month_filter = (
        split_core_zone_awarded["preferred_entry_date_month"] == month
    )

    total_split_core_zone_awarded = len(split_core_zone_awarded)

    split_core_zone_month_awarded = split_core_zone_awarded[
        split_core_zone_month_filter
    ]

    total_split_core_zone_month_awarded = len(split_core_zone_month_awarded)

    prob_split_core_zone_awarded = (
        total_split_core_zone_month_awarded / total_split_core_zone_awarded
    )

    print(
        f"Probability of being awarded a permit for the Core Enchantments on {month}: {prob_split_core_zone_awarded:.2%} ({total_split_core_zone_month_awarded}/{total_split_core_zone_awarded})"
    )

    prob_split_core_zone_month.append(
        [
            month,
            prob_split_core_zone_awarded,
            total_split_core_zone_month_awarded,
            total_split_core_zone_awarded,
        ]
    )

# Create dataframe from list
df_prob_split_core_zone_month = zone_probabilities_to_crosstab(
    prob_split_core_zone_month,
    [
        "Month",
        "Probability",
        "Total Awarded Month Core Zone",
        "Total Awarded Core Zone",
    ],
)

# Show crosstab of the new dataframe
df_prob_split_core_zone_month

Probability of being awarded a permit for the Core Enchantments on August: 18.57% (125/673)
Probability of being awarded a permit for the Core Enchantments on September: 16.94% (114/673)
Probability of being awarded a permit for the Core Enchantments on July: 18.13% (122/673)
Probability of being awarded a permit for the Core Enchantments on May: 11.29% (76/673)
Probability of being awarded a permit for the Core Enchantments on October: 16.94% (114/673)
Probability of being awarded a permit for the Core Enchantments on June: 18.13% (122/673)


Unnamed: 0,Month,Probability,Total Awarded Month Core Zone,Total Awarded Core Zone,Probability (%)
0,August,0.185736,125,673,18.57%
1,July,0.181278,122,673,18.13%
2,June,0.181278,122,673,18.13%
3,September,0.169391,114,673,16.94%
4,October,0.169391,114,673,16.94%
5,May,0.112927,76,673,11.29%


In [55]:
# Let's do the same analysis look at the month
# Get the total and months columns from the month skipped dataframe
df_total_skipped_by_month = df_prob_skipped_core_zone_month[
    ["Total Month Skipped Core Zone", "Month"]
]
df_total_awarded_by_month = df_prob_split_core_zone_month[
    ["Total Awarded Month Core Zone", "Month"]
]

# Merge the total and month columns with the awarded group size dataframe
df_prob_month_skipped_core_zone_total = pd.merge(
    df_total_skipped_by_month, df_total_awarded_by_month, on="Month"
)

# Calculate the percentage of awarded group size entries compared to the skipped group size entries
df_prob_month_skipped_core_zone_total["Percentage"] = (
    df_prob_month_skipped_core_zone_total["Total Awarded Month Core Zone"]
    / df_prob_month_skipped_core_zone_total["Total Month Skipped Core Zone"]
)

# Sort the dataframe by the percentage column
df_prob_month_skipped_core_zone_total = (
    df_prob_month_skipped_core_zone_total.sort_values(by="Percentage", ascending=False)
)

# Show the percentage column as a percentage
df_prob_month_skipped_core_zone_total["Percentage"] = (
    df_prob_month_skipped_core_zone_total["Percentage"].map("{:.2%}".format)
)

# Reorder columns
df_prob_month_skipped_core_zone_total = df_prob_month_skipped_core_zone_total[
    [
        "Month",
        "Total Awarded Month Core Zone",
        "Total Month Skipped Core Zone",
        "Percentage",
    ]
]

# Show the dataframe
df_prob_month_skipped_core_zone_total

Unnamed: 0,Month,Total Awarded Month Core Zone,Total Month Skipped Core Zone,Percentage
5,May,76,58,131.03%
4,October,114,114,100.00%
3,June,122,141,86.52%
2,July,122,257,47.47%
0,August,125,291,42.96%
1,September,114,285,40.00%


Not a very surprisiing. If you want to win, you should apply in May or October. Applications in May were awarded more times than they were skipped. Similarly, applications in October for the Core Zone were awarded at the same clip that they were rejected.

It's pretty wild the gap between the three less busy, or snowy month, and the summer month. There's not a ton of daylight between July, August, and September.


In [56]:
# Create an awarded to skipped comparison for the zones

prob_zone_skipped = []

for zone in zones_values:
    zone_filter = df_split_skipped["preferred_division"] == zone

    zone_data = df_split_skipped[zone_filter]

    total_zone = len(zone_data)

    total_skipped = len(df_split_skipped)

    prob_zone_df_split_skipped = total_zone / total_skipped

    prob_zone_skipped.append(
        [
            zone,
            prob_zone_df_split_skipped,
            total_skipped,
            total_zone,
        ]
    )

    print(
        f"Probability of being awarded but skipped given {zone}: {prob_zone_df_split_skipped:.2%} ({total_zone}/{total_skipped})"
    )

# Create dataframe from list
df_prob_zone_skipped = zone_probabilities_to_crosstab(
    prob_zone_skipped,
    ["Zone", "Probability", "Total Skipped", "Total Zone Skipped"],
)

# Show crosstab of the new dataframe
df_prob_zone_skipped

Probability of being awarded but skipped given Core Enchantment Zone: 61.58% (1146/1861)
Probability of being awarded but skipped given Colchuck Zone: 20.10% (374/1861)
Probability of being awarded but skipped given Snow Zone: 8.76% (163/1861)
Probability of being awarded but skipped given Stuart  Zone: 6.72% (125/1861)
Probability of being awarded but skipped given Eightmile/Caroline Zone: 2.20% (41/1861)
Probability of being awarded but skipped given Eightmile/Caroline Zone (stock): 0.38% (7/1861)
Probability of being awarded but skipped given Stuart Zone (stock): 0.27% (5/1861)


Unnamed: 0,Zone,Probability,Total Skipped,Total Zone Skipped,Probability (%)
0,Core Enchantment Zone,0.615798,1861,1146,61.58%
1,Colchuck Zone,0.200967,1861,374,20.10%
2,Snow Zone,0.087587,1861,163,8.76%
3,Stuart Zone,0.067168,1861,125,6.72%
4,Eightmile/Caroline Zone,0.022031,1861,41,2.20%
5,Eightmile/Caroline Zone (stock),0.003761,1861,7,0.38%
6,Stuart Zone (stock),0.002687,1861,5,0.27%


In [57]:
# Create an awarded to skipped ratio for zones
# Get the total and zone columns from the zone skipped dataframe
df_total_skipped_by_zone = df_prob_zone_skipped[["Total Zone Skipped", "Zone"]]
df_total_awarded_by_zone = df_prob_awarded_zone_split[["Total Awarded", "Zone"]]

# Merge the total and zone columns with the awarded group size dataframe
df_prob_zone_skipped_total = pd.merge(
    df_total_skipped_by_zone, df_total_awarded_by_zone, on="Zone"
)

# Calculate the percentage of awarded group size entries compared to the skipped group size entries
df_prob_zone_skipped_total["Percentage"] = (
    df_prob_zone_skipped_total["Total Awarded"]
    / df_prob_zone_skipped_total["Total Zone Skipped"]
)

# Sort the dataframe by the percentage column
df_prob_zone_skipped_total = df_prob_zone_skipped_total.sort_values(
    by="Percentage", ascending=False
)

# Show the percentage column as a percentage
df_prob_zone_skipped_total["Percentage"] = df_prob_zone_skipped_total["Percentage"].map(
    "{:.2%}".format
)

# Reorder columns
df_prob_zone_skipped_total = df_prob_zone_skipped_total[
    [
        "Zone",
        "Total Awarded",
        "Total Zone Skipped",
        "Percentage",
    ]
]

# Show the dataframe
df_prob_zone_skipped_total.to

AttributeError: 'DataFrame' object has no attribute 'to'

This was the first awarded-to-skipped ratio where the majority of the values were over 1, or 100%. It appears that only the Colchuck Zone and the Core Enchantment Zone are skipped more than they are awarded.


In [None]:
# Find all the entries in May, on a Friday, with a group size of 1
july_friday_3 = df_split[
    (df_split["preferred_entry_date_month"] == "July")
    & (df_split["preferred_entry_date_day"] == "Friday")
    & (df_split["minimum_acceptable_group_size"] == 3)
]

july_friday_3_awarded_filter = july_friday_3["awarded"] == True

july_friday_3_awarded = july_friday_3[july_friday_3_awarded_filter]

july_friday_3_awarded

Unnamed: 0,preferred_division,preferred_entry_date,minimum_acceptable_group_size,results_status,awarded_preference,awarded_entry_date,awarded_entrance_code_name,awarded_group_size,awarded,preferred_option,preferred_entry_date_month,preferred_entry_date_day
13565,Snow Zone,2021-07-30,3,Accepted,1,2021-07-30,Snow Zone,3,True,1,July,Friday
23295,Eightmile/Caroline Zone,2021-07-09,3,Accepted,1,2021-07-09,Eightmile/Caroline Zone,3,True,1,July,Friday
24951,Snow Zone,2021-07-02,3,Accepted,1,2021-07-02,Snow Zone,3,True,1,July,Friday
23511,Snow Zone,2021-07-09,3,Accepted,2,2021-07-09,Snow Zone,3,True,2,July,Friday
16378,Snow Zone,2021-07-16,3,Accepted,3,2021-07-16,Snow Zone,3,True,3,July,Friday


In [None]:
# Finally all the entries in August, on a Friday, with a group size of 1
august_friday_1 = df_split[
    (df_split["preferred_entry_date_month"] == "August")
    & (df_split["preferred_entry_date_day"] == "Friday")
    & (df_split["minimum_acceptable_group_size"] == 1)
]

august_friday_1_awarded_filter = august_friday_1["awarded"] == True

august_friday_1_awarded = august_friday_1[august_friday_1_awarded_filter]

august_friday_1_awarded

Unnamed: 0,preferred_division,preferred_entry_date,minimum_acceptable_group_size,results_status,awarded_preference,awarded_entry_date,awarded_entrance_code_name,awarded_group_size,awarded,preferred_option,preferred_entry_date_month,preferred_entry_date_day
11135,Core Enchantment Zone,2021-08-27,1,Accepted,2,2021-08-27,Core Enchantment Zone,1,True,2,August,Friday


In [None]:
# Final all the entries in August, on a Tuesday, with a group size of 2
august_tuesday_2 = df_split[
    (df_split["preferred_entry_date_month"] == "August")
    & (df_split["preferred_entry_date_day"] == "Tuesday")
    & (df_split["minimum_acceptable_group_size"] == 2)
]

august_tuesday_2_awarded_filter = august_tuesday_2["awarded"] == True

august_tuesday_2_awarded = august_tuesday_2[august_tuesday_2_awarded_filter]

august_tuesday_2

Unnamed: 0,preferred_division,preferred_entry_date,minimum_acceptable_group_size,results_status,awarded_preference,awarded_entry_date,awarded_entrance_code_name,awarded_group_size,awarded,preferred_option,preferred_entry_date_month,preferred_entry_date_day
8,Colchuck Zone,2021-08-31,2,Unsuccessful,0,1970-01-01,,0,False,1,August,Tuesday
47,Core Enchantment Zone,2021-08-03,2,Unsuccessful,0,1970-01-01,,0,False,1,August,Tuesday
76,Core Enchantment Zone,2021-08-17,2,Cancelled,0,1970-01-01,,0,False,1,August,Tuesday
135,Core Enchantment Zone,2021-08-24,2,Unsuccessful,0,1970-01-01,,0,False,1,August,Tuesday
181,Core Enchantment Zone,2021-08-24,2,Unsuccessful,0,1970-01-01,,0,False,1,August,Tuesday
...,...,...,...,...,...,...,...,...,...,...,...,...
36087,Colchuck Zone,2021-08-24,2,Unsuccessful,0,1970-01-01,,0,False,3,August,Tuesday
36206,Core Enchantment Zone,2021-08-24,2,Unsuccessful,0,1970-01-01,,0,False,3,August,Tuesday
36217,Core Enchantment Zone,2021-08-10,2,Unsuccessful,0,1970-01-01,,0,False,3,August,Tuesday
36322,Snow Zone,2021-08-03,2,Unsuccessful,0,1970-01-01,,0,False,3,August,Tuesday


In [None]:
len(august_tuesday_2_awarded)

19

In [None]:
# Lets look at september on a monday with a group size of 5
september_monday_5 = df_split[
    (df_split["preferred_entry_date_month"] == "September")
    & (df_split["preferred_entry_date_day"] == "Monday")
    & (df_split["minimum_acceptable_group_size"] == 5)
]

september_monday_5_awarded_filter = september_monday_5["awarded"] == True

september_monday_5_awarded = september_monday_5[september_monday_5_awarded_filter]

len(september_monday_5_awarded)

8

In [None]:
# Lets look at september on a monday with a group size of 2
september_monday_2 = df_split[
    (df_split["preferred_entry_date_month"] == "September")
    & (df_split["preferred_entry_date_day"] == "Monday")
    & (df_split["minimum_acceptable_group_size"] == 2)
]

september_monday_2_awarded_filter = september_monday_2["awarded"] == True

september_monday_2_awarded = september_monday_2[september_monday_2_awarded_filter]

september_monday_2

Unnamed: 0,preferred_division,preferred_entry_date,minimum_acceptable_group_size,results_status,awarded_preference,awarded_entry_date,awarded_entrance_code_name,awarded_group_size,awarded,preferred_option,preferred_entry_date_month,preferred_entry_date_day
167,Core Enchantment Zone,2021-09-06,2,Unsuccessful,0,1970-01-01,,0,False,1,September,Monday
177,Colchuck Zone,2021-09-06,2,Unsuccessful,0,1970-01-01,,0,False,1,September,Monday
550,Core Enchantment Zone,2021-09-20,2,Unsuccessful,0,1970-01-01,,0,False,1,September,Monday
936,Core Enchantment Zone,2021-09-06,2,Unsuccessful,0,1970-01-01,,0,False,1,September,Monday
993,Core Enchantment Zone,2021-09-20,2,Unsuccessful,0,1970-01-01,,0,False,1,September,Monday
...,...,...,...,...,...,...,...,...,...,...,...,...
35828,Core Enchantment Zone,2021-09-27,2,Unsuccessful,0,1970-01-01,,0,False,3,September,Monday
35902,Core Enchantment Zone,2021-09-13,2,Unsuccessful,0,1970-01-01,,0,False,3,September,Monday
36028,Stuart Zone,2021-09-06,2,Unsuccessful,0,1970-01-01,,0,False,3,September,Monday
36103,Core Enchantment Zone,2021-09-13,2,Unsuccessful,0,1970-01-01,,0,False,3,September,Monday


In [None]:
len(september_monday_2_awarded)

10

In [None]:
# Let's look at September on a Sunday with a group size of 2
september_sunday_2 = df_split[
    (df_split["preferred_entry_date_month"] == "September")
    & (df_split["preferred_entry_date_day"] == "Sunday")
    & (df_split["minimum_acceptable_group_size"] == 2)
]

september_sunday_2_awarded_filter = september_sunday_2["awarded"] == True

september_sunday_2_awarded = september_sunday_2[september_sunday_2_awarded_filter]

september_sunday_2

Unnamed: 0,preferred_division,preferred_entry_date,minimum_acceptable_group_size,results_status,awarded_preference,awarded_entry_date,awarded_entrance_code_name,awarded_group_size,awarded,preferred_option,preferred_entry_date_month,preferred_entry_date_day
75,Core Enchantment Zone,2021-09-05,2,Unsuccessful,0,1970-01-01,,0,False,1,September,Sunday
153,Colchuck Zone,2021-09-19,2,Unsuccessful,0,1970-01-01,,0,False,1,September,Sunday
174,Core Enchantment Zone,2021-09-05,2,Unsuccessful,0,1970-01-01,,0,False,1,September,Sunday
802,Core Enchantment Zone,2021-09-19,2,Unsuccessful,0,1970-01-01,,0,False,1,September,Sunday
1436,Core Enchantment Zone,2021-09-05,2,Unsuccessful,0,1970-01-01,,0,False,1,September,Sunday
...,...,...,...,...,...,...,...,...,...,...,...,...
35739,Core Enchantment Zone,2021-09-05,2,Unsuccessful,0,1970-01-01,,0,False,3,September,Sunday
35900,Core Enchantment Zone,2021-09-19,2,Unsuccessful,0,1970-01-01,,0,False,3,September,Sunday
36215,Core Enchantment Zone,2021-09-19,2,Unsuccessful,0,1970-01-01,,0,False,3,September,Sunday
36471,Core Enchantment Zone,2021-09-19,2,Unsuccessful,0,1970-01-01,,0,False,3,September,Sunday


In [None]:
len(september_sunday_2_awarded)

15

In [None]:
# Let's look like July on a Sunday with a group size of 2
july_sunday_2 = df_split[
    (df_split["preferred_entry_date_month"] == "July")
    & (df_split["preferred_entry_date_day"] == "Sunday")
    & (df_split["minimum_acceptable_group_size"] == 2)
]

july_sunday_2_awarded_filter = july_sunday_2["awarded"] == True

july_sunday_2_awarded = july_sunday_2[july_sunday_2_awarded_filter]

july_sunday_2

Unnamed: 0,preferred_division,preferred_entry_date,minimum_acceptable_group_size,results_status,awarded_preference,awarded_entry_date,awarded_entrance_code_name,awarded_group_size,awarded,preferred_option,preferred_entry_date_month,preferred_entry_date_day
1273,Core Enchantment Zone,2021-07-11,2,Unsuccessful,0,1970-01-01,,0,False,1,July,Sunday
1564,Core Enchantment Zone,2021-07-04,2,Unsuccessful,0,1970-01-01,,0,False,1,July,Sunday
1884,Core Enchantment Zone,2021-07-25,2,Unsuccessful,0,1970-01-01,,0,False,1,July,Sunday
2033,Core Enchantment Zone,2021-07-11,2,Unsuccessful,0,1970-01-01,,0,False,1,July,Sunday
2273,Snow Zone,2021-07-25,2,Unsuccessful,0,1970-01-01,,0,False,1,July,Sunday
...,...,...,...,...,...,...,...,...,...,...,...,...
33979,Core Enchantment Zone,2021-07-25,2,Unsuccessful,0,1970-01-01,,0,False,3,July,Sunday
34200,Core Enchantment Zone,2021-07-18,2,Unsuccessful,0,1970-01-01,,0,False,3,July,Sunday
34233,Colchuck Zone,2021-07-25,2,Unsuccessful,0,1970-01-01,,0,False,3,July,Sunday
35720,Eightmile/Caroline Zone,2021-07-25,2,Unsuccessful,0,1970-01-01,,0,False,3,July,Sunday


In [None]:
len(july_sunday_2_awarded)

17

In [None]:
# Let's look at August on a Sunday with a group size of 2
august_sunday_2 = df_split[
    (df_split["preferred_entry_date_month"] == "August")
    & (df_split["preferred_entry_date_day"] == "Sunday")
    & (df_split["minimum_acceptable_group_size"] == 2)
]

august_sunday_2_awarded_filter = august_sunday_2["awarded"] == True

august_sunday_2_awarded = august_sunday_2[august_sunday_2_awarded_filter]

august_sunday_2

Unnamed: 0,preferred_division,preferred_entry_date,minimum_acceptable_group_size,results_status,awarded_preference,awarded_entry_date,awarded_entrance_code_name,awarded_group_size,awarded,preferred_option,preferred_entry_date_month,preferred_entry_date_day
244,Stuart Zone,2021-08-29,2,Unsuccessful,0,1970-01-01,,0,False,1,August,Sunday
382,Core Enchantment Zone,2021-08-08,2,Accepted,1,2021-08-08,Core Enchantment Zone,2,True,1,August,Sunday
392,Colchuck Zone,2021-08-15,2,Unsuccessful,0,1970-01-01,,0,False,1,August,Sunday
445,Snow Zone,2021-08-29,2,Unsuccessful,0,1970-01-01,,0,False,1,August,Sunday
553,Core Enchantment Zone,2021-08-01,2,Accepted,1,2021-08-01,Core Enchantment Zone,2,True,1,August,Sunday
...,...,...,...,...,...,...,...,...,...,...,...,...
35523,Core Enchantment Zone,2021-08-01,2,Unsuccessful,0,1970-01-01,,0,False,3,August,Sunday
35881,Core Enchantment Zone,2021-08-01,2,Unsuccessful,0,1970-01-01,,0,False,3,August,Sunday
36033,Stuart Zone,2021-08-08,2,Unsuccessful,0,1970-01-01,,0,False,3,August,Sunday
36452,Core Enchantment Zone,2021-08-01,2,Unsuccessful,0,1970-01-01,,0,False,3,August,Sunday


In [None]:
len(august_sunday_2_awarded)

22

In [None]:
# Let's compare applying for permit in august on a tuesday for 2 with applying for permit in september on a monday for 2
print(
    f"Probability of being awarded a permit for August on a Tuesday with a group size of 2: {len(august_tuesday_2_awarded)/len(august_tuesday_2):.2%} ({len(august_tuesday_2_awarded)}/{len(august_tuesday_2)})"
)

print(
    f"Probability of being awarded a permit for August on a Sunday with a group size of 2: {len(august_sunday_2_awarded)/len(august_sunday_2):.2%} ({len(august_sunday_2_awarded)}/{len(august_sunday_2)})"
)

print(
    f"Probability of being awarded a permit for September on a Monday with a group size of 2: {len(september_monday_2_awarded)/len(september_monday_2):.2%} ({len(september_monday_2_awarded)}/{len(september_monday_2)})"
)

print(
    f"Probability of being awarded a permit for September on a Sunday with a group size of 2: {len(september_sunday_2_awarded)/len(september_sunday_2):.2%} ({len(september_sunday_2_awarded)}/{len(september_sunday_2)})"
)

print(
    f"Probability of being awarded a permit for July on a Sunday with a group size of 2: {len(july_sunday_2_awarded)/len(july_sunday_2):.2%} ({len(july_sunday_2_awarded)}/{len(july_sunday_2)})"
)

Probability of being awarded a permit for August on a Tuesday with a group size of 2: 1.67% (19/1136)
Probability of being awarded a permit for August on a Sunday with a group size of 2: 3.88% (22/567)
Probability of being awarded a permit for September on a Monday with a group size of 2: 1.53% (10/652)
Probability of being awarded a permit for September on a Sunday with a group size of 2: 4.60% (15/326)
Probability of being awarded a permit for July on a Sunday with a group size of 2: 4.94% (17/344)


The probabilities above should come as a shock to anyone who applied for a permit on Monday in September thinking they were making a safe bet! They could have traded it for a Tuesday in August and had similar odds amongst their peers. Or, really increased their odds by applying on a Sunday in August.

However, equally surprising is similar odds for a Sunday in September and a Sunday is July, holding group size constant.


In [None]:
df_split_awarded = df_split[df_split["awarded"] == True]

pd.crosstab(
    df_split_awarded["awarded_entrance_code_name"],
    df_split_awarded["preferred_entry_date_month"],
    margins=True,
)

preferred_entry_date_month,August,July,June,May,October,September,All
awarded_entrance_code_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Colchuck Zone,67,66,64,37,50,64,348
Core Enchantment Zone,125,122,122,76,114,114,673
Eightmile/Caroline Zone,64,62,58,27,25,58,294
Eightmile/Caroline Zone (stock),3,4,5,3,1,4,20
Snow Zone,129,128,124,63,74,124,642
Stuart Zone,98,97,90,37,49,83,454
Stuart Zone (stock),0,0,0,0,3,11,14
All,486,479,463,243,316,458,2445


In [None]:
# Check probability of being awarded permit for Monday, Tuessday, Wednesday in July
july_mon_tues_wed_split = df_split[
    (df_split["preferred_entry_date_month"] == "July")
    & (
        (df_split["preferred_entry_date_day"] == "Monday")
        | (df_split["preferred_entry_date_day"] == "Tuesday")
        | (df_split["preferred_entry_date_day"] == "Wednesday")
    )
]

july_mon_tues_wed_awarded_filter = july_mon_tues_wed_split["awarded"] == True


july_mon_tues_wed_awarded = july_mon_tues_wed_split[july_mon_tues_wed_awarded_filter]

total_july_mon_tues_wed_awarded = len(july_mon_tues_wed_awarded)

total_july_mon_tues_wed = len(july_mon_tues_wed_split)

prob_july_mon_tues_wed_awarded = (
    total_july_mon_tues_wed_awarded / total_july_mon_tues_wed
)

print(
    f"Probability of being awarded a permit for Monday, Tuesday, Wednesday in July: {prob_july_mon_tues_wed_awarded:.2%} ({total_july_mon_tues_wed_awarded}/{total_july_mon_tues_wed})"
)

# Check probability of being awarded permit for Thursday, Friday, Saturday in August
august_thurs_fri_sat_split = df_split[
    (df_split["preferred_entry_date_month"] == "August")
    & (
        (df_split["preferred_entry_date_day"] == "Thursday")
        | (df_split["preferred_entry_date_day"] == "Friday")
        | (df_split["preferred_entry_date_day"] == "Saturday")
    )
]

august_thurs_fri_sat_awarded_filter = august_thurs_fri_sat_split["awarded"] == True

august_thurs_fri_sat_awarded = august_thurs_fri_sat_split[
    august_thurs_fri_sat_awarded_filter
]

total_august_thurs_fri_sat_awarded = len(august_thurs_fri_sat_awarded)

total_august_thurs_fri_sat = len(august_thurs_fri_sat_split)

prob_august_thurs_fri_sat_awarded = (
    total_august_thurs_fri_sat_awarded / total_august_thurs_fri_sat
)

print(
    f"Probability of being awarded a permit for Thursday, Friday, Saturday in August: {prob_august_thurs_fri_sat_awarded:.2%} ({total_august_thurs_fri_sat_awarded}/{total_august_thurs_fri_sat})"
)

Probability of being awarded a permit for Monday, Tuesday, Wednesday in July: 1.56% (178/11434)
Probability of being awarded a permit for Thursday, Friday, Saturday in August: 0.94% (178/18968)


## Key Takeaways

### General

-   July sees the second most application options entries by month (~40k for August, ~30k for July, ~23k for September).
-   May, October, and June had by far the best awarded-to-skipped ratios. There's not a huge difference between July and August/September. I don't think you gain that much by applying for September or July over August.
-   Sunday sees by far the least amount of competition.
-   Mondays are very competitive! Avoid Thursdays, Fridays, and Mondays.
-   Saturdays are less competitive than Mondays.
-   Colchuck isn't that much less competitive than Core.

### Core Zone

-   Applicants should submit _at least one_ option for the Core Enchantments out of their three options.
-   Friday, Tuesday, and Sunday had the best awarded-to-skipped ratios for the Core Enchantments
-   Group sizes of 1, 3, and 2 had the best awarded-to-skipped ratios for the Core Enchantments
-   Avoid high odd numbers for group sizes in the Core Enchantments. A group size of 8 is better than 5 and 7. These group sizes have the worse awarded-to-skipped ratios.

-   There's a bit of game theory going on with the lottery. Applicants should consider what other people do in terms of group size (odd vs. even) and day of the week.

## What I'd Do

-   If I had to apply today based on these results, this is what I'd do:
    -   Option 1: Core Enchantments Zone, Group Size of 3, Sunday, August
    -   Option 2: Colchuck Zone, Group Size of 3, Sunday, late July
    -   Option 3: Stuart Zone, Group Size of 3, Sunday, August

Let me explain each of these choices and my reasoning behind them.

### Core Enchantments Zone, Group Size of 3, Sunday, August

A total of 1,039 applicants were awarded permits other than the Core Zone **despite** applying for at least one Core Zone option (~42.5% of the permits awarded). If you add the people that applied for the Core Zone and won that number jumps to 1,712 (70%). Therefore, you don't lose much by applying for the Core Zone in one entry. You still have a good chance of winning a permit. What matters is that you hedge your chances of winning in later options.

Knowing that I should apply for the Core Zone, I'd reduce my chances of being skipped in two ways. First, I'd select a group size of 3. A group size of three had the second best awarded-to-skipped ratio which was ahead of a group size of 2. Group sizes of 4 had a worse ratio than group sizes of 6. A group size of 1 would be the best option, but I'd rather not go alone. Second, I'd choose a Sunday. Starting your trip on a Sunday had, by far, the best ratio of awarded-to-skipped. Trips starting on Friday or Monday were about twice as likely to be skipped rather than awarded compared to Sundays.

As for the month, the best months for avoiding being skipped are May, October, and June. Unfortunately, these months are all questionable weather months. They're not worth the effort or risk to me. Conversely, July, August, and September are better months for weather. So why would I choose August instead of July or September? Well, the month with the worse awarded-to-skipped ratio was September and the difference between the awarded-to-skipped ratio for July to August was pretty small. The benefits of August seem to greatly outweight the small gain of choosing July over August.

The choice of group size is only relevant for the Core Zone. The entry date of Sunday is repeated in all three options so I won't repeat my reasoning in later explanations.

### Colchuck Zone, Group Size of 4, Sunday, late July

The Snow Zone provided the second most permits in the lottery behind the Core Enchantment Zone. Additionally, applicants were more than twice as likely to receive a permit for the Snow Zone (7.46%) compared to the Colchuck Zone (2.97%). So why would I still choose the Colchuck Zone over the Snow Zone given these probabilities?

There are two reasons why applying for the Snow Zone is less desirable than the Colchuck Zone:

1. The hike into the Snow Zone is longer than the Colchuck Zone and the climb from the Snow Zone into the Lower Enchantments is worse, in my opinion, than the climb from Aasgard Pass (above Colchuck) to the Upper Enchantments. From the Snow Lakes trailhead into the Core Enchantments is some 8 or 9 miles if I remember. And the rock faces, big step up, and unneasy terrain between Snow Lakes and the Lower Enchantments is brutal and, _in my opinion_, more dangerous than Aasgard Pass. There's a reason much more people are going up Aasgard than Snow Lakes. It's enough for me to ignore the probability data.

2. Colchuck Lake is beautiful. Even if you don't decide to go up Aasgard Pass into the Upper Enchantments a trip to Colchuck Lake is stunning. Even with the crowds, crazy parking lots, etc. The scenery at Colchuck Lake is far and away more worth it than Snow Lake. The Snow Lakes are a loved part of the Enchantment Permit Zone, but they are not why people go to the Enchantment Permit Zone. I'll take my chances.

I'd be slightly increasing my chances of being awarded a permit compared to the Core Enchantment Zone in the Colchuck Zone in Option 2.

As for choosing late July, instead of August, is a break from my Option 1 explanation. What gives? Colchuck is a few thousand feet lower than the Core Enchantment Zone. The climb up would be easier and the conditions would _probably_ be better. I would take the small increase in awarded-to-skipped ratio in this situation because I'd be camping at a lower and easier to reach elevation. Camping at Colchuck Lake would still provide the opportunity to hike up to the Enchantments during the day.

Day selection are the same as Option 1

### Stuart Zone, Group Size 4, Sunday, August

The previous two options saw the zones and months change in an attempt to better my chances as my options get deeper. Option 3 is no exception. I again want to increase my chances by choosing a less competitive zone.

The zones of the Enchantments are hard to understand. Making things worse are "complete" guides online that don't talk about how different each zone is. For example, the Eightmile/Caroline Zone is no where near the beautiful lakelets and sharp peaks of the Core Enchantments or Colchuck Zone. The Jack Creek Fire burned the landscape around Eightmile Lake making it, as the numbers show, less desirable. The Forest Service has also had to reduce permits in the area due to the fire.

For the uninitiated, the Snow Zone and Colchuck may seem on level playing ground for having a chance to get up into the Core Enchantments but they are quite different in difficulty and beauty. There are reasons why the Colchuck Zone is almost as competitive as the Core Zone.

Likewise, the Stuart Zone is miles away from the Colchuck Zone and much different than all the other zones. Despite the Stuart Zone not having the most lakes, or jagged peaks, it's a very underrated zone. As mentioned in an online trip report, the lake is not dammed, giving it a very natural shoreline. It also, due to it's unpopularity, provides more solitude than the other zones while still providing pretty spectacular views of the surrounding mountains (including Mount Stuart).

The trail to Stuart Lake is certainly not as steep, rocky, or long as the trail to the other more popular zones.

Camping at Stuart Lake provides opportunities for a grueling obstacle course hike up to the magnificent Horseshoe Lake, or a worthy day hike to Colchuck Lake. It's not _THE_ Enchantments, but it's a great plan B.

The reasoning behind day of the week, and Month are the same as Option 1.


In [None]:
df_split_awarded["preferred_entry_date_day"].value_counts()

preferred_entry_date_day
Sunday       433
Saturday     351
Tuesday      338
Thursday     335
Monday       330
Wednesday    330
Friday       328
Name: count, dtype: int64

In [None]:
# Merge dataframes
df_prob_month_awarded = pd.merge(
    df_split_awarded["preferred_entry_date_month"].value_counts(),
    df_split["preferred_entry_date_month"].value_counts(),
    on="preferred_entry_date_month",
    suffixes=("_awarded", "_total"),
)

# Divide the count_awarded by count_total and make new columns
df_prob_month_awarded["Probability"] = (
    df_prob_month_awarded["count_awarded"] / df_prob_month_awarded["count_total"]
)

# Sort the probability values
df_prob_month_awarded = df_prob_month_awarded.sort_values(
    by="Probability", ascending=False
)

# Show probability as percent
df_prob_month_awarded["Probability"] = df_prob_month_awarded["Probability"].map(
    "{:.2%}".format
)


# Show the dataframe
df_prob_month_awarded

Unnamed: 0_level_0,count_awarded,count_total,Probability
preferred_entry_date_month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
May,243,1638,14.84%
October,316,3518,8.98%
June,463,8579,5.40%
September,458,23842,1.92%
July,479,30188,1.59%
August,486,40877,1.19%
