# Motivation and Problem Statement
My original Plan A was too ambitious and I was unable to receive proper permission for using the website data (see Data Selected for Analysis for details on why that is), so I decided to go with Plan B! 

The U.S. Forest Service (USFS) operates an annual lottery for overnight camping permits in the Enchantments. The Enchantments are a popular area for hikers in the state of Washington, but to protect the plants and animals in the area, the USFS limits entry into the area via a permit system. Each year, hopeful hikers must register for the lottery for a chance to get a permit. Hikers must select their top 3 choices for an entry day and the zone within the Enchantments they want to visit, the “Core Zone” being the most popular. In the 2019 and 2020 cycles, the raw data of applicants was also released publicly via PDFs. While the USFS posts a summary PDF on the most popular days, which gives potential applicants an idea of what days to avoid, I wonder if analysis can be done to identify the combinations that optimize the chances of a hiker being selected. Since the lottery strives to fill up each day’s allotte permits to the maximum, I wonder if single-hiker applicants also have an increased chance of entry if they are used to fill in the remainders, and if so, by how much. 

I am interested in Plan B because of its topic being related to the outdoors. There's also some more personal benefits related to learning what day, zone, and party size combinations might be best to score an Enchantments permit, which I have not managed to do yet. 


# Data Selected for Analysis
For Plan B, the permit lottery data is available from [here](https://web.archive.org/web/20201020211744/https://www.fs.usda.gov/Internet/FSE_DOCUMENTS/fseprd695975.pdf) (2019) and [here](https://www.fs.usda.gov/Internet/FSE_DOCUMENTS/fseprd695975.pdf) (2020). The PDFs can be converted into spreadsheet format via Google Sheets. While the license is not explicity stated within the page or the PDF, the USFS is part of the U.S. Department of Agriculture, which has a website “Digital Rights and Copyright” section that states, “Most information presented on the USDA Web site is considered public domain information. Public domain information may be freely distributed or copied, but use of appropriate byline/photo/image credits is requested. Attribution may be cited as follows: “U.S. Department of Agriculture… Some material on the USDA Web site are protected by copyright, trademark, or patent, and/or are provided for personal use only… and USDA has made every attempt to identify and clearly label them.” The lottery data is assumed to be available to be public domain, given that the lottery is held by the USFS, the document is not labeled as being copyrighted, and the dataset does not have any personal information. The dataset is appropriate for analyzing Plan B because it is the only known dataset containing Enchantments permit lottery data, as it’s coming from the lottery operator (the USFS). There does not appear to be any immediate ethical concerns with using the dataset, aside from potentially increasing one’s chances of gaining a permit should the resulting analysis be accurate. If the days to select or methods of increasing one’s chances of receiving a permit is released publicly, an influx of applicants may try to submit information that increase their chances of obtaining a permit. 


# Unknowns and Dependencies
Plan A contained more unknowns, since it is a lengthier project that would require scraping for data. The need to scrape for data already introduces complications, such as if permission isn’t granted, or if the scraper does not work as intended. Naturally, because Plan A contained an additional data scraping step, it was also at higher risk than Plan B of running behind schedule or not being able to complete in time by the end of the quarter. As a graduating senior, I have a commitment to capstone, as well as gaining a full-time job post-graduation, and moving out of my current apartment back home which will also require a significant time investment. 

Overall, while Plan A was more interesting, Plan B seemed to be more feasible since it required an already-provided dataset, is a smaller dataset, and has a smaller set of possible questions to be answered (e.g., less time needed to choose what exactly to analyze). 



# Research Questions/Hypotheses
Permission was not granted to scrape the Washington Trails Association's (WTA) Webmaster to analyze its trail-related data. In addition, the time commitment for Plan A was already long. As a result, I will be choosing Plan B, which is about exploring the Enchantments Permit Lottery System. Questions include:

* What is the average probability of winning the permit lottery for trips starting on each day of the week?
* What is the likelihood of winning the permit lottery on a weekend day compared to a weekday day? 
* What is the likelihood of winning the permit lottery depending on group size?
* Is it more likely to win the lottery as a single-person party versus a group with 2+ people?
* Which days provide the greatest chance of winning the permit lottery? 

# Background
With the increasing population and increasing prevalence of scenic viewpoints being shared on social media, comes an increased amount of traffic to recreational areas, and greater strain on the resources within them. As a result, permit systems have been implemented by various land-managing agencies like the National Park Service, Bureau of Land Management, and U.S. Forest Service. With the creation of the Recreation.gov website in the early 2000's, it's been even easier for land-managing agencies to set up, run, and hold reservation lotteries via the integrated system. Since each permit lottery holds slight variations, it doesn't look like there's been a specific study on the overall recreational permit lottery system as a whole; meanwhile, the Enchantments lottery is at too small of a scale to have had much popularity for research studies. However, it is possible to view some related real-life phenomena that lends possible credence to the research questions being asked. For instance, single-person parties might have greater chances of winning the permit lottery because of their increased flexibility—that is, if there is only one additional space for the day, any other group that's 2 or more wouldn't be able to "win" that last spot for the day. In the realm of theme parks, we see places like Disneyland offer single rider queues where lone individuals can fill in the remaining seats for a ride. On [Mousehacking.com](https://www.mousehacking.com/blog/disney-world-single-rider-lines), it's claimed that the single rider line can save 25-50% off wait times. There's also clear evidence that weekends are tougher to win the outdoor permit lottery. People are likely to prefer venturing outdoors when they're not working, or when they can minimize time off, and for most people, that's Friday night to Sunday. Yosemite National Park runs a [similar lottery system](https://www.nps.gov/yose/planyourvisit/hdpermits.htm) for Half Dome where they saw that in the 2018 season, the "average success rate on weekdays was 47%, but only 24% on weekends." Likewise, the Mt. Whitney lottery [shows similar trends](https://www.fs.usda.gov/Internet/FSE_DOCUMENTS/fseprd617167.pdf) where Fridays and Saturdays experienced increased demand and competition. 

Thus, the name of the game seems to be flexibility, and increased flexibility allows individuals to choose and participate in the permit lottery in a way that maximizes their chance of winning. Yet, how much of an advantage applies in the case of the Enchantments lottery in terms of the day and group size? Is there even an advantage as imagined? That's what the research hopes to find out.  

# Methodology
Much of the analyses will be about summarizing the data into more digestible forms than the raw row-by-row data and then comparing the numbers to spot any differences. To verify statistical significance between different variables and scenarios, the t-test will be used. The data may be visualized in a variety of diagrams and charts, including a table of the most-likely days to win the permit lottery, a bar chart comparing the successful and unsuccessful application percentages by group size, and a time series graph to trend the number of permit applications over the course of the season. The table will be a helpful way to answer the question of which days will provide the greatest chance of winning the lottery. Meanwhile, the bar charts can help convey the likelihood of winning a permit by group size and day of the week. The time series chart can help show a more holistic view of the hiking season versus single day-by-day views or summaries. For instance, a time series can show the highest peaks (most competitive) and lowest valleys (least competitive) time periods. To conduct the analysis, the data from the U.S. Forest Service will likely be cleaned up and stored inside a Python dictionary, which will allow for the statistical work to happen via access to Python's statistical and visualization libraries. 

# Disclaimer

Due to time limitations, I opted to complete the preparation, cleaning, and analysis of the dataset in Google Sheets. I am aware that it's different from code; however, given my available time and energy (e.g., opting to not do any all-nighters), and preference for seeing the results, I made the intentional decision to conduct work in Google Sheets, and am ready to accept any penalties associated with that. I've tried to explain the process done in Google Sheets to the best of my ability to aid in any reproducing and replication of the results. In the future, I might revisit the data and try the analysis again with Python. But, alas, for now, Google Sheets will have to suffice. 

# Phase 1: Preparation, Cleaning, and Loading Datasets

The permit lottery data was retrieved from [here](https://web.archive.org/web/20201020211744/https://www.fs.usda.gov/Internet/FSE_DOCUMENTS/fseprd695975.pdf) (2019) and [here](https://www.fs.usda.gov/Internet/FSE_DOCUMENTS/fseprd695975.pdf) (2020).

Both 2019 and 2020 permit lottery data is provided by the USFS in the form of PDFs, which is not immediately analysis-friendly. Thus, both files were converted to a XLSX format via Adobe Acrobat, which offers a filetype conversion option via "Export". The files were then uploaded to Google Sheets for initial observation and cleaning. Adobe Acrobat was chosen because of the unreliability of 3rd party conversion services. For example, the 2019 file converted by a 3rd party service resulted in over 3,000 cell-related errors where certain data was misplaced.

Initial observations had to be done to ensure the conversion process did not remove any columns or introduce any anomalies. Cleaning was necessary because of the original PDF format, which contained unnecessary repetition of the headers on each page of data. 
[[ADD IMAGE??]](google.com)

These repeated headers required removal. While the headers could be removed via Python (e.g., by searching for any dictionary entries where the key and value were equal), I found it more time-efficient to complete the cleaning in Google Sheets. In Sheets, the process was simply setting up a filter for the entire spreadsheet. Then, using the filter for the "Preferred Entry Date 1" column, I opted to only show rows where the value was also "Preferred Entry Date 1," since that would indicate that it was a header row. I then highlighted and deleted all duplicated header rows. I repeated the process with the "Permit Type" column to minimize chances of missing a duplicate. 

**2020 Dataset:**
* The 2020 dataset started with 26,956 rows, which includes the first (not duplicated) instance of the header row.
* 518 duplicate header rows were identified and removed.
* The dataset ended with 26,438 rows.



**2019 Dataset:**
* The 19 dataset started with 24,897 rows, which includes the first (not duplicated) instance of the header row.
* 282 duplicate header rows were identified and removed.
* The dataset ended with 24,615 rows.


# Phase 2: Data Analysis and Findings

As mentioned in the Disclaimer, all analysis was conducted in Google Sheets using both manual calculation techniques (e.g., subtraction, division, multiplication) and functions (e.g., COUNT, COUNTIF, ISBLANK). The Google Sheets can be directly accessed here: https://drive.google.com/file/d/1gDTxfdES6L-W56ebMElB8dT9aL3IL0-5/view?usp=sharing or via the included Excel file ("enchantments_allsheets.xlsx").

## Group Size

One of the first factors explored was group size. I wanted to understand how group size might impact one's chances of winning the permit lottery, with the hypothesis being that smaller group sizes should increase probability of winning due to their increased flexibility (e.g., single member parties can fill in the remainder). 

**Steps to conduct the analysis:**
1. Click on the "Results Status" column in the "Copy of 2020" sheet. Filter "Results Status" to "Awarded."
2. Copy all the values in the "Awarded Group Size" column to a sheet called "# Pref + Group Size (2020)" in Column J.
3. In Column K, list out all possible group sizes (1-8)
4. In Column L, find all matches to the group size from Column K in Column J. A function like "=COUNTIF($J$2:$J, 8)" can be used where $J$2:$J is the range of cells to count once each time there's a match, and 8 being the group size number (which will go down all the way to 1). 
5. The value of each cell in Column L is then fivided by the total of all awarded permits (the sum of all cells in Column L). The resulting percentage is saved in cells in Column M. 

**Result:**

<img src="enchantments_groupsize.png">



## Number of Preferences

Another factor to explore was the number of preferences a party had listed. Leaders are able to input between 1 and 3 different zone and date combinations. Logic dictates that the more options entered, the better the chances; however, how much better is it to have all 3 versus 1 preference? How about 3 versus 2? 

**Steps to conduct the analysis:**
1. 
2.
3.


**Result:**

<img src="enchantments_preferences.png">


<img src="enchantments_preferences2.png">

Interestingly, 

<img src="enchantments_preferencediscussion.png">



## Day of the Week


**Steps to conduct the analysis:**


**Result:**

<img src="enchantments_dayofweek.png">


<img src="enchantments_dayofweek2.png">



## Day in the Season (May 15-October 31)


**Steps to conduct the analysis:**


**Result:**

<img src="enchantments_season.png">

# Phase 3: Discussion



It is important to note the limitations of the study, of which there are a few. First off, the 2019 and 2020 datasets included different data fields, which limited analysis between the two (e.g., one had awarded permits data and group size data, but the other didn't). 

The lottery process itself also hasn't been released. For instance, does the lottery choose a person and then look at their preferences to decide whether they get a permit or not based on available space? Or, does the lottery go day-by-day in the permit season, selecting people randomly based on those who had chosen that day? While assumptions can be made based on the results, it's an assumption, and there are caveats to be aware with all assumptions. Patterns may emerge and be present, but that may be caused by random chance.

# Phase 4: Conclusion and Next Steps



Ultimately, the research seeks to better understand the factors involved in winning the Enchantments Permit lottery, and the results do appear to follow the logical thoughts someone might have when thinking about ways to increase their chances of receiving an Echantments Permit. To increase one's chances of winning the lottery, an applicant should be as flexible as possible. In terms of preferences, it means filling out ALL 3 options. While filling out only 2 options does not greatly decrease chances, only filling 1 option almost halves one's chances of getting a permit. Meanwhile, applicants should consider starting their trips on Sunday, which is by far the least in-demand of the days of the week. Permits are most competitive during the middle of the season around August 14th; however, the Enchantments are often mostly snow-free well before and after the peak time, so individuals can try to schedule trips earlier in the season (for wildflowers) or later in the season (for larches). 

Unfortunately, the dataset's limitations and my personal time available limited the amount of analysis that could be conducted. In the future, researchers may consider submitting a FOIA request to the USFS to gain access to additional permit lottery data to allow for greater comparison across the years. They may also be able to obtain data like the group sizes for applications that were unsuccessful. 