<h1>Outdoor Recreation: Overnight Reservations</h1>
Here we would like to get an idea for:
<ul>
<li>how many overnight reservations occur, </li>
<li>in what type of facilities the guests stay,</li>
<li>the spread of where they stay,</li>
<li>when they stay,</li>
<li>and how many people stay.</li>
<ul>


Put simply, we would like to verify that there is enough variation in our data that it is reasonable to expect our models will provided differentiable recommendations depending on the attributes of a given host.

In [1]:
import pandas as pd
import numpy as np

In [2]:
res19 = pd.read_csv('reservations2019.csv')

  interactivity=interactivity, compiler=compiler, result=result)


In [3]:
res19.shape

(3479643, 36)

In [4]:
res19.columns

Index(['historicalreservationid', 'ordernumber', 'agency', 'orgid',
       'codehierarchy', 'regioncode', 'regiondescription', 'parentlocationid',
       'parentlocation', 'legacyfacilityid', 'park', 'sitetype', 'usetype',
       'productid', 'inventorytype', 'facilityid', 'facilityzip',
       'facilitystate', 'facilitylongitude', 'facilitylatitude', 'customerzip',
       'customerstate', 'customercountry', 'tax', 'usefee', 'tranfee',
       'attrfee', 'totalbeforetax', 'discount', 'totalpaid', 'startdate',
       'enddate', 'orderdate', 'numberofpeople', 'equipmentdescription',
       'equipmentlength'],
      dtype='object')

For the explaratory questions at hand we are interested in the columns
'regiondescription',
'parentlocationid',
'sitetype', 
'usetype',
'startdate',
'numberofpeople'

<h3>What proportion of reservations are for overnight stays?</h3>

In [5]:
res19['usetype'].unique()

array(['Overnight', nan, 'Day', 'Multi'], dtype=object)

In [6]:
(res19['usetype']=='Overnight').sum()

2459649

In [9]:
#proportions
day_prop = (res19['usetype']=='Day').sum()/3479643
overnight_prop = (res19['usetype']=='Overnight').sum()/3479643
multi_prop = (res19['usetype']=='Multi').sum()/3479643

In [16]:
print('Day:' + str(day_prop) + '\n' + 'Overnight:' + str(overnight_prop)+ '\n' + 'Multi:' + str(multi_prop))

Day:0.006335707427457357
Overnight:0.7068682045830563
Multi:6.322487680489062e-06


So we can see that about 2.5 million (a very dominant majority of the labelled observations) reservations are overnight, confirming that there is a market for overnight lodging options.

<h3>In what type of facilities do guests stay?</h3>

In [17]:
res19['sitetype'].unique()

array(['CABIN NONELECTRIC', 'Entrance', 'Historic Tour', nan,
       'Hiking Zone', 'Campsite', 'Shelter Nonelectric',
       'Standard Nonelectric', 'RV Nonelectric',
       'GROUP SHELTER NONELECTRIC', 'SHELTER NONELECTRIC',
       'GROUP PICNIC AREA', 'STANDARD ELECTRIC', 'TENT ONLY NONELECTRIC',
       'GROUP SHELTER ELECTRIC', 'CABIN ELECTRIC', 'TENT ONLY ELECTRIC',
       'GROUP STANDARD ELECTRIC', 'STANDARD NONELECTRIC',
       'GROUP STANDARD AREA NONELECTRIC', 'EQUESTRIAN NONELECTRIC',
       'GROUP TENT ONLY AREA NONELECTRIC', 'River', 'Entry Point',
       'Segment', 'Trailhead', 'WALK TO', 'GROUP STANDARD NONELECTRIC',
       'GROUP EQUESTRIAN', 'MOTOR', 'NONMOTOR', 'RV NONELECTRIC',
       'GROUP WALK TO', 'SHELTER ELECTRIC', 'RV ELECTRIC',
       'GROUP RV AREA NONELECTRIC', 'MANAGEMENT', 'Cabin', 'Campground',
       'Cave Tour', 'Hidden', 'Nature Tour', 'Tent Only Nonelectric',
       'Boat Tour', 'BOAT IN', 'HIKE TO', 'Zone', 'Houseboat',
       'GROUP HIKE TO', 'COURT

In [19]:
res19['sitetype'][res19['usetype']=='Overnight'].value_counts()

STANDARD NONELECTRIC                1030915
STANDARD ELECTRIC                    844828
TENT ONLY NONELECTRIC                244415
RV NONELECTRIC                        72310
WALK TO                               55277
RV ELECTRIC                           46370
CABIN NONELECTRIC                     30789
GROUP STANDARD NONELECTRIC            20424
HIKE TO                               19612
GROUP TENT ONLY AREA NONELECTRIC      18907
CABIN ELECTRIC                        10743
BOAT IN                               10229
MANAGEMENT                             9942
TENT ONLY ELECTRIC                     9345
EQUESTRIAN NONELECTRIC                 6582
GROUP STANDARD AREA NONELECTRIC        4116
OVERNIGHT SHELTER ELECTRIC             4094
EQUESTRIAN ELECTRIC                    3582
GROUP STANDARD ELECTRIC                2918
YURT                                   1634
GROUP HIKE TO                          1505
Shelter Nonelectric                    1451
Standard Nonelectric            

This demonstrates a wide variety of lodging options selected by guests.  This has the advantage that host recommendations can also be varied and, more pointedly, that there are gaps in market offerings that hosts near popular destinations can fill.  That is, standard (meaning no built facilities at location) being the most popular is more likely to be an indication of cost/impact savings on the part of the site managers because of the variety of locations chosen by guests.  This reasoning is further supported by standard electric being the second most popular choice, indicating that guests want some conveniences and would respond favorably to additional market offerings.

<h3>Where do people stay?</h3>

Here we are interested in verifying if the marketplace is diverse enough for a national market.  We will look at both the geographic spread of our overnight guests and the number of locations with a reasonable threshold of activity.  That is, if everyone stays near Yosemite, then we don't have much hope of making informative recommendations to hosts across the country (as we will see, this is not the case).

In [21]:
res19['regiondescription'][res19['usetype']=='Overnight'].value_counts()

Southwestern Div.          285289
Pacific Southwest          264833
Pacific West Region        221246
Intermountain Region       179465
South Atlantic Div         161021
Pacific Northwest          158333
Mississippi Valley         137886
Great Lakes / Ohio R       120409
Rocky Mountain RE          113232
Southeast Region           102295
Intermountain Reg.          94705
Northwestern Div.           91728
Southern Region             80112
Northeast Region            62208
Eastern Region              57062
Southwest Reg.              56655
Midwest Region              45019
Northern Region             41334
Pwr                         36257
South Pacific Div.          33706
Imr                         24222
North Atlantic Div          21069
Alaska Region               17635
National Capitol Region     12176
Great Plains                 5297
Mid Pacific                  5121
Oregon (BLM)                 4612
Nevada (BLM)                 3102
Washington (BLM)             1630
Alaska (BLM)  

In [28]:
locations = res19['parentlocationid'][res19['usetype']=='Overnight'].value_counts()

In [29]:
locations[locations > 1000]

74296       96729
74282       59684
74409       42688
74283       41417
74297       39776
            ...  
74651        1139
10008687     1110
74511        1089
72438        1035
72618        1025
Name: parentlocationid, Length: 303, dtype: int64

Our results indicate that while we do have some concentration in the West (as we would expect), overnight stay utilization has substantial spread over the country and over many different sites.  This will allow for regional recommendations and site specific recommendations to improve the granularity of our modelling results.

<h3>When do people stay?</h3>
Looking here to quantify seasonality.

In [49]:
dates = res19['startdate'][res19['usetype']=='Overnight']

In [50]:
months = dates.str.slice_replace(start=7, repl='').str.slice_replace(stop=5, repl='')

In [61]:
months.str.contains('|'.join(['05','06','07','08','09','10'])).sum()/len(dates)

0.8653051715915564

In [62]:
len(dates)*.14

344350.86000000004

As we would expect, such overnight stays have heavy seasonality that focus on the summer months.  But the number of stays from Nov. - Apr. are not insignificant.  Furthermore, future inspection would need to account for regional differences in May, Sep., and Oct.  Overall, there is enough spread here that we can provide specific recommendations in our model according to where a host site is located.

<h3>How many people stay?</h3>
Finally, we would like to inspect the variation in how many people use any given reservation.

In [63]:
res19['numberofpeople'][res19['usetype']=='Overnight'].value_counts()

2.0      968812
4.0      410822
6.0      279683
3.0      218532
5.0      160007
          ...  
87.0          1
86.0          1
68.0          1
185.0         1
94.0          1
Name: numberofpeople, Length: 121, dtype: int64

As desired, there is significant variation in the number of guest utilizing our spaces.  This raises the possibility that hosts may focus in on a particular subsection of guests (e.g., couple, families, groups) and, by extension, there is the potential to adjust our recommendations based on all of the above considerations accordingly.