# Capstone Project Introduction and Data

This notebook contains an introduction to the Capstone Data Science project that I have devised, the business problem it intends to address, and a summary of the data that I will use to complete the project.

### Business Problem

I am a keen outdoor enthusiast as are many others, and one of my favourite hobbies is hiking. Whenever I am on holiday I will try to find local hikes which combine the right level of challenge and natural beauty, but doing so based on random internet searches or individual recommendations often makes me wonder if I have made the most of the area I am visiting, or indeed whether I have visited the best area to persue my hobby. 

The problem this project will address is the challenge of selecting an appropriate location for a hiking holiday (and by appropriate I am talking about locations that have access to the right number of hikes, that are of the right quality and difficulty).

![alt text](https://www.backpacker.com/.image/c_limit%2Ccs_srgb%2Cq_auto:good%2Cw_860/MTQ5NTkxODMyNDA4MzY4NjA0/35541767156_8ba52234a0_o.webp "Logo Title Text 1")

### Data

There are two main data sources I will use for this project:
+ **Hiking Project:** This is a website that offers a free-to-use API that will allow me to retrieve data relating to different hiking routes. I have given an example below. The web address is: https://www.hikingproject.com/data
+ **USA counties:** I will use the counties of the USA as potential 'holidaying locations' for this project. Data relating to these is available, with latitude and longitude co-oridnates at: https://en.wikipedia.org/w/index.php?title=User:Michael_J/County_table&oldid=368803236 

### Data retrieval example (hikingproject.com)

The following code sections demonstrate how data relating to hiking trails can be retrieved from Hiking Project.

1. First import libraries:

In [19]:
import requests
import pandas as pd

2. Define API parameters and create url for API request:

In [50]:
lat = 40.0274 # Random US location to test
long = -105.2519 # Random US location to test
maxDistance = 15 # Max distance from defined locations
maxResults = 500 # Max number of results is 500 from this API
key = '200581789-f40ea73ef2b0bac87cf5c431217fc969' # My Hiking Project API key

# create the API request URL
url = 'https://www.hikingproject.com/data/get-trails?lat={}&lon={}&maxDistance={}&maxResults={}&key={}'.format(
    lat, 
    long,
    maxDistance,
    maxResults,
    key)

3. Make API call:

In [51]:
results = requests.get(url).json()["trails"]

4. Check results:

In [52]:
results

[{'id': 7000130,
  'name': 'Bear Peak Out and Back',
  'type': 'Featured Hike',
  'summary': 'A must-do hike for Boulder locals and visitors alike!',
  'difficulty': 'blueBlack',
  'stars': 4.6,
  'starVotes': 108,
  'location': 'Boulder, Colorado',
  'url': 'https://www.hikingproject.com/trail/7000130/bear-peak-out-and-back',
  'imgSqSmall': 'https://cdn-files.apstatic.com/hike/7061992_sqsmall_1566068327.jpg',
  'imgSmall': 'https://cdn-files.apstatic.com/hike/7061992_small_1566068327.jpg',
  'imgSmallMed': 'https://cdn-files.apstatic.com/hike/7061992_smallMed_1566068327.jpg',
  'imgMedium': 'https://cdn-files.apstatic.com/hike/7061992_medium_1566068327.jpg',
  'length': 5.7,
  'ascent': 2541,
  'descent': -2540,
  'high': 8342,
  'low': 6103,
  'longitude': -105.2755,
  'latitude': 39.9787,
  'conditionStatus': 'All Clear',
  'conditionDetails': '',
  'conditionDate': '2019-08-10 16:37:58'},
 {'id': 7004226,
  'name': "Sunshine Lion's Lair Loop",
  'type': 'Featured Hike',
  'summary

5. Loop through results to load a pandas dataframe, and check this has completed successfully:

In [53]:
df = pd.DataFrame(columns=['name', 'type', 'difficulty', 'stars', 'starVotes', 
                           'location', 'length', 'ascent', 'descent', 
                           'high', 'low', 'latitude', 'longitude'])

for v in results:
    # initialize list of lists 
    data = [v['name'], 
            v['type'], 
            v['difficulty'],
            v['stars'],
            v['starVotes'],
            v['location'],
            v['length'], 
            v['ascent'], 
            v['descent'], 
            v['high'],
            v['low'],
            v['latitude'],
            v['longitude']] 
    # Create the pandas DataFrame 
    df_temp = pd.DataFrame([data], columns=['name', 'type', 'difficulty', 'stars', 'starVotes', 'location', 'length', 'ascent', 'descent', 'high', 'low', 'latitude', 'longitude']) 
    # append temp dataframe to main dataframe 
    df = df.append(df_temp).reset_index(drop=True)
    
df

Unnamed: 0,name,type,difficulty,stars,starVotes,location,length,ascent,descent,high,low,latitude,longitude
0,Bear Peak Out and Back,Featured Hike,blueBlack,4.6,108,"Boulder, Colorado",5.7,2541,-2540,8342,6103,39.9787,-105.2755
1,Sunshine Lion's Lair Loop,Featured Hike,blue,4.5,103,"Boulder, Colorado",5.3,1261,-1282,6800,5530,40.0200,-105.2979
2,Boulder Skyline Traverse,Featured Hike,black,4.7,70,"Superior, Colorado",16.3,5409,-5492,8492,5417,39.9388,-105.2582
3,Royal Arch Out and Back,Featured Hike,blueBlack,4.4,145,"Boulder, Colorado",3.3,1311,-1312,6917,5691,39.9997,-105.2830
4,Walker Ranch,Featured Hike,blueBlack,4.5,118,"Coal Creek, Colorado",7.6,1594,-1585,7335,6439,39.9511,-105.3378
5,Green Mountain via Ranger/Saddle Rock Loop,Featured Hike,blueBlack,4.5,76,"Boulder, Colorado",4.9,2305,-2277,8099,5806,39.9975,-105.2928
6,Mount Sanitas Loop,Featured Hike,blueBlack,4.1,102,"Boulder, Colorado",3.2,1281,-1280,6780,5521,40.0202,-105.2977
7,Betasso Preserve,Featured Hike,blue,4.2,58,"Boulder, Colorado",6.7,776,-778,6575,6178,40.0164,-105.3446
8,Mountain Lion Trail,Featured Hike,blue,4.5,50,"Coal Creek, Colorado",6.9,1531,-1509,8901,7627,39.8505,-105.3606
9,Golden Gate Canyon State Park Loop,Featured Hike,blueBlack,4.4,24,"Coal Creek, Colorado",13.2,2232,-2230,9493,8228,39.8330,-105.4084
