### Making Main: How to ping an API, make a Distance Matrix, and call both at once
Today, we are going to work through the data collection method for this project, step by step. In order, here are the steps we are looking to accomplish:
-  Find the distance between two locations
-  Repeat for every possible route in our list
-  Save the collected data in matrix format
-  Cache the data for later
-  Make sure that we can load the cache

Let's start with the import statements:

In [4]:
# Import statements
import json # Used to grab certain parts of the collected data, load key
import requests # How we actually contact the API
import pandas as pd # Excel but for python
import numpy as np # Set up for later math
import os # Navigating the file system

# Open API Key - For telling Google we are indeed allowed to do this stuff
with open("../Keys.json", "r") as data:
    Keys = json.load(data)
key = Keys['googleMaps']

Now that we have all the required packages, we can start to do the actual programming. First step, define a function that goes and gets the data from Google Maps' directions API. An API is sort of like a library. If you go and ask the librarian for a specific book, they'll give it to you. With an API, the librarian is our ```requests``` module. The librarian in a real library can help you best if you give it the dewey decimal associated with the book. The dewey decimal for us is a url, which we change based on what information we want. Let's define a function that will get the data for us:

In [14]:
# Comments within the function represent what steps need to be taken for this function
def RouteCaller(loc1:str, loc2:str, units:str='metric', routeType:str='Bicycle', key:str=None):
    # print(f'Getting data for {loc1} to {loc2}')
    # Verify Key exists
    if key == None:
        print('Error: You do not have an API key, and cannot access the Library')
        return
    # Ping the API
    if loc1 != loc2:
        returnValue = requests.get(
            f"https://maps.googleapis.com/maps/api/directions/json?destination={loc1}&origin={loc2}&units={units}&mode={routeType}&key={key}").json()
        # Return only the distance of the route
        return float(returnValue['routes'][0]['legs'][0]['distance']['text'][:-3])
    else:
        return

Now that we have a function that calls the API, let's test it and see what we get! The results for these should be 1.2 and 0. respectively

In [7]:
# Result in 1.2
test1 = RouteCaller('305+E+23rd+St+Austin+TX+78712', '300+W+21st+St+Austin+TX+78712', key=key)
print(test1)
# Result in 0.
test2 = RouteCaller('305+E+23rd+St+Austin+TX+78712', '305+E+23rd+St+Austin+TX+78712',key=key)
print(test2)

Getting data for 305+E+23rd+St+Austin+TX+78712 to 300+W+21st+St+Austin+TX+78712
2.0
Getting data for 305+E+23rd+St+Austin+TX+78712 to 305+E+23rd+St+Austin+TX+78712
None


### This function is great, but what if I have a bunch of places to go to?
Great question Stephen! The first part of answering this question is to ask another question- **How do we store where we want to go?** This is where we begin to use Pandas. Pandas is pretty much just google sheets, but in Python. Instead of a spreadsheet, we use this fancy thing called a ```DataFrame```.  ```DataFrames``` are pretty cool, and can read data from .csv files super easy. We can start by importing our data.

To access data, we use a ```with open('name of file', 'mode') as ____:``` statement. What this does is only keep the file open as long as we need to. If we keep it open, then it can start to mess with the memory of our computer until killed. The ```'name of file'``` argument is just giving python directions on where to find the info. Next, we have ```mode```, which tells python what we are going to be doing to this file. The two main ```modes``` we can use are ```"r"```, read mode, or ```"w"```, write mode. Finally, we need to give the file a temporary name, which goes in the blank spot. This is just how we will refer to it when grabbing information. A common name to give it in this application is ```data```. Let's try this out: 

In [8]:
# For ease of use, lets give our file a generic name so we can access it later
FILENAME = 'AddressesTest'

with open(f"../Data/{FILENAME}.csv", "r") as data: # f"" is the same as "".format(), but cleaner
    df = pd.read_csv(data)
    # Make the addresses API friendly 
    # (API does not like the spaces or commas. I have already removed the commas but not the spaces from our test)
    df = df.replace(' ', '+', regex=True)
print(df)

  Name                                            Address
0  RLP                      305+E+23rd+St+Austin+TX+78712
1  HRC                      300+W+21st+St+Austin+TX+78712
2  CNS  Will+C.Hogg+Bldg,+120+Inner+Campus+Drive,+Aust...
3  BUR              2505+University+Ave,+Austin,+TX+78712
4  CPE             200+E+Dean+Keeton+St,+Austin,+TX+78712


Now we have the data in an easy location to manipulate. We can begin writing a function that iterates over the data and spits out the distances we are looking for, since we know what format the data is in now.

In [9]:
def generateDistanceMatrix(df:pd.DataFrame, key:str=None):
    # Set up local storage
    distDict = {}
    arrays = []
    # Iterate through the DataFrame
    for index1, row1 in df.iterrows():
        distDict[index1] = []
        for index2, row2 in df.iterrows():
            if index1 == index2:
                # If same location, we already know the distance, just append 0
                distDict[index1].append(0)
            else:
                # Ping the API using the function we just made
                RouteDistance = RouteCaller(df.iloc[index1]['Address'], df.iloc[index2]['Address'], key=key)
                distDict[index1].append(RouteDistance)
    # Process the data into a form we want
    print(distDict)
    for key in distDict.keys():
        arrays.append(np.array(distDict[key]))
    print(arrays)
    distMatrix = np.vstack(arrays)
    # Return
    return distMatrix

Again, let's test our function to make sure that we are getting what we want:

In [15]:
test3 = generateDistanceMatrix(df, key)
print(type(test3))

{0: [0, 2.0, 2.3, 1.1, 0.9], 1: [1.8, 0, 1.5, 1.0, 1.3], 2: [1.7, 1.0, 0, 0.8, 1.0], 3: [1.2, 0.9, 1.2, 0, 0.5], 4: [0.9, 1.3, 1.6, 0.4, 0]}
[array([0. , 2. , 2.3, 1.1, 0.9]), array([1.8, 0. , 1.5, 1. , 1.3]), array([1.7, 1. , 0. , 0.8, 1. ]), array([1.2, 0.9, 1.2, 0. , 0.5]), array([0.9, 1.3, 1.6, 0.4, 0. ])]
<class 'numpy.ndarray'>


### Awesome! We have a distance matrix all set up, so we're done right?

Not quite! But we're really close. If we were to have a stupid big list of addresses, which we might, this API call stuff will just eat time every time we want to run the problem. So now, we need to do this thing called ```caching```. ```Caching``` is just the term for saving data. We're going to put all of our saved data into a ```.npy``` file, so that we can quickly pull it up later when we want to. The way you save to a ```.npy``` file is pretty simple:

In [16]:
# Since we already have test3 setup as a Distance Matrix, we can just use it 
np.save(FILENAME,test3)

### But, how do we access it now that it's stuck in that file? And if we have a different data set, how do we make go get that data and save it as well?

The last thing we have to do for this part is called **validation**. It's just a fancy word for checking that our data exists, and if it doesn't, make it. Also, we want to store all the cached data in the same place, so let's also make a folder for that too.

In [17]:
def validateCache(filename:str,df:np.ndarray,key:str=None):
    # Define where the cache should be
    cache_filename = os.path.splitext(filename)[0] + ".npy"
    # Check for if the parent folder exists, if not make one
    if not os.path.exists(os.path.join(os.path.dirname(__file__),"CachedDistances")):
        os.mkdir(os.path.join(os.path.dirname(__file__),"CachedDistances"))
    # Check if the distance matrix doesn't exist, if so make one
    if not os.path.exists(os.path.join(os.path.dirname(__file__),"CachedDistances",cache_filename)):
        distMatrix = generateDistanceMatrix(df, key)
        np.save(os.path.join(os.path.dirname(__file__),"CachedDistances",cache_filename),distMatrix)
    # If the matrix exists, just load it
    else:
        distMatrix = np.load(os.path.join(os.path.dirname(__file__), "CachedDistances", cache_filename))
    # Return the matrix
    return distMatrix

validateCache('AddressesTest',df=df, key=key)



'''
adsfadsf
afdsadsf
'''

NameError: name '__file__' is not defined

### And that's pretty much it.

There's some other fancy stuff that we can do in a ```.py``` file, which we can get into later. Next up, we can start to turn our eyes on formulating the constraints...