# Scraping [Centro API](https://www.centro.org/misc-group/bus-tracker) to Gather Visualization Data:

### Goal:
   my goal for this project is to create a relatively concise function/script to scrape Centro's bustracker api to record geodata and transportation data for buses in the Syracuse, NY area

### Function:

   "centro_scrape" takes 5 arguments:
        timein: the amount of time to wait before calling api again
        url: url to call api (includes key)
        i: counter, set default to 0
        lines: how many lines of data you want scraped
        csv: name of csv

### Results: 

   Function does as asked, can be altered (default i, speed etc) and can be more concise. For my purposes, the function works as required produces a csv that can be easily evaluated in pandas
  
### Further Thoughts:

   Now that I have this information, I would love to begin a visualization in arcgis to view patterns, furthermore it might be interesting to view visualizatons on a normal day versus a blizzard (very appropriate for 'Cuse'!)
    
   Inspiration for visualization is [linked](https://tjukanov.org/gulfoffinland)
    
   Also, I am interested in cleaning up the function that calls multiple vehicles.
   
   
### Scroll to the bottom for a first attempt at animated data visualizations!





In [None]:
#if issues on start, check busmap and active vid in api call

import requests
import csv
import datetime
import calendar
import time
import json



key = '???'
#api key

url = 'https://bus-time.centro.org/bustime/api/v3/getvehicles?key='+key+'&vid=1217&format=json'
#url through which we call api


r = requests.get(url)
#call 

data = r.json()#['bustime-response']['vehicle'][0]
#gives features, look at data_new = r.json to view full json format



values = data['vid'],data['spd'],data['lat'],data['lon'],data['tmstmp']
#now lets put the features im interested into values

values
#check




In [None]:


#Function 
from csv import writer
def centro_scrape(timein,url,i,lines,csv):
#define function with paramters 

    with open(csv, 'a+',newline ='') as f:
        csv_writer = writer(f)
        csv_writer.writerow(['VID','MPH','LAT','LON','TIME'])            
#writes header
    
    while i < lines:
        r = requests.get(url)
        data = r.json()['bustime-response']['vehicle'][0]
        values = data['vid'], str(data['spd']), data['lat'], data['lon'], data['tmstmp']

#stores each feature that i'm interested in from each call into a list
        
        with open(csv, 'a+',newline ='') as f:
            csv_writer = writer(f)
            csv_writer.writerow(list(values))

#write rows 

        time.sleep(timein)
        
#function sleeps before we start again (timein gives sleep value)
        
        print("scrapes completed:",i+1)
        i +=1
#counter

    else:
        print('done')


In [None]:
centro_scrape(5,url,0,20,'filename.csv')

# Below I attempt to scrape multiple busses:

Inefficient method

In [None]:

key = '???'
#api key

busses = '1237,1906,1212,1251,1629'
url = 'https://bus-time.centro.org/bustime/api/v3/getvehicles?key='+key+'&vid=1626,1251,1755,1203,1774&format=json'
#url through which we call api
#note multiple vid in call

r = requests.get(url)
#call 

data = r.json()['bustime-response']['vehicle']
#gives features, look at data_new = r.json to view full json format



In [None]:

from csv import writer
def centro_scrape(timein,url,i,lines,csv):
#define function with paramters 

    with open(csv, 'a+',newline ='') as f:
        csv_writer = writer(f)
        csv_writer.writerow(['VID','MPH','LAT','LON','TIME'])            
#writes header
    
    while i < lines:
        r = requests.get(url)
        bus_one = r.json()['bustime-response']['vehicle'][0]
        bus_two = r.json()['bustime-response']['vehicle'][1]
        bus_three = r.json()['bustime-response']['vehicle'][2]
        bus_four = r.json()['bustime-response']['vehicle'][3]
        
        values_one = i+1,bus_one['vid'], str(bus_one['spd']), bus_one['lat'], bus_one['lon'], bus_one['tmstmp']
        values_two = i+1,bus_two['vid'], str(bus_two['spd']), bus_two['lat'], bus_two['lon'], bus_two['tmstmp']
        values_three = i+1,bus_three['vid'], str(bus_three['spd']), bus_three['lat'], bus_three['lon'], bus_three['tmstmp']
        values_four = i+1,bus_four['vid'], str(bus_four['spd']), bus_four['lat'], bus_four['lon'], bus_four['tmstmp']
        
        
#stores each feature that i'm interested in from each call into a list
        
        with open(csv, 'a+',newline ='') as f:
            csv_writer = writer(f)
            csv_writer.writerow(list(values_one))
            csv_writer.writerow(list(values_two))
            csv_writer.writerow(list(values_three))
            csv_writer.writerow(list(values_four))


#write rows 

        time.sleep(timein)
        
#function sleeps before we start again (timein gives sleep value)
        
        print("scrapes completed:",i+1)
        i +=1
#counter

    else:
        print('done')

In [None]:
centro_scrape(5,url,0,90,'filename.csv')

#use function to scrape multiple bus locations and speed for approx 7.5 mins

In [None]:
import pandas as pd

df = pd.read_csv('filename.csv')
#read data to csv 

In [None]:
df['TIME'] = df['TIME'].astype('datetime64[ns]')

#cast datetime type for QGIS modeling

In [None]:
df.to_csv('for_map.csv')

#reupload 

### A prototype of time series visualization using QGIS:

![here](https://github.com/mrgonzal-SU/Visualizations/blob/master/centro_bus_vis.gif?raw=true)