# Scraping [Centro API](https://www.centro.org/misc-group/bus-tracker) to Gather Visualization Data:

### Goal:
   my goal for this project is to create a relatively concise function/script to scrape Centro's bustracker api to record geodata and transportation data for buses in the Syracuse, NY area

### Function:

   "centro_scrape" takes 5 arguments:
        timein: the amount of time to wait before calling api again
        url: url to call api (includes key)
        i: counter, set default to 0
        lines: how many lines of data you want scraped
        csv: name of csv

### Results: 

   Function does as asked, can be altered (default i, speed etc) and can be more concise. For my purposes, the function works as required produces a csv that can be easily evaluated in pandas
  
### Further Thoughts:

   Now that I have this information, I would love to begin a visualization in arcgis to view patterns, furthermore it might be interesting to view visualizatons on a normal day versus a blizzard (very appropriate for 'Cuse'!)
    Inspiration for visualization is [linked](https://tjukanov.org/gulfoffinland)
    
   Next I want to develop this function to be scalable in that i can scrape information for MULTIPLE buses(different VID's) 
    
   This alteration would require modifications in the URL i.e. the api call.
    
   





In [185]:
#if issues on start, check busmap and active vid in api call

import requests
import csv
import datetime
import calendar
import time
import json



key = '6ZVpUKhRXYJp2tqhdMNCsKUVM'
#api key

url = 'https://bus-time.centro.org/bustime/api/v3/getvehicles?key='+key+'&vid=1217&format=json'
#url through which we call api


r = requests.get(url)
#call 

data = r.json()#['bustime-response']['vehicle'][0]
#gives features, look at data_new = r.json to view full json format



values = data['vid'],data['spd'],data['lat'],data['lon'],data['tmstmp']
#now lets put the features im interested into values

values
#check




KeyError: 'vid'

In [139]:


#Function 


from csv import writer
def centro_scrape(timein,url,i,lines,csv):
#define function with paramters 

    with open(csv, 'a+',newline ='') as f:
        csv_writer = writer(f)
        csv_writer.writerow(['VID','MPH','LAT','LON','TIME'])            
#writes header
    
    while i < lines:
        r = requests.get(url)
        data = r.json()['bustime-response']['vehicle'][0]
        values = data['vid'], str(data['spd']), data['lat'], data['lon'], data['tmstmp']

#stores each feature that i'm interested in from each call into a list
        
        with open(csv, 'a+',newline ='') as f:
            csv_writer = writer(f)
            csv_writer.writerow(list(values))

#write rows 

        time.sleep(timein)
        
#function sleeps before we start again (timein gives sleep value)
        
        print("scrapes completed:",i+1)
        i +=1
#counter

    else:
        print('done')


In [141]:
centro_scrape(5,url,0,20,'try_me.csv')

scrapes completed: 1
scrapes completed: 2
scrapes completed: 3
scrapes completed: 4
scrapes completed: 5
scrapes completed: 6
scrapes completed: 7
scrapes completed: 8
scrapes completed: 9
scrapes completed: 10
scrapes completed: 11
scrapes completed: 12
scrapes completed: 13
scrapes completed: 14
scrapes completed: 15
scrapes completed: 16
scrapes completed: 17
scrapes completed: 18
scrapes completed: 19
scrapes completed: 20
done


In [157]:
import pandas as pd

df = pd.read_csv('try_me.csv')
df['TIME']

0     20200210 21:11
1     20200210 21:11
2     20200210 21:11
3     20200210 21:11
4     20200210 21:11
5     20200210 21:11
6     20200210 21:11
7     20200210 21:11
8     20200210 21:11
9     20200210 21:12
10    20200210 21:12
11    20200210 21:12
12    20200210 21:12
13    20200210 21:12
14    20200210 21:12
15    20200210 21:12
16    20200210 21:12
17    20200210 21:12
18    20200210 21:12
19    20200210 21:12
Name: TIME, dtype: object

In [153]:

df.to_csv('use_me.csv')

newp = pd.read_csv('use_me.csv')


Unnamed: 0.1,Unnamed: 0,MPH,LAT,LON,TIME
0,0,24,43.013712,-76.117752,20200210 21:11
1,1,19,43.011478,-76.1168,20200210 21:11
2,2,19,43.011478,-76.1168,20200210 21:11
3,3,11,43.010921,-76.116261,20200210 21:11
4,4,11,43.010921,-76.116261,20200210 21:11
5,5,11,43.010921,-76.116261,20200210 21:11
6,6,11,43.010921,-76.116261,20200210 21:11
7,7,11,43.010921,-76.116261,20200210 21:11
8,8,11,43.010921,-76.116261,20200210 21:11
9,9,0,43.010877,-76.116194,20200210 21:12


# Below I attempt to scrape multiple busses:

Inefficient method

In [186]:

key = '6ZVpUKhRXYJp2tqhdMNCsKUVM'
#api key

busses = '1237,1906,1212,1251,1629'
url = 'https://bus-time.centro.org/bustime/api/v3/getvehicles?key='+key+'&vid=1626,1251,1755,1203,1774&format=json'
#url through which we call api


r = requests.get(url)
#call 

data = r.json()['bustime-response']['vehicle']
#gives features, look at data_new = r.json to view full json format



In [187]:

from csv import writer
def centro_scrape(timein,url,i,lines,csv):
#define function with paramters 

    with open(csv, 'a+',newline ='') as f:
        csv_writer = writer(f)
        csv_writer.writerow(['VID','MPH','LAT','LON','TIME'])            
#writes header
    
    while i < lines:
        r = requests.get(url)
        bus_one = r.json()['bustime-response']['vehicle'][0]
        bus_two = r.json()['bustime-response']['vehicle'][1]
        bus_three = r.json()['bustime-response']['vehicle'][2]
        bus_four = r.json()['bustime-response']['vehicle'][3]
        
        values_one = i+1,bus_one['vid'], str(bus_one['spd']), bus_one['lat'], bus_one['lon'], bus_one['tmstmp']
        values_two = i+1,bus_two['vid'], str(bus_two['spd']), bus_two['lat'], bus_two['lon'], bus_two['tmstmp']
        values_three = i+1,bus_three['vid'], str(bus_three['spd']), bus_three['lat'], bus_three['lon'], bus_three['tmstmp']
        values_four = i+1,bus_four['vid'], str(bus_four['spd']), bus_four['lat'], bus_four['lon'], bus_four['tmstmp']
        
        
#stores each feature that i'm interested in from each call into a list
        
        with open(csv, 'a+',newline ='') as f:
            csv_writer = writer(f)
            csv_writer.writerow(list(values_one))
            csv_writer.writerow(list(values_two))
            csv_writer.writerow(list(values_three))
            csv_writer.writerow(list(values_four))


#write rows 

        time.sleep(timein)
        
#function sleeps before we start again (timein gives sleep value)
        
        print("scrapes completed:",i+1)
        i +=1
#counter

    else:
        print('done')

In [189]:
centro_scrape(5,url,0,90,'qgis_centro2')

scrapes completed: 1
scrapes completed: 2
scrapes completed: 3
scrapes completed: 4
scrapes completed: 5
scrapes completed: 6
scrapes completed: 7
scrapes completed: 8
scrapes completed: 9
scrapes completed: 10
scrapes completed: 11
scrapes completed: 12
scrapes completed: 13
scrapes completed: 14
scrapes completed: 15
scrapes completed: 16
scrapes completed: 17
scrapes completed: 18
scrapes completed: 19
scrapes completed: 20
scrapes completed: 21
scrapes completed: 22
scrapes completed: 23
scrapes completed: 24
scrapes completed: 25
scrapes completed: 26
scrapes completed: 27
scrapes completed: 28
scrapes completed: 29
scrapes completed: 30
scrapes completed: 31
scrapes completed: 32
scrapes completed: 33
scrapes completed: 34
scrapes completed: 35
scrapes completed: 36
scrapes completed: 37
scrapes completed: 38
scrapes completed: 39
scrapes completed: 40
scrapes completed: 41
scrapes completed: 42
scrapes completed: 43
scrapes completed: 44
scrapes completed: 45
scrapes completed: 

SSLError: HTTPSConnectionPool(host='bus-time.centro.org', port=443): Max retries exceeded with url: /bustime/api/v3/getvehicles?key=6ZVpUKhRXYJp2tqhdMNCsKUVM&vid=1626,1251,1755,1203,1774&format=json (Caused by SSLError(SSLZeroReturnError(6, 'TLS/SSL connection has been closed (EOF) (_ssl.c:841)'),))

In [190]:
import pandas as pd

df = pd.read_csv('qgis_centro2')

In [192]:
df['TIME'] = df['TIME'].astype('datetime64[ns]')

In [194]:
df.head()

Unnamed: 0,VID,MPH,LAT,LON,TIME
1,1774,20,43.005692,-76.168037,2020-02-10 22:36:00
1,1755,15,43.039398,-76.131599,2020-02-10 22:36:00
1,1251,18,43.01925,-76.123019,2020-02-10 22:36:00
1,1626,8,43.018654,-76.118112,2020-02-10 22:35:00
2,1774,19,43.006011,-76.166081,2020-02-10 22:36:00


In [195]:
df.to_csv('for_map.csv')