# Web-scrape Austin's 311 Website with Beautiful Soup

__[Austin's 311 Public Dataset](https://data.austintexas.gov/Government/311-Unified-Data/i26j-ai4z)__ has a comprehensive open source summary of 311 reports which provides a wealth of oppourtunities for citizens to dive into and track progress being made on local concerns. Publicizing 311 reports increases transparency, community engagement, and provides oppourtunities for citizens to develop __[public service applications](http://www.austintexas.gov/sites/default/files/images/Communications/Web_Content/Austin_311_Socrata_Fact_Sheet.pdf)__.

However, understanding homeless-related 311 reports poses a unique setback. At the moment (Sept 2017) There isn't an explicit _SR Description_ that encompases homeless-related incidents and _Owning Departements_ that are assigned to these respective tickets vary. The only indicator of a homeless related ticket is determined in the _Issue_ descriptions found on the 311 website. To supplement 311 datasets on Socrata's Open Data Network, a quick solution is to scrape the 311 website.

Scraping the website is split up into two parts; Part 1 entails scraping the landing page and genrating ticket report URL's. Part 2 scrapes the feature components in each ticket report from Part 1. All features are appened to a csv file.


# Part 1: Scrape the landing page

This algorithm finds the ticket numbers from each page on the 311 landing page, generates a custon URL pointing to the ticket reports website and appends it to a csv file called ticketnums.csv

In [6]:
from bs4 import BeautifulSoup
import urllib
import csv
from datetime import datetime  
import pandas as pd


#Identify Website
site = urllib.urlopen('http://311.austintexas.gov/reports?utf8=%E2%9C%93&q=homeless').read()

#HTML Parser
soup = BeautifulSoup(site, "html.parser")

#Identify the table of ticket requests
table = soup.find("tbody").findAll("span", {"class":"activity-timestamp"})

#Extract the ticket number and generate a URL that points to the ticket report
for idx, val in enumerate(table):#split rows
    for x, y in enumerate(val):#split collumns
        unicode_ticket_nums = y.split()[-1:]
        for i in unicode_ticket_nums:
            ticket_nums = i[1:]
            
            #create a unique url that points to the ticket report
            ticket_url = "http://311.austintexas.gov/reports/"+ticket_nums
            print(ticket_url)

            #open a csv file and append the URL to the next row
            with open('ticketurlTEST2.csv', 'a') as csv_file:  
                writer = csv.writer(csv_file)
                writer.writerow([ticket_url])


http://311.austintexas.gov/reports/17-00278917
http://311.austintexas.gov/reports/17-00277186
http://311.austintexas.gov/reports/17-00277163
http://311.austintexas.gov/reports/17-00163036
http://311.austintexas.gov/reports/17-00274887
http://311.austintexas.gov/reports/17-00274451
http://311.austintexas.gov/reports/17-00272363
http://311.austintexas.gov/reports/17-00272949
http://311.austintexas.gov/reports/17-00272815
http://311.austintexas.gov/reports/17-00222028


# Part 2: Scrape the Ticket Report

Scrape individual ticket reports using cleanticketurl.csv and populate 311HomelessScrape.csv with features: Ticket Number, Service Request Title, Description, Address, XY coordinates, Lat-Long coordinates.

In [7]:
with open('ticketurlTEST2.csv','r+') as f:#csv of clean urls
    reader = csv.reader(f)  
    with open('311HomelessScrapeTEST2.csv', 'w+') as csv_file:  
        writer = csv.writer(csv_file)
        for row in reader:#Each row is a separate list
            for url in row:
                site = urllib.urlopen(url).read()#open each url

                #HTML Parser
                soup = BeautifulSoup(site, "html.parser")

                #Ticket Number
                ticket_number_box = soup.find("div", {"id":"report-source"}).strong
                ticket_number = ticket_number_box.text.strip()

                #Service Request Title
                service_request_box = soup.find("div", {"class":"content-head"})
                service_request = service_request_box.text.strip()

                #Description
                description_box = soup.blockquote
                description = description_box.text.strip()

                #Address, XY coordinates, Lat-Long coordinates
                loctab = soup.find(id="location-tab")
                address = loctab.p.next_element.next_element.next_element
                xy_coord = loctab.p.next_element.next_element.next_element.next_element.next_element.next_element.next_element.next_element
                latlong_coord = loctab.p.next_element.next_element.next_element.next_element.next_element.next_element.next_element.next_element.next_element.next_element.next_element.next_element.next_element.next_element.next_element

                rows = []

                row_content = [ticket_number, service_request, description, address, xy_coord, latlong_coord]   

                #Populate csv 311HomelessScrape.csv
                writer.writerow(row_content)