# Introduction
What's the last school shooting you remember? If you're on top of your news, then you'll know it was the Pleasantville, New Jersey shooting, a little over a week ago (from the time this was written). 
Do you remember the details, though?
Probably not.

What about the shooting the day right before Pleasantville? Or the two in October?
The problem is clear: school shootings are a regular occurrence and it's hard to keep track of them all, let alone the details of each one.
In fact, according to Wikipedia, there were 64 US school shootings from 2000-2009, 87 from 2010-2014, and 113 from 2015-November 2019. 
If these numbers seem lower than you expected, remember that these are ONLY school shootings, not mass shootings. The numbers there are staggering: 2,138 since 2013, roughly one a day. The number changes a bit depending on how you define a mass shooting, but either way the point remains: the number of shootings that occur in the U.S. is unacceptably high.

For this project we want to focus on school shootings, as the definition of school shooting is much more clearly defined and can be analyzed easier with a smaller timeframe. Our goal is to find causes of the spike in shootings, especially in the past decade.
It's important to note that none of us want to push any sort of policy or idea or agenda here: we want to stay as neutral as possible.

        
# Installation


# References
https://en.wikipedia.org/wiki/List_of_school_shootings_in_the_United_States
https://en.wikipedia.org/wiki/Mass_shootings_in_the_United_States
https://en.wikipedia.org/wiki/List_of_school_shootings_in_the_United_States_(before_2000)#20th_century
https://en.wikipedia.org/wiki/List_of_unsuccessful_attacks_related_to_schools


## Beginning our Analysis
Our goal is to figure out significant school shootings by getting compiled data from Wikipedia and recording certain characteristics about that specific event. With those characteristics, we want to scrape various data sources, such as Fox News, CNN, etc, and see if characteristics of the shooting affect the amount that it is reported among various source.
We recognize that because we have a very small dataset that we may not be able to come up with any significant conclusions, but we believe that it is a start to making steps towards doing so in identifying these reporting patterns.

In [107]:
import pandas as pd
import requests as rq
import numpy as np
import re
from datetime import datetime
from bs4 import BeautifulSoup


In [96]:
def makeDF(header, body):
    sol = pd.DataFrame.from_records(body)
    sol.columns = header
    return sol

def parseWiki(url):
    school_shooting_url = rq.get(url).text
    soup = BeautifulSoup(school_shooting_url)
    soup.prettify()
    # sortable wikitable is what wikipedia has their tables called
    tables = soup.find_all("table",{"class":"sortable wikitable"})
    header = []
    body = []
    # wikipedia separates by year chunks so we iterate through
    for table in tables:
        for row in table.find_all("tr"):
            temp = []
            if not header:
                for h in row.find_all("th"):
                    header.append(h.get_text().rstrip())
            for col in row.find_all("td"):
                cur = col.get_text()
                # want to get rid of wiki references
                cur = re.sub(r'\[.*\]', '', cur)
                cur = cur.replace("\n", " ")
                noline = cur.rstrip()
                temp.append(noline)
            body.append(temp)
    # the first is the header, so we take it out
    return header, body[1:]

header, body = parseWiki("https://en.wikipedia.org/wiki/List_of_school_shootings_in_the_United_States")
makeDF(header, body)


Unnamed: 0,Date,Location,Deaths,Injuries,Description
0,"February 29, 2000","Flint, Michigan",1,0,Shooting of Kayla Rolland: At Buell Elementary...
1,"May 26, 2000","Lake Worth, Florida",1,0,"13-year-old honor student, Nathaniel Brazill, ..."
2,"June 28, 2000","Seattle, Washington",2,0,58-year-old Director of the Division of Pathol...
3,"August 28, 2000","Fayetteville, Arkansas",2,0,"36-year-old James Easton Kelly, a PhD candidat..."
4,"September 26, 2000","New Orleans, Louisiana",0,2,"13 year-olds Darrel Johnson, and Alfred Anders..."
5,"December 1, 2000","San Diego, California",0,1,A 15-year-old Junipero Serra High School stude...
6,"March 5, 2001","Santee, California",2,13,Santana High School shooting: 15-year-old stud...
7,"March 7, 2001","Williamsport, Pennsylvania",0,1,"14-year-old student, Elizabeth Catherine Bush,..."
8,"May 16, 2001","Parkland, Washington",2,0,40-year-old music instructor and organist Jame...
9,"March 22, 2001","El Cajon, California",0,5,"18-year-old former student, Jason Hoffman, ope..."


### Using APIs to get news sources and attempt to parse data
Now that we have the wikipedia data parsed, we want to set up the data scraping from various sources. We will be first using google news and webscraping articles that are at a certain date and do some more analysis. 

In [118]:
def parseResp(**params):
    url = "https://www.google.com/search?q={query}&tbs=cdr%3A1%2Ccd_min%3A{month}%2F{from_day}%2F{year}%2Ccd_max%3A{month}%2F{to_day}%2F{year}"
    url = "https://www.google.com/search?pz=1&cf=all&ned=us&hl=en&tbm=nws&gl=us&as_q={query}&as_occt=any&as_drrb=b&as_mindate={month}%2F%{from_day}%2F{year}&as_maxdate={month}%2F{to_day}%2F{year}&tbs=cdr%3A1%2Ccd_min%3A3%2F1%2F13%2Ccd_max%3A3%2F2%2F13&authuser=0"
    response = rq.get(url.format(**params))
    return response

# df: the dataframe that has date, location, injuries, deaths, description of shooting
def googlenews(df):
    for i, row in df.iterrows():
        date = datetime.strptime(row['Date'], '%B %d, %Y')
        resp = parseResp(query = "test", month = date.month, from_day = date.day, to_day = date.day + 1, year = date.year)
        print(resp.json())

googlenews(makeDF(header, body))


AttributeError: 'bytes' object has no attribute 'json'