# Assignment 1 - Automated News Writing

This skeleton notebook offers some helper functions and some example template writing code that you may find helpful to producing and testing your automated story writing algorithm. 

The data for this assignment can be [downloaded here](https://github.com/comp-journalism/UMD-J479V-J779V-Spring2017/raw/master/Data/fatal-police-shootings-2015-16.csv). It contains fatal police shootings tracked by the Washington Post, but has been filtered to include only data from 2015 and 2016. More context on the original dataset can be found [here](https://github.com/washingtonpost/data-police-shootings) and on the methodology [here](https://www.washingtonpost.com/national/how-the-washington-post-is-examining-police-shootings-in-the-united-states/2016/07/07/d9c52238-43ad-11e6-8856-f26de2537a9d_story.html?utm_term=.9583e14255cc#comments)

The data includes the following fields: 
- id
- name
- date (in format YYYY-MM-DD)
- manner_of_death (i.e., "shot")
- armed (i.e., unarmed or what type of weapon)
- age
- gender ("M" or "F")
- race ("A" = asian; "W" = white; "H" = hispanic; "B" = black; "N" = native american; "O" = other race)
- city
- state
- signs_of_mental_illness (was person fatally shot showing signs of mental illness, True or False)
- threat_level ("attack", "other", "undetermined")
- flee ("Not fleeing", "Car", "Foot", "Other")
- body_camera (was police office wearing a body camera, True or False)


### Getting Started with Jinja
Jinja is a template engine that can be used to take your data and writing structure and output various text files. Often it is used to generate HTML pages for the web, but here we'll use it to generate textual news stories. 

You should read the [Jinja documentation on templates](http://jinja.pocoo.org/docs/dev/templates/) before getting started. The next cells show some basic examples to get you thinking in the right direction. 

In [2]:
import pandas as pd
import jinja2 as jj
import numpy as np
from datetime import datetime

In [119]:
# A template is a string that when rendered by Jinja outputs some text. 
# In the next template the double curly brackets indicate there is a variable that will get substituted there. 
template = jj.Template("Hello {{ variable }}.")

# To actually render the template we do this, which will output the final text. See how it substituted the value of the variable?
print template.render(variable="World")
    
# I could pass a different value of the variable as a parameter
print template.render(variable="Professor Diakopoulos")

Hello World.
Hello Professor Diakopoulos.


**Filters & Synsets**  
Jinja also has the concept of filters, which modify the variables before they are rendered. They are indicated with the pipe symbol "|" followed by the name of the filter in the template string. The set of possible filters is listed [here in the documentation](http://jinja.pocoo.org/docs/dev/templates/#builtin-filters). Filters that may be useful for this assignment include:
- round (for rounding off numbers)
- random (for randomly selecting a value from a list variable)
- title (for converting to title case, e.g. for the start of a sentence)

In [120]:
# To apply a filter that lower cases the text in the variable I can add "| lower" to the template. See how even though the variable is in capital case, the filter converts it to lowercase when rendered?
template = jj.Template("Hello {{ variable | lower }}.")
print template.render(variable="Professor Diakopoulos")

Hello professor diakopoulos.


In [121]:
# We could use a filter to randomly select from a synset that we author. Note that each time you run this code it may output a different random selection from the synset. 
synset = ["Prof", "Dr.", "Professor"]
template = jj.Template("Hello {{ variable_title | random }} {{ variable_name }}.")
print template.render(variable_title=synset, variable_name="Diakopoulos")

Hello Prof Diakopoulos.


**Conditionals **  
You'll most likely need "if" statements in order to output different types of text based on different values.

In [122]:
# Note the template string is spread across multiple lines only to make it easier to read. As a result the "\" character is added to the end of a line so that python knows the same variable continues on to the next line. To make a multi-line string we use a triple quote at the beginning and ending of the string. 
template_string = """Today there was a \
{% if earthquake_size <= 3.0 %}\
small\
{% elif earthquake_size <= 5 and earthquake_size > 3.0 %}\
medium\
{% else %}\
large\
{% endif %}\
 size earthquake. \n
It had a magnitude of {{ earthquake_size }}."""

template = jj.Template(template_string)
print template.render(earthquake_size = 4.0)

Today there was a medium size earthquake. 

It had a magnitude of 4.0.


In [4]:
df = pd.read_csv("Datasets/fatal-police-shootings-2015-16.csv")
row_num = 100 # change this value to test with a different row of data
row_as_dict = df.iloc[row_num].to_dict()

In [7]:
from decimal import *

df_racegrouped=df.groupby("race").size()

df_race=df['race'].replace('', np.nan)
df_race.dropna(inplace=True)

total_race=len(df_race)
white_race=df_racegrouped["W"]
black_race=df_racegrouped["B"]

white_percent=float("{0:.2f}".format(white_race/Decimal(total_race)*100))
black_percent=float("{0:.2f}".format(black_race/Decimal(total_race)*100))


In [8]:
count_15=len(df[(df['age'] >= 25) & (df['age'] <= 50)])
count_total=len(df)


mean_age=float("{0:.2f}".format(df["age"].mean()))
ratio_age= count_15/Decimal(count_total)*100
ratio_age=float("{0:.2f}".format(ratio_age))


In [9]:
df_stategrouped=df.groupby("state").size()
max_state=df_stategrouped.argmax()
max_count=df_stategrouped.max()

### Test Function
The following `write_story` function takes as a parameter a row of data from a dataframe. For ease of testing (and grading), implement the `write_story` function so that it returns the story you've created the given row of data. 

In [53]:
def write_story(row):
    day = datetime.strptime(row["date"], '%Y-%m-%d').strftime('%A')
    
    line1 = jj.Template("{% if gender_var == 'M' %}MAN {% else %}WOMAN {% endif %}\
KILLED IN A FATAL ASSAULT WITH A POLICE OFFICER IN {{city_var|upper}}, {{state_var|upper}}.\n\
POSTED: {{date_var}}, By Shashank Kava.\n\
A {% if gender_var == 'M' %}man{% else %}woman{% endif %} \
has died following a \
{{ day_var }} morning \
{% if arms_var == 'gun' %}shootout{% else %}assault{% endif %} \
between the police and the suspect in {{state_var|upper}}, {{city_var}}. \
The suspect has been identified by civilians in the {{vic_var|random}} as {{name_var}}, a {{age_var|int}} year old \
{%if race_var=='A'%}asian{%elif race_var=='W'%}white{%elif race_var=='H'%}hispanic{%elif race_var=='B'%}black{%elif race_var=='N'%}native american{%else%}of unknown race{%endif%} \
and died before any medical help could arrive at the crime scene.\n\
{% if bcam_var == False %}\
Witnesses present at the crime scene have confirmed that the suspect was \
\
{%if arms_var=='unarmed'%}unarmed{%elif arms_var=='vehicle'%}unarmed\
{%elif arms_var=='undetermined'%}unarmed{%elif arms_var=='unknown weapon'%}unarmed\
{%elif arms_var==''%}unarmed{% else %}armed with a {{arms_var}}{% endif %}.\
\
{% else %}\
The footage from the officer's body camera confirmed that the suspect was \
\
{%if arms_var=='unarmed'%}unarmed{%elif arms_var=='vehicle'%}unarmed\
{%elif arms_var=='undetermined'%}unarmed{%elif arms_var=='unknown weapon'%}unarmed\
{%elif arms_var==''%}unarmed{% else %}armed with a {{arms_var|lower}}{% endif %}.\
\
{% endif %} \
The witnesses also confirmed that the identified suspect was \
{% if flee_var == 'Not fleeing' %}\
not trying to flee the crime scene\
{%elif flee_var=='Other'%}\
not trying to flee the crime scene\
{%elif flee_var==''%}\
not trying to flee the crime scene\
{% else %}\
trying to flee the crime scene by {{flee_var|lower}}\
{% endif %} when the shots were fired. \
Further analysis of the incident based on \
{%if bcam_var == False %}\
testimonies of the witnesses\
{% else %}\
the body camera's footage\
{% endif %} \
helped the police department of {{state_var|upper}} conclude that the suspect \
{%if attack_var == 'attack'%}\
{{att_var|random}}\
{% else %}\
did not attack\
{% endif %} the police office before being shot to death. \
Based on medical reports obtained from the health center in the city of {{city_var}}, it was also proven that the suspect was \
{%if mental_var == False %}\
\
{% else %}\
not \
{% endif %}mentally stable. \n\
The Washington Post has managed to maintain a compiling database of every fatal shooting in the United States by a \
police officer in the line of duty since Jan. 1, 2015. \
According to this data, {{white_var}} \
percent of fatal police shootings were white, while {{black_var}} percent were black.\n\
Statistics show that the average age of civilians killed in the United States in fatal shootings is {{mean_age_var}}. \
The police department in the state of {{max_state_var|upper}} is looking meticulously into these statistics as the \
count ({{max_count_var}}) of people killed in fatal shootings is maximum for this state. \
Washington Post found that civilians between the ages of 25 and 50 formed {{ratio_age_var}} percent of the total \
population and are more likely to be killed by police than any other demographic.\
"
                        )
    
    return line1.render(day_var=day,gender_var=row["gender"],arms_var=row["armed"],state_var=row["state"],city_var=row["city"],
                        name_var=row["name"],age_var=row["age"],race_var=row["race"],date_var=row["date"],flee_var=row["flee"],
                        attack_var=row["threat_level"],mental_var=row["signs_of_mental_illness"],bcam_var=row["body_camera"],
                        white_var=white_percent,black_var=black_percent,mean_age_var=mean_age,max_state_var=max_state,
                        max_count_var=max_count,ratio_age_var=ratio_age,vic_var=synset_vicinity,att_var=synset_attack)


df = pd.read_csv("Datasets/fatal-police-shootings-2015-16.csv")

synset_vicinity = ["vicinity", "neighborhood", "locality"]
synset_attack = ["attacked","assaulted",]


row_num = 3 # change this value to test with a different row of data
row_as_dict = df.iloc[row_num].to_dict()

story_str=write_story(row_as_dict)

print story_str

with open("temp2.txt","w") as f:
      f.write(story_str)

MAN KILLED IN A FATAL ASSAULT WITH A POLICE OFFICER IN SAN FRANCISCO, CA.
POSTED: 2015-01-04, By Shashank Kava.
A man has died following a Sunday morning assault between the police and the suspect in CA, San Francisco. The suspect has been identified by civilians in the neighborhood as Matthew Hoffman, a 32 year old white and died before any medical help could arrive at the crime scene.
Witnesses present at the crime scene have confirmed that the suspect was armed with a toy weapon. The witnesses also confirmed that the identified suspect was not trying to flee the crime scene when the shots were fired. Further analysis of the incident based on testimonies of the witnesses helped the police department of CA conclude that the suspect assaulted the police office before being shot to death. Based on medical reports obtained from the health center in the city of San Francisco, it was also proven that the suspect was notmentally stable. 
The Washington Post has managed to maintain a compili