# Newsflash-Terror-ChartCreator

This script was developed to put a terrorist attack into a first context.

It uses two databases:

* <b>Wikipedia's list of terrorist attacks &nbsp; from 2015 up until now</b> 
* <b>The Global Terrorism Database &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; from 1970 up until 2016 </b> 

<br>
Both databases have their advantages and disadvantages.
* While Wikipedia is constantly updated, it is nowhere near complete. A rough comparison with the Global Terrorism Database shows that Wikipedia only covers a tenth of all attacks in the timespan of 2015 to 2016 that were listed in the Global Terrorism Database. To be clear: Absolute numeric statements are not possible. However, all big events seem to show up. Hence, this scraped database is a viable option for and ONLY for rankings such as:

<b>Right:</b> "This was the 5th deadliest attack of Boko Haram within the last six months"

<b>Wrong:</b>"While there were X attacks in Iraq in March, there were only Y attacks in April."

<br>

* On the other end, the Global Terrorism Database (GTD) is the gold-standard in social science about terrorism.  However, it is only updated once a year with a lag of approximately six months. So for example, as I am writing this script, the next version of the GTD will be published in summer 2018 and it will be only about 2017. Hence, it is <b>NOT POSSIBLE</b> to make statements based on the data such as:
<br>

<b>Right:</b><i> The Iraq has seen an increase of terrorist attacks over the last 20 years </i>

<b>Wrong:</b><i> This is the Xth terrorist attack in Iraq this year.</i>

<br>
Also, please do not try to compare data from the scraper against the GTD. Best thing to do is to not mix up the both data sources for any graphic.

# I. Getting ready

<br>
## Loading Libraries

In [203]:
import re
import os
import csv
import time
import requests
import datetime as dt
from datetime import datetime, timedelta
from bs4 import BeautifulSoup as BS
import pandas as pd
import numpy as np
import glob
pd.options.display.max_columns = 100

## First, we load in the Global Terrorism Database

In [204]:
GTDclean=pd.read_csv("GTDData/GDT_clean_database.csv")
criteria_array=[]
GTDclean["Qdate"]=pd.to_datetime(GTDclean["Qdate"])
GTDclean.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,eventid,Qdate,year,month,day,approxdate,extended,resolution,deaths,injured,casualties,country_id,country,region_id,region,provstate,city,latitude,longitude,specificity,vicinity,location,summary,crit1,crit2,crit3,doubtterr,alternative,alternative_txt,multiple,success,suicide,attacktype1,attacktype1_txt,attacktype2,attacktype2_txt,attacktype3,attacktype3_txt,targtype1,targtype1_txt,targsubtype1,targsubtype1_txt,corp1,target1,natlty1,natlty1_txt,targtype2,targtype2_txt,targsubtype2,targsubtype2_txt,...,weapsubtype1_txt,weaptype2,weaptype2_txt,weapsubtype2,weapsubtype2_txt,weaptype3,weaptype3_txt,weapsubtype3,weapsubtype3_txt,weaptype4,weaptype4_txt,weapsubtype4,weapsubtype4_txt,weapdetail,nkillus,nkillter,nwoundus,nwoundte,property,propextent,propextent_txt,propvalue,propcomment,ishostkid,nhostkid,nhostkidus,nhours,ndays,divert,kidhijcountry,ransom,ransomamt,ransomamtus,ransompaid,ransompaidus,ransomnote,hostkidoutcome,hostkidoutcome_txt,nreleased,addnotes,scite1,scite2,scite3,dbsource,INT_LOG,INT_IDEO,INT_MISC,INT_ANY,related,incidentno
0,197000000001,1970-07-02,1970,7,2,,0,,1.0,0.0,1.0,58,Dominican Republic,2,Central America & Caribbean,,Santo Domingo,18456792,-69951164,1.0,0,,,1,1,1,0,,,0,1,0,1,Assassination,,,,,14,Private Citizens & Property,68.0,Named Civilian,,Julio Guzman,58.0,Dominican Republic,,,,,...,,,,,,,,,,,,,,,,,,,0,,,,,0.0,,,,,,,0.0,,,,,,,,,,,,,PGIS,0,0,0,0,,1
1,197000000002,NaT,1970,0,0,,0,,0.0,0.0,0.0,130,Mexico,1,North America,,Mexico city,19432608,-99133207,1.0,0,,,1,1,1,0,,,0,1,0,6,Hostage Taking (Kidnapping),,,,,7,Government (Diplomatic),45.0,"Diplomatic Personnel (outside of embassy, cons...",Belgian Ambassador Daughter,"Nadine Chaval, daughter",21.0,Belgium,,,,,...,,,,,,,,,,,,,,,,,,,0,,,,,1.0,1.0,0.0,,,,Mexico,1.0,800000.0,,,,,,,,,,,,PGIS,0,1,1,1,,1
2,197001000001,NaT,1970,1,0,,0,,1.0,0.0,1.0,160,Philippines,5,Southeast Asia,Tarlac,Unknown,15478598,120599741,4.0,0,,,1,1,1,0,,,0,1,0,1,Assassination,,,,,10,Journalists & Media,54.0,Radio Journalist/Staff/Facility,Voice of America,Employee,217.0,United States,,,,,...,,,,,,,,,,,,,,,,,,,0,,,,,0.0,,,,,,,0.0,,,,,,,,,,,,,PGIS,-9,-9,1,1,,1
3,197001000002,NaT,1970,1,0,,0,,,,0.0,78,Greece,8,Western Europe,Attica,Athens,37983773,23728157,1.0,0,,,1,1,1,0,,,0,1,0,3,Bombing/Explosion,,,,,7,Government (Diplomatic),46.0,Embassy/Consulate,,U.S. Embassy,217.0,United States,,,,,...,Unknown Explosive Type,,,,,,,,,,,,,Explosive,,,,,1,,,,,0.0,,,,,,,0.0,,,,,,,,,,,,,PGIS,-9,-9,1,1,,1
4,197001000003,NaT,1970,1,0,,0,,,,0.0,101,Japan,4,East Asia,,Fukouka,33580412,130396361,1.0,0,,,1,1,1,-9,,,0,1,0,7,Facility/Infrastructure Attack,,,,,7,Government (Diplomatic),46.0,Embassy/Consulate,,U.S. Consulate,217.0,United States,,,,,...,,,,,,,,,,,,,,Incendiary,,,,,1,,,,,0.0,,,,,,,0.0,,,,,,,,,,,,,PGIS,-9,-9,1,1,,1


## Then the scraper gathers the Wikipedia data and stores them in the folder "ScrapedData"

In [205]:
#cover for 2015 to 2018
localtime = datetime.now().strftime("%Y-%b-%d--%H-%M-%S")
current_month = datetime.now().strftime("%B")
current_year = datetime.now().strftime("%Y")
months_choices=[]
for i in range(1,13):
     months_choices.append((dt.date(2017, i, 1).strftime('%B')))

print(str(current_month))
print(str(current_year))
print("Wait for it...")
stopvariable=0
with open(("ScrapedData/terror_{}.csv").format(localtime), 'w') as resultsfile:
    resultsfileWriter = csv.writer(resultsfile)
    firstrow=("day","month","year","attacktype","deaths","injured","location","details","attacker","context","country","link")
    resultsfileWriter.writerow(firstrow)
    wrongformattedrow=("8","November","2015","Suicide bombing","3","14","Ngouboua, Chad",
               "Two suicide bombers, suspected to be sent by Boko Haram, have detonated themselves in a village on the shores of Lake Chad. 3 people were killed in the blast, including two kids and another 14 were wounded.",
               "Boko Haram (suspected)","Boko Haram insurgency","Chad","https://en.wikipedia.org/wiki/List_of_terrorist_incidents_in_November_2015")
    resultsfileWriter.writerow(wrongformattedrow)

    
    for year in range(2015,(int(current_year)+1)):
        #goes through the list of months that are needed to generate the link
        
        for month in months_choices:
            print("...loading...")
            year=str(year)
            month=str(month)
            #used to prevent that months and years are retrieved that have not passed yet
            #the stopvariable is set at the end of the loops
            if stopvariable==1:
                if month != current_month:
                        break
            
            else:
                print(month+"/"+year)
                #checks if we are at the current year and month
                if year == current_year:
                    if month == current_month:
                        stopvariable=1
                link=("https://en.wikipedia.org/wiki/List_of_terrorist_incidents_in_"+(month)+"_"+(year))
                page = requests.get(link).text
                soup = BS(page, 'html.parser')
                tablesA = soup.find('table', class_="wikitable sortable")
                tablesB = soup.find('table', class_="wikitable")

                if tablesA == []:
                    tables=tablesB
                    print("triggered")
                    if tablesB == []:
                        print("Both empty")
                else:
                    tables=tablesA
                j=0
                rows = tables.find_all('tr')
                for row in rows:
                    i=0
                    cells=(row.find_all("td"))
                    for cell in cells:
                        i+=1
                        if i==1:
                            daycell=cell.text
                            daylist=re.split("–|\+|-|\s", daycell)
                            try:
                                type(int(daylist[0]))
                                day=daylist[0]                                
                            except ValueError:
                                
                                break
                        if i==2:
                            attacktype=cell.text
                        if i==3:
                            deathcell=cell.text
                            deathlist=re.split("–|\+|-|\s|(|)", deathcell)
                            deathlist=filter(None, deathlist)
                            deathlist = [ x for x in deathlist if x.isdigit() ]
                            try:
                                type(int(deathlist[0]))
                                deaths=deathlist[0]
                            except (ValueError, IndexError) as e:
                                deaths=""

                        if i==4:
                            injuredcell=cell.text
                            injuredlist=re.split("–|\+|-|\s", injuredcell)
                            injuredlist=filter(None, injuredlist)
                            injuredlist = [ x for x in injuredlist if x.isdigit() ]

                            try:
                                type(int(injuredlist[0]))
                                injured=injuredlist[0]
                            except (ValueError, IndexError) as e:
                                injured=""

                        if i==5:
                            location=cell.text
                            loclist = location.split(", ")
                            country=loclist[-1]
                            locationspec=loclist[0]
                        if i==6:
                            details=cell.text
                        if i==7:
                            attacker=cell.text
                            attacker=attacker.replace("\n","")
                        if i==8:
                            context=cell.text
                        
                    if cells!=[]:
                        output_list=(day,month,year,attacktype,deaths,injured,locationspec,details,attacker,context,country, link)
                        resultsfileWriter.writerow(output_list)

                        
terror=pd.read_csv(("ScrapedData/terror_{}.csv").format(localtime))
terror["attacker"]=terror["attacker"].astype(str)
terror["country"]=terror["country"].astype(str)
terror["context"]=terror["context"].astype(str)
terror["attacktype"]=terror["attacktype"].astype(str)

terror["month"]=pd.to_datetime(terror["month"], format='%B')
terror["month"]=terror["month"].dt.month
terror=terror.sort_values(by=["year","month","day"])
terror=terror.reset_index(drop=True)
terror["Qdate"]=terror["year"].astype(str)+"-"+terror["month"].astype(str).str.zfill(2)+"-"+terror["day"].astype(str).str.zfill(2)
terror["Qdate"]=pd.to_datetime(terror["Qdate"])
terror["casualties"]=terror.fillna(0)['deaths']+terror.fillna(0)['injured']
terror=terror[['day', 'month', 'year', "Qdate" ,'attacktype',"deaths","injured","casualties","country","location","details", "attacker","context", "link"]]
terror.to_csv(("ScrapedData/terror_{}.csv").format(localtime))
terrorWiki=terror
terrorWiki.head()

March
2018
Wait for it...
...loading...
January/2015


  return _compile(pattern, flags).split(string, maxsplit)


...loading...
February/2015
...loading...
March/2015
...loading...
April/2015
...loading...
May/2015
...loading...
June/2015
...loading...
July/2015
...loading...
August/2015
...loading...
September/2015
...loading...
October/2015
...loading...
November/2015
...loading...
December/2015
...loading...
January/2016
...loading...
February/2016
...loading...
March/2016
...loading...
April/2016
...loading...
May/2016
...loading...
June/2016
...loading...
July/2016
...loading...
August/2016
...loading...
September/2016
...loading...
October/2016
...loading...
November/2016
...loading...
December/2016
...loading...
January/2017
...loading...
February/2017
...loading...
March/2017
...loading...
April/2017
...loading...
May/2017
...loading...
June/2017
...loading...
July/2017
...loading...
August/2017
...loading...
September/2017
...loading...
October/2017
...loading...
November/2017
...loading...
December/2017
...loading...
January/2018
...loading...
February/2018
...loading...
March/2018
...lo

Unnamed: 0,day,month,year,Qdate,attacktype,deaths,injured,casualties,country,location,details,attacker,context,link
0,1,1,2015,2015-01-01,Shooting,15.0,10.0,25.0,Cameroon,Waza Region,A group of militants attacked a crowded bus tr...,Boko Haram (suspected),Boko Haram insurgency,https://en.wikipedia.org/wiki/List_of_terroris...
1,3,1,2015,2015-01-03,Attack,1.0,3.0,4.0,Philippines,Maguindanao and Sultan Kudarat,An undetermined number of Bangsamoro Islamic F...,BIFF,Moro conflict,https://en.wikipedia.org/wiki/List_of_terroris...
2,3,1,2015,2015-01-03,Shooting,3.0,2.0,5.0,Iraq,Basra Governorate,Shooting in Basra Governorate left 5 people de...,Unknown,Iraqi Civil War,https://en.wikipedia.org/wiki/List_of_terroris...
3,3,1,2015,2015-01-03,Melee attack,1.0,0.0,1.0,Tunisia,El Fahs,A Tunisian policeman was ambushed by a group o...,Islamist militants,Insurgency in the Maghreb,https://en.wikipedia.org/wiki/List_of_terroris...
4,3,1,2015,2015-01-03,Massacre,2000.0,,2000.0,Nigeria,Baga,2015 Baga massacre: Boko Haram militants opene...,Boko Haram,Boko Haram insurgency,https://en.wikipedia.org/wiki/List_of_terroris...


#  II. PUTTTING IN YOUR SEARCH CRITERIA AND START THE FILTERING
##  Find the country where the attack happened
Now, we can filter both databases by the country where the most recent attack has happened. As we use two databases the name of the country might differ between the two databases.

Hence, below this text block four code-blocks follow. The first two are for the Wikipedia-data. The two after that for the Global Terrorism Database. The principle for both duplets of code-blocks is the same.

One lists up all the countries available in the database. Simply copy the name from the country that you are interested from this list and post it into the brackets of the following code-block.

Just like:

<i>countryfilterGTD=["Afghanistan"]</i>

and

<i>countryfilterWiki=["Afghanistan"]</i>

<b>OR</b>

<i>countryfilterGTD=["Israel","Palestine"]</i>

and
<i>countryfilterWiki=["Israel","Israel."]</i>


In case the data is messy and has several different ways of spelling a countries name or you want to look at more than one country at once.

## List up for the Global Terrorism Database and bracket to filter

In [206]:
countryrlistGTD=GTDclean["country"].unique().tolist()
countryrlistGTD.sort()
for i in countryrlistGTD:
    print(i)

Afghanistan
Albania
Algeria
Andorra
Angola
Antigua and Barbuda
Argentina
Armenia
Australia
Austria
Azerbaijan
Bahamas
Bahrain
Bangladesh
Barbados
Belarus
Belgium
Belize
Benin
Bhutan
Bolivia
Bosnia-Herzegovina
Botswana
Brazil
Brunei
Bulgaria
Burkina Faso
Burundi
Cambodia
Cameroon
Canada
Central African Republic
Chad
Chile
China
Colombia
Comoros
Costa Rica
Croatia
Cuba
Cyprus
Czech Republic
Czechoslovakia
Democratic Republic of the Congo
Denmark
Djibouti
Dominica
Dominican Republic
East Germany (GDR)
East Timor
Ecuador
Egypt
El Salvador
Equatorial Guinea
Eritrea
Estonia
Ethiopia
Falkland Islands
Fiji
Finland
France
French Guiana
French Polynesia
Gabon
Gambia
Georgia
Germany
Ghana
Greece
Grenada
Guadeloupe
Guatemala
Guinea
Guinea-Bissau
Guyana
Haiti
Honduras
Hong Kong
Hungary
Iceland
India
Indonesia
International
Iran
Iraq
Ireland
Israel
Italy
Ivory Coast
Jamaica
Japan
Jordan
Kazakhstan
Kenya
Kosovo
Kuwait
Kyrgyzstan
Laos
Latvia
Lebanon
Lesotho
Liberia
Libya
Lithuania
Luxembourg
Macau
Mac

# Select the country here!!!

In [207]:
countryfilterGTD=["Afghanistan"]

## List up for the Wikipedia data, followed by the bracket to set the filter

In [208]:
countrylistWiki=terrorWiki["country"].unique().tolist()
countrylistWiki.sort()
for i in countrylistWiki:
    print(i)

Abkhazia
Afghanistan
Alau.Nigeria
Algeria
Angola
Argentina
Armenia
Australia
Austria
Azerbaijan
Baghdad Iraq
Bahrain
Bangkok
Bangladesh
Belarus
Belfast Northern Ireland
Belgium
Bolivia
Border Region Niger
Bosnia and Herzegovina
Burkina Faso
Burundi
Cameroon
Canada
Central African Republic
Chad
Chile
China
Colombia
Concepción Paraguay
Crimea
Dagestan
Damascus Syria
Deir ez-Zor Syria
Democratic Republic of Congo
Democratic Republic of the Congo
Denmark
East Jerusalem
East Jerusalem,
Ecuador
Egypt
Egypt.
El-Baraf Somalia
Ethiopia
Finland
France
Gabon
Gaza Strip
Georgia
Germany
Greece
Guatemala
Haiti
Hebron
Honduras
Hungary
India
Indonesia
Iran
Iraq
Israel
Israel.
Italy
Ivory Coast
Japan
Jerusalem
Jordan
Kazakhstan
Kenya
Khost Province
Kibirizi Democratic Republic of Congo
Kosovo
Kurdistan
Kuwait
Kyrgyzstan
Lahj Governorate Yemen
Laos
Lebanon
Libya
Luqa Malta
Madagascar
Malaysia
Mali
Mexico
Mozambique
Muradiye Turkey
Myanmar
Nepal
Netherlands
Niger
Niger-Nigeria border
Nigeria
Northern Ire

# And select the country here!!!

In [209]:
countryfilterWiki=["Afghanistan"]


# Additionally, you have to define:
* How many days we want to look back. Per default, the value is 365, so an entire year
* The German name of the country that you are interested in
* The details of the attack that has just happened

In [210]:
daystogoback=30
GermanName="Afghanistan"

In [211]:
adddayofthemonth=19
addmonth=3
addyear=2018
addattacktype="bla"
adddeaths=12
addinjured=2
addcountry="Afghanistan"
addcity="Kandahar"
adddetails="blub"
addattacker="Taliban"
addcontext="Afghanistan War"

## Now we start filtering both databases for the country of interest


In [212]:
subframe=GTDclean
criteria_txt_array=[]
criteria_array=[]


if countryfilterGTD[0]!="XYZ":
    subframe=subframe[subframe["country"].isin(countryfilterGTD)]
    sub_array=[]
    sub_array.append("for the country(ies):")
    sub_array.extend(countryfilterGTD)
    criteria_txt_array.append(sub_array)
    criteria_array.append("country")

subframe=subframe.sort_values(by="Qdate", ascending=False)
terrorGTD=subframe.reset_index(drop=True)
terrorGTD.head()

Unnamed: 0,eventid,Qdate,year,month,day,approxdate,extended,resolution,deaths,injured,casualties,country_id,country,region_id,region,provstate,city,latitude,longitude,specificity,vicinity,location,summary,crit1,crit2,crit3,doubtterr,alternative,alternative_txt,multiple,success,suicide,attacktype1,attacktype1_txt,attacktype2,attacktype2_txt,attacktype3,attacktype3_txt,targtype1,targtype1_txt,targsubtype1,targsubtype1_txt,corp1,target1,natlty1,natlty1_txt,targtype2,targtype2_txt,targsubtype2,targsubtype2_txt,...,weapsubtype1_txt,weaptype2,weaptype2_txt,weapsubtype2,weapsubtype2_txt,weaptype3,weaptype3_txt,weapsubtype3,weapsubtype3_txt,weaptype4,weaptype4_txt,weapsubtype4,weapsubtype4_txt,weapdetail,nkillus,nkillter,nwoundus,nwoundte,property,propextent,propextent_txt,propvalue,propcomment,ishostkid,nhostkid,nhostkidus,nhours,ndays,divert,kidhijcountry,ransom,ransomamt,ransomamtus,ransompaid,ransompaidus,ransomnote,hostkidoutcome,hostkidoutcome_txt,nreleased,addnotes,scite1,scite2,scite3,dbsource,INT_LOG,INT_IDEO,INT_MISC,INT_ANY,related,incidentno
0,201612310039,2016-12-31,2016,12,31,2016-12-31 00:00:00,0,,1.0,6.0,7.0,4,Afghanistan,6,South Asia,Helmand,Lashkar Gah,31583664,64368699,1.0,0,,12/31/2016: An explosive device detonated in L...,1,1,1,0,,,0,1,0,3,Bombing/Explosion,,,,,14,Private Citizens & Property,67.0,Unnamed Civilian/Unspecified,Not Applicable,Civilians,4.0,Afghanistan,,,,,...,Land Mine,,,,,,,,,,,,,,0.0,0.0,0.0,0.0,-9,,,,,0.0,,,,,,,,,,,,,,,,,"""Program Summary: Afghanistan-Pul-e Alam Zinat...",,,START Primary Collection,-9,-9,0,-9,,1
1,201612310022,2016-12-31,2016,12,31,2016-12-31 00:00:00,0,,0.0,1.0,1.0,4,Afghanistan,6,South Asia,Farah,Bagh Kafi,32372459,62120092,2.0,0,,12/31/2016: Assailants shot and injured an Afg...,1,1,1,0,,,0,1,0,2,Armed Assault,,,,,3,Police,25.0,Police Security Forces/Officers,Afghan Local Police (ALP),Deputy Head: Mohammad Esa,4.0,Afghanistan,,,,,...,Unknown Gun Type,,,,,,,,,,,,,,0.0,0.0,0.0,0.0,0,,,,,0.0,,,,,,,,,,,,,,,,,"""Farah ALP officer wounded in gun attack,"" Paj...",,,START Primary Collection,-9,-9,0,-9,,1
2,201612310015,2016-12-31,2016,12,31,,0,,2.0,4.0,6.0,4,Afghanistan,6,South Asia,Kabul,Kabul,34533342,69078022,1.0,0,The incident occurred in the Qambar Square are...,12/31/2016: An explosive device attached to a ...,1,1,1,0,,,0,1,0,3,Bombing/Explosion,,,,,14,Private Citizens & Property,73.0,Vehicles/Transportation,Not Applicable,Vehicle,4.0,Afghanistan,,,,,...,Sticky Bomb,,,,,,,,,,,,,A magnetic bomb was used in the attack.,0.0,0.0,0.0,0.0,1,3.0,Minor (likely < $1 million),-99.0,A vehicle was damaged in this attack.,0.0,,,,,,,,,,,,,,,,The victims included Nazia. Casualty numbers c...,"""Wounded bride: All my dreams shattered,"" Pajh...","""Highlights: Pakistan Pashto Press 1 January D...",,START Primary Collection,-9,-9,0,-9,,1
3,201612310008,2016-12-31,2016,12,31,,0,,,1.0,1.0,4,Afghanistan,6,South Asia,Faryab,Band-e Abgardan,35842058,64539948,2.0,0,The incident occurred in Almar district.,12/31/2016: Assailants attacked Band-e Abgarda...,1,1,1,0,,,1,1,0,2,Armed Assault,,,,,14,Private Citizens & Property,75.0,Village/City/Town/Suburb,Band-e Abgardan Village,Village,4.0,Afghanistan,,,,,...,Unknown Gun Type,,,,,,,,,,,,,,0.0,,0.0,,-9,,,,,0.0,,,,,,,,,,,,,,,,Casualty numbers represent a division of the t...,"""Afghan forces foiled Taliban group attack on ...",,,START Primary Collection,0,0,0,0,"201612310006, 201612310007, 201612310008",1
4,201612310007,2016-12-31,2016,12,31,,0,,,1.0,1.0,4,Afghanistan,6,South Asia,Faryab,Bukhari Qala,35856925,64506511,1.0,0,The incident occurred in Almar district.,12/31/2016: Assailants attacked Bukhari Qala v...,1,1,1,0,,,1,1,0,2,Armed Assault,,,,,14,Private Citizens & Property,75.0,Village/City/Town/Suburb,Bukhari Qala Village,Village,4.0,Afghanistan,,,,,...,Unknown Gun Type,,,,,,,,,,,,,,0.0,,0.0,,-9,,,,,0.0,,,,,,,,,,,,,,,,Casualty numbers represent a division of the t...,"""Afghan forces foiled Taliban group attack on ...",,,START Primary Collection,0,0,0,0,"201612310006, 201612310007, 201612310008",1


In [213]:
timeback=(datetime.now())- timedelta(days=daystogoback)
print(timeback)
terror["Qdate"]=terror["Qdate"].astype(str).astype('datetime64[ns]')
subframe=terror[terror["Qdate"]>timeback]
subframe.reset_index(drop=True)
if countryfilterWiki[0]!="XYZ":
    subframe=subframe[subframe["country"].isin(countryfilterWiki)]
subframe=subframe.sort_values(by="Qdate", ascending=False)
terrorWiki_red=subframe.reset_index(drop=True)
terrorWiki_red.tail()

2018-02-17 17:39:34.522133


Unnamed: 0,day,month,year,Qdate,attacktype,deaths,injured,casualties,country,location,details,attacker,context,link
33,19,2,2018,2018-02-19,Shootings,24.0,0.0,24.0,Afghanistan,Farah Province,At least 24 policemen were killed in western F...,Taliban,War in Afghanistan,https://en.wikipedia.org/wiki/List_of_terroris...
34,19,2,2018,2018-02-19,Shooting,5.0,6.0,11.0,Afghanistan,Nad Ali District,Five police officers were killed and six other...,Taliban,War in Afghanistan,https://en.wikipedia.org/wiki/List_of_terroris...
35,18,2,2018,2018-02-18,Shooting,1.0,0.0,1.0,Afghanistan,Sharana,"In Paktika province, a policeman was killed af...",Taliban,War in Afghanistan,https://en.wikipedia.org/wiki/List_of_terroris...
36,18,2,2018,2018-02-18,Shooting,1.0,3.0,4.0,Afghanistan,Dih Yak District,"In Ghazni province, Taliban insurgents stormed...",Taliban,War in Afghanistan,https://en.wikipedia.org/wiki/List_of_terroris...
37,18,2,2018,2018-02-18,Bombing,3.0,0.0,3.0,Afghanistan,Shib Koh District,"Three civilians, including a woman, were kille...",Taliban (suspected),War in Afghanistan,https://en.wikipedia.org/wiki/List_of_terroris...


## For the Wikipedia database, we check if most recent incident is included, it should be at the top
This is the data for your context. But most likely without the attack that you want to report on is not included.  So check, if the most recent event is already listed in the dataframe up there.

If so, just keep or set the variable "dayofthemonth" in the codeblock below to 0.

If not, add the different variables in the codeblock below. Numbers for each variable that goes without quotes. Otherwise, if there are " "-signs, write words fitting to the variable name listed in between the quotes.

So for example
dayofthemonth=<i>12</i>
    
attacktype=<i>"bombing"</i>

In [214]:
addedframe=terrorWiki_red
addedframe["mostrecent"]=0
if dayofthemonth != 0:
    extraQdate=str(addyear)+"-"+str(addmonth).zfill(2)+"-"+str(adddayofthemonth).zfill(2)
    addextraQdate=datetime.strptime(extraQdate, '%Y-%m-%d')
    addlocation=city
    addcasualties=deaths+injured
    addlink="This is our incident"
    extraarray=[adddayofthemonth,addmonth,addyear, addextraQdate, addattacktype,adddeaths,addinjured,addcasualties,addcountry,addlocation,adddetails,addattacker,addcontext,addlink,1]
    addedframe.loc[-1] = extraarray  # adding a row
    addedframe.index = addedframe.index + 1  # shifting index
    addedframe = addedframe.sort_index()  # sorting by index
addedframe["Qdate"]=pd.to_datetime(addedframe["Qdate"])
addedframe

Unnamed: 0,day,month,year,Qdate,attacktype,deaths,injured,casualties,country,location,details,attacker,context,link,mostrecent
0,19,3,2018,2018-03-19,bla,12.0,2.0,10,Afghanistan,Kandahar,blub,Taliban,Afghanistan War,This is our incident,1
1,18,3,2018,2018-03-18,"Grenade attack, attempted suicide bombing",0.0,6.0,6,Afghanistan,Kabul,An attacker dressed in a school uniform set of...,Islamic State (suspected),War in Afghanistan,https://en.wikipedia.org/wiki/List_of_terroris...,0
2,17,3,2018,2018-03-17,Shooting,5.0,4.0,9,Afghanistan,Ghazni Province,Five police personnel were killed and four oth...,Taliban,War in Afghanistan,https://en.wikipedia.org/wiki/List_of_terroris...,0
3,17,3,2018,2018-03-17,Bombing,1.0,5.0,6,Afghanistan,Ghor Province,A roadside bomb killed a young shepherd and wo...,Taliban,War in Afghanistan,https://en.wikipedia.org/wiki/List_of_terroris...,0
4,17,3,2018,2018-03-17,Bombing,2.0,17.0,19,Afghanistan,Nadir Shah Kot District,Two children were killed and 17 others injured...,Taliban (suspected),War in Afghanistan,https://en.wikipedia.org/wiki/List_of_terroris...,0
5,17,3,2018,2018-03-17,Suicide car bombing,3.0,4.0,7,Afghanistan,Kabul,At least three people were killed and four oth...,Taliban,War in Afghanistan,https://en.wikipedia.org/wiki/List_of_terroris...,0
6,14,3,2018,2018-03-14,Suicide car bombing,2.0,3.0,5,Afghanistan,Nad Ali District,At least two Afghan police personnel were kill...,Taliban,War in Afghanistan,https://en.wikipedia.org/wiki/List_of_terroris...,0
7,14,3,2018,2018-03-14,Shooting,7.0,7.0,14,Afghanistan,Farah,At least seven security personnel were killed ...,Taliban,War in Afghanistan,https://en.wikipedia.org/wiki/List_of_terroris...,0
8,12,3,2018,2018-03-12,Shooting,5.0,2.0,7,Afghanistan,Farah,Five police officers were killed and two other...,Taliban,War in Afghanistan,https://en.wikipedia.org/wiki/List_of_terroris...,0
9,12,3,2018,2018-03-12,Bombing,0.0,2.0,2,Afghanistan,Sirkanay District,Two police officers were injured in a donkey b...,Taliban (suspected),War in Afghanistan,https://en.wikipedia.org/wiki/List_of_terroris...,0


## Next, we dump the filtered data into the folder "filtered_datasets"
The script gives you four different csv-files then in the filtered_datasets-folder.
1. "top5_incidents_sortedby_deaths.csv": Attacks within the last x days, sorted by number of deaths.
2. "top5_incidents_sortedby_date.csv": Your top 5 Attacks ordered chronologically.
3. "GTD_incidents_sortedby_date.csv": Attacks chronologically ordered with all background information.
4. "GTD_days_summed_sortedby_date.csv": Days with attacks. The number of deaths, injured and casualties is summed for each day. Days are listed in chronological order.

In [215]:
sortcriteria="deaths"
topx=5
sortedframe=addedframe.sort_values(by=sortcriteria, ascending=False)
sortedframe=sortedframe.reset_index(drop=True)
rankedframe=sortedframe.head(topx)
if any(rankedframe.mostrecent == 1):
    rankedframe.to_csv(("filtered_datasets/scraper_recentattack_intop{}_incidents_sortedby_{}.csv").format(topx,sortcriteria), index=False)
    chronframe=rankedframe.sort_values(by="Qdate", ascending=False)
    chronframe.to_csv(("filtered_datasets/scraper_recentattack_intop{}_incidents_sortedby_date.csv").format(topx), index=False)
else:
    mostrecentevent=sortedframe[sortedframe["mostrecent"]==1]
    rankedframe=pd.concat([rankedframe,mostrecentevent])
    rankedframe.to_csv(("filtered_datasets/scraper_recentattack_vs_top{}_incidents_sortedby_{}.csv").format(topx,sortcriteria), index=False)
    chronframe=rankedframe.sort_values(by="Qdate", ascending=False)
    chronframe.to_csv(("filtered_datasets/scraper_recentattack_vs_top{}_incidents_sortedby_date.csv").format(topx), index=False)

rankedframe

Unnamed: 0,day,month,year,Qdate,attacktype,deaths,injured,casualties,country,location,details,attacker,context,link,mostrecent
0,23,2,2018,2018-02-23,Shooting,25.0,0.0,25,Afghanistan,Bala Buluk District,25 Afghan army members died after Taliban mili...,Taliban,War in Afghanistan,https://en.wikipedia.org/wiki/List_of_terroris...,0
1,19,2,2018,2018-02-19,Shootings,24.0,0.0,24,Afghanistan,Farah Province,At least 24 policemen were killed in western F...,Taliban,War in Afghanistan,https://en.wikipedia.org/wiki/List_of_terroris...,0
2,9,3,2018,2018-03-09,Shooting,24.0,,24,Afghanistan,Bala Buluk District,At least 24 members of the Afghan security for...,Taliban,War in Afghanistan,https://en.wikipedia.org/wiki/List_of_terroris...,0
3,8,3,2018,2018-03-08,Shooting,17.0,13.0,30,Afghanistan,Khwaja Ghar District,"At least seventeen security forces, including ...",Taliban,War in Afghanistan,https://en.wikipedia.org/wiki/List_of_terroris...,0
4,19,3,2018,2018-03-19,bla,12.0,2.0,10,Afghanistan,Kandahar,blub,Taliban,Afghanistan War,This is our incident,1


In [216]:
terrorGTD.to_csv("filtered_datasets/GTD_incidents_sortedby_date.csv", index=False)


In [217]:
terrorGTD.to_csv("filtered_datasets/GTD_incidents_sortedby_date.csv", index=False)
criteria_array.append("Qdate")
sumoverdaysGTD=terrorGTD.groupby(criteria_array).sum()
sumoverdaysGTD=sumoverdaysGTD.reset_index()
sumoverdaysGTD=sumoverdaysGTD.sort_values(by="Qdate", ascending=True)
sumoverdaysGTD=sumoverdaysGTD.reset_index()
criteria_array.extend(["deaths","injured","casualties","incidentno"])
sumoverdaysGTD=sumoverdaysGTD[criteria_array]
sumoverdaysGTD.to_csv("filtered_datasets/GTD_days_summed_sortedby_date.csv", index=False)
sumoverdaysGTD.head()

Unnamed: 0,country,Qdate,deaths,injured,casualties,incidentno
0,Afghanistan,1973-05-01,0.0,1.0,1.0,1
1,Afghanistan,1979-02-14,1.0,0.0,1.0,1
2,Afghanistan,1979-08-27,50.0,0.0,50.0,1
3,Afghanistan,1979-09-09,2.0,1.0,3.0,1
4,Afghanistan,1987-05-31,0.0,2.0,2.0,1


# III. Preparing the data to just copy & paste it into our five prepared Q-graphics

## 1) Data For Isotype Ranking

Now we want to prepare the data for the isotype-ranking shown below, which lists up the five incidents with the most deaths including the most recent event or the five incidents with the most deaths PLUS the most recent event.


[This is the link to the chart template](https://q-playground.st.nzz.ch/item/39f4ce5dea12c793cf9e33cdaf87e761)


![title](Screenshots/isotype1.png)

In [218]:
import datetime as dt
sortcriteria="deaths"
mostrecentdf=rankedframe[rankedframe["mostrecent"]==1]
mostrecentdf=mostrecentdf[["Qdate",sortcriteria,"location"]]
newcolumname="recent_"+str(sortcriteria)
mostrecentdf=mostrecentdf.rename(columns={sortcriteria: newcolumname})
contextdf=rankedframe[rankedframe["mostrecent"]==0]
contextdf=contextdf[["Qdate",sortcriteria,"location"]]
contextdf=contextdf.sort_values(by=sortcriteria, ascending=False)
isotypedf=pd.concat([mostrecentdf,contextdf])
isotypedf=isotypedf[["Qdate","location",sortcriteria,newcolumname]]
isotypedf=isotypedf.reset_index(drop=True)
isotypedf['Qdate']=isotypedf['Qdate'].astype(str).astype('datetime64[ns]').dt.strftime(' (am %d.%m.%Y)')
isotypedf["Label"]=isotypedf['location'].astype(str) + isotypedf['Qdate']
isotypedf=isotypedf[["Label","deaths","recent_deaths"]]
isotypedf.to_csv("chartdata/1isotype.csv", index=False)
isotypedf

Unnamed: 0,Label,deaths,recent_deaths
0,Kandahar (am 19.03.2018),,12.0
1,Bala Buluk District (am 23.02.2018),25.0,
2,Farah Province (am 19.02.2018),24.0,
3,Bala Buluk District (am 09.03.2018),24.0,
4,Khwaja Ghar District (am 08.03.2018),17.0,


## 2) Data For Timeline

Now we want to prepare the data for a timeline as shown below, which shows when the five deadliest attacks happened.

[This is the link to the chart template](https://q-playground.st.nzz.ch/item/6294a4200868c7fb37b1cd796ac6875b)



![title](Screenshots/Timeline.png)

In [219]:
mostrecentdf=rankedframe[rankedframe["mostrecent"]==1]
mostrecentdf["mostrecent"]="Aktuelle Attacke"
mostrecentdf=mostrecentdf[["Qdate","deaths","location","mostrecent"]]
contextdf=rankedframe[rankedframe["mostrecent"]==0]
contextdf["mostrecent"]="Vorherige Attacken"
contextdf=contextdf[["Qdate","deaths","location","mostrecent"]]
contextdf=contextdf.sort_values(by="Qdate",ascending=False)
timebackdate=addextraQdate - timedelta(days=daystogoback)
startdf=pd.DataFrame(data=[[timebackdate,0,"","Startdatum"]],columns=["Qdate","deaths","location","mostrecent"])

timeline=pd.concat([mostrecentdf,contextdf,startdf])

timeline=timeline[["Qdate","deaths","mostrecent","location"]]
timeline.to_csv("chartdata/2timeline.csv", index=False)
timeline

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """


Unnamed: 0,Qdate,deaths,mostrecent,location
4,2018-03-19,12.0,Aktuelle Attacke,Kandahar
2,2018-03-09,24.0,Vorherige Attacken,Bala Buluk District
3,2018-03-08,17.0,Vorherige Attacken,Khwaja Ghar District
0,2018-02-23,25.0,Vorherige Attacken,Bala Buluk District
1,2018-02-19,24.0,Vorherige Attacken,Farah Province
0,2018-02-17,0.0,Startdatum,


## 3) Stacked Bars along X-Axis as Time

Now we want to prepare the data for a stacked bar-chart timeline as shown below, which shows how the number of deaths and injured has changed over the years for the cases that we are interested in. 

[This is the link to the chart template](https://q-playground.st.nzz.ch/item/7c3adc1beb537ab7c73fe0b0cc4d1e29)

![title](Screenshots/StackedBar.png)

In [220]:
stackedbars=terrorGTD[["year","deaths","injured"]]
stackedbars=stackedbars.groupby("year").sum()
stackedbars=stackedbars.reset_index()
for year in range(1970, 2017):
    if (stackedbars["year"]==year).any():
        pass
    else:
        stackedbars=stackedbars.append({"year":year,"deaths":0,"injured":0}, ignore_index=True)
stackedbars=stackedbars.sort_values(by="year")   
stackedbars=stackedbars.reset_index(drop=True)
stackedbars.to_csv("chartdata/3stackedbars.csv", index=False)
stackedbars.head()

Unnamed: 0,year,deaths,injured
0,1970,0.0,0.0
1,1971,0.0,0.0
2,1972,0.0,0.0
3,1973,0.0,1.0
4,1974,0.0,0.0


## 4)  Bars along X-Axis as Time

Now we want to prepare the data for a bar-chart timeline as shown below, which shows the number of attacks over the years for the cases that we are interested in.

[This is the link to the chart template](https://q-playground.st.nzz.ch/item/6294a4200868c7fb37b1cd796a5d5ac5)
![title](Screenshots/IncidentsProYear.png)

In [221]:
barchart=terrorGTD[["year","incidentno"]]
barchart=barchart.groupby("year").sum()
barchart=barchart.reset_index()
for year in range(1970, 2017):
    if (barchart["year"]==year).any():
        pass
    else:
        barchart=barchart.append({"year":year,"incidentno":0}, ignore_index=True)
barchart=barchart.sort_values(by="year")   
barchart=barchart.reset_index(drop=True)
barchart.to_csv("chartdata/4barchart.csv", index=False)
barchart.head()

Unnamed: 0,year,incidentno
0,1970,0
1,1971,0
2,1972,0
3,1973,1
4,1974,0


# 5) Lines / Grouped Bar Chart /Small Multiples 
Here, the data is prepared for a comparison betwee the country of interest and the country in the region with the highest number of deaths. The same CSV can be used for 
* 5a) [Lines Comparison Template](https://q-playground.st.nzz.ch/item/be0222c276c3186c7b4bbc773a2afac2)
* 5b) [Small Multiples Comparison Template](https://q-playground.st.nzz.ch/item/5d6ccfb7837c146c07735192f71b70f4)

![title](Screenshots/SmallMultiples.png)

In [222]:
countryofinterest=countryfilterGTD[0]
smallmultiplesdfREG=pd.read_csv("GTDData/smallmultiplesdfREG.csv")
medianpercountry=pd.read_csv("GTDData/medianpercountry_deaths.csv")
compareregion=(medianpercountry[medianpercountry["country"]==countryofinterest]["region"]).iloc[0]
regionalcomparison=smallmultiplesdfREG[(smallmultiplesdfREG["region"]==compareregion) & (smallmultiplesdfREG["country"]!=countryofinterest)]
regionalcomparison=regionalcomparison.sort_values(by="deaths", ascending=False)
worstcasecomp=regionalcomparison["country"].iloc[0]
countrydf=smallmultiplesdfREG[smallmultiplesdfREG["country"]==countryofinterest]
countrydf=countrydf.sort_values(by="year").reset_index(drop=True).rename(columns={"deaths":countryofinterest})
comparedcountrydf=smallmultiplesdfREG[smallmultiplesdfREG["country"]==worstcasecomp]
comparedcountrydf=comparedcountrydf.sort_values(by="year").reset_index(drop=True).rename(columns={"deaths":worstcasecomp})
smallmultipledf1on1=pd.concat([countrydf[["year",countryofinterest]],comparedcountrydf[worstcasecomp]],axis=1)
smallmultipledf1on1.to_csv("ChartData/5TwoCountriesComparison.csv", index=False)
smallmultipledf1on1.head()

Unnamed: 0,year,Afghanistan,Pakistan
0,1970,0.0,1.0
1,1971,0.0,0.0
2,1972,0.0,0.0
3,1973,0.0,0.0
4,1974,0.0,0.0


# 6) Stripplot

This part creates a stripplot that allows to compare the country of interest against three countries in the world with high, medium and low number of attacks in the last year.

[This is the link to the Stripplot Template](https://q-playground.st.nzz.ch/item/be0222c276c3186c7b4bbc773a3b9d3e)
![title](Screenshots/Stripplot.png)

In [223]:
GTDclean=GTDclean.sort_values(by="year", ascending=False)
latestyear=GTDclean["year"].iloc[0]
latestyeardf=GTDclean[GTDclean["year"]==latestyear]
stripplotdf=latestyeardf[(latestyeardf["country"]==countryofinterest)|(latestyeardf["country"]=="Iraq")|(latestyeardf["country"]=="France")|(latestyeardf["country"]=="Turkey")]
stripplotdf["order"]=0
stripplotdf.loc[stripplotdf['country']==countryofinterest, "order"]=1
stripplotdf.loc[stripplotdf['country']=="Iraq", "order"]=2
stripplotdf.loc[stripplotdf['country']=="Turkey", "order"]=3
stripplotdf.loc[stripplotdf['country']=="France", "order"]=4
stripplotdf.loc[stripplotdf['country']=="Iraq", "country"]="Irak"
stripplotdf.loc[stripplotdf['country']=="Turkey", "country"]="Türkei"
stripplotdf.loc[stripplotdf['country']=="France", "country"]="Frankreich"
stripplotdf.loc[stripplotdf['country']=="countryofinterest", "country"]=GermanName

stripplotdf=stripplotdf[["Qdate","country","order"]]
stripplotdf=stripplotdf.sort_values(by="Qdate")
stripplotdf.to_csv("ChartData/6stripplot.csv",index=False, encoding="utf-8")
stripplotdf

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s


Unnamed: 0,Qdate,country,order
132562,2016-01-01,Afghanistan,1
132571,2016-01-01,Afghanistan,1
132567,2016-01-01,Irak,2
132563,2016-01-01,Türkei,3
132558,2016-01-01,Irak,2
132557,2016-01-01,Irak,2
132559,2016-01-01,Irak,2
132589,2016-01-02,Irak,2
132594,2016-01-02,Irak,2
132588,2016-01-02,Irak,2
