This is a Python-based notebook to present the new change variable created by Subject to Change. It presents a CSV with the change information, along with some introductory plots.  
This serves as an easy-to-access distribution point for the project, the full details of which can be found in the accompanying publication and code repository.

In [51]:
import pandas as pd
import numpy as nd
import matplotlib.pyplot as plt
from matplotlib import cm
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
from plotly.subplots import make_subplots
pio.renderers.default = "plotly_mimetype+notebook"

## Data

The following blocks of code import a dataset with the modeled group-years. It also imports a dataset that maps group actors and their regions of operation.  
Several actors have multiple regions of operation, which we manage downstream.

In [3]:
group_years = pd.read_csv("./data/group_years_regions.csv")

print(group_years.shape)
print(group_years.columns)

(2180, 22)
Index(['Unnamed: 0', 'ucdp_name', 'ucdp_dset_id', 'year', 'modeledarticles',
       'countT1', 'countT2', 'propT1', 'propT2', 'propdiff', 'propdif.L1',
       'propdif.L2', 'delta1', 'delta1.5', 'delta2', 'delta1_L2', 'gap25',
       'gap50', 'counter', 'frexWords', 'side_b_dset_id', 'region'],
      dtype='object')


In [4]:
## Data Description and Introduction

**NAME** (ucdp_name): This is the human-readable name, corresponding to the UCDP "side b" name
** UCDP Dataset ID** ('ucdp_dset_id'):This is the UCDP new dataset ID code, included to help users integrate the data with other research

**Year** ('year'): Year of activity being modeled 

**Modeled Articles** ('modeledarticles'): Number of articles modeled in this year

**Count Topic One** ('countT1'): Number of articles assigned to the group-specific Topic One

**Count Topic Two** ('countT2'): Number of articles assigned to the group-specific Topic Two.

**Proportion Topic One** ('propT1'): Proportion of articles in the group-year assigned to the group-specific Topic One. Constructed as (Count Topic One/Modeled Articles).

**Proportion Topic Two** ('propT2'): Proportion of articles in the group-year assigned to the group-specific Topic Two. Constructed as (Count Topic Two/Modeled Articles). 

**Proportion Difference** ('propdiff'): The yearly difference in articles assigned to the group's Topic Two relative to Topic One. Created as (Proportion Topic Two - Proportion Topic One). This variable quantifies how concentrated the framing is for a group in any given year. Higher absolute values indicate a year in which articles about a group's activity are more concentrated in the Topic Two (1) or Topic One (-1) end of their specific scale.

**Lagged Difference**('propdif.L1'): This is the difference between the current yearly difference between Topic Two and Topic One and the difference in the previous year (Lag = 1). Construcated as (Proportion Difference - Previous Year's Proportion difference). This variable serves as the primary driver for pinpointing representational changes.
    

**Lagged Difference 2**( ('propdif.L2'): This variable measures how much the representation has changed over a two-year window. It is constructed as (Proportion Difference, Current Year - Proportion Difference Time - 2). This variables captures the context in which a group representation might have changed dramatically, but over two years rather than one. This basic window can be arbetrarily expanded in 02tinyThreshTransformVarRep.R, or using the data in this notebook.

The next three variables are indicators that serve as a shorthand for bins of year-on-year shifts in framing:

**Small Change** ('delta1'): An indicator for whether the group-year has a one-year lagged proportion difference of at least |1|
 
**Medium Change** ('delta1.5'): An indicator for whether the group-year has a one-year lagged proportion difference of at least |1.5|

**Large Change** ('delta2'): An indicator for whether the group-year has a one-year lagged proportion difference of at least |2|. This is the largest possible shift in framing, and captures a situation in which in the first year all articles about a group's activity is on one end of the scale and by the following year it is entirely on the other side of the scale.

**Lagged Large Change** ('delta1_L2'): This is an indicator variable which captures whether there is a two year lag of at least |1| in the representational proportion. 

The difference variables (which start with 'gap') are the inverse of the "delta" variables. They are binary variables indicting years without a large separation in topics. The goal is to pinpoint years where news articles are about equally likely to fall on each side of the group scale. This can be used to indicate periods of slow framing transition; groups that mix different operating modes; or a scale for which both sides are similar.  

**Small Difference** ('gap25'): This is a binary varible that takes the value of 1 when the difference between the proportion of group Topic Two and Topic One is |0.25| or less for a given year. Basically, years in which articles about the group's operations were roughly evenly split on their specific axis. 

**Medium Difference** ('gap50'): This is a binary varible that takes the value of 1 when the difference between the proportion of group Topic Two and Topic One is |0.50|. This captures the same substantive concept as the "Small Difference" year, but with a slightly wider aperture. 

**Years Since Last Change**('counter'): This variable starts at 0 for years with a change (via the one-year lagged proportion difference of at least |1| 

**Characterization of Dominant Topic** ('frexWords'): These are the "FREX" words that summarize the dominant topic assigned to the group-specific model and year. 

**Region** ('region') This is the region of activity associated with each group. The data is derived by extracting the region associated with each group's events in the UCDP dataset. There are 14 groups whose associated events spanned more than one region. These groups were coded as "Multiple" 


## Prepare Features for Plotting and Summaries

Here we prepare a dataframe, `numchanges` with a high-level summary of the number of identified changes per group.

We also prepare the dataframe `yearsums`, which collects the number of group changes recorded in a given year.

In [5]:
## Helper dataframe that counts the number of changes per group:
numchanges = group_years.groupby(['ucdp_dset_id',
                                  'ucdp_name', 'region'])['delta1'].apply(lambda x:
                                                                (x == 1).sum()).reset_index(name='numchanges')

yearsums = group_years.groupby("year")['delta1'].apply(lambda x:
                                                       (x == 1).sum()).reset_index(name='year_total')

## Plots

First, I start with a scatter plot depicting a high-level number of summaries.
The first plot presents number of changes across all groups in a given year. 

There is a clear temporal trend: as time passes, the yearly change counts increase.
ُThe trends have local peaks in 2002 and 2015, which may reflect attention and global framing effects driven by the War on Terror and the rise and spread of the ISIS global insurgency.

In [18]:
## Start by creating static plots, and eventually move towards
## a dynamic interface


fig_all = px.scatter(yearsums, x = "year", y="year_total", 
                 title="Yearly Count of Framing Changes of at Least |1|",
                labels={"year_total": "Number of Changes",
                        "year": "Year"})
fig_all.update_layout(plot_bgcolor='white')
## Marking for guidelines:
fig_all.update_xaxes(
    mirror=True,
    ticks='outside',
    dtick=1,
    showline=True,
    linecolor='black',
   # gridcolor='lightgrey'
)
fig_all.update_yaxes(
    mirror=True,
    ticks='outside',
    showline=True,
    linecolor='black',
    tick0=0,
    dtick =1
    #gridcolor='lightgrey'
)
fig_all.show()

## Frequency of changes by region:

(Mouse over the plot for the yearly tally)

In [43]:
region_sums = group_years.groupby(["year", "region"])['delta1'].apply(lambda x:
                                                       (x == 1).sum()).reset_index(name='year_total')
fig_rs = px.line(region_sums, x = "year", y="year_total", color="region", 
                 title="Yearly Count of Framing Changes of at Least |1|",
                labels={"year_total": "Number of Changes",
                        "year": "Year", 
                       "region": "Region"})

fig_rs.update_layout(plot_bgcolor='white')
fig_rs.show()

## Visuzalize by Region

The second set of plots presents the number of changes per group, segmented by region.

First, some data prepwork to summarize by region. 

Create summary dataframes for each region, keeping only the columns we want to display.

This changes the unit of analysis, but allows us to quickly visualize the number of changes associated with specific militant groups.

In [22]:
## Keep only the columns we care about:

cols = ["ucdp_dset_id", "ucdp_name", "numchanges", "region"]

region_changes = numchanges[cols].drop_duplicates()

## Adjust BRA which is in twice with different IDs:

region_changes.loc[(region_changes["ucdp_dset_id"] == 289), "ucdp_name" ] = "BRA1"
region_changes.loc[(region_changes["ucdp_dset_id"] == 328), "ucdp_name" ] = "BRA2"

## Some of the name are too long for the plot, make a label column with truncated names:
## Note that 21 characters is the min length to not have artificial duplicates in Europe
region_changes["label"]= [(item[:21]+"..") if 
                          len(item) > 18 else 
                          item for item in region_changes["ucdp_name"].values]

## Africa

In [30]:
df_africa = region_changes.query('region in ["Africa"]')

dat = df_africa.sort_values(by="numchanges", ascending = False).drop_duplicates()

fig_africas = px.bar(dat, #sort values in df
              y="label",
             x = "numchanges",
              hover_name="ucdp_name",
            labels={ "label": "Group Name (Abrev)",
                     "numchanges": "Number of Associated Changes",
                     "ucdp_name": "Group Name"},
                title="Change Distribution: Africa")

fig_africas.update_layout(plot_bgcolor='white')

fig_africas.show()

## Americas

In [31]:
df_am = region_changes.query('region in ["Americas"]')

dat = df_am.sort_values(by="numchanges", ascending = False).drop_duplicates()

fig_am = px.bar(dat, #sort values in df
              y="label",
             x = "numchanges",
            hover_name="ucdp_name",
            labels={"label": "Group Name (Abrev)",
                     "numchanges": "Number of Associated Changes",
                     "ucdp_name": "Group Name"},
                title="Change Distribution: Americas")
fig_am.update_layout(plot_bgcolor='white')
fig_am.show()

## Asia

In [35]:
df_asia = region_changes.query('region in ["Asia"]')

dat = df_asia.sort_values(by="numchanges", ascending = False)

fig_asia = px.bar(dat, #sort values in df
              y="label",
             x = "numchanges",
            hover_name="ucdp_name",
            labels={
                     "numchanges": "Number of Associated Changes",
                     "label": "Group Name (Abrev)" },
                title="Change Distribution: Asia")
fig_asia.update_layout(plot_bgcolor='white')
fig_asia.show()



## Europe 

In [36]:
df_euro = region_changes.query('region in ["Europe"]').sort_values(by="numchanges", ascending = False)

fig_euro = px.bar(df_euro, #sort values in df
              y="label",
             x = "numchanges",
              hover_name="ucdp_name",
            labels={"label": "Group Name (Abrev)",
                     "numchanges": "Number of Associated Changes",
                     "ucdp_name": "Group Name"},
                title="Change Distribution: Europe")
fig_euro.update_layout(plot_bgcolor='white')
fig_euro.show()

## Middle East

In [39]:
df_me = region_changes.query('region in ["Middle East"]').sort_values(by="numchanges", ascending = False)

fig_me = px.bar(df_me, #sort values in df
              y="label",
             x = "numchanges",
              hover_name="ucdp_name",
            labels={"label": "Group Name (Abrev)",
                     "numchanges": "Number of Associated Changes",
                     "ucdp_name": "Group Name"},
                title="Change Distribution: Middle East")
fig_me.update_layout(plot_bgcolor='white')
fig_me.show()

## Groups Spanning Multiple Regions

In [41]:

df_mult =region_changes.query('region in ["Multiple"]').sort_values(by="numchanges", ascending = False).drop_duplicates()

fig_mult = px.bar(dat, #sort values in df
              y="label",
             x = "numchanges",
               hover_name="ucdp_name",
            labels={"label": "Group Name (Abrev)",
                     "numchanges": "Number of Associated Changes",
                     "ucdp_name": "Group Name"},
                title="Change Distribution: Groups Spanning Multiple Regions")
fig_mult.update_layout(plot_bgcolor='white')
fig_mult.show()

## Plot all at once

The previous plots are repetitive, so the following code block produces them via a single loop:

In [55]:
#print(region_changes.columns)

regions = [r for r in region_changes["region"].unique()]

print(regions)

['Multiple', 'Asia', 'Middle East', 'Europe', 'Africa', 'Americas']


In [63]:
## Function + Loop:
    
def plot_regions(data, r):
    df_region = data[data["region"] == r].sort_values(by="numchanges", 
                                                      ascending = False).drop_duplicates()
    plt.figure()
    figr = px.bar(df_region, #sort values in df
              y="label",
             x = "numchanges",
            hover_name="ucdp_name",
            labels={
                     "numchanges": "Number of Associated Changes",
                     "label": "Group Name (Abrev)" },
                title="Change Distribution: " + r )
    figr.update_layout(plot_bgcolor='white')
    return figr
 
# Plots all:
[plot_regions(region_changes, r).show() for r in regions]

[None, None, None, None, None, None]

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

## Spotlight on Groups of Interest

This presents code for spotlighting groups of interest. A more interactive version is demod in 08_Dash_Dev.ipynb, and (eventually) via the accompanying Dash app. 

The Subject to Change manuscript presents a series of validation plots that plot the topic assigment of all stories associated with several groups (Abu Sayyaf, AQAP, LRA, ONLF, PKK). These were chosen as illustrations becuase they are associated with strong pre-existing narratives about organizational trajectories, which we can use to validate the modeling approach.  However, interested users might want to investigate the distribution of other groups in the data. 

In [75]:
test_graph = group_years.query('ucdp_name in ["KDPI", "KNU", "AQAP"]')
print(test_graph.shape) #52 x 20

(52, 22)


In [76]:
groupdb = ["AQAP"] ## insert group of interest
mask = group_years.ucdp_name.isin(groupdb)
#fig = px.line(test_graph[mask], 

fig = px.scatter(group_years[mask], 
              x="year", 
              y="propdiff",
              color="ucdp_name",
              title='Group Trajectory',
                labels = {"ucdp_name" : "Selected Group", 
                         "propdiff" : "Framing Concentration", 
                         "year" : "Year"})
fig.update_layout(plot_bgcolor='white')
fig.update_xaxes(
    mirror=True,
    ticks='outside',
    dtick=1,
    showline=True,
    linecolor='black',
   # gridcolor='lightgrey'
)
fig.update_yaxes(
    mirror=True,
    ticks='outside',
    showline=True,
    linecolor='black',
    tick0=0,
    dtick =1
    #gridcolor='lightgrey'
)
fig.add_hline(y=0.0, line_dash="dash", line_color="lightgray")
fig.show()
