# Machines of healing grace?

Code with basic analysis and results from the AI v Covid paper

**Sections**

1. **Descriptive analysis**
  * How much Covid and AI activity do we detect in our data sources?
  * Is AI over or underrepresented in Covid research
  * How has AI activity evolved over time?
2. Topical analysis
  * What is the topical composition of Covid research and in what areas is AI focusing?
  * What are some examples of AI research to tackle Covid?
  * How has it evolved over time?
3. Geography
  * Where is AI research happening?
  * Who is doing it?
  * Do we find any differences in the topics that different countries focus on?
  * What reflects whether a country focuses on Covid research? Demand pull or supply push?
4. Knowledge base
  * On what topics do AI researchers draw on?
4. Analysis of diffusion
  * What determines the focus of AI researchers on particular topics?
  * Does Covid oriented-AI research reflect the composition of the broader field? 
  * What researchers have been attracted to AI research and why?
  

## Preamble

In [None]:
%run ../notebook_preamble.ipy

In [None]:
import altair as alt
from altair_saver import save
from toolz.curried import *

In [None]:
FIG_PATH = f"{project_dir}/reports/figures/report_1"
SRC_PATH = f"{project_dir}/data/processed/ai_research"


In [None]:
pd.options.mode.chained_assignment = None

In [None]:
def save_fig(figure,name):
    save(figure,f'{FIG_PATH}/{name}.png',method='selenium',
         webdriver=DRIVER,scale_factor=3)
    
def preview(x):
    print(x.head())
    print(x.shape)
    return(x)

## 1. Read data

In [None]:
xiv = pd.read_csv(f"{SRC_PATH}/xiv_papers_labelled.csv").pipe(preview)

In [None]:
xiv.columns = [x.lower() for x in xiv.columns]

## 2. Analysis

In [None]:
alt.data_transformers.disable_max_rows()

In [None]:
#xiv['date'] = xiv['created'].apply(lambda x: np.datetime64(datetime.datetime.strptime(x,"%Y-%m-%d")))

xiv['date'] = pd.to_datetime(xiv['created'])

In [None]:
xiv_recent = xiv.query("year > 2016")

In [None]:
xiv_daily = xiv_recent['date'].value_counts().rename('all_xiv')

In [None]:
queries = ["is_covid == 1","is_ai == 1","(is_covid ==1) & (is_ai ==1)"]
names = ['covid','ai','covid_ai']

all_series = pd.concat([xiv_daily,
    pd.concat([xiv_recent.query(q)['date'].value_counts().rename(n) for n,q in zip(names,queries)],axis=1)],axis=1).fillna(0)

all_series_long = all_series.reset_index(drop=False).melt(id_vars='index')

In [None]:
sort_variables = ['covid','ai','covid_ai','all_xiv']

plot_all = (alt
           .Chart(all_series_long)
           .mark_line()
           .encode(row=alt.Row('variable:O'),
                   x='yearmonth(index)',y='mean(value)',
              color='variable'))

plot_all.resolve_scale(y='independent').properties(width=150,height=100)

In [None]:
all_series_long_norm = (100*all_series.apply(lambda x: x/x.sum())).reset_index(drop=False).melt(id_vars='index')

plot_all_norm = (alt
           .Chart(all_series_long_norm)
           .mark_line()
           .encode(x=alt.X('yearmonth(index)',title=''),
                   y=alt.Y('mean(value)',title='Share of all papers in category'),
                   color=alt.Color('variable',title='Category')))

times_norm = plot_all_norm.properties(width=300,height=200)

save_fig(times_norm,"fig_1_trends")

times_norm

In [None]:
### Representation

rep = pd.concat([xiv_recent['is_ai'].value_counts(normalize=True),
          xiv_recent.query('is_covid == 1')['is_ai'].value_counts(normalize=True)],axis=1).loc[1]

rep.index = ['share_of_all','share_of_covid']
rep

In [None]:
# By data source

source_shares = 100*pd.concat(
    [xiv_recent.query(q)['article_source'].value_counts(normalize=True).rename(n) for n,q in zip(names,queries)],axis=1)

source_shares_long = source_shares.reset_index(drop=False).melt(id_vars=['index'])

In [None]:
prop = alt.Chart(
    source_shares_long).mark_bar().encode(y=alt.Y('variable',title='Category'),
                                          x=alt.X('value',title='% of papers in category'),
                                          color=alt.Color('index:N',title='Source'))
source_prop = prop.properties(width=200,height=100)

save_fig(source_prop,'fig_2_source_shares')