# Impact of the 21st Century Cures Act on Stimulating Collaboration
________________________________________________________________________

## 2. Understanding New and Ongoing Collaborations

**About:** 

This notebook is provided to meet the data availability requirements for a scientific publication.

The data used in this notebook is provided to meet the data availability requirements for a scientific publication. Data is derived from NIH's internal datasets and from Digital Science's Dimensions Platform. Data from Digital Science is based on metadata as of March 2024 from Digital Science's Dimensions platform, available at https://app.dimensions.ai/.  Access was granted under license agreement with the National Cancer Institute. Researchers interested in exploring the data further should visit the Dimensions platform website.

**Notebook Goals:**

This notebook helps to answer the following 2 research questions:
1) Did new collaborations form between 21st Century Cures Act funded investigators?  
2) Did collaborations that began before 21st Century Cures Act funding persist with Cures Act funding support?

This notebook is part of a series of notebooks on evaluating the impact of the 21st Century Cures Act on stimulating collaboration in the cancer research community.

**Key Definitions:**
- Cures Act PI Network: a network of investigators who were funded through the 21st Century Cures Act in FY 2017-2023.
- New collaboration: a pairwise collaboration between PIs that began in or after the fiscal year during which at least one PI in the collaborative pair received 21st Century Cures Act funding.
- Ongoing collaboration: a pairwise collaboration between PIs that occurred both before and after the fiscal year during which at least one PI in the collaborative pair received 21st Century Cures Act funding.  

**Required Packages:**
- Pandas

**Notebook Input Files:**

This notebook assumes you used the filepaths recommended in Notebook 1. If you did not, be sure to change the input paths to the location where you saved the Cures Act PI Network nodes and aggregated edge tables.

- Input Filepath 1: "../data/cures_nodes.csv"
    - The nodes table for the Cures Act PI Network
- Input Filepath 2: "../data/cures_agg_edges.csv"
    - The aggregated edge table for the Cures Act PI Network

**Notebook Output Files:**

No outputs are generated in this notebook.

## Import Packages

In [None]:
import pandas as pd

## Read in Data

In [None]:
# Read in the node data
nodes_df = pd.read_csv("../data/cures_nodes.csv")
print(nodes_df.shape)

nodes_df.head()

In [None]:
# Read the aggregated edge data 
agg_edges_df = pd.read_csv("../data/cures_agg_edges.csv")
print(agg_edges_df.shape)

agg_edges_df.head()

## Analyze and Extract Insights from Data

### Categorizing Collaboration Periods

We define a new collaboration as one that occurs in the same year or after at least one PI in the collaborative pair received 21st Century Cures Act funding. We define an ongoing collaboration as one that occurred both before and after at least one PI in the collaborative pair received 21st Century Cures Act funding. This categorization is available in the "ms_collab_period" column of the aggregated edge table. 

In [None]:
# Save a value counts of "ms_collab_period" to a series
collab_period_series = agg_edges_df["ms_collab_period"].value_counts(dropna=False)

collab_period_series

In [None]:
# Total new and ongoing
collab_period_series["Post-Moonshot"] + collab_period_series["Pre&Post-Moonshot"]

### Characterizing New Collaborations

In [None]:
# Subset the aggregated edge data to the post-21st Century Cures Act funded group (new collaborations)
new_collabs_df = agg_edges_df[agg_edges_df["ms_collab_period"] == "Post-Moonshot"]

new_collabs_df.head()

In [None]:
# How many PIs are part of new collabs?
# This is the total number of unique PIs across the "source" and "target" columns

pis = list(new_collabs_df["source"]) + list(new_collabs_df["target"])

unique_new_collaborators = list(set(pis))

# Count of PIs part of new collabs
print(len(unique_new_collaborators))

# % out of total PIs that are part of new collabs
print(len(unique_new_collaborators)/len(nodes_df))

In [None]:
# Now, look at the funding of these pairs
# How many new collaborations were supported by funding through the 21st Century Cures Act? 
# This information is available in the "ms_collab_funding_overall" column
new_collab_funded_series = new_collabs_df["ms_collab_funding_overall"].value_counts(dropna=False)

new_collab_funded_series

### Characterizing Ongoing Collaborations

In [None]:
# Subset the aggregated edge data to the pre&post Cures Act funded group (ongoing collaborations)
ongoing_collabs_df = agg_edges_df[agg_edges_df["ms_collab_period"] == "Pre&Post-Moonshot"]

ongoing_collabs_df.head()

In [None]:
# How many PIs are part of ongoing collabs?
# This is the total number of unique PIs across the "source" and "target" columns

ongoing_collaborators = list(ongoing_collabs_df["source"]) + list(ongoing_collabs_df["target"])

unique_ongoing_collaborators = list(set(ongoing_collaborators))

# Count of PIs part of ongoing collabs
print(len(unique_ongoing_collaborators))

# % out of total MS PIs that are part of new collabs
print(len(unique_ongoing_collaborators)/len(nodes_df))

In [None]:
# Now look at funding for these pairs
# How many ongoing collaborations were supported by funding through the 21st Century Cures Act? 
ongoing_collab_funded_series = ongoing_collabs_df["ms_collab_funding_overall"].value_counts()

ongoing_collab_funded_series

In [None]:
# Further subset to those where the collaboration was sustained with 21st Century Cures Act funding

# Filter ongoing_collabs_df to those that had 21st Century Cures Act support
subset_ongoing_collabs_df = ongoing_collabs_df[ongoing_collabs_df["ms_collab_funding_overall"].isin(["Moonshot Funded"])]
print(subset_ongoing_collabs_df.shape)

In [None]:
# For these 507 collaborations, 21st Century Cures Act support may have been on publications, projects, or both

# To determine the pairs (number of collaborations) that had this support on their publications
# we can filter the aggregated table to where "n_ms_proj_collabs_excl_type3" == 0 (0 funded projects)
# and "n_ms_pub_collabs" > 0 (at least one funded publication)

only_ms_pubs = subset_ongoing_collabs_df[(subset_ongoing_collabs_df["n_ms_proj_collabs_excl_type3"] == 0) &
                                   (subset_ongoing_collabs_df["n_ms_pub_collabs"] > 0) 
                                  ]

only_ms_pubs.shape

In [None]:
# To determine the pairs (number of collaborations) that had 21st Century Cures Act support on their projects
# we can again filter the aggregated table, this time to where "n_ms_proj_collabs_excl_type3" > 0 
# and "n_ms_pub_collabs" == 0
only_ms_projs = subset_ongoing_collabs_df[(subset_ongoing_collabs_df["n_ms_proj_collabs_excl_type3"] > 0) &
                                   (subset_ongoing_collabs_df["n_ms_pub_collabs"] == 0) 
                                  ]

only_ms_projs.shape

In [None]:
# To determine the pairs (number of collaborations) that had 21st Century Cures Act support on both their projects
# and their publications, we can again filter the aggregated table, 
# this time to where "n_ms_proj_collabs_excl_type3" > 0 and "n_ms_pub_collabs" > 0
both = subset_ongoing_collabs_df[(subset_ongoing_collabs_df["n_ms_proj_collabs_excl_type3"] > 0) &
                                   (subset_ongoing_collabs_df["n_ms_pub_collabs"] > 0) 
                                  ]

both.shape