Skip to content

network analysis of network analysis publications --- split by software

Notifications You must be signed in to change notification settings

incertae-sedis/cavatica

Repository files navigation

Build Status github release https://www.singularity-hub.org/static/img/hosted-singularity--hub-%23e32929.svg Docker Automated build Docker Pulls

Initial Commit: July 2016

***** Cavatica has been adopted by the incertae-sedis group. *****

Cavatica

Code and pipeline for fetching PubMed and PubMed Central data and co-author network analysis. This tool can be used to identify author trends among several search terms.

An example, I've used these scripts to do a multi-network analysis of network analysis papers and their software. Wiki Page Here

Added

The name comes from Charlotte's Web since her full name was Charlotte A. Cavatica. Although Cavatica also refers to barn spider.

Pipeline

***** Cavatica pipeline has been modified so no longer relies on Ebot. *****

Plan

Dependencies

  • Some type of Linux Terminal where you can run Bash. (Cygwin if you're on Windows. Terminal already preinstalled on Mac)
  • R (check if installed by typing Rscript --version)
  • perl (check if installed by typing perl --version)
  • Mango Graph Studio for multi-network analysis

Installation

git clone https://github.com/incertae-sedis/cavatica.git

Basic Example

Here is a basic example fetching PubMed and PMC papers containing the word "Neo4j" and "Cytoscape".

cd cavatica/data
mkdir test
cd test
echo "Neo4j" > config.txt
echo "Cytoscape" >> config.txt
../../code/script.sh

This will create tabular files (list of papers Neo4j_papers_pm.tsv and list of authors Neo4j_authors_pm.tsv). Open the png files Neo4j_pm.png to see a barchart of the number of papers by year.

Neo4j countCavatica count

Can also open the html files to check the one sentence usages of Neo4j and Cavatica

Sentences that contain Neo4j

2018 29377902 Reactome graph database: Efficient access to complex pathway data.

  • Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data.

2018 28936969 Systematic integration of biomedical knowledge prioritizes drugs for repurposing.

  • First, we constructed Hetionet (neo4j.het.io), an integrative network encoding knowledge from millions of biomedical studies.

2017 28416946 Use of Graph Database for the Integration of Heterogeneous Biological Data.

  • Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases.
  • When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases.
  • These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships.

...

Sentences that contain Cytoscape

2018 29894068 Identification of potential miRNAs and candidate genes of cervical intraepithelial neoplasia by bioinformatic analysis.

  • Then the miRNA- mRNA regulatory network was constructed using Cytoscape software.

2018 29872319 An integrated analysis of key microRNAs, regulatory pathways and clinical relevance in bladder cancer.

  • Protein-protein interaction (PPI) and miRNA-mRNA regulatory networks were established by using the Search Tool for the Retrieval of Interacting Genes/Proteins and Cytoscape tool.

2018 29760609 Identification of potential crucial genes and construction of microRNA-mRNA negative regulatory networks in osteosarcoma.

  • Protein-protein interaction (PPI) network was constructed by STRING and visualized in Cytoscape.

...

It will also create a script pubmed.gel. Open Mango Graph Studio, open pubmed.gel and type the following into the Mango Console.

run "pubmed.gel";

This will create a transition table and export the file. It will also load and visualize the author-paper networks.

Neo4jCytoscape
Neo4j network Cavatica network
Going back to your terminal, rerun the script file and it will continue.
../../code/script.sh

The transitions should be saved in trends_pm.txt. The following trends_pm.txt indicates that authors switched from cytoscape to Neo4j 9 times, while authors switched from Neo4j to Cytoscape 3 times.

Cytoscape:Neo4j 9
Neo4j:Cytoscape 3

It will then commence searching PMC, fetching list of papers and authors and generating a "pmc.gel" file. Once again open the "pmc.gel" file in Mango and type the following into Mango Console.

run "pmc.gel";

Then rerun the script to continue tabulating the trends which should be saved in trends_pmc.txt.

The output of a 2017 run comparing "Neo4j", "Gephi", "GraphViz" and "iGraph" is shown below:

=============PubMed Transitions
Neo4j:Gephi 1
Neo4j:GraphViz 1
Neo4j:iGraph 1
=============PubMed Central Transitions
Gephi:GraphViz 2
Gephi:Neo4j 3
Gephi:iGraph 31
GraphViz:Gephi 19
GraphViz:Neo4j 10
GraphViz:iGraph 58
Neo4j:Gephi 4
Neo4j:GraphViz 4
Neo4j:iGraph 1
iGraph:Gephi 34
iGraph:GraphViz 9
iGraph:Neo4j 13

PMC results usually return more papers since search terms like "Neo4j" or "Cytoscape" are being matched to the fulltext, instead of just the title and abstract. This may return more accurate trend tables since sometimes software names are only mentioned in the methods and not in the abstract.

Singularity Container

This repo provides a container for easily reproducing and running Cavatica through a container. The pipeline for both Singularity and Docker was ran on an Ubuntu 18.04 instance on Jetstream, which is a national science and engineering cloud led by the Indiana University Pervasive Technology Institute.

A singularity container of Cavatica is available on Singularity Hub. Using singularity you can download the contained with the following command:

singularity pull shub://TeamMango/cavatica:latest

When run, the container will look for a text file called config.txt in a directory called output in the same directory as the .simg you just downloaded. Place the terms that you want Cavatica to search for in this file. In Ubuntu, you can use the following commands to create this file:

mkdir output
echo "YOURSEARCHTERM" > ./output/config.txt

Your search terms can also be followed by a year range, separated by commas:

echo "YOURSEARCHTERM,1996,2006" > ./output/config.txt

Each search term and year range should occupy it's own line. If you want to search for use of the term cytoscape and VisANT between 1994 and 2000, config.txt would look like this:

visant,1999,2006
cytoscape,1994,2003

Once you have entered the terms in the config.txt file, return to the same directory as the .simg image and run the following command:

singularity run --bind output:/cavatica/data/output TeamMango-cavatica-master-latest.simg

The results of the search will appear in the output directory next to your config.txt file.

Docker Container

A docker container of Cavatica is available on Docker Hub. You can pull the docker container with the following command:

docker pull incertaesedis/cavatica

To run the docker container, move into the directory where you want to generate output from Cavatica. Create three files called multitool-pubmed.tsv, multitool-pmc.tsv, and config.txt. In Ubuntu you can do this with the following command:

touch multitool-pubmed.tsv multitool-pmc.tsv config.txt

All three files must be present in the directory where you run the container. In config.txt enter the search terms that you want Cavatica to search for, with each term on a new line. Optional year ranges can be indicated with commas:

visant,1999,2006
cytoscape,1994,2003

In the same directory as config.txt, run the docker container:

docker run -v ${PWD}:/cavatica/data/output incertaesedis/cavatica

If on windows, "$PWD" should be replaced with the absolute path to your current directory. The files produced by Cavatica should appear on running the container. If you wish to rerun the search with different terms, make sure that the multitool-pubmed.tsv and multitool-pmc.tsv files are still in the folder.

Value of Reproducible Research

Accomplishments and opportunities of reproducing and containerizing this project

Publications