# Getting started: citation searching (a.k.a. snowball-search) with paperfetcher

*Written on Oct 3, 2021 by Akash Pallath.* 

*Last updated on May 22, 2022 by Akash Pallath.*

---

To get started, let's import paperfetcher's snowballsearch package.

In [1]:
from paperfetcher import snowballsearch

## Snowballing backwards (also called backward reference chasing, backward reference search, or backward citation search) with Crossref

Backward reference chasing involves retrieving all articles which are referenced (cited) by a set of starting articles. 

Let's fetch all the references from two review papers with DOIs:
- 10.1021/acs.jpcb.1c02191
- 10.1073/pnas.2018234118
using the Crossref service.

First, we create a search object, and initialize it with a list of strings, each string being a DOI.

In [2]:
search = snowballsearch.CrossrefBackwardReferenceSearch(["10.1021/acs.jpcb.1c02191",
                                                         "10.1073/pnas.2018234118"])

Let's run the search!

(Ignore the warning - it just says that one of the retrieved articles does not have a DOI in Crossref. Read paperfetcher's documentation to learn more about this warning!)

In [3]:
search()

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.59it/s]


How many works did our search return?

In [4]:
len(search)

140

# Extracting data from the search results

Just as we did for handsearching, we can get a Dataset of DOIs from the search results:

In [5]:
doi_ds = search.get_DOIDataset()

We can display this as a DataFrame:

In [6]:
doi_ds.to_df()

Unnamed: 0,DOI
0,10.1021/ar2000869
1,10.1073/pnas.93.17.8951
2,10.1021/jp2107523
3,10.1021/acs.jpcb.6b10797
4,10.1073/pnas.1113256108
...,...
135,10.1073/pnas.1110703108
136,10.2174/138920308785132712
137,10.1021/jz200319g
138,10.1146/annurev-chembioeng-061010-114156


Or save it to a text file:

In [7]:
doi_ds.save_txt("out/snowball_back.txt")

We can also convert it to RIS format:

In [8]:
ris_ds = search.get_RISDataset()

Converting results to RIS format.: 100%|███████████████████████████████████████████████████████████████████████████████████| 140/140 [00:28<00:00,  4.89it/s]


And save it to an RIS file:

In [9]:
ris_ds.save_ris("out/snowball_back.ris")

## Snowballing backwards with COCI

We can also perform backward snowballing with COCI, the OpenCitations Index of Crossref DOI-to-DOI citations.

The syntax is similar to that of Crossref:

In [10]:
search = snowballsearch.COCIBackwardReferenceSearch(["10.1021/acs.jpcb.1c02191",
                                                     "10.1073/pnas.2018234118"])
search()
doi_ds = search.get_DOIDataset()
doi_ds.to_df()

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.24s/it]


Unnamed: 0,DOI
0,10.1021/ar2000869
1,10.1073/pnas.93.17.8951
2,10.1021/jp2107523
3,10.1021/acs.jpcb.6b10797
4,10.1073/pnas.1113256108
...,...
135,10.1073/pnas.1110703108
136,10.2174/138920308785132712
137,10.1021/jz200319g
138,10.1146/annurev-chembioeng-061010-114156


## Snowballing forwards (also called forward citation chasing or forward citation search) with COCI

Forward citation chasing involves retrieving all articles which cite a set of starting articles. 

Let's fetch all the citations of two review papers with DOIs:
- 10.1021/acs.jpcb.1c02191
- 10.1073/pnas.2018234118
using the COCI service. We cannot use the Crossref service for this task.

The syntax is similar to that of backward search:

In [11]:
search = snowballsearch.COCIForwardCitationSearch(["10.1021/acs.jpcb.1c02191",
                                                   "10.1073/pnas.2018234118"])
search()
doi_ds = search.get_DOIDataset()
doi_ds.to_df()

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.06it/s]


Unnamed: 0,DOI
0,10.1002/pol.20210526
1,10.1021/acs.jpcb.1c02191
2,10.1016/j.bpj.2021.07.016
3,10.1021/acs.jpcb.1c08603
4,10.1021/acsomega.1c05064
5,10.1101/2021.03.17.435885
6,10.1038/s41557-021-00864-2


Again, we can save the search results to a text file:

In [12]:
doi_ds.save_txt("out/snowball_fwd.txt")

Or to an RIS file:

In [13]:
ris_ds = search.get_RISDataset()
ris_ds.save_ris("out/snowball_fwd.ris")

Converting results to RIS format.: 100%|███████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00,  4.73it/s]
