# Getting started: a quick hand-search using the Crossref service

*Written on Aug 2, 2021 by Akash Pallath.* 

*Last updated on Sep 12, 2021 by Akash Pallath.*

---

To get started, let's import the paperfetcher handsearch package.

In [1]:
from paperfetcher import handsearch

Let's perform a simple task: to search for all journal articles in the *Journal of Physical Chemistry B* (JPCB) published between January 01, 2021 and June 01, 2021.

A quick Google search reveals that the ISSN for the web edition of JPCB is 1520-5207.

Now let's use this information to create a search object:

In [2]:
# Create a search object
search = handsearch.CrossrefSearch(ISSN="1520-5207",
                                   from_date="2021-01-01",
                                   until_date="2021-06-01")

Let's run the search!

(Ignore the warning for now. You can read the paperfetcher documentation to learn more about it later!)

In [3]:
search()

100%|██████████| 29/29 [00:20<00:00,  1.41it/s]


How many works did our search return?

In [4]:
len(search)

568

We can refine our search to select papers matching the keywords 'polymer' and 'solvation'.

As before, we create a search object, but this time, pass the two keywords to the search using the `keyword_list` argument:

In [5]:
# Create a search object
search = handsearch.CrossrefSearch(ISSN="1520-5207", 
                                   keyword_list=["polymer", "solvation"], 
                                   from_date="2021-01-01",
                                   until_date="2021-06-01")

Let's run the search!

In [6]:
search()

100%|██████████| 2/2 [00:01<00:00,  1.12it/s]


How many works did our search return?

In [7]:
len(search)

22

## Extracting data from the search results

paperfetcher provides many different ways to access the search result data, using special data structures called Datasets. 

For example, you can get a Dataset of DOIs from the search results:

In [8]:
doi_ds = search.get_DOIDataset()

You can display this as a DataFrame

In [9]:
doi_ds.to_df()

Unnamed: 0,DOI
0,10.1021/acs.jpcb.1c01837
1,10.1021/acs.jpcb.1c01460
2,10.1021/acs.jpcb.1c02191
3,10.1021/acs.jpcb.1c01177
4,10.1021/acs.jpcb.1c01953
5,10.1021/acs.jpcb.1c01070
6,10.1021/acs.jpcb.1c00885
7,10.1021/acs.jpcb.0c10283
8,10.1021/acs.jpcb.1c01898
9,10.1021/acs.jpcb.0c10831


Or save it to a text file!

In [10]:
doi_ds.save_txt("DOI_dataset.txt")

**What if you want more information?**

You can extract information corresponding to all the fields that Crossref stores and store them in a `CitationsDataset`. The way in which Crossref stores some of these fields can be pretty complex. paperfetcher provides 'parsers' to convert these fields into human-readable strings.

Here's a quick-and-dirty example: let's create a dataset containing the DOI, URL, article title, author list, and publication date. As per the Crossref API, these fields are:
`DOI`, `URL`, `title`, `author`, and `issued`.

`title` and `author` and `issued` require special parsers. The rest don't.

Let's jump right to it!

In [11]:
# Import the parsers module
from paperfetcher import parsers

In [12]:
ds = search.get_CitationsDataset(field_list=['DOI', 'URL', 'title', 'author', 'issued'],
                                 field_parsers_list=[None, None, parsers.crossref_title_parser,
                                                     parsers.crossref_authors_parser, 
                                                     parsers.crossref_date_parser])

In [13]:
ds.to_df()

Unnamed: 0,DOI,URL,title,author,issued
0,10.1021/acs.jpcb.1c01837,http://dx.doi.org/10.1021/acs.jpcb.1c01837,Continuous Illumination of a Conjugated Polyme...,"AlShetwi, Schiefer, Sommer, Reiter",2021-5-24
1,10.1021/acs.jpcb.1c01460,http://dx.doi.org/10.1021/acs.jpcb.1c01460,Hydrogen Bonding Strength Determines Water Dif...,"Bayles, Fisher, Valentine, Nowbahar, Helgeson,...",2021-5-12
2,10.1021/acs.jpcb.1c02191,http://dx.doi.org/10.1021/acs.jpcb.1c02191,Characterizing the Interplay between Polymer S...,"Dhabal, Jiang, Pallath, Patel",2021-5-12
3,10.1021/acs.jpcb.1c01177,http://dx.doi.org/10.1021/acs.jpcb.1c01177,Concentration-Dependent Solvation Structure an...,"Bazak, Wong, Duanmu, Han, Reed, Murugesan",2021-5-10
4,10.1021/acs.jpcb.1c01953,http://dx.doi.org/10.1021/acs.jpcb.1c01953,Multiscale Approaches for Confined Ring Polyme...,"Chubak, Likos, Egorov",2021-5-3
5,10.1021/acs.jpcb.1c01070,http://dx.doi.org/10.1021/acs.jpcb.1c01070,Length-Scale Effects in Hydrophobic Polymer Co...,van der Vegt,2021-4-27
6,10.1021/acs.jpcb.1c00885,http://dx.doi.org/10.1021/acs.jpcb.1c00885,Glassy and Polymer Dynamics of Elastomers by 1...,"Nardelli, Martini, Carignani, Rossi, Borsacchi...",2021-4-22
7,10.1021/acs.jpcb.0c10283,http://dx.doi.org/10.1021/acs.jpcb.0c10283,Investigating Primary Charge Separation in the...,"Brütting, Foerster, Kümmel",2021-3-31
8,10.1021/acs.jpcb.1c01898,http://dx.doi.org/10.1021/acs.jpcb.1c01898,Correction to “Imaging Switchable Protein Inte...,"Dutta, Bishop, Zepeda O, Chatterjee, Flatebo, ...",2021-3-30
9,10.1021/acs.jpcb.0c10831,http://dx.doi.org/10.1021/acs.jpcb.0c10831,Real-Time Observation of Solvation Dynamics of...,"Bahry, Denisov, Moisy, Ma, Mostafavi",2021-3-2


We can also save this to a text file using the `save_txt` method. But let's try something new - saving this information to an Excel file! It's very easy:

In [14]:
ds.save_excel("results.xlsx")

That's all for this Getting Started notebook! 

Check out the paperfetcher documentation or the source code on Github for more!