## Words

Another way to scrape the data is to get some paper data from 

In [1]:
# Import LISC - Words
from lisc.words import Words
from lisc.scrape import scrape_words

In [8]:
# Set up some test data
#  Note that each entry should be a list
terms_a = [['brain'], ['cognition']]
terms_b = [['body'], ['biology'], ['disease']]

## Function Approach: scrape_words

In [51]:
# Scrape words data - set the scrape to return data for at most 5 papers per term
dat, meta_dat = scrape_words(terms_a, retmax='5', use_hist=False, save_n_clear=False, verbose=True)

Scraping words for:  body
Scraping words for:  biology
Scraping words for:  disease


In [52]:
# The function returns a list of LISC Data objects
dat

[<lisc.data.Data at 0x1a1444b470>,
 <lisc.data.Data at 0x1a14514b00>,
 <lisc.data.Data at 0x1a14177198>]

In [61]:
dd = dat[0]

In [62]:
meta_dat['req'].n_requests

7

In [63]:
dd.label

'body'

In [59]:
# Each data object holds the data for the scraped papers
d1 = dat[0]

# Print out some of the data
print(d1.n_articles, '\n')
print('\n'.join(d1.titles), '\n')

2 

Comparative study between: Carboxytherapy, platelet-rich plasma, and tripolar radiofrequency, their efficacy and tolerability in striae distensae.
Pharmacokinetics, Pharmacodynamics, and Safety of Subcutaneous Bapineuzumab: A Single-Ascending-Dose Study in Patients With Mild to Moderate Alzheimer Disease. 



## Object Approach: Words

In [13]:
# Initialize Words object
words = Words()

# Set terms to search
words.set_terms(terms_a)

In [14]:
# Run words scrape
words.run_scrape(retmax='5', save_n_clear=False)

In [15]:
# Words also saves the same list of Data objects
words.results

[<lisc.data.Data at 0x1a13f2c048>, <lisc.data.Data at 0x1a1438b2b0>]

The use of synonyms and exclusion words, demonstrated above for counts, applies in the same way to the scraping words.

### Exploring Words Data

The words object also has a couple convenience methods for exploring the data. 

In [16]:
# Indexing with labels
print(words['brain'])

<lisc.data.Data object at 0x1a13f2c048>


In [17]:
# Iterating through papers found from a particular search term
#  The iteration returns a dictionary with all the paper data
for art in words['cognition']:
    print(art['title'])

Sleep-disordered breathing, brain volume, and cognition in older individuals with heart failure.
Licorice root components mimic estrogens in an object location task but not an object recognition task.
Animal Cognition: Chimps Use Human Knowledge When Reasoning Statistically.
Application of motor learning in neurorehabilitation: a framework for health-care professionals.
The association between gait speed and cognitive status in community-dwelling older people: A systematic review and meta-analysis.


### Metadata

Regardless of what you are scraping, or how you run it through LISC, there is some meta-data saved.

This data is collected in a dictionary, that is returned by the scrape functions (and saved to the objects, if applicable).

In [23]:
# The meta data includes some information on the database that was scraped
meta_dat['db_info']

{'count': '28559796',
 'dbbuild': 'Build180618-2212m.3',
 'dbname': 'pubmed',
 'description': 'PubMed bibliographic record',
 'lastupdate': '2018/06/19 16:48',
 'menuname': 'PubMed'}

In [25]:
# This data is also saved to object
words.meta_dat['db_info']

{'count': '28559796',
 'dbbuild': 'Build180618-2212m.3',
 'dbname': 'pubmed',
 'description': 'PubMed bibliographic record',
 'lastupdate': '2018/06/19 16:48',
 'menuname': 'PubMed'}

In [19]:
# It also includes the Requester object, which is used to launch URL requests
#   This object also stores some details about the scrape
#   It can be used, for example, to track how long scrapes take, and how many requests they include
print('Start time:    ', meta_dat['req'].st_time)
print('End time:      ', meta_dat['req'].en_time)
print('# of requests: ', meta_dat['req'].n_requests)

Start time:     16:36 Wednesday 20 June
End time:       16:36 Wednesday 20 June
# of requests:  5
