# pubmed_tool Demonstration

## Installation

The `pubmed_tool` can be installed with a single pip command

In [1]:
pip install pubmed_tool@git+https://github.com/intro-to-ds-capstone/capstone-project

Collecting pubmed_tool@ git+https://github.com/intro-to-ds-capstone/capstone-projectNote: you may need to restart the kernel to use updated packages.

  Cloning https://github.com/intro-to-ds-capstone/capstone-project to c:\users\morri\appdata\local\temp\pip-install-5egwig1r\pubmed-tool_8dff25107ccc4b3990028da7bcd1c4af
  Resolved https://github.com/intro-to-ds-capstone/capstone-project to commit 960b8de0147c1eeba87ce89025e73a0445f633a3
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: pubmed_tool
  Building wheel for pubmed_tool (setup.py): started
  Building wheel for pubmed_tool (setup.py): finished with status 'done'
  Created wheel for pubmed_tool: filename=pubmed_tool-1.0.0-py3-none-any.whl size=33352 sha256=494f1d0444cea60402977fbe5f6aa99a6e43e0c42bf942dbc73eb7d3162f06f6
  Stored in directory: C:\Users\morri\AppData\Local\Temp\pip-ephem-wheel-cache-cqyxhk8x\wheels\ed\28\96\31cb1d611a89c4cb

  Running command git clone --filter=blob:none --quiet https://github.com/intro-to-ds-capstone/capstone-project 'C:\Users\morri\AppData\Local\Temp\pip-install-5egwig1r\pubmed-tool_8dff25107ccc4b3990028da7bcd1c4af'


Alternatively, the contents of the file /pubmed_tool on the project [GitHub repository](https://github.com/intro-to-ds-capstone/capstone-project) can be manually downloaded and extracted into the project directory. Downloading the files offers the opportunity to customize the functions further.

## Import

`pubmed_tool` can be imported as a single command

In [2]:
import pubmed_tool

## Main Functions

### Scraping PubMed with `pubmed_tool.scraper()`

#### Basic Scrape

For this demonstration, we queried PubMed for the records that contained our keyword 'HIV', published between January 1 and August 30 of 2020. We exported our results to `'demo_data.csv'`, and returned the data set for examination in this notebook.

Input is used to capture an email address, as all users should input their own email address for Entrez query logging.

In [3]:
records = pubmed_tool.scraper(keyword = 'hiv', 
                              start_date = '2020/01/01', 
                              end_date = '2020/08/30',
                              email = input('email:').lower().strip(),
                              path = 'demo_data.csv',
                              return_df = True)

2023-12-05 21:06:10 - INFO: ['No project directory given. Directory set to the current working directory.']
2023-12-05 21:06:10 - INFO: Initiating PubMed search.
 	Query: (hiv) AND ("2020/01/01"[Date - Publication] : "2020/08/30"[Date - Publication])
 	Entrez email: morrigan.mahady@uth.tmc.edu
 	Max returns: 200000
2023-12-05 21:06:11 - INFO: Search Successful! Obtained 9999 PMID(s).
2023-12-05 21:06:11 - INFO: Initiating PubMed search. Requesting article records for 9999 PMID(s)
2023-12-05 21:07:33 - INFO: Search Successful! Obtained records for  9984 PMID(s).
2023-12-05 21:07:35 - INFO: Success! 
 9999 records for PubMed Search: 
(hiv) AND ("2020/01/01"[Date - Publication] : "2020/08/30"[Date - Publication]) 
 processed and written to c:\Users\morri\Documents\demo\demo_data.csv


As we can see, our function gave us informative logs. Our search found 9,999 possible PubMed IDs that matched our query out of 200,000 possible maximum records. However, only 9,984 had records. 

When we inspect the shape of the record data frame, we find that we have 9,984 rows and 14 columns, which is what we expected based on our logs.

In [4]:
records.shape

(9984, 14)

We inspect our 14 columns. We find our expected results: `title`, `pubdate`, `authors`, `keywords`, `journal`, `isoabbrev`, `volume`, `issue`, `page_start`, `page_end`, `language`, `abstract`, `other_type`, and `other_val`. These correspond to the Article Title, Date of Publication, Author List, Keyword List, Journal Name, Journal ISO Abbreviation, Volume, Issue, Start Page, End Page, List of Languages, Abstract, and Other information that was stored in Volume or Issue fields such as 'Part' or 'Special No.'.

We find that `pubdate`, the date of publication, is appropriately stored as a datetime object.

In [5]:
records.dtypes

title                 object
pubdate       datetime64[ns]
authors               object
keywords              object
journal               object
isoabbrev             object
volume               float64
issue                float64
page_start            object
page_end              object
language              object
abstract              object
other_type            object
other_val             object
dtype: object

By inspecting the first five records in our results, we find that `pmid` serves as our index. All content appears as expected. `authors` is stored as a dictionary, which will be used in later processing.

In [6]:
records.head()

Unnamed: 0_level_0,title,pubdate,authors,keywords,journal,isoabbrev,volume,issue,page_start,page_end,language,abstract,other_type,other_val
pmid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
33193428,major scientific hurdles in hiv vaccine develo...,2020-01-01,"[{'order': '1', 'first': 'tiza', 'last': 'ng'u...","[history of hiv-1 vaccines, hiv, hiv-1 vaccine...",frontiers in immunology,front immunol,11.0,,590780,590780,[eng],Following the discovery of HIV as a causative ...,,
32692406,hiv/sars-cov-2 coinfection: a global perspective.,2021-02-01,"[{'order': '1', 'first': 'osman', 'last': 'kan...","[hiv, covid-19, antiretroviral therapy/coinfec...",journal of medical virology,j med virol,93.0,2.0,726,732,[eng],"Since its first appearance in Wuhan, China, se...",,
31936859,block-and-lock strategies to cure hiv infection.,2020-01-01,"[{'order': '1', 'first': 'gerlinde', 'last': '...","[hiv, latency, block-and-lock, cure]",viruses,viruses,12.0,1.0,,,[eng],Today HIV infection cannot be cured due to the...,,
32066532,hiv transmission and source-sink dynamics in s...,2020-03-01,"[{'order': '1', 'first': 'justin', 'last': 'ok...",,the lancet. hiv,lancet hiv,7.0,3.0,e209,e214,[eng],Multiple phylogenetic studies of HIV in sub-Sa...,,
32612233,clinical targeting of hiv capsid protein with ...,2020-08-01,"[{'order': '1', 'first': 'john', 'last': 'link...",,nature,nature,584.0,7822.0,614,618,[eng],Oral antiretroviral agents provide life-saving...,,


For our final check of the returned data frame, we verify that each PMID is, in fact, unique.

In [7]:
records.index.nunique() == records.shape[0]

True

We can import our CSV to check that the records are the same. We have the same number of rows and columns, and they have the same content. 

Note: pd.DataFrame.equals() would not be True, as the original dataframe has 'authors', 'language', and 'keywords' stored as lists. The dataframe as read in from the CSV would require additional processing to be fully equivalent on all measures.

In [8]:
import pandas as pd

file_records = pd.read_csv('demo_data.csv', 
                    sep = ',', 
                    header = 0, 
                    dtype={'pmid': int, 'volume': 'Int64', 'issue': 'Int64'})\
                        .set_index('pmid', drop = True)
print(file_records.shape)
display(file_records.head())

(9984, 14)


Unnamed: 0_level_0,title,pubdate,authors,keywords,journal,isoabbrev,volume,issue,page_start,page_end,language,abstract,other_type,other_val
pmid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
33193428,major scientific hurdles in hiv vaccine develo...,2020-01-01,"[{'order': '1', 'first': 'tiza', 'last': ""ng'u...","['history of hiv-1 vaccines', 'hiv', 'hiv-1 va...",frontiers in immunology,front immunol,11,,590780,590780,['eng'],Following the discovery of HIV as a causative ...,,
32692406,hiv/sars-cov-2 coinfection: a global perspective.,2021-02-01,"[{'order': '1', 'first': 'osman', 'last': 'kan...","['hiv', 'covid-19', 'antiretroviral therapy/co...",journal of medical virology,j med virol,93,2.0,726,732,['eng'],"Since its first appearance in Wuhan, China, se...",,
31936859,block-and-lock strategies to cure hiv infection.,2020-01-01,"[{'order': '1', 'first': 'gerlinde', 'last': '...","['hiv', 'latency', 'block-and-lock', 'cure']",viruses,viruses,12,1.0,,,['eng'],Today HIV infection cannot be cured due to the...,,
32066532,hiv transmission and source-sink dynamics in s...,2020-03-01,"[{'order': '1', 'first': 'justin', 'last': 'ok...",,the lancet. hiv,lancet hiv,7,3.0,e209,e214,['eng'],Multiple phylogenetic studies of HIV in sub-Sa...,,
32612233,clinical targeting of hiv capsid protein with ...,2020-08-01,"[{'order': '1', 'first': 'john', 'last': 'link...",,nature,nature,584,7822.0,614,618,['eng'],Oral antiretroviral agents provide life-saving...,,


#### Exploration of Options

There are several options available in the `pubmed_tool.scraper()` function. The only mandatory fields are `keyword`, `start_date`, `end_date`, and `email`. For a full exploration of these options, we encourage you to view the [User Manual](https://github.com/intro-to-ds-capstone/capstone-project/blob/main/docs/user_manual.pdf).

As an additional brief demonstration, we will perform another query. This time, we will search for articles with the term 'genomics' published between January 1, 2020 and February 1, 2023. We want to restrict our results to the first 300 articles returned. If we are working on an older system, we may determine we need to 'chunk' our data processing so that our memory only holds the full data for 100 records at a time before recording to disc.

In [9]:
pubmed_tool.scraper(keyword = 'genomics', 
                    start_date = '2020/01/01', 
                    end_date = '2023/02/01',
                    email = input('email:').lower().strip(), 
                    project_dir = None,
                    path = 'demo_expanded.csv', 
                    chunksize = 100, max_returns = 300,
                    overwrite = True, return_df = False)

2023-12-05 21:07:48 - INFO: ['No project directory given. Directory set to the current working directory.']
2023-12-05 21:07:48 - INFO: Initiating PubMed search.
 	Query: (genomics) AND ("2020/01/01"[Date - Publication] : "2023/02/01"[Date - Publication])
 	Entrez email: morrigan.mahady@uth.tmc.edu
 	Max returns: 300
2023-12-05 21:07:49 - INFO: Search Successful! Obtained 300 PMID(s).
2023-12-05 21:07:49 - INFO: Initiating PubMed search. Requesting article records for 100 PMID(s)
2023-12-05 21:07:50 - INFO: Search Successful! Obtained records for  100 PMID(s).
2023-12-05 21:07:50 - INFO: Initiating PubMed search. Requesting article records for 100 PMID(s)
2023-12-05 21:07:52 - INFO: Search Successful! Obtained records for  100 PMID(s).
2023-12-05 21:07:52 - INFO: Initiating PubMed search. Requesting article records for 100 PMID(s)
2023-12-05 21:07:53 - INFO: Search Successful! Obtained records for  100 PMID(s).
2023-12-05 21:07:53 - INFO: Success! 
 300 records for PubMed Search: 
(gen

Our log indicates the chunking steps, and all appears well. We read in our file, and find it does contain the expected 300 records in the expected format.

In [10]:
import pandas as pd

file_records = pd.read_csv('demo_expanded.csv', 
                    sep = ',', 
                    header = 0,
                    dtype={'pmid': int, 'volume': 'Int64', 'issue': 'Int64'})\
                        .set_index('pmid', drop = True)
print(file_records.shape)
display(file_records.head())

(300, 14)


Unnamed: 0_level_0,title,pubdate,authors,keywords,journal,isoabbrev,volume,issue,page_start,page_end,language,abstract,other_type,other_val
pmid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
32034321,pan-genomics in the human genome era.,2020-04-01,"[{'order': '1', 'first': 'rachel', 'last': 'sh...",,nature reviews. genetics,nat rev genet,21,4.0,243,254.0,['eng'],"Since the early days of the genome era, the sc...",,
36523157,plant pan-genomics and its applications.,2023-01-01,"[{'order': '1', 'first': 'junpeng', 'last': 's...","['genome assembly', 'plant pan-genome', 't-2-t...",molecular plant,mol plant,16,1.0,168,186.0,['eng'],Plant genomes are so highly diverse that a sub...,,
35145307,a roadmap to increase diversity in genomic stu...,2022-02-01,"[{'order': '1', 'first': 'segun', 'last': 'fat...",,nature medicine,nat med,28,2.0,243,250.0,['eng'],"Two decades ago, the sequence of the first hum...",,
36302390,an expanded arsenal of immune systems that pro...,2022-11-01,"[{'order': '1', 'first': 'adi', 'last': 'millm...","['bacterial immunity', 'microbial genomics', '...",cell host & microbe,cell host microbe,30,11.0,1556,156900000.0,['eng'],Bacterial anti-phage systems are frequently cl...,,
31584170,plasmidfinder and in silico pmlst: identificat...,2020-01-01,"[{'order': '1', 'first': 'alessandra', 'last':...","['bacterial typing', 'wgs', 'genomics', 'repli...","methods in molecular biology (clifton, n.j.)",methods mol biol,2075,,285,294.0,['eng'],PlasmidFinder and in silico plasmid multiLocus...,,


### SQL Upload and Query with `pubmed_tool.sql_full()`

#### Basic Upload and Query

Now we would like to upload our data to a database and query an author name. We use our CSV file for this process, but we could also pass the output from scraper directly.

Here, we are searching for any author with the name 'mary', in any field. If we had records with an author of 'Mary Jones' and 'Jackson Mary', we should return both of their results.

In [12]:
mary_records = pubmed_tool.sql_full('demo_data.csv', any_nm = 'mary')

2023-12-05 21:08:29 - INFO: ['No project directory given. Directory set to the current working directory.']
2023-12-05 21:09:18 - INFO: ['No project directory given. Directory set to the current working directory.']
2023-12-05 21:09:18 - INFO: Attempting to connect to SQLite database c:\Users\morri\Documents\demo\publications.db
2023-12-05 21:09:18 - INFO: Connection successful
2023-12-05 21:09:18 - INFO: Table <papers> successfully created.
2023-12-05 21:09:19 - INFO: 9984 unique papers sucessfully uploaded to SQLite table <papers>.
2023-12-05 21:09:19 - INFO: Table <authors> successfully created.
2023-12-05 21:09:19 - INFO: 49267 unique papers sucessfully uploaded to SQLite table <authors>.
2023-12-05 21:09:19 - INFO: Table <pairs_authorpapers> successfully created.
2023-12-05 21:09:19 - INFO: 69807 unique pairs of author-paperkeys successfully uploaded to SQLite table <pairs_authorpapers>.
2023-12-05 21:09:19 - INFO: ['No project directory given. Directory set to the current working

any


2023-12-05 21:09:20 - INFO: Query successful. Returning 190 records as pandas dataframe.


As we can see, our log output was again very informative. It let us know the names of the files, and the stages of processing as they occur. It also gave us the full text of the SQL query, if we wished to use it outside of the function itself. We know to expect 190 matching records.

If we check the results, we find that we did receive 190 records. It provides us with the author's full name, if they were first author on the paper, and the details of the paper.

In [None]:
print(mary_records.shape)
display(mary_records.head())

(190, 17)


Unnamed: 0,fullname,firstauthor,pmid,title,pubdate,abstract,journal,isoabbrev,numauthors,volume,issue,page_start,page_end,other_type,other_val,keywords,language
0,em okoli eberechukwu maryann,False,31885351,determination of antibodies to human immunodef...,2020-01-01,The need for a cure against HIV infection and ...,journal of immunoassay & immunochemistry,j immunoassay immunochem,2,41,2.0,208,218,,,"['nigeria', 'elisa', 'p24 antigen', 'pregnant ...",['English']
1,f al ammary fawaz,False,32681603,early steroid withdrawal in hiv-infected kidne...,2021-02-01,Kidney transplant (KT) outcomes for HIV-infect...,american journal of transplantation : official...,am j transplant,6,21,2.0,717,726,,,"['clinical research/practice', 'immunosuppress...",['English']
2,im poynten isobel mary,False,32621759,a meta-analysis of anal cancer incidence by ri...,2021-01-01,Certain population groups are known to have hi...,international journal of cancer,int j cancer,9,148,1.0,38,47,,,"['incidence', 'anal cancer', 'msm', 'transplan...",['English']
3,im poynten mary,False,32109409,hiv treatment and anal cancer: emerging clarity.,2020-04-01,,the lancet. hiv,lancet hiv,3,7,4.0,e220,e221,,,[],['English']
4,j molden jhomary,False,32163523,the human il-15 superagonist n-803 promotes mi...,2020-03-01,Despite the success of antiretroviral therapy ...,plos pathogens,plos pathog,22,16,3.0,e1008339,e1008339,,,[],['English']


#### Expanded Options

There are several options available in the `pubmed_tool.sql_full()` function. The only mandatory field is `t_df`, wherein one would specify the data frame for upload. For a full exploration of these options, we encourage you to view the [User Manual](https://github.com/intro-to-ds-capstone/capstone-project/blob/main/docs/user_manual.pdf).

As a small demonstration, we will create another database based on our genomics scrape, with custom names for each of the tables. We want to query any author with the initial 'A'.

In [7]:
genomics_records = pubmed_tool.sql_full('demo_expanded.csv', 
                    project_dir = None, 
                    db_name = 'genomics.db', 
                    paper_name = 'genomics_papers', 
                    authors_name = 'genomics_authors',
                    pairs_name = 'genomics_pairs',
                    any_nm = None, first_nm = None, 
                    last_nm = None, initials_nm = 'A'
                    )

2023-12-05 21:17:05 - INFO: ['No project directory given. Directory set to the current working directory.']
2023-12-05 21:17:07 - INFO: ['No project directory given. Directory set to the current working directory.']
2023-12-05 21:17:07 - INFO: Attempting to connect to SQLite database c:\Users\morri\Documents\demo\genomics.db
2023-12-05 21:17:07 - INFO: Connection successful
2023-12-05 21:17:07 - INFO: Table <genomics_papers> successfully created.
2023-12-05 21:17:07 - INFO: 300 unique papers sucessfully uploaded to SQLite table <genomics_papers>.
2023-12-05 21:17:07 - INFO: Table <genomics_authors> successfully created.
2023-12-05 21:17:07 - INFO: 1583 unique papers sucessfully uploaded to SQLite table <genomics_authors>.
2023-12-05 21:17:07 - INFO: Table <genomics_pairs> successfully created.
2023-12-05 21:17:07 - INFO: 1624 unique pairs of author-paperkeys successfully uploaded to SQLite table <genomics_pairs>.
2023-12-05 21:17:07 - INFO: ['No project directory given. Directory set t

Again, our logs were very useful. We found that 166 of our records had an author with an initial of 'A'. 

In [8]:
print(genomics_records.shape)
display(genomics_records.head())

(166, 17)


Unnamed: 0,fullname,firstauthor,pmid,title,pubdate,abstract,journal,isoabbrev,numauthors,volume,issue,page_start,page_end,other_type,other_val,keywords,language
0,a afshinfard amirhossein,False,35729491,rresolver: efficient short-read repeat resolut...,2022-06-01,De novo genome assembly is essential to modern...,bmc bioinformatics,bmc bioinformatics,8,23,1.0,246,246,,,"['scalable', 'repeat resolution', 'short reads...",['English']
1,a agarwal akshay,False,32877338,"functional genomics platform, a cloud-based pl...",2022-01-01,The rapid growth in biological sequence data i...,ieee/acm transactions on computational biology...,ieee/acm trans comput biol bioinform,11,19,2.0,940,952,,,[],['English']
2,a bag aishee,False,36131352,genomics and epigenetics guided identification...,2022-09-01,Genomic safe harbors are regions of the genome...,genome biology,genome biol,8,23,1.0,199,199,,,"['genetic engineering', 'chromatin organizatio...",['English']
3,a bashir ali,False,31397844,mspac: a tool for haplotype-phased structural ...,2020-02-01,While next-generation sequencing (NGS) has dra...,"bioinformatics (oxford, england)",bioinformatics,4,36,3.0,922,924,,,[],['English']
4,a bauer armin,False,35442080,genomic and chemical decryption of the bactero...,2022-06-01,With progress in genome sequencing and data sh...,microbiology spectrum,microbiol spectr,16,10,3.0,e0247921,e0247921,,,"['natural products', 'bacteroidetes', 'antifun...",['English']


def query(db_name = 'publications.db', 
              project_dir = None,
              paper_name = 'papers', 
              authors_name = 'authors', 
              pairs_name = 'pairs_authorpapers', 
              any_nm = None, last_nm = None, 
              first_nm = None, initials_nm = None):

#### Query Only

The primary subfunction of the `pubmed_tool.sql` module that might be called outside of the main function of `pubmed_tool.sql_full()` is `pubmed_tool.sql.query()`, which performs only the query portion of the SQL processing.

To demonstrate, we search our publications database for any author with a first name of 'xiaoling', which returns 2 records.

In [13]:
pubmed_tool.sql.query(db_name = 'publications.db', first_nm = 'xiaoling')

2023-12-05 21:20:26 - INFO: ['No project directory given. Directory set to the current working directory.']
2023-12-05 21:20:26 - INFO: Attempting SQLite query on database:c:\Users\morri\Documents\demo\publications.db 
 	 Query text: 
 	SELECT pairs_authorpapers.fullname, pairs_authorpapers.firstauthor, papers.* 
FROM papers, pairs_authorpapers, authors 
WHERE (authors.first LIKE '%xiaoling%')AND pairs_authorpapers.fullname == authors.fullname AND papers.pmid == pairs_authorpapers.pmid
GROUP BY pairs_authorpapers.fullname;
2023-12-05 21:20:26 - INFO: Query successful. Returning 2 records as pandas dataframe.


Unnamed: 0,fullname,firstauthor,pmid,title,pubdate,abstract,journal,isoabbrev,numauthors,volume,issue,page_start,page_end,other_type,other_val,keywords,language
0,x guo xiaoling,False,32646373,effect of a multi-dimensional case management ...,2020-07-01,This paper introduces a comprehensive case man...,bmc infectious diseases,bmc infect dis,16,20,1.0,489,489,,,"['hiv', 'retention rate', 'case management', '...",['English']
1,x wang xiaoling,False,32278789,stavudine exposure results in developmental ab...,2020-06-01,Stavudine is an anti-AIDS drug widely used to ...,toxicology,toxicology,9,439,,152443,152443,,,"['development', 'dna damage', 'hiv', 'abnormal...",['English']


### Visualization with `pubmed_tool.full_visual()`

There are several options available in the `pubmed_tool.full_visual()` function. The only mandatory field is `t_df`, wherein one would specify the data frame for upload. For a full exploration of these options, we encourage you to view the [User Manual](https://github.com/intro-to-ds-capstone/capstone-project/blob/main/docs/user_manual.pdf).

Because the visualization modules are powered by Holoviz and Bokeh, export is limited to visualization in a Jupyter notebook, temporary local hosting of a generated HTML webpage, or as an HTML file with full embeds of all requisite data. This avoided complex requirements for Selenium and PhantomJS imports. However, any of the visuals generated can be saved using the interactive 'save' button on the Bokeh visuals, or taken as a screenshot.

#### Static Visualization

The primary visuals are the static visual pane, which generates summary statisics and descriptive plots of publications over time. It has options to customize the primary, secondary, and accent colors. For this demonstration, we are generating our visual with the default colors (explicitly passed), and a royalty free image from pexels. We also use our genomics set, as it is smaller and provides clearer histogram visuals.

In [17]:
logo_path = r'https://images.pexels.com/photos/3109167/pexels-photo-3109167.jpeg'

pubmed_tool.full_visual('demo_expanded.csv', mode = 'jupyter', 
                        keyword = 'genomics',
                        primary_color = 'blue', secondary_color = 'grey', 
                        accent_color = 'grey',
                        logo_path = logo_path)

2023-12-05 21:30:11 - INFO: ['No project directory given. Directory set to the current working directory.']
2023-12-05 21:30:11 - INFO: ['No project directory given. Directory set to the current working directory.']


BokehModel(combine_events=True, render_bundle={'docs_json': {'71946e57-a5e8-43a3-a57b-6ff54cb61dd2': {'version…

#### Interactive Visualization

Our more robust visualization is created in the interactive mode, which allows for dynamic additional filtering of material in active, interactive exploration.

In [16]:
logo_path = r'https://images.pexels.com/photos/3109167/pexels-photo-3109167.jpeg'

pubmed_tool.full_visual('demo_data.csv', mode = 'jupyter', keyword = 'hiv', 
                        start_date = '2020/01/01', end_date = '2020/08/30',
                        primary_color = 'blue', secondary_color = 'grey', 
                        accent_color = 'grey', interactive = True,
                        logo_path = logo_path)

2023-12-05 21:28:51 - INFO: ['No project directory given. Directory set to the current working directory.']
2023-12-05 21:28:51 - INFO: ['No project directory given. Directory set to the current working directory.']
2023-12-05 21:29:42 - ERROR: Error occured in path validation: 
 	Error message(s):
 	[WinError 123] The filename, directory name, or volume label syntax is incorrect: 'https:'


BokehModel(combine_events=True, render_bundle={'docs_json': {'49b17cbf-f520-4cc1-9891-c02e0df15a34': {'version…