<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#NCSES-class---FedRePORTER-and-IPEDS-data" data-toc-modified-id="NCSES-class---FedRePORTER-and-IPEDS-data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>NCSES class - FedRePORTER and IPEDS data</a></span><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#Python-Setup" data-toc-modified-id="Python-Setup-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Python Setup</a></span></li></ul></li><li><span><a href="#Load-the-data" data-toc-modified-id="Load-the-data-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Load the data</a></span><ul class="toc-item"><li><span><a href="#Federal-RePORTER---Abstracts-(https://federalreporter.nih.gov/FileDownload)" data-toc-modified-id="Federal-RePORTER---Abstracts-(https://federalreporter.nih.gov/FileDownload)-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Federal RePORTER - Abstracts (<a href="https://federalreporter.nih.gov/FileDownload" target="_blank">https://federalreporter.nih.gov/FileDownload</a>)</a></span></li><li><span><a href="#NMF-method---Non-negative-matrix-factorization" data-toc-modified-id="NMF-method---Non-negative-matrix-factorization-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>NMF method - Non-negative matrix factorization</a></span></li></ul></li></ul></div>

## NCSES class - FedRePORTER and IPEDS data

### Introduction

**Federal RePORTER** (https://federalreporter.nih.gov) - a collaborative effort led by STAR METRICS® to create a searchable database of scientific awards from agencies (across agencies or fiscal years, by the award's project leader, or by a text search of a project's title, terms, or abstracts).

### Python Setup

In [1]:
# Data manipulation
import pandas as pd

# Reading in files
import glob

# Text analysis (topic modeling)
import numpy as np
import sklearn
from sklearn.decomposition import NMF
from sklearn.feature_extraction.text import TfidfVectorizer
import string

## Load the data

### Federal RePORTER - Abstracts (https://federalreporter.nih.gov/FileDownload)

In [2]:
"""Get all files with project abstracts."""

abstracts_files = glob.glob('FedRePORTER_PRJABS_C_FY20*.csv')
print(abstracts_files)

['FedRePORTER_PRJABS_C_FY2000.csv', 'FedRePORTER_PRJABS_C_FY2001.csv', 'FedRePORTER_PRJABS_C_FY2002.csv', 'FedRePORTER_PRJABS_C_FY2003.csv', 'FedRePORTER_PRJABS_C_FY2004.csv', 'FedRePORTER_PRJABS_C_FY2005.csv', 'FedRePORTER_PRJABS_C_FY2006.csv', 'FedRePORTER_PRJABS_C_FY2007.csv', 'FedRePORTER_PRJABS_C_FY2008.csv', 'FedRePORTER_PRJABS_C_FY2009.csv', 'FedRePORTER_PRJABS_C_FY2010.csv', 'FedRePORTER_PRJABS_C_FY2011.csv', 'FedRePORTER_PRJABS_C_FY2012.csv', 'FedRePORTER_PRJABS_C_FY2013.csv', 'FedRePORTER_PRJABS_C_FY2014.csv', 'FedRePORTER_PRJABS_C_FY2015.csv', 'FedRePORTER_PRJABS_C_FY2016.csv', 'FedRePORTER_PRJABS_C_FY2017.csv', 'FedRePORTER_PRJABS_C_FY2018.csv']


In [None]:
"""Read them in, concatenate and convert to a dataframe."""

list_data = []
for filename in abstracts_files:
    data = pd.read_csv(filename, error_bad_lines=False, encoding = "ISO-8859-1")
    list_data.append(data)
    
abstracts = pd.concat(list_data)

In [6]:
"""Drop missing abstracts."""

abstracts = abstracts.dropna()

In [11]:
abstracts = abstracts.drop_duplicates()

In [14]:
"""Get abstracts as a list to feed in to TfidfVectorizer in the next step."""

merged_abstracts_list = abstracts[' ABSTRACT'].values.tolist()

In [15]:
len(merged_abstracts_list)

1034189

### NMF method - Non-negative matrix factorization

NMF is a model used for topic extraction - while the LDA model uses raw counts of unique words per document, NMF model uses a normalized representation of those raw counts (TF-IDF representation)

TF stands for term-frequency and TF-IDF is term-frequency times inverse document-frequency. In other words, we are not only looking for how often a word appears in a given document, but also whether this particular word is distinct across all the collections of documents (corpus). For example, intuitively we understand that words like "often" or "use" are more frequently encountered, but they are less informative (more semantically-vacuous) if we want to discern a particular topic of a document, as they might be frequently encounter across all text documents in a corpus. On the other hand, words which we will see less frequently across a collection of document might indicate that those words are specific to a particular document, and, therefore, constitute a basis for a topic. 

More here: 

- https://scikit-learn.org/stable/modules/decomposition.html#non-negative-matrix-factorization-nmf-or-nnmf
- https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
- https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfTransformer.html#sklearn.feature_extraction.text.TfidfTransformer

In [16]:
"""Convert a collection of raw documents to a matrix of TF-IDF features."""

vectorizer = TfidfVectorizer(stop_words='english')
tfidf = vectorizer.fit_transform(merged_abstracts_list)

In [17]:
"""Get feature names."""

vectorizer_feature_names = vectorizer.get_feature_names()

In [None]:
"""Run the model with 100 topics."""

nmf = NMF(n_components=100, verbose=2).fit(tfidf)

In [None]:
"""Get topics to documents and word to topics matrices."""

nmf_W = nmf.transform(tfidf) # get topics to documents matrix
nmf_H = nmf.components_ # get word to topics matrix

In [20]:
"""View the list of topics (10 top words per topic)"""

for topic_idx, topic in enumerate(nmf_H):
    print("Topic %d:" % (topic_idx))
    print('----------------------------')
    print(" ".join([vectorizer_feature_names[i]
                for i in topic.argsort()[:-10 - 1:-1]]))
    print('----------------------------')

Topic 0:
----------------------------
mice mouse models human animal model transgenic mutant animals vivo
----------------------------
Topic 1:
----------------------------
training trainees faculty postdoctoral biology fellows doctoral predoctoral scientists medicine
----------------------------
Topic 2:
----------------------------
subproject institution nih center isfor andinvestigator crisp theresources entries necessarily
----------------------------
Topic 3:
----------------------------
intervention interventions randomized trial adherence self group based physical study
----------------------------
Topic 4:
----------------------------
cancer nci cancers pancreatic ovarian prevention oncology colorectal members colon
----------------------------
Topic 5:
----------------------------
hiv aids infected infection antiretroviral prevention transmission art cd4 cfar
----------------------------
Topic 6:
----------------------------
core projects investigators cores project provide sp

students graduate undergraduate student faculty college biomedical underrepresented summer school
----------------------------
Topic 56:
----------------------------
clinical trials translational trial phase studies protocol conduct oncology monitoring
----------------------------
Topic 57:
----------------------------
cocaine addiction dopamine abuse da self relapse seeking reward administration
----------------------------
Topic 58:
----------------------------
genetic genes variants genome variation sequencing genetics genomic association identify
----------------------------
Topic 59:
----------------------------
center university pilot programs faculty director new resources support investigators
----------------------------
Topic 60:
----------------------------
signaling pathway pathways cell activation wnt growth kinase mechanisms notch
----------------------------
Topic 61:
----------------------------
memory cd8 hippocampus hippocampal memories working episodic cd4 retrieval 

In [21]:
"""View a top document related to a given topic"""

for topic_idx, topic in enumerate(nmf_H):
    print('--------------------')
    print("Topic %d:" % (topic_idx))
    print('--------------------')
    print(" ".join([vectorizer_feature_names[i]
                    for i in topic.argsort()[:-10 - 1:-1]]))
    top_doc_indices = np.argsort(nmf_W[:,topic_idx] )[::-1][0:1]
    for doc_index in top_doc_indices:
        print('--------------------')
        print(merged_abstracts_list[doc_index])

--------------------
Topic 0:
--------------------
mice mouse models human animal model transgenic mutant animals vivo
--------------------
The study of human immunology is often difficult because certain types of experiments can't be performed on humans for obvious ethical reasons. To facilitate human research, many scientists utilize animal models. The animal model of choice for immunologists is the mouse. This animal model is very useful because of the large number of reagents available as well as the creation of transgenic and knockout mice. Both types of mice are required for this program project grant. Transgenic carrying  humanized  MHC molecules will be iused to discover new epitopes for influenza proteins in project 2 and in project 3, they will be used to model human influenza infection in mice. Knockout mice are required so that the mouse endogenous MHC class Ior II genes are not expressed in the MHC transgenic mice. Two projects will require some of thesame strains of mice 

--------------------
The Data Management/Statistical Core is a key component of the Center. The Core will serve as a support mechanism to the Center's overall research program. The overall objective of the Core is to facilitate the ability of the Center's investigators to conduct research that is of the highest standards and to disseminate the research outcomes to the academic and service communities and industry. The Core will provide statistical and analytical support to the research programs at all of the sites. The Core will also provide technical support for the research projects. In addition, the Core will serve as the depository for the cross-site core battery of measures. The Core will build on the structures developed in CREATE I for data collection, storage, transfer, and management, and quality control for the core battery. The methods developed in CREATE I will be further enhanced during the proposed CREATE II project to ensure that they reflect recent strategies in data ma

--------------------
DESCRIPTION (provided by applicant): A main focus of systems neuroscience is to understand how sensory information is encoded and used to guide behavior. Perceptual decision-making, like nearly all normal behavioral processes and disorders of the nervous system, is thought to involve the activity of large groups of neurons. Technical limitations, however, have forced most physiological studies to focus on single neurons. These studies have provided many important insights, but they necessarily miss key information about the relationship between groups of sensory neurons and decisions. For example, single neuron responses cannot tell us how that neuron's activity interacts and is combined with that of other neurons within or between cortical areas. Furthermore, my prior work showed that the cognitive state of even a well-trained subject fluctuates greatly from moment to moment, with striking consequences on performance on perceptual tasks. Therefore, combining infor

abstract provided available summary description applicant application project end component
--------------------
ABSTRACT NOT PROVIDED
--------------------
Topic 15:
--------------------
alcohol drinking use consumption alcoholism abuse dependence heavy binge related
--------------------
Alcohol
--------------------
Topic 16:
--------------------
stem differentiation hematopoietic progenitor hsc adult self niche regeneration transplantation
--------------------
DESCRIPTION (provided by applicant): Continuous replacement and repair of adult epithelial tissues such as the skin, intestine, and lung depend on self-renewing stem cells which generate the specialized cells necessary for tissue maintenance. Recent work has shown that the local environment, or niche, in which stem cells reside is critical for their maintenance and function. Specifically, positioning of the stem cell within the niche exposes it to signals that promote its survival and maintenance and guide the production of spec

data analysis statistical methods management database collection information tools software
--------------------
C. Data Management and Analysis Core1. ObjectiveThe objective of the Data Management and Analysis Core is to continue to provide data managementservices and statistical expertise to Program Project investigators in a wide range of data acquisition andanalysis activities. Integral to the goals of each project is the management and analysis of Core observational,interview, questionnaire, and behavioral data, as well as management and analysis of data from the individualprojects. Data management activities draw on the considerable resources of the Data Management andAnalysis Center (DMAC) at the Frank Porter Graham Child Development Center (FPG) at UNC and TheMethodology Center (TMC) at PSD. Specific Aims of the Data Management and Analysis Core are to:1) Implement the planned missing design in conjunction with the Executive Committee.2) Develop and maintain data management str

--------------------
Forest ecosystems in the northeastern U.S. provide many important water resource benefits to people living in the region, including high quality drinking water, a reliable supply of water year-round, and protection against flooding. We know that past fluctuations in climate and land use have caused important changes to the species composition and structure of these forests. However, scientists predict that climate change processes currently underway due to human activity will likely result in an even greater frequency and intensity of extreme climate events than experienced in the past, especially more severe droughts, flooding, and temperature extremes. We currently know very little about how such changes in climate conditions will impact the northeastern forests and the critical water resources they provide to local populations. For example, climate change may significantly alter the water cycle by influencing how much water trees use or by changing the species c

materials energy properties quantum material solar metal magnetic polymer devices
--------------------
TECHNICALA workshop, to be held in the University of California Santa Barbara in March 2011 on the topic of Materials by Design, is supported by programs in the Division of Materials Research (Solid State and Materials Chemistry and Condensed Matter Physics), Division of Chemistry (Macromolecular, Supramolecular, and Nanochemistry), and the Mathematical and Physical Sciences Directorate (Office of Multidisciplinary Activities).  Research in Solid State and Materials Chemistry spans the full spectrum of exploration from constituent atoms, to bonding, to physical properties in extended materials. The strong links to physical properties and the functional behavior of materials is a key aspect that distinguishes solid state and materials chemists from other practitioners of chemistry, physics, and materials science: a unique approach that includes inorganic and organic materials research,

--------------------
DESCRIPTION (provided by applicant): Insulin-dependent diabetes mellitus (IDDM) is characterized by the infiltration of T-lymphocytes into the islets of Langerhans of the pancreas (insulitis), followed by selective destruction of insulin-secreting beta cells leading to overt diabetes. Preliminarily, we observed the important association of IDDM with AIM factor (a secretion molecule we initially identified as an apoptosis inhibitory factor), which are: (1) AIM [-/-] mice backcrossed to non-obese diabetes (NOD) background showed complete prevention of IDDM; (2) AIM is expressed by infiltrating macrophages in the pancreatic islets from the very early stage of the disease; (3) AIM strongly induces TNF-alpha, IL-1 beta, IL-6 and IL-12 in macrophages and dendritic cells (DCs). Based on these, a hypothesis has emerged that AIM may accelerate IDDM by inducing pro-inflammatory- and type I- cytokines in initially infiltrating macrophages and DCs in the islets at the onset st

--------------------
DESCRIPTION (provided by applicant): The care of patients with complex healthcare needs is often fragmented because they receive care from multiple providers across disparate care locations and because information related to this care is frequently not transmitted between providers or locations. Inadequate inter-provider communication and care coordination significantly lessen care quality and compromise patient safety. This three-year project seeks to improve outcomes, quality and coordination of care for patients with complex healthcare needs by facilitating the availability of information following three types of care transitions into the ambulatory care setting. Specifically, information regarding care transitions will be made available to patients, primary care practitioners and care managers following hospitalizations, emergency department (ED) encounters, and specialty clinic evaluations. This project will build upon a regional Health Information Exchange (H

--------------------
DESCRIPTION (provided by applicant):  This proposal seeks to identify the exact step(s) of adenovirus (Ad) infection that is (are) responsible for the initiation of an anti-Ad acute inflammatory response upon systemic virus application. Over the last two decades numerous Ad-based vectors have been developed for gene therapy applications and many are currently being tested in clinical trials. Most recently, interest in Ad has further expanded due to its potential as a vector for vaccination against life threatening infectious agents such as anthrax. While natural infections with Ad are largely harmless to humans, intravenous Ad administration may result in a severe inflammatory response, which can lead to fatal outcomes. It is currently recognized that the initiation of this acute systemic inflammation depends on interactions of the Ad capsid with host cells. Despite significant knowledge regarding Ad interactions with cells in vitro, the molecular mechanisms govern

--------------------
It is widely accepted that systems engineering is a key element of all large engineering projects; however, one estimate is that the Department of Defense alone loses about $200 million per day as a result of poor systems engineering, and other Federal agencies acknowledge that they cannot meet their projected missions using current systems engineering practices.  There has been a proliferation of calls for systems engineering, including the Secretary of Defense and the National Academies.  Simultaneously over the past 25 years, elements of a theory of systems engineering have emerged.  These elements point to an opportunity to formulate systems engineering as a rigorous engineering discipline.  The elements span mathematics, economics, business, and psychology, as well as significant elements of engineering and the sciences.  Numerous Universities and professional societies offer diverse educational programs and certifications in systems engineering.  This worksho

--------------------
The discovery of antibiotics nearly 80 years ago was a major milestone in the battle against infectious disease, yet bacterial infections continue to be a significant cause of death worldwide. In fact, management of many bacterial infections is becoming progressively more difficult due to the emergence of new and rapidly evolving pathogens with increased virulence, resistance to antibiotics, a greater ability to evade host responses, and heightened transmissibility. To reverse this trend, a systematic understanding of the complex dynamics between the pathogen and host is needed at every level of interaction, including those between cells, individuals, microbial communities, and populations. We will develop and implement an integrated experimental framework that provides systematic and complementary insights into bacterial infections encompassing single cells, animal models, and human patients, to investigate cellular genomics, transcriptional networks, and host mic

--------------------
Obesity is one of the most serious health problems in the United States affecting about 33% of adult Americans 20-74 years old. Morbidity and mortality increases significantly with increasing weight, especially with abdominal obesity. The increased morbidity and mortality results primarily from increased incidence of cardiovascular diseases such as hypertension, stroke and coronary artery disease. Obesity is also associated with abnormal insulin levels, diabetes mellitus, and lipid abnormalities. However there is very limited information about the possible mechanisms by which obesity may lead to increased risk for cardiovascular diseases. A more thorough understanding of physiological relationship between overweight and cardiovascular disease would be facilitated by identification of an appropriate experimental model exhibiting important characteristics associated with human obesity and yet suitable for study of cardiovascular function. We hypothesize that diet-ind

--------------------
ABSTRACT  Influenza virus is a serious public health threat. Seasonal outbreaks cause significant morbidity and mortality, and  pandemic  outbreaks  have  the  potential  for  widespread  infection  and  disease.  Novel pandemic viruses emerge from animal reservoirs to rapidly become the dominant circulating strains, as was the case during the emergence of the 2009 H1N1 pandemic virus. Like all viruses, influenza virus is completely dependent upon the host cell for replication.  Influenza  virus  exploits  and  subverts  host  processes  while  at  the  same  time evading  cellular  antiviral  responses.  The  balance  between  these  pro-Â­  and  antiviral  forces  influences  the outcome of a viral infection, yet we have limited knowledge on the host factors engaged during replication and how they impact disease severity and the emergence of new viruses. This information is required to define the molecular mechanisms underpinning a productive infection. The overa

--------------------
DESCRIPTION (provided by applicant): Abuse of cocaine is a widespread and severely deleterious public health problem. Although it is well established that there is substantial variability in an individual's response to cocaine exposure, we do not yet understand the mechanisms that mediate individual differences in the etiology of cocaine abuse and addiction. In particular, we do not understand why some individuals are resistant to the effects of cocaine and will consume cocaine at low rates whereas other individuals are relatively vulnerable to the effects of cocaine and will consume cocaine at high rates. In the present proposal, we will evaluate several novel biomarkers that may predict the vulnerability of an individual to abuse cocaine or to relapse following abstinence. We will evaluate these processes using drug self-administration (SA) and reinstatement of previously drug-maintained behavior in nonhuman primates, as these are well established animal models o

memory cd8 hippocampus hippocampal memories working episodic cd4 retrieval neural
--------------------
DESCRIPTION (provided by applicant):     Studies are proposed to advance understanding of the organization of human memory and the nature of memory disorders. The work involves six studies, which are organized around three topics that have been prominent in recent discussions of memory and memory impairment: A) Recent memory and remote memory; B) Recognition memory and the human hippocampus; C) Working memory and the medial temporal lobe. A salient aspect of the proposed work is the opportunity to continue study of our population of memory-impaired patients with bilateral lesions limited to the hippocampus or with larger medial temporal lobe lesions. Many of these patients are veterans. Our population of amnesic study patients, including the veterans, is one of the very few such populations available anywhere and is the best characterized in terms of detailed, quantitative neuroanatom

--------------------
DESCRIPTION (provided by applicant): Asthma, a chronic inflammatory disorder of the airways, is estimated by the World Health Organization to affect 150 million people worldwide and its global pharmacotherapeutic costs exceed $5 billion per year. Cincinnati Children's Hospital Medical Center (CCHMC) provides clinical care to -7000 asthmatic children in the primary care and specialty clinics. Last year, over 3000 children were treated in the CCHMC Emergency Department with the primary diagnosis of an acute asthma exacerbation, and 885 patients (29.5%) were admitted to the hospital for management of acute asthma exacerbations. CCHMC has invested considerable resources to promote asthma research including the establishment of the Division of Asthma Research, which has partnered with the Asthma Center to create a comprehensive Asthma Program, which now provides a central base for the clinical and research activities for asthma at CCHMC. Patients suffering from asthma s

--------------------
Since 1995, over 40 countries have adopted legislation that mandates women's participation in government, and these policies enjoy considerable support from international organizations, politicians, and activists.  Practitioners who seek to increase women's presence in office do so in large part because they believe it will provide normative benefits for women through improved policy representation.  Despite this widely held assumption, it is clear that in a number of cases increasing women's representation fails to transform the legislative agenda.  While in some countries women's increased participation alters policy representation-defined in this project as the adoption of legislation related to women's interests-in others the addition of female legislators has had only limited consequences.Why does increasing women's participation lead to such mixed results?  This project represents one of the first systematic efforts to carefully develop and test theoretical e

--------------------
DESCRIPTION (provided by applicant): A significant challenge for the development of new influenza vaccines is to identify strategies that can both accelerate vaccine production and protect against the emergence of epidemic and pandemic strains. This proposal will develop a DNA vaccine to meet these needs. DNA vaccines can be rapidly designed and manufactured to express multiple antigens and induce antibody and cell-mediated protection against distant drift variants. This proposal will employ particle-mediated epidermal delivery (PMED) of the DNA vaccine and builds on the recent success of a seasonal influenza PMED DNA vaccine that induced protective levels of antibody in vaccinated subjects in a phase I human clinical trial. The purpose of this proposal is to increase PMED DNA vaccine immunogenicity and further develop the vaccine as a pandemic flu product. The primary objectives of the proposal are to: 1) Broaden the specificity of the vaccine against genetically 

--------------------
DESCRIPTION (provided by applicant): Parkinson disease (PD) is the second most common neurodegenerative disorder after Alzheimer's disease. Although dopamine replacement therapy improves the functional prognosis of PD, there is currently no treatment that prevents the progression of the disease. The etiology of PD is not well understood. Most of the PD cases are sporadic. However, approximately 5-10% of PD patients may have a clear familial history, exhibiting a classical recessive or dominant Mendelian mode of inheritance. Genetic studies in the past more than 10 years have played a vital role in understanding the etiology and pathogenic mechanism of PD. But, the power of genetic study is greatly limited because it requires sufficient number of informative meiosis, i.e. large families with both normal and affected members, especially when great genetic heterogeneity exists. It is particularly a difficult task for most of the late onset neurodegenerative disorders,

--------------------
DESCRIPTION (provided by applicant): Project Summary/Abstract Cardiovascular disease (CVD) is a significant problem for HIV-infected patients, yet the extent to which the newest CVD risk prediction tools accurately predict risk for HIV patients is not known. In this grant, we willevaluate the new American College of Cardiology (ACC)/ American Heart Association (AHA) CVD risk prediction algorithm, released in November 2013, to assess its performance in HIV patients. We will then develop a new risk prediction algorithm incorporating HIV and HIV-related factors to attempt to improve risk prediction. The rationale for performing this study is the uniqu pathophysiology underlying HIV-associated CVD, which is thought to be incompletely explained by traditional risk factors and driven in large part by inflammation and immune dysregulation. While established CVD risk prediction tools have been applied to HIV groups, there is not evidence that they are appropriate for use a

--------------------
DESCRIPTION (provided by applicant):     This proposal is in response to the NHLBI's call for  Novel Methods of Monitoring Health Disparities.  The University of Wisconsin (UW) School of Medicine and Public Health (SMPH) and its partners propose to build an innovative research network to monitor the effects of economic and policy changes on cardiovascular and respiratory health in communities. The main focus will be to identify the determinants of the state's significant health inequities according to place of residence, race/ethnicity, gender, and socioeconomic status.  We will create a model information network called the Wisconsin Health Equity Network (WHEN) by linking unique existing resources that assess the health of Wisconsin individuals and communities at multiple levels. These resources include: (1) the Survey of the Health of Wisconsin (SHOW): an annual survey of representative samples of state communities and their adult residents including data on demo

--------------------
THIS HAS NO SCIENCE
--------------------
Topic 86:
--------------------
hcv infection hepatitis infected chronic ifn virus antiviral interferon viral
--------------------
DESCRIPTION (provided by applicant): Hepatitis C virus (HCV) is the leading cause of chronic hepatitis, cirrhosis, and hepatocellular carcinoma (HCC). HCV chronically infects approximately 4 million people in the U.S. and 170 million people worldwide. HCV co-infection with HIV is very common with overall 25-30% of HIV-positive persons, particularly among drug abusers with up to 70% co-infection (2). HCV infection is a major risk factor for HCC development. HCV- associated end-stage of liver diseases is the leading indication for liver transplantation. Advances on HCV research have been significantly hampered by the lack of a robust cell culture HCV propagation system and reliable small animal models of HCV infection and replication. Recent breakthroughs have been the development of robust cell cul

--------------------
Stroke remains the most disabling of any neurological disease, as well as one of the leading causes of death in the United States. The annual incidence of strokes in the US is now neariy 750,000 and the most recent cost estimates of stroke care have been calculated as a staggering $50 billion. Over the next decade, the impact of stroke is likely to increase. The aging of our population and the changing race-ethnic composition could lead to an increased stroke incidence, mortality, morbidity and cost. The only approved acute therapy for stroke is IV tPA. Despite the excellent 3-month outcomes among those acute stroke patients treated within 3 hours, IV tPA is used in only 3-4% of patients suffering from stroke in the US. Part of the explanation for the inadequate penetration of IV tPA into stroke treatment stems from the public's lack of recognition of acute stroke symptoms, as well as the lack of infrastructure at the hospital level to expeditiously detect, triage 

--------------------
This contract provides for the development and standardization of small animal models for infectious diseases, and may include efficacy testing of candidate products.
--------------------
Topic 94:
--------------------
malaria parasite falciparum parasites transmission plasmodium vector control mosquito resistance
--------------------
DESCRIPTION (provided by applicant): Epidemiology of Clinical Malaria in Western Kenya Highlands Malaria is a major public health problem in Africa. An estimated one million people in Africa die from malaria each year, with the majority of fatalities occurring in children under the age of 5 in areas south of the Sahara. The African highlands (areas with elevation above 1500 m above sea level), where malaria used to be absent or very limited, has experienced periodic epidemics since the 1980's, with more than 110,000 fatalities each year. With the support of Global Funds to Fight Malaria, Presidential Malaria Initiatives and other priv

--------------------
ï»¿   DESCRIPTION (provided by applicant): As a cofactor for thousands of enzymes, iron is an essential micronutrient. Yet, free iron is toxic because it catalyzes rapid formation of damaging reactive oxygen species. Therefore, homeostasis systems exert tight control on iron levels in all organisms and gene expression is adjusted in response to iron deprivation and iron abundance: expression of at least 100 genes is known to be iron-dependent in Escherichia coli. In humans, disruption of iron homeostasis contributes to severe diseases: iron accumulation in the brain is linked to neurodegenerative diseases, iron overload causes the liver disease hemochromatosis, and iron deficiency leads to anemia and impaired cognitive development. Furthermore, invading bacterial pathogens hijack iron out of human proteins to establish infections. These considerations underscore the essential role of iron homeostasis to human health. Because of a lack of studies examining the total

As a result of the first topic model with 100 topics, 2 topics related to opioids and drug abuse are found.

In [22]:
"""Get topic weights per document."""

topics_weights = []
for index,i in enumerate(nmf_W): # for every document
    topics_weights.append([index, i[25], i[57]]) # get topic weights for 2 opioid- and drug abuse-related topics

In [23]:
"""Get those abstracts which have at least some value (not zero) for either of 2 topics."""

topics_list_dataframe = pd.DataFrame(topics_weights)

abstracts = abstracts.reset_index()
topics_list_dataframe = topics_list_dataframe.rename(columns={0:'index'})
concat = pd.concat([abstracts,topics_list_dataframe],axis=1)

filtered = concat[(concat[1] != 0) | (concat[2] != 0)] 

In [24]:
"""Get a list of abstracts from the filtered dataframe above."""

filtered_abstracts = filtered[' ABSTRACT'].values.tolist()

In [25]:
len(filtered_abstracts)

422713

In [None]:
"""Run another topic model with 100 topics on the filtered abstracts."""

"""Convert a collection of raw documents to a matrix of TF-IDF features."""

vectorizer = TfidfVectorizer(stop_words='english')
tfidf = vectorizer.fit_transform(filtered_abstracts)

"""Get feature names"""
vectorizer_feature_names = vectorizer.get_feature_names()

"""Run the model with 100 topics"""
nmf = NMF(n_components=100, verbose=2).fit(tfidf)

In [None]:
"""Get topics to documents and word to topics matrices."""

nmf_W = nmf.transform(tfidf) # get topics to documents matrix
nmf_H = nmf.components_ # get word to topics matrix

In [28]:
"""View the list of topics (10 top words per topic)"""

for topic_idx, topic in enumerate(nmf_H):
    print("Topic %d:" % (topic_idx))
    print('----------------------------')
    print(" ".join([vectorizer_feature_names[i]
                for i in topic.argsort()[:-10 - 1:-1]]))
    print('----------------------------')

Topic 0:
----------------------------
genetic genes variants genome variation genetics identify traits association genomic
----------------------------
Topic 1:
----------------------------
center research administrative support director administration leadership resources activities management
----------------------------
Topic 2:
----------------------------
substance use risk adolescent abuse youth adolescents adolescence sexual behaviors
----------------------------
Topic 3:
----------------------------
subproject institution nih center isfor andinvestigator crisp theresources entries necessarily
----------------------------
Topic 4:
----------------------------
hiv infected aids infection antiretroviral art prevention transmission msm tat
----------------------------
Topic 5:
----------------------------
cancer pancreatic cancers nci prevention colon colorectal survivors ovarian members
----------------------------
Topic 6:
----------------------------
core projects investigators 

decision making decisions choice reward information choices make policy value
----------------------------
Topic 59:
----------------------------
exposure effects exposures environmental prenatal exposed arsenic bpa air radiation
----------------------------
Topic 60:
----------------------------
prostate men pca cancer androgen ar prostatic psa african bph
----------------------------
Topic 61:
----------------------------
memory cognitive schizophrenia deficits pfc prefrontal task cortex cognition hippocampal
----------------------------
Topic 62:
----------------------------
kidney renal ckd chronic cric injury aki hypertension esrd transplant
----------------------------
Topic 63:
----------------------------
cardiac heart failure myocardial hf ventricular hypertrophy remodeling mi infarction
----------------------------
Topic 64:
----------------------------
training trainees research faculty postdoctoral fellows scientists doctoral basic medicine
----------------------------
Topi

In [29]:
"""View a top document related to a given topic"""

for topic_idx, topic in enumerate(nmf_H):
    print('--------------------')
    print("Topic %d:" % (topic_idx))
    print('--------------------')
    print(" ".join([vectorizer_feature_names[i]
                    for i in topic.argsort()[:-10 - 1:-1]]))
    top_doc_indices = np.argsort(nmf_W[:,topic_idx] )[::-1][0:1]
    for doc_index in top_doc_indices:
        print('--------------------')
        print(filtered_abstracts[doc_index])

--------------------
Topic 0:
--------------------
genetic genes variants genome variation genetics identify traits association genomic
--------------------
The goal of the laboratory is to develop new approaches to the study of the genetic basis of cancer and its outcomes. Previously, the major focus was the analysis of common genetic variation in candidate genes in cancer and its related outcomes, particularly in immunocompromised individuals. Emphasis was on conducting pilot association studies and annotating candidate genes drawn from key pathways in innate immunity and cancer biology, such as telomere stability or nutrient transport (i.e., Vitamin C sodium dependent transport). The laboratory has developed expertise in bio-informatics and advanced genetic analyses with new platforms designed to test dense sets of single nucleotide polymorphisms (SNPs), which are the most common genetic variants in the human genome. Specifically, the laboratory has integrated approaches to identify

--------------------
This subproject is one of many research subprojects utilizing theresources provided by a Center grant funded by NIH/NCRR. The subproject andinvestigator (PI) may have received primary funding from another NIH source,and thus could be represented in other CRISP entries. The institution listed isfor the Center, which is not necessarily the institution for the investigator.Preparation of the present report.
--------------------
Topic 4:
--------------------
hiv infected aids infection antiretroviral art prevention transmission msm tat
--------------------
DESCRIPTION (provided by applicant): HIV dementia (HIV-D) and HIV-associated sensory neuropathy (HIV-SN) are the most common neurological manifestations of advanced HIV infection. The prevalence of HIV-D and HIV-SN in Sub- Saharan Africa where the majority of HIV cases reside globally is largely unknown. In addition, HIV subtype may have an impact on HIV disease progression, suggesting the possibility that HIV subtyp

cocaine addiction relapse seeking administration self nac reinstatement accumbens craving
--------------------
The most difficult aspect of treating cocaine dependence is the propensity for relapse to cocaine use after a period of abstinence. Cocaine dependent individuals often describe their relapse as being precipitated by cocaine craving which might be triggered by a  priming  dose of cocaine itself. Indeed, studies in laboratory animals have shown that low dose cocaine can trigger  relapse  in cocaine-seeking behavior. In rodents, dopamine (DA) D2 receptors agonists augment the priming effect of cocaine on cocaine-seeking behavior, while DA D1 receptors agonists inhibit this effect. In the current cycle of this Center, we measured with PET both D1 and D2 receptors in cocaine dependent participants and matched controls, and studied the relationship between PET measurements and increased vulnerability to cocaine primed cocaine-taking behavior. Low D1 receptor availability in the vent

--------------------
DESCRIPTION (provided by applicant): My goal for the K25 award is to establish myself as an independent neuroimaging researcher with expertise in brain network analysis and an integral member of multidisciplinary research teams devoted to addressing diseases of the brain. Attaining these objectives will require focused didactic training and research guidance. Research We will develop new methodology to improve whole-brain connectivity analyses of normal and abnormal brain function. The launching of the Human Connectome Project by the NIH in 2009 underscores the importance of whole-brain connectivity analyses. Appropriately conducting these analyses is paramount in our understanding normal brain function as well as alterations due to conditions such as aging, dyslexia, and substance abuse. Before we can glean useful information from functional brain network differences in these conditions, methods need to be developed in order to permit 1) assessing several network 

--------------------
DESCRIPTION (provided by applicant):  Project Summary: Research on efficacious adolescent substance abuse interventions over the past two decades has consistently demonstrated heterogeneity of response to treatment. Key challenges for the .field include understanding these various response trajectories and establishing strategies to adapt treatments to enhance their effectiveness for those who are not responsive to standard interventions. Such research is critically needed to inform treatment services providers on optimal care strategies. An important advance in treatment services research efforts is the increasing movement away from a  one-size-fits-all  perspective inherent in standardized interventions and a call for the development of adaptive interventions that incorporate treatment algorithms to aid clinical decision-making. The Sequential Multiple Assignment Randomized Trial (SMART) outlines a process whereby a series of randomizations within an individual s

--------------------
DESCRIPTION (provided by applicant): Mice have triumphed as the in vivo experimental model system of choice in biomedical research. However, there are clear limitations to mouse models. The literature is full of therapeutic approaches that worked in mice but failed in humans. Further, mice often provide less than optimal mimics of the human diseases being modeled, atherosclerosis being a prime example. Compared to humans, mice are very resistant to atherosclerosis. In part, this is thought to be due to fundamental differences in lipoprotein metabolism between species. Available mouse models are thus based on dietary or genetic perturbations in lipoprotein metabolism; all have drawbacks. While mouse models have provided important insights into pathogenesis, fundamental biological and practical problems with available models have hindered translation of principles derived from mouse studies to human disease. As with other difficult-to-model diseases, this has been pr

data statistical analysis management collection database methods design study analyses
--------------------
C. Data Management and Analysis Core1. ObjectiveThe objective of the Data Management and Analysis Core is to continue to provide data managementservices and statistical expertise to Program Project investigators in a wide range of data acquisition andanalysis activities. Integral to the goals of each project is the management and analysis of Core observational,interview, questionnaire, and behavioral data, as well as management and analysis of data from the individualprojects. Data management activities draw on the considerable resources of the Data Management andAnalysis Center (DMAC) at the Frank Porter Graham Child Development Center (FPG) at UNC and TheMethodology Center (TMC) at PSD. Specific Aims of the Data Management and Analysis Core are to:1) Implement the planned missing design in conjunction with the Executive Committee.2) Develop and maintain data management strategi

inflammatory inflammation pro macrophages anti immune macrophage cytokines sepsis atherosclerosis
--------------------
DESCRIPTION (provided by applicant): Atherosclerosis is the leading cause of death in the United States and is now widely recognized as a chronic inflammatory disease occurring within the artery wall. The macrophage is the key innate immune cell type implicated in atherogenic inflammation, a pathophysiology largely driven by alterations in transcription. Work over the past two decades has identified pathways initiated through cell surface toll-like and cytokine receptors which culminate in the activation of NF-kB, AP1, and Stat proteins and subsequent transcriptional activation of pro-inflammatory genes encoding cytokines, chemokines, and matrix remodeling enzymes. The means and mechanisms by which pro-inflammatory genes are attenuated, however, remain poorly understood. Peroxisome proliferator activated receptors (PPARs) are lipid-sensing nuclear receptors and represe

--------------------
EarlyÂ  onsetÂ  intellectualÂ  disabilitiesÂ  (ID)Â  affectÂ  1-Â­3%Â  ofÂ  theÂ  populationÂ  andÂ  resultsÂ  inÂ  aÂ  majorÂ  burdenÂ  toÂ familiesÂ andÂ society,Â withÂ lifetimeÂ costsÂ estimatedÂ toÂ beÂ $1-Â­2Â million.Â Â ThereÂ areÂ manyÂ causes,Â someÂ ofÂ whichÂ  areÂ  preventableÂ  suchÂ  asÂ  malnutritionÂ  andÂ  fetalÂ  alcoholÂ  syndrome.Â  Â  However,Â  theÂ  mostÂ  severeÂ formsÂ ofÂ IDÂ haveÂ geneticÂ causes,Â andÂ approximatelyÂ 25%Â ofÂ allÂ casesÂ haveÂ beenÂ mappedÂ toÂ chromo-Â­somalÂ  deletions,Â  rearrangements,Â  andÂ  mutations.Â  X-Â­linkedÂ  intellectualÂ  disabilitiesÂ  (XLIDs)Â  accountÂ  forÂ approximatelyÂ 10-Â­12%Â ofÂ maleÂ IDÂ cases.Â IdentificationÂ ofÂ theÂ responsibleÂ genesÂ holdsÂ outÂ theÂ promiseÂ thatÂ  havingÂ  anÂ  inventoryÂ  ofÂ  potentiallyÂ  defectiveÂ  genes,Â  andÂ  understandingÂ  theÂ  molecularÂ  defectsÂ  willÂ leadÂ  toÂ  betterÂ  testsÂ  andÂ  treatmentsÂ  toÂ  helpÂ  patientsÂ  andÂ  theirÂ  families.Â  Â  Se

tumor tumors metastasis growth progression metastatic microenvironment melanoma radiation anti
--------------------
DESCRIPTION (provided by applicant): The purpose of this research is to determine how radiation therapy alters the various components of solid tumors that may affect the immune system's ability to mount effective anti-tumor responses. The goal of these studies is to gain a better understanding of the basic mechanisms involved in radiation induced changes in the tumor microenvironment to allow more effective treatment strategies combining radiation and immunotherapy. The differences in these two therapy modalities make it highly likely that their combined use could have synergistic effects on tumor destruction. Radiation is highly effective at killing large numbers of tumor cells and controlling primary disease, however, it is not easily used for the treatment of dissem- inated metastatic disease. In contrast immunotherapy, because of its specificity and systemic nature, h

--------------------
DESCRIPTION (provided by applicant): Candidate: Dr. Ariadne Letra is a Postdoctoral Associate fellow at the University of Pittsburgh School of Dental Medicine. Her main research interests are genetics of oral-facial clefts. Her short-term goals are to improve knowledge of the multifactorial etiology of this condition and perform genetic studies of complex oral traits. The candidate also wishes to gain expertise in advanced statistical genetics and molecular biology to develop biological functional assays. Her long-term goal is to become an independent scientist and occupy a tenure track position at an Institution. This proposal reflects the plan of the candidate to engage in mentored-research meanwhile building a strong foundation for career development. The research plan is innovative as it proposes to study the family of matrix metalloproteinase genes which have been shown to have important roles during craniofacial development and implicated in cleft lip/palate.

--------------------
The number of people directly affected by diabetes continues to grow. In Washington State, 350,000 people are living with diabetes; another 150,000 are living with diabetes but have not been diagnosed. Some people in our state are affected by diabetes more than others. People with lower incomes and education levels are at greater risk to develop diabetes. These same individuals often do not have access to the diabetes education and health care needed to manage diabetes and prevent the complications often caused by uncontrolled diabetes. Research shows that treatments available today dramatically improve the management of diabetes and prevent long-term complications. The WSU Diabetes Awareness and Detection project has shown improvements in participants' knowledge and confidence to manage their diabetes and improvements in diabetes control as measured by A1c and blood pressure. The WSU Diabetes Detection and Prevention project will continue to offer diabetes awarene

muscle skeletal muscles atrophy smooth strength force function satellite mass
--------------------
DESCRIPTION (provided by applicant): The loss of skeletal muscle mass is of clinical importance because it is associated with increased morbidity and mortality, as well as a marked deterioration in the quality of life. A broad patient population is affected by significant losses in muscle mass including those afflicted by various systemic diseases (cancer, sepsis, HIV- AIDS), chronic physical inactivity as a result of long term bed rest, rheumatoid arthritis and limb immobilization, and sarcopenia, the age associated loss in muscle mass and strength. Satellite cells are currently an attractive therapeutic target given their stem cell characteristics and essential role in post-natal muscle growth and regeneration. What remains controversial is the necessity of satellite cells in other aspects of muscle plasticity such as hypertrophy, re-growth following atrophy and muscle maintenance with 

--------------------
NON-TECHNICAL SUMMARY Remarkable levels of sophistication have been reached in linking properties of a given material to its microstructure, crystal structure and electronic structure. A substantially bigger challenge, though, is predicting the dynamic evolution of a material taken out of equilibrium and determining what external stimuli must be imposed to shepherd the material into a desired end state. The desirable properties from a particular chemistry are usually manifested in metastable crystal structures and microstructures rather than in the true equilibrium state of that chemistry. In many applications it is necessary to know how a material in a particular state will evolve over time either because it is metastable or unstable, such as in high temperature applications, or due to changing boundary conditions, as in electrochemical energy storage applications.This award supports computational research and education to develop highly automated statistical mech

--------------------
DESCRIPTION (provided by applicant): The overall goal of the proposed epidemiologic study is to examine the causes and consequences of impaired decision-making in old age. Decision-making refers to the ability to generate and process multiple competing alternatives and choose a favorable behavior. Virtually all behaviors result from some decision-making process, and efficient decision-making is thought to be critical for maintaining independence, health and well-being in modern society. Emerging data suggest that older persons, even some without dementia, exhibit impaired decision-making, and impaired decision-making may be a sign of preclinical Alzheimer's disease (AD). However, surprisingly few studies have rigorously examined decision-making in older persons and longitudinal data are sorely lacking. The proposed study will quantify the rate of change in financial, healthcare, and socioemotional decision-making in a large cohort of community-based older persons w

cardiac heart failure myocardial hf ventricular hypertrophy remodeling mi infarction
--------------------
DESCRIPTION (provided by applicant):  Imbalances in neurohumoral control, especially those leading to excessive sympathetic efferent neuronal activation, are associated with adverse short- and long-term alterations in cardiac function - including cardiac arrhythmias and pump failure. As a corollary, stabilization of such imbalances within select components of the cardiac neuronal hierarchy can reduce the arrhythmic substrate, maintain myocyte viability and prolong survival. Thus, the primary objective for this competitive renewal is to first determine the role of interdependent interactions within and between central and peripheral components of the cardiac neuronal hierarchy and secondly how such linkages remodel in response to acute and chronic cardiac stress (e.g. myocardial ischemia/infarction). As the intrinsic cardiac nervous system represents the final common integrator of c

--------------------
--------------------
Topic 68:
--------------------
subproject sources ncrr nih grant subprojectand resourcesprovided likelyrepresents center staff
--------------------
This subproject is one of many research subprojects utilizing the resourcesprovided by a Center grant funded by NIH/NCRR. Primary support for the subprojectand the subproject's principal investigator may have been provided by other sources,including other NIH sources.  The Total Cost listed for the subproject likelyrepresents the estimated amount of Center infrastructure utilized by the subproject,not direct funding provided by the NCRR grant to the subproject or subproject staff.Preparation of the present report.
--------------------
Topic 69:
--------------------
stroke motor rehabilitation recovery ischemic gait disability walking stimulation limb
--------------------
DESCRIPTION (provided by applicant):  Stroke is the leading cause of disability in the United States. It is estimated that 700,000

host infection bacterial infections bacteria biofilm virulence pathogen pathogens tb
--------------------
DESCRIPTION (provided by applicant): This application seeks to address the impact of bacterial colonization and persistence in chronic wounds. The formation of biofilms has clearly been linked to chronic and persistent bacterial infections. This considerably delays and complicates wound healing. Unlike acute bacterial infections, which are often cleared by the host, biofilm-related chronic infections are not easily resolved even with high dose antibiotics and intact immunity. The bacterial pathogens Pseudomonas aeruginosa and Staphylococcus aureus, which are the focus of this application, cause an array of biofilm-related clinical diseases including persistent airway infections, burn wound infections, endocarditis, and surgical site infections. Unresolved infected wounds also contribute to nosocomial persistence and the spread of bacteria in health care settings. The abundance and 

--------------------
climate species change changes forest ecosystem ecosystems ecological land ocean
--------------------
Understanding the structural and functional relationships and interactions between past and present climate, the geomorphic setting, natural disturbance regimes, and the anthropogenic history for impacting grassland, shrubland and desert ecosystems is necessary to determine ecosystem stability , resilience, or change in response to future climate changes.  To sustain or restore both ecosystem function and ecosystem services we need to better understand the interactions between the mix of human and natural disturbances, the critical thresholds associated with each, and how they will be affected by climate change.    A major complication of this goal is the increasing presence of invasive species and our limited understanding of what controls community stability.  Effective restoration following disturbance also requires better understanding of why the observed chang

network networks wireless ctn users internet security infrastructure node community
--------------------
Part 1.The optical network of the future will have orders of magnitude increase in data rates, due at least partially to the increase in big-data transactions. These create the need for fast scheduling of network resources and agile network adaptation to most efficiently move the data across the network.  This project proposes to investigate a cognitive network management and control system, which 'senses' current network conditions and uses this information to satisfy overall performance goals. This project will be the first comprehensive research on cognitive optical network management and control. The goal is for agile automated adaptation to replace current slow, manually-driven management and control practices. The fruits of this research will have implications for next generation wireless networks and power grid systems and for fast detection of extreme events that can signifi

--------------------
The administrative shell has an organizational rather than a scientific focus. While the goals of the individual Cores are to serve the user base, the goal of the administrative shell is to serve the individual cores, overarching needs and the overall Research Core Center mission. The charge to the administrative shell is therefore to:  Specific Aim 1: Integrate and supervise the Research Core Center activities Specific Aim 2: Facilitate the operation of the individual cores by resolving issues of space and facilities Specific Aim 3: Assume administrative tasks for all cores such as personnel management and financial operations Specific Aim 4: Carry out or facilitate contacts with University of Michigan administration and granting agencies Specific Aim 5: Coordinate and facilitate the submission of progress reports and renewal grant applications Specific Aim 6: Track user group research and publications associated with their use of the Core Specific Aim 7: Coordina

women risk pregnancy study maternal men factors birth reproductive hpv
--------------------
DESCRIPTION (provided by applicant): More than 1 million women of childbearing age in the U.S. report disabilities or needing assistance with activities of daily living, primarily because of chronic physical impairments that cause mobility difficulties. Anecdotal reports suggest that growing numbers of women with physical disabilities are choosing to become pregnant and bear children. Nonetheless, little information is available about the prevalence of women in the U.S. with physical disabilities who become pregnant and about their obstetrical experiences. The overall goal of this mixed-methods study is to provide systematic albeit exploratory evidence about pregnancy and childbearing experiences among U.S. women with chronic physical disabilities, with three specific aims: 1. To analyze four data sources, each of which contains information on different sets of women - the National Health Interv

nerve regeneration injury glaucoma peripheral optic axons axon axonal nerves
--------------------
Background:  Traumatic nerve injuries often require surgical repair with an autologous nerve graft, increasing peri-operative risks and adding morbidity at the harvesting site.  Furthermore, donor sites are often less accessible in war wounds and, even with autologous nerve grafts, functional recovery is often imperfect.  Multiple strategies are being developed as a replacement for the sural nerve grafts, but none of the currently FDA-approved products in the market is effective as a replacement for autologous sural nerve grafts.  Recently, we have demonstrated that biodegradable fibers, when placed longitudinally in nerve guides, improve peripheral nerve regeneration by providing contact guidance to regenerating axons and their support cells, Schwann cells.  In addition, neurotrophic factors delivered through nanofiber in situ further enhance regenerative outcomes.  However, trophic facto

--------------------
DESCRIPTION (provided by applicant): Parkinson disease (PD) is characterized by cardinal symptoms of bradykinesia, rigidity, and resting tremor considered reflective of dopaminergic (DA) neuronal loss in the nigrostriatal tract. These motor symptoms can be accompanied by impairments in cognition, although there is heterogeneity in the cognitive deficits associated with PD. While DA depletion has been correlated with the severity of motor impairment in PD, prior studies investigating the effects of DA medication on cognitive tasks have been inconsistent and the role of the basal ganglia in cognition is controversial. Few studies have evaluated patients in both on and off medication states, and there are inconsistent results in the studies which have been conducted. Thus, the neuroanatomical and neurochemical correlates of cognitive dysfunction in PD are uncertain. The specific hypothesis behind the research is that only some PD disordered behaviors are mediated by D

4 related topics are found in a second round of a topic model with 100 topics.

In [30]:
"""Get abstracts with 4 related topics."""

topics_weights = []
for index,i in enumerate(nmf_W): # for every document
    topics_weights.append([index, i[2], i[7], i[8], i[51]]) # get topic weights for 4 related topics

topics_weights_dataframe = pd.DataFrame(topics_weights)

"""Sum the weights for all 4 related topics per document."""

topics_weights_dataframe['sum'] = topics_weights_dataframe[1] + topics_weights_dataframe[2] + topics_weights_dataframe[3] + topics_weights_dataframe[4]

In [37]:
"""Filter by those abstracts who have a summed value of more than 0.01."""

filtered_by_threshold = topics_weights_dataframe[topics_weights_dataframe['sum'] >= 0.01]

In [None]:
"""Merge with the original dataframe. Make sure that level_0 column is in both dataframes to merge on."""

filtered_abstracts = pd.DataFrame(filtered_abstracts)
filtered_abstracts = filtered_abstracts.reset_index()

filtered_by_threshold = filtered_by_threshold.rename(columns={0:'index'})
filtered_by_threshold_finalized = filtered_by_threshold.merge(filtered_abstracts,on='index')

filtered_by_threshold_finalized = filtered_by_threshold_finalized.rename(columns={0:' ABSTRACT'})
filtered_by_threshold_updated = filtered_by_threshold_finalized.merge(abstracts, on=' ABSTRACT')

filtered_by_threshold_updated = filtered_by_threshold_updated[['PROJECT_ID',' ABSTRACT']]

filtered_by_threshold_updated = filtered_by_threshold_updated.drop_duplicates()

In [61]:
"""Export results to .CSV"""

filtered_by_threshold_updated.to_csv('Filtered_Results_Abstracts.csv')