<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#NCSES-class---FedRePORTER-and-IPEDS-data" data-toc-modified-id="NCSES-class---FedRePORTER-and-IPEDS-data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>NCSES class - FedRePORTER and IPEDS data</a></span><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#Python-Setup" data-toc-modified-id="Python-Setup-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Python Setup</a></span></li></ul></li><li><span><a href="#Load-the-data" data-toc-modified-id="Load-the-data-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Load the data</a></span><ul class="toc-item"><li><span><a href="#Federal-RePORTER---Projects-(https://federalreporter.nih.gov/FileDownload)" data-toc-modified-id="Federal-RePORTER---Projects-(https://federalreporter.nih.gov/FileDownload)-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Federal RePORTER - Projects (<a href="https://federalreporter.nih.gov/FileDownload" target="_blank">https://federalreporter.nih.gov/FileDownload</a>)</a></span></li><li><span><a href="#Federal-RePORTER---Abstracts-(https://federalreporter.nih.gov/FileDownload)" data-toc-modified-id="Federal-RePORTER---Abstracts-(https://federalreporter.nih.gov/FileDownload)-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Federal RePORTER - Abstracts (<a href="https://federalreporter.nih.gov/FileDownload" target="_blank">https://federalreporter.nih.gov/FileDownload</a>)</a></span></li></ul></li><li><span><a href="#Filter-the-data" data-toc-modified-id="Filter-the-data-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Filter the data</a></span></li><li><span><a href="#Text-analysis-(Topic-modeling)" data-toc-modified-id="Text-analysis-(Topic-modeling)-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Text analysis (Topic modeling)</a></span><ul class="toc-item"><li><span><a href="#NMF-method---Non-negative-matrix-factorization" data-toc-modified-id="NMF-method---Non-negative-matrix-factorization-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>NMF method - Non-negative matrix factorization</a></span></li></ul></li></ul></div>

## NCSES class - FedRePORTER and IPEDS data

### Introduction

**Federal RePORTER** (https://federalreporter.nih.gov) - a collaborative effort led by STAR METRICS® to create a searchable database of scientific awards from agencies (across agencies or fiscal years, by the award's project leader, or by a text search of a project's title, terms, or abstracts).

### Python Setup

In [1]:
# Data manipulation
import pandas as pd

# Reading in files
import glob

# Text analysis (topic modeling)
import numpy as np
import sklearn
from sklearn.decomposition import NMF, LatentDirichletAllocation
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
import string

## Load the data

### Federal RePORTER - Abstracts (https://federalreporter.nih.gov/FileDownload)

In [2]:
abstracts_files = glob.glob('FedRePORTER_PRJABS_C_FY20*.csv')
print(abstracts_files)

['FedRePORTER_PRJABS_C_FY2009.csv', 'FedRePORTER_PRJABS_C_FY2008.csv', 'FedRePORTER_PRJABS_C_FY2018.csv', 'FedRePORTER_PRJABS_C_FY2017.csv', 'FedRePORTER_PRJABS_C_FY2003.csv', 'FedRePORTER_PRJABS_C_FY2002.csv', 'FedRePORTER_PRJABS_C_FY2016.csv', 'FedRePORTER_PRJABS_C_FY2000.csv', 'FedRePORTER_PRJABS_C_FY2014.csv', 'FedRePORTER_PRJABS_C_FY2015.csv', 'FedRePORTER_PRJABS_C_FY2001.csv', 'FedRePORTER_PRJABS_C_FY2005.csv', 'FedRePORTER_PRJABS_C_FY2011.csv', 'FedRePORTER_PRJABS_C_FY2010.csv', 'FedRePORTER_PRJABS_C_FY2004.csv', 'FedRePORTER_PRJABS_C_FY2012.csv', 'FedRePORTER_PRJABS_C_FY2006.csv', 'FedRePORTER_PRJABS_C_FY2007.csv', 'FedRePORTER_PRJABS_C_FY2013.csv']


In [3]:
"""Read them in, concatenate and convert to a dataframe."""

list_data = []
for filename in abstracts_files:
    data = pd.read_csv(filename)
    list_data.append(data)
    
abstracts = pd.concat(list_data)

In [4]:
abstracts = abstracts.dropna()

In [5]:
merged_abstracts_list = abstracts[' ABSTRACT'].values.tolist()

### NMF method - Non-negative matrix factorization

NMF is another model used for topic extraction - while the LDA model uses raw counts of unique words per document, NMF model uses a normalized representation of those raw counts (TF-IDF representation)

TF stands for term-frequency and TF-IDF is term-frequency times inverse document-frequency. In other words, we are not only looking for how often a word appears in a given document, but also whether this particular word is distinct across all the collections of documents (corpus). For example, intuitively we understand that words like "often" or "use" are more frequently encountered, but they are less informative (more semantically-vacuous) if we want to discern a particular topic of a document, as they might be frequently encounter across all text documents in a corpus. On the other hand, words which we will see less frequently across a collection of document might indicate that those words are specific to a particular document, and, therefore, constitute a basis for a topic. 

More here: 

- https://scikit-learn.org/stable/modules/decomposition.html#non-negative-matrix-factorization-nmf-or-nnmf
- https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
- https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfTransformer.html#sklearn.feature_extraction.text.TfidfTransformer

In [12]:
# Convert a collection of raw documents to a matrix of TF-IDF features
vectorizer = TfidfVectorizer(stop_words='english')
tfidf = vectorizer.fit_transform(filtered_abstracts)

In [13]:
# Get feature names
vectorizer_feature_names = vectorizer.get_feature_names()

In [None]:
# Run the model with 10 topics
nmf = NMF(n_components=100, verbose=2).fit(tfidf)

In [15]:
nmf_100_2 = nmf

In [16]:
%store nmf_100_2

Stored 'nmf_100_2' (NMF)


In [6]:
%store -r nmf_100

In [9]:
nmf_W = nmf_100_2.transform(tfidf) 

In [10]:
topics_weights = []
for index,i in enumerate(nmf_W): # for every document
    topics_weights.append([index, i[25], i[57]]) # get topic weights

In [11]:
topics_list_dataframe = pd.DataFrame(topics_weights)
topics_list_dataframe.head()

Unnamed: 0,0,1,2
0,0,0.0,0.0
1,1,0.0,0.0
2,2,0.0,0.0
3,3,0.000108,0.0
4,4,0.0,0.0


In [12]:
abstracts = abstracts.reset_index()

In [None]:
topics_list_dataframe = topics_list_dataframe.rename(columns={0:'index'})

In [24]:
topics_list_dataframe.index.nunique()

1032895

In [25]:
abstracts.index.nunique()

1032895

In [13]:
concat = pd.concat([abstracts,topics_list_dataframe],axis=1)

In [14]:
concat.head()

Unnamed: 0,index,PROJECT_ID,ABSTRACT,0,1,2
0,0,103915,EDUCATION IN ACTION NASA Exchange City Learnin...,0,0.0,0.0
1,1,103916,Educational Advancement Alliance Inc Math Scie...,1,0.0,0.0
2,2,103917,"CUBRC, Inc FY09 Earmark Entitled, to continue...",2,0.0,0.0
3,3,103918,University Corporation for Atmospheric Researc...,3,0.000108,0.0
4,4,103919,Proposal Number: 0850898PI: John Doyle ...,4,0.0,0.0


In [15]:
filtered = concat[(concat[1] != 0) | (concat[2] != 0)]

In [16]:
len(filtered)

416312

In [26]:
filtered.to_csv('filtered_results.csv')

In [8]:
filtered = pd.read_csv('filtered_results.csv')

In [9]:
filtered_abstracts = filtered[' ABSTRACT'].values.tolist()

In [10]:
len(filtered_abstracts)

416312

In [30]:
pd.set_option('display.max_colwidth', -1)

In [19]:
opioid_terms = ['opioid','opiate','morphine','heroin',
                'percocet','vicoprofen','dextromethorphan','loperamide',
                'naloxegol','hydrocodone','oxycodone','fentanyl',
                'naloxone','analgesics','carfentanil','benzodiazepines',
                'narcotic','opium','cocaine','codeine',
                'pain relief','cancer pain','anesthesia','chronic pain',
                'nerve pain','fibromyalgia','overdose','addiction',
                'withdrawal','dependence','recreational use','euphoria',
                'tolerance','controlled substance','over-prescription',
                'peripheral nervous system','psychoactive','agonist',
                'antagonist','blood-brain']

In [21]:
results = []
for term in opioid_terms:
    for abstract in filtered[' ABSTRACT']:
        results.append([term,abstract])

In [None]:
len(results)

In [24]:
pd.set_option('display.max_colwidth', -1)

In [None]:
results = pd.DataFrame(results)
results

In [23]:
filtered = topics_list_dataframe[(topics_list_dataframe[1] != 0) | (topics_list_dataframe[2] != 0)]

In [31]:
filtered = filtered.rename(columns={0:'index'})

In [37]:
filtered = filtered.reset_index()

In [None]:
abstracts = abstracts.reset_index()

In [39]:
topics_abstracts = abstracts.merge(filtered,on='index')

In [17]:
nmf_W = nmf_100_2.transform(tfidf) # get topics to documents matrix
nmf_H = nmf_100_2.components_ # get word to topics matrix

violation: 1.0
violation: 0.5176337571181686
violation: 0.050164562381222834
violation: 0.011162182993669261
violation: 0.0025927295912233244
violation: 0.0006365453148000184
violation: 0.00018226397383737058
violation: 4.611443097461428e-05
Converged at iteration 8


In [20]:
# View the list of topics (10 top words per topic)

for topic_idx, topic in enumerate(nmf_H):
    print("Topic %d:" % (topic_idx))
    print('----------------------------')
    print(" ".join([vectorizer_feature_names[i]
                for i in topic.argsort()[:-10 - 1:-1]]))
    print('----------------------------')

Topic 0:
----------------------------
genetic genes variants genome variation identify genetics traits association genomic
----------------------------
Topic 1:
----------------------------
research investigators translational support development university biomedical researchers new career
----------------------------
Topic 2:
----------------------------
intervention interventions adherence randomized trial group based efficacy months participants
----------------------------
Topic 3:
----------------------------
subproject institution nih center isfor andinvestigator crisp theresources entries necessarily
----------------------------
Topic 4:
----------------------------
hiv infected aids infection antiretroviral art msm prevention transmission risk
----------------------------
Topic 5:
----------------------------
cancer nci cancers pancreatic colon prevention colorectal survivors members oncology
----------------------------
Topic 6:
----------------------------
core projects inve

decision making decisions choice information reward choices make policy value
----------------------------
Topic 58:
----------------------------
animal animals models human primate mice laboratory model mouse studies
----------------------------
Topic 59:
----------------------------
stem differentiation renewal hematopoietic hsc cells niche hscs progenitor adult
----------------------------
Topic 60:
----------------------------
prostate men pca cancer androgen ar prostatic progression african bph
----------------------------
Topic 61:
----------------------------
clinical trials trial studies translational phase protocol medical conduct safety
----------------------------
Topic 62:
----------------------------
kidney renal ckd hypertension disease blood pressure chronic cardiovascular vascular
----------------------------
Topic 63:
----------------------------
program programs members year programmatic funding scientific biomedical departments progress
----------------------------
T

In [22]:
# View a top document related to a given topic

for topic_idx, topic in enumerate(nmf_H):
    print('--------------------')
    print("Topic %d:" % (topic_idx))
    print('--------------------')
    print(" ".join([vectorizer_feature_names[i]
                    for i in topic.argsort()[:-10 - 1:-1]]))
    top_doc_indices = np.argsort(nmf_W[:,topic_idx] )[::-1][0:1]
    for doc_index in top_doc_indices:
        print('--------------------')
        print(filtered_abstracts[doc_index])

--------------------
Topic 0:
--------------------
genetic genes variants genome variation identify genetics traits association genomic
--------------------
The goal of the laboratory is to develop new approaches to the study of the genetic basis of cancer and its outcomes. Previously, the major focus was the analysis of common genetic variation in candidate genes in cancer and its related outcomes, particularly in immunocompromised individuals. Emphasis was on conducting pilot association studies and annotating candidate genes drawn from key pathways in innate immunity and cancer biology, such as telomere stability or nutrient transport (i.e., Vitamin C sodium dependent transport). The laboratory has developed expertise in bio-informatics and advanced genetic analyses with new platforms designed to test dense sets of single nucleotide polymorphisms (SNPs), which are the most common genetic variants in the human genome. Specifically, the laboratory has integrated approaches to identify

hiv infected aids infection antiretroviral art msm prevention transmission risk
--------------------
DESCRIPTION (provided by applicant): HIV dementia (HIV-D) and HIV-associated sensory neuropathy (HIV-SN) are the most common neurological manifestations of advanced HIV infection. The prevalence of HIV-D and HIV-SN in Sub- Saharan Africa where the majority of HIV cases reside globally is largely unknown. In addition, HIV subtype may have an impact on HIV disease progression, suggesting the possibility that HIV subtypes may differ with respect to their ability to cause neurological disease. The project will assemble a cohort of HIV+ individuals in Uganda: 1) to determine the prevalence of and risk factors associated with HIV-D and HIV-SN among untreated HIV+ individuals with moderate advanced immunosuppression, 2) to determine whether untreated HIV+ individuals decline from baseline in neuropsychological test performance, and peripheral nerve function, and 3) to obtain preliminary data t

--------------------
DESCRIPTION (provided by applicant): Older adults are at increased risk to develop prolonged pain and experience greater pain-related loss of physical and psychosocial function compared to younger cohorts. We propose that changes in endogenous pain modulatory capacity accounts for increased incidence of pain and disability in older adults. That endogenous pain modulation dysfunction is related to persistent pain is supported by a number of studies comparing chronic pain patients with healthy controls using a  pain-inhibition-by-pain  experimental model. Although this research group (and others) has shown age deficiencies using the  pain inhibits pain  model, other human laboratory models that are known to engage pain modulatory systems have not been tested across the lifespan, each potentially involving different mechanisms. The overarching goal of the proposed research is to characterize age-related changes in pain inhibitory and facilitatory function and to inves

--------------------
DESCRIPTION (provided by applicant): My goal for the K25 award is to establish myself as an independent neuroimaging researcher with expertise in brain network analysis and an integral member of multidisciplinary research teams devoted to addressing diseases of the brain. Attaining these objectives will require focused didactic training and research guidance. Research We will develop new methodology to improve whole-brain connectivity analyses of normal and abnormal brain function. The launching of the Human Connectome Project by the NIH in 2009 underscores the importance of whole-brain connectivity analyses. Appropriately conducting these analyses is paramount in our understanding normal brain function as well as alterations due to conditions such as aging, dyslexia, and substance abuse. Before we can glean useful information from functional brain network differences in these conditions, methods need to be developed in order to permit 1) assessing several network 

--------------------
DESCRIPTION (provided by applicant): Evidence for the effectiveness of community-based psychosocial treatments for adolescent substance use disorders is mixed, at best. Researchers' inability to detect strong or replicable treatment effects may result from their focus on the post-treatment effects of a single incident treatment episode. Recent conceptualizations of addiction and treatment suggest that these approaches which compare pre-post treatment effects may obscure some of treatment's most salient effects. For one, treatment effects should be expected to be greatest during treatment or concurrently. Second, from a treatment careers perspective, multiple treatment episodes over time are likely to lead to cumulative effects, which would be greater than those observed for any individual treatment episode. Finally, considerations of client heterogeneity suggest that effects of treatment may be greatest for a subgroup of patients (moderated effects), with such effe

protein proteins membrane structure structural binding rna molecular folding interactions
--------------------
DESCRIPTION (provided by applicant): How proteins fold, that is attain their three-dimensional structure, is a fundamental biological process with important implications for human health. Misfolded proteins are often toxic, as illustrated by the number of neurodegenerative diseases referred to as  protein folding diseases . Molecular chaperones play vital roles in remodeling protein structure -- assisting de novo protein folding, preventing protein aggregation and disassembling protein complexes. Hsp70-based machineries, having J-proteins as obligate components, are amongst the most highly conserved molecular chaperone systems. J-proteins are a very diverse set of proteins, having only the 70 amino acid J-domain in common. All J-proteins share the ability to stimulate the ATPase activity of their partner Hsp70s, allowing them to capture client proteins. But it is their functio

--------------------
The role of the Burnham Cancer Center Administration is to provide the administrative support and management for the Center to effectively pursue its mission of cancer research within the CCSG Guidelines. Because of the extraordinarily close integration of the Burnham Institute and the Cancer Center, the Institute provides all basic general administrative functions for the Cancer Center. This allows the Cancer Center Administration to focus on comprehensive management of the CCSG, and other Cancer Center-specific issues. The Cancer Center Administration also serves as a conduit to the Institute's administrative services, assuring the Center receives the needed quantity and quality of services to optimally support Cancer Center operations and comply with all relevant regulations. Key functions of the Cancer Center Administration include: Administrative support for Cancer Center Planning, management and evaluation Management of the CCSG budget, providing monitoring, 

--------------------
DESCRIPTION (provided by applicant): Summary The purpose of this project is to demonstrate major improvements in care quality through redesign of care delivery in the University of Utah Community Clinics. The Community Clinics (CC) are a fee-for-service 10-site primary- and secondary-care system with about 50 primary care physicians, 350,000 annual visits, and 120,000 active patients. The CC have led primary care delivery reform since 2003 when we began the development and implementation of a new model of care called Care By Design (CBD). The three organizing principles of CBD are: Appropriate Access (AA), Care teams (CT), and Planned care (PC). It is within this existing system design that we will implement additional redesigns. In this project we will implement a comprehensive care management program targeted to patients with multiple chronic conditions. Our delivery redesign will include strategies for effectively managing care transitions and for aggressively s

neurons sensory neural neuronal olfactory circuits synaptic activity motor circuit
--------------------
DESCRIPTION (provided by applicant): Peripheral sensory neurons are responsible for detecting chemical, thermal and mechanical stimuli in the skin. Their ability to recognize and process these touch sensations is influenced by the territories in which their peripheral arbors innervate the skin, how they connect in the central nervous system, and their ability to relay information to downstream circuits. Defects in any of these components can result in a range of debilitating disorders collectively known as peripheral neuropathies. Extensive research has gone into cataloguing the different subtypes of sensory neurons based on the expression of specific molecular markers. However, the mechanisms by which sensory neuron subtypes identify process and transmit different kinds of sensory information are still largely unknown. We have discovered that two subclasses of peripheral sensory neu

--------------------
DESCRIPTION (provided by applicant): Gammaherpesviruses result in lifelong infection associated with malignancies and other chronic disease in immune deficient individuals. The human gammaherpesviruses include Epstein Barr virus and Kaposi's Sarcoma associated virus, which are associated with Burkitt's lymphoma, Hodgkin's lymphoma, post-transplant lymphoproliferative disorder, nasopharyngeal carcinoma, peritoneal effusion lymphoma and Kaposi's sarcoma. Given the strict host specificity of the human gammaherpesviruses, a major challenge is to understand the host and viral factors that regulate the outcome of gammaherpesvirus infection in vivo, in both healthy and immune compromised individuals. This proposal makes extensive use of the mouse gammaherpesvirus 68, to investigate the genetic contribution of viral and host genes in shaping the outcome of infection. Through our previous work, we developed an extensive knowledge of the precise in vivo contexts in which a v

wound healing tissue wounds repair corneal matrix mechanical regeneration diabetic
--------------------
DESCRIPTION (provided by applicant): Wound-healing complications are an important health concern that can be associated with diabetes, bed sores, and infection. The inability to form a stable provisional matrix over the wound site is a common hallmark of poor wound healing. Without a stable matrix, the migration of inflammation responsive cells such as endothelial cells, neutrophils and macrophages, needed to produce new blood vessels1 and fight infections is not possible. The goal of this project is to develop an imaging method to quantify the stability of theprovisional matrix during early-stage wound healing. This biological process has been largely invisible and the proposed work is expected to provide significant insight into molecular events that delay wound healing. Wound healing involves an intricate set of precisely timed processes that begin with the formation of a fibrin c

bone fracture osteoporosis marrow fractures pth loss skeletal resorption formation
--------------------
DESCRIPTION (provided by applicant): 1. Osteoporosis is  a pediatric disease with geriatric consequences . Simply stated, suboptimal skeletal.  2. Development in childhood and adolescence may result in decreased bone strength and an increase in lifetime 3 fracture incidence. A delay in the onset of puberty (primary amenorrhea) correlates with both low bone mass; 4. and an increased incidence of stress fracture. Suboptimal bone accrual may have long term consequences.  5. Even with current treatment options as studies that treated amenorrheic dancers for 2 years with hormone 6 replacement therapy found no difference in bone mineral density between treated and placebo groups. The 7 most significant factors during development may be nutritional and lifestyle factors. Therefore, our overall goal 8 is to ascertain the affect of delayed pubertal development on the mechanism of bone loss at

community communities disparities based african education american rural participatory cbpr
--------------------
University and community engagement in research exists along a spectrum. Minimally, community-placed research exists when community members are asked to individually participate in research, but no attempt is made to either engage the community or understand community research needs. At the other end of the spectrum, there is true community-based participatory research (CBPR) in which the community has engaged in the development of the research question, as well as in the research design, implementation, analysis, and eventually dissemination of the results. In reality, universities primarily engage in community-placed research; few truly community-based partnership projects are performed. As a result, underserved communities have suffered from research that disregards community needs, often resulting in harm from inappropriate research that may be stigmatizing.Underserved c

mir mirnas mirna micrornas expression microrna rnas mrna rna mirs
--------------------
This subproject is one of many research subprojects utilizing theresources provided by a Center grant funded by NIH/NCRR. The subproject andinvestigator (PI) may have received primary funding from another NIH source,and thus could be represented in other CRISP entries. The institution listed isfor the Center, which is not necessarily the institution for the investigator.Background & Aims: The Gastrointestinal tract (GIT) is a major target of HIV/SIV infection. Although our understanding of HIV/SIV enteropathy has greatly improved, the recent discovery of miRNAs has added yet another novel and complex regulator of gene expression with potential roles in the molecular pathogenesis of this disorder. microRNAs (miRNAs) are genomically transcribed, ~21-23 nucleotide noncoding RNAs that are highly conserved and suppress gene expression by targeting mRNAs for translational repression or degradation. We inve

diabetes diabetic type complications risk t1d glycemic glucose t2dm islet
--------------------
The number of people directly affected by diabetes continues to grow. In Washington State, 350,000 people are living with diabetes; another 150,000 are living with diabetes but have not been diagnosed. Some people in our state are affected by diabetes more than others. People with lower incomes and education levels are at greater risk to develop diabetes. These same individuals often do not have access to the diabetes education and health care needed to manage diabetes and prevent the complications often caused by uncontrolled diabetes. Research shows that treatments available today dramatically improve the management of diabetes and prevent long-term complications. The WSU Diabetes Awareness and Detection project has shown improvements in participants' knowledge and confidence to manage their diabetes and improvements in diabetes control as measured by A1c and blood pressure. The WSU Diabete

--------------------
﻿   DESCRIPTION (provided by applicant): Prostate cancer is a heterogeneous disease with significant variations in its clinical outcome. Current methods to assess prostate cancer risk combine prostate specific antigen (PSA) screening and random prostate biopsy. Unfortunately, this strategy fails to reveal the lesion's location and does not accurately differentiate between aggressive and non-aggressive prostate cancers. As a result, most patients will receive unnecessary active treatment for low-risk prostate cancer in order to avoid under treatment. Active treatment involves surgery or radiation, often causing long-term side effects such as urinary incontinence, erectile dysfunction, or bowel urgency. Therefore, non-invasive and accurate diagnostic methods to determine the location of prostate cancer and to assess its immediate risk are needed.  The mission of Prostate Theranostics is to address this problem by developing targeting MRI contrast enhancing agents spe

--------------------
DESCRIPTION (provided by applicant): Exercising self-control is a process that requires individuals to override or inhibit their thoughts, emotions, urges, and behaviors. Self-control failures are frequently linked with and offered as explanations for a variety of negative outcomes, including substance use. In experimental studies of self-control capacity, state levels of self-control are inferred through performance on behavioral measures thought to reflect this resource. However, constraining definitions of self-control capacity to strictly behavioral measurements limits the applicability of models of self-control to the phenomena of self-control failure as it occurs in everyday life. The proposed work makes use of self-report measures of state self-control in the context of diary and experience sampling studies to glean a more accurate representation of the variation people experience in their capacity to exhibit self-control in everyday life. Specifically, the 

--------------------
DESCRIPTION (provided by applicant):  Stroke is the leading cause of disability in the United States. It is estimated that 700,000 people in the United States will experience a stroke each year and that there are over 5 million Americans living with a stroke. Regaining the ability to walk is an important goal for individuals who have experienced a stroke and it is often a primary focus of the rehabilitation of these individuals. Current research suggests that rehabilitation strategies that are based on task oriented, intensive training are necessary to induce use dependent neurologic reorganization in order to enhance motor and functional recovery after stroke. Constraint induced movement therapy (CIMT) has been shown to be effective in improving upper extremity motor control and functional use of the affected limb in real world situations in people with stroke. Our long-term goal is development and testing of a comprehensive, CIMT based intervention protocol to im

--------------------
DESCRIPTION (provided by applicant): The loss of skeletal muscle mass is of clinical importance because it is associated with increased morbidity and mortality, as well as a marked deterioration in the quality of life. A broad patient population is affected by significant losses in muscle mass including those afflicted by various systemic diseases (cancer, sepsis, HIV- AIDS), chronic physical inactivity as a result of long term bed rest, rheumatoid arthritis and limb immobilization, and sarcopenia, the age associated loss in muscle mass and strength. Satellite cells are currently an attractive therapeutic target given their stem cell characteristics and essential role in post-natal muscle growth and regeneration. What remains controversial is the necessity of satellite cells in other aspects of muscle plasticity such as hypertrophy, re-growth following atrophy and muscle maintenance with aging. In an effort to resolve this fundamental issue, a novel mouse line was 

auditory hearing speech cochlear noise processing sound loss hair perception
--------------------
DESCRIPTION (provided by applicant): This proposal will study a recently recognized form of hearing disorder called auditory neuropathy (AN) that is due to a disorder of auditory nerve functions in the presence of normal cochlear receptor activities. AN subjects have normal measures of cochlear outer hair cell activities but abnormal measures of the central auditory pathway functions beginning with auditory nerve. The hearing disorder typically affects speech comprehension out-of-proportion to the pure tone loss, particularly speech recognition in noise. AN is not rare and accounts for 10 percent of newborns identified as having hearing loss and also develops in childhood and in adults. In adults the disorder of auditory nerve is commonly associated with a peripheral neuropathy. In addition, AN and sensory hearing loss occur together. Loss of neural synchrony and decreased auditory nerve i

network networks ctn wireless node connectivity internet regulatory infrastructure trials
--------------------
Part 1.The optical network of the future will have orders of magnitude increase in data rates, due at least partially to the increase in big-data transactions. These create the need for fast scheduling of network resources and agile network adaptation to most efficiently move the data across the network.  This project proposes to investigate a cognitive network management and control system, which 'senses' current network conditions and uses this information to satisfy overall performance goals. This project will be the first comprehensive research on cognitive optical network management and control. The goal is for agile automated adaptation to replace current slow, manually-driven management and control practices. The fruits of this research will have implications for next generation wireless networks and power grid systems and for fast detection of extreme events that can s

--------------------
NON-TECHNICAL SUMMARY Remarkable levels of sophistication have been reached in linking properties of a given material to its microstructure, crystal structure and electronic structure. A substantially bigger challenge, though, is predicting the dynamic evolution of a material taken out of equilibrium and determining what external stimuli must be imposed to shepherd the material into a desired end state. The desirable properties from a particular chemistry are usually manifested in metastable crystal structures and microstructures rather than in the true equilibrium state of that chemistry. In many applications it is necessary to know how a material in a particular state will evolve over time either because it is metastable or unstable, such as in high temperature applications, or due to changing boundary conditions, as in electrochemical energy storage applications.This award supports computational research and education to develop highly automated statistical mech

ptsd veterans tbi va fear trauma traumatic symptoms extinction military
--------------------
DESCRIPTION (provided by applicant):        Posttraumatic Stress Disorder (PTSD), a highly prevalent, chronic psychiatric disorder, often co-occurs with other psychiatric disorders, is associated with occupational and social dysfunction, and can lead to chronic disability. PTSD is also a risk factor for a number of health-related concerns including chronic pain and cardiovascular disease and is associated with significant economic burden. Veterans are at increased risk for PTSD with prevalence rates up to 3 times those who are non-Veterans. Existing PTSD treatments have been found to be less effective for Veterans with PTSD, in fact, combat-related trauma has been associated with the lowest treatment effect sizes in studies of PTSD treatments. Veterans with PTSD often experience difficulty engaging in traditional CBT exposure based approaches, which can trigger the uncomfortable emotional arous

--------------------
DESCRIPTION (provided by applicant): The overall goal of The Clinical Profile of Parkinson's Disease (PD) Pathology, is to characterize the clinical profile of PD pathology in older person's without a diagnosis of PD. Showing that PD pathology is associated with a distinct and progressive condition among persons without a clinical diagnosis of PD, would have a transformative effect on PD studies. Although, PD only affects up to 5% of persons by age 85, compelling preliminary data shows that indices of PD pathology including nigral degeneration and Lewy bodies are present in nearly 20% of older persons without PD and are associated with the severity of parkinsonism proximate to death. This suggests that PD pathology is common and causes clinical signs in persons who do not meet clinical criteria for PD. Like the recent reclassification of AD, PD may also have an asymptomatic PD pathology phase, followed by a stage in which PD pathology results in mild motor and non-

mitochondrial mitochondria ros dysfunction oxidative oxygen mtdna energy iron metabolism
--------------------
﻿   DESCRIPTION (provided by applicant): The impact of mitochondrial biology on human cancers is broad because these organelles are critical regulators of metabolism, proliferation, and apoptosis. Indeed, mitochondrial aberrations are common in multiple cancer types --- not only do mitochondrial dysfunctions correlate with disease pathogenesis, but aberrant mitochondria also negatively impact upon chemotherapeutic success. Within a cell, mitochondrial homeostasis is maintained by a process referred to as  mitochondrial dynamics , which is essential for mitochondrial genome integrity, efficient ATP generation, managing ROS, and the rapid distribution of mitochondrial metabolites. Mitochondrial dynamics result from the cumulative nature of two opposing forces: mitochondrial division and mitochondrial fusion. Recent published work from my group demonstrated: (1) mitochondrial divi

exercise physical aerobic activity training fitness intensity sedentary walking effects
--------------------
﻿   DESCRIPTION (provided by applicant): Given the rising proportion of older adults worldwide and the progressive decline in brain function with advancing age, there is a pressing need to develop novel interventions that protect the aging brain. The predominant approach for implementing exercise training to improve brain function is to increase cardiovascular fitness. However, there is mixed empirical support for the effectiveness of this approach. Further, there are also acute effects of exercise within one hour of the cessation of a single exercise session. These effects occur before adaptations related to fitness could occur and animal studies have shown they occur in the same brain regions that benefit from longer-term exercise training. Therefore, we propose the acute paradigm is a tool to probe this early, direct response from exercise in order to determine how best to ma

oral dental caries hpv tooth periodontal craniofacial mutans taste cavity
--------------------
DESCRIPTION (provided by applicant): Oral health is vital to a person's overall health, and yet the incidence and prevalence of dental problems and oral diseases remain quite high. Statistics show that severe gum disease affects almost 15% of adults, and a large proportion of adults show at least some signs of gum disease. Furthermore, preliminary evidence also indicates that oral health is closely related to systemic diseases through a number of routes such as blood circulation; due the complexity involved with the process and procedures of clinical encounters and the increasing overall costs of health care, dental exam and treatment are more complicated and expensive. The reality is that underserved populations do not have the ability to identify a usual source of care, lack dental insurance to support the provision of care, and are burdened by competing needs. Self-report is the most effic

In [401]:
# Get the index of documents and topic probabilities per document
topics_probabilities = []
for index,i in enumerate(nmf_W): # for every document
    topics_probabilities.append([i, index]) # get all topic probabilities

In [390]:
# Get the index of documents and associated (most probable) topic index
topics_list = []
for index,i in enumerate(nmf_W): # for every document
    topics_list.append([np.argsort(nmf_W[index])[::-1][0], index]) # get most probable topic

In [391]:
topics_list_dataframe = pandas.DataFrame(topics_list)
topics_list_dataframe = topics_list_dataframe.rename(columns={0:'topic',1:'index'})
merged_abstracts = merged_abstracts.reset_index()
merged_topics_abstracts = merged_abstracts.merge(topics_list_dataframe,on='index')

In [None]:
# View a distribution of abstracts per topic
merged_topics_abstracts.groupby('topic')[' ABSTRACT'].count()

In [None]:
topics_weights = []
for index,i in enumerate(nmf_W): # for every document
    topics_weights.append([index, i[11], i[25], i[32], i[43]]) # get topic weights for 4 related topics

topics_weights_dataframe = pd.DataFrame(topics_weights)
topics_weights_dataframe = topics_weights_dataframe.set_index(0)
topics_weights_dataframe = topics_weights_dataframe.loc[(topics_weights_dataframe!=0).any(1)]

topics_weights_dataframe.columns = [11,25,32,43]

# Find the maximum value for a given topic
topics_weights_dataframe[topics_weights_dataframe[43]==topics_weights_dataframe[43].max()]

filtered_abstracts = set(filtered_abstracts)

# View a given abstract based on a treshold
list(filtered_abstracts)[556]