<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#NCSES-class---FedRePORTER-and-IPEDS-data" data-toc-modified-id="NCSES-class---FedRePORTER-and-IPEDS-data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>NCSES class - FedRePORTER and IPEDS data</a></span><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#Python-Setup" data-toc-modified-id="Python-Setup-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Python Setup</a></span></li></ul></li><li><span><a href="#Load-the-data" data-toc-modified-id="Load-the-data-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Load the data</a></span><ul class="toc-item"><li><span><a href="#Federal-RePORTER---Abstracts-(https://federalreporter.nih.gov/FileDownload)" data-toc-modified-id="Federal-RePORTER---Abstracts-(https://federalreporter.nih.gov/FileDownload)-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Federal RePORTER - Abstracts (<a href="https://federalreporter.nih.gov/FileDownload" target="_blank">https://federalreporter.nih.gov/FileDownload</a>)</a></span></li><li><span><a href="#NMF-method---Non-negative-matrix-factorization" data-toc-modified-id="NMF-method---Non-negative-matrix-factorization-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>NMF method - Non-negative matrix factorization</a></span></li></ul></li></ul></div>

## NCSES class - FedRePORTER and IPEDS data

### Introduction

**Federal RePORTER** (https://federalreporter.nih.gov) - a collaborative effort led by STAR METRICS® to create a searchable database of scientific awards from agencies (across agencies or fiscal years, by the award's project leader, or by a text search of a project's title, terms, or abstracts).

### Python Setup

In [1]:
# Data manipulation
import pandas as pd

# Reading in files
import glob

# Text analysis (topic modeling)
import numpy as np
import sklearn
from sklearn.decomposition import NMF
from sklearn.feature_extraction.text import TfidfVectorizer
import string

## Load the data

### Federal RePORTER - Abstracts (https://federalreporter.nih.gov/FileDownload)

In [2]:
"""Get all files with project abstracts."""

abstracts_files = glob.glob('FedRePORTER_PRJABS_C_FY20*.csv')
print(abstracts_files)

['FedRePORTER_PRJABS_C_FY2009.csv', 'FedRePORTER_PRJABS_C_FY2008.csv', 'FedRePORTER_PRJABS_C_FY2018.csv', 'FedRePORTER_PRJABS_C_FY2017.csv', 'FedRePORTER_PRJABS_C_FY2003.csv', 'FedRePORTER_PRJABS_C_FY2002.csv', 'FedRePORTER_PRJABS_C_FY2016.csv', 'FedRePORTER_PRJABS_C_FY2000.csv', 'FedRePORTER_PRJABS_C_FY2014.csv', 'FedRePORTER_PRJABS_C_FY2015.csv', 'FedRePORTER_PRJABS_C_FY2001.csv', 'FedRePORTER_PRJABS_C_FY2005.csv', 'FedRePORTER_PRJABS_C_FY2011.csv', 'FedRePORTER_PRJABS_C_FY2010.csv', 'FedRePORTER_PRJABS_C_FY2004.csv', 'FedRePORTER_PRJABS_C_FY2012.csv', 'FedRePORTER_PRJABS_C_FY2006.csv', 'FedRePORTER_PRJABS_C_FY2007.csv', 'FedRePORTER_PRJABS_C_FY2013.csv']


In [3]:
"""Read them in, concatenate and convert to a dataframe."""

list_data = []
for filename in abstracts_files:
    data = pd.read_csv(filename)
    list_data.append(data)
    
abstracts = pd.concat(list_data)

In [4]:
"""Drop missing abstracts."""

abstracts = abstracts.dropna()

In [5]:
"""Get abstracts as a list to feed in to TfidfVectorizer in the next step."""

merged_abstracts_list = abstracts[' ABSTRACT'].values.tolist()

### NMF method - Non-negative matrix factorization

NMF is a model used for topic extraction - while the LDA model uses raw counts of unique words per document, NMF model uses a normalized representation of those raw counts (TF-IDF representation)

TF stands for term-frequency and TF-IDF is term-frequency times inverse document-frequency. In other words, we are not only looking for how often a word appears in a given document, but also whether this particular word is distinct across all the collections of documents (corpus). For example, intuitively we understand that words like "often" or "use" are more frequently encountered, but they are less informative (more semantically-vacuous) if we want to discern a particular topic of a document, as they might be frequently encounter across all text documents in a corpus. On the other hand, words which we will see less frequently across a collection of document might indicate that those words are specific to a particular document, and, therefore, constitute a basis for a topic. 

More here: 

- https://scikit-learn.org/stable/modules/decomposition.html#non-negative-matrix-factorization-nmf-or-nnmf
- https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
- https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfTransformer.html#sklearn.feature_extraction.text.TfidfTransformer

In [6]:
"""Convert a collection of raw documents to a matrix of TF-IDF features."""

vectorizer = TfidfVectorizer(stop_words='english')
tfidf = vectorizer.fit_transform(merged_abstracts_list)

In [7]:
"""Get feature names."""

vectorizer_feature_names = vectorizer.get_feature_names()

In [None]:
"""Run the model with 100 topics."""

nmf = NMF(n_components=100, verbose=2).fit(tfidf)

In [None]:
"""Get topics to documents and word to topics matrices."""

nmf_W = nmf.transform(tfidf) # get topics to documents matrix
nmf_H = nmf.components_ # get word to topics matrix

In [10]:
"""View the list of topics (10 top words per topic)"""

for topic_idx, topic in enumerate(nmf_H):
    print("Topic %d:" % (topic_idx))
    print('----------------------------')
    print(" ".join([vectorizer_feature_names[i]
                for i in topic.argsort()[:-10 - 1:-1]]))
    print('----------------------------')

Topic 0:
----------------------------
genes mutations identify candidate gene identified zebrafish genetic drosophila defects
----------------------------
Topic 1:
----------------------------
training trainees faculty postdoctoral biology fellows doctoral predoctoral scientists trainee
----------------------------
Topic 2:
----------------------------
subproject institution nih center isfor andinvestigator crisp theresources entries necessarily
----------------------------
Topic 3:
----------------------------
health disparities public mental policy minority population relevance racial ethnic
----------------------------
Topic 4:
----------------------------
cancer nci cancers pancreatic ovarian prevention oncology colorectal members colon
----------------------------
Topic 5:
----------------------------
hiv aids infected infection antiretroviral prevention transmission art cfar cd4
----------------------------
Topic 6:
----------------------------
core investigators services project

students graduate undergraduate student faculty college school underrepresented summer biomedical
----------------------------
Topic 56:
----------------------------
clinical trials translational trial phase protocol studies conduct monitoring basic
----------------------------
Topic 57:
----------------------------
cocaine addiction dopamine abuse da self relapse seeking reward administration
----------------------------
Topic 58:
----------------------------
genetic variants genome variation sequencing genomic genetics disease association traits
----------------------------
Topic 59:
----------------------------
administrative committee scientific advisory meetings management external activities oversight core
----------------------------
Topic 60:
----------------------------
vascular endothelial blood hypertension flow cardiovascular inflammation angiogenesis atherosclerosis inflammatory
----------------------------
Topic 61:
----------------------------
network networks wireless c

In [13]:
"""View a top document related to a given topic"""

for topic_idx, topic in enumerate(nmf_H):
    print('--------------------')
    print("Topic %d:" % (topic_idx))
    print('--------------------')
    print(" ".join([vectorizer_feature_names[i]
                    for i in topic.argsort()[:-10 - 1:-1]]))
    top_doc_indices = np.argsort(nmf_W[:,topic_idx] )[::-1][0:1]
    for doc_index in top_doc_indices:
        print('--------------------')
        print(merged_abstracts_list[doc_index])

--------------------
Topic 0:
--------------------
genes mutations identify candidate gene identified zebrafish genetic drosophila defects
--------------------
﻿   DESCRIPTION (provided by applicant): Our goal is to identify genes important to schizophrenia. In the first cycle of our project, we demonstrated that damaging de novo mutations in persons with schizophrenia, from otherwise healthy families, disrupt genes that are co-expressed in the dorsolateral and ventrolateral prefrontal cortex during fetal development. Compared to their unaffected siblings, schizophrenia patients were significantly more likely to harbor such alleles. Proteins encoded by these genes functioned in neuronal migration, synaptic transmission, signaling, and transcriptional regulation. Integration of genetic data and expression data suggested possible schizophrenia-related processes and even potential targets for treatment. In the next cycle of our project, we ask whether these and other candidate genes are e

cancer nci cancers pancreatic ovarian prevention oncology colorectal members colon
--------------------
The SEER Program was initiated in 1972 in response to requirements of the National Cancer Program for assessing the magnitude of the cancer burden in the United States, and for identifying factors related to cancer risk and/or patient survival. The SEER Program has among its objectives: 1. 	To assemble and report, on a periodic basis, estimates of cancer incidence, especially among the following key cancer sites: breast cancer, lung cancer, colorectal cancer, prostate cancer, pancreatic cancer, and urinary bladder cancer.2. 	To monitor annual cancer incidence trends to identify unusual changes in specific forms of cancer occurring in population subgroups defined by geographic, demographic, and social characteristics. 3. 	To provide continuing information on changes over time in extent of disease at diagnosis, trends in therapy, and changes in cancer patient survival. 4. 	To identify 

cells cd4 cd8 antigen nk epithelial cell differentiation progenitor effector
--------------------
White blood cells called T lymphocytes play critical roles in immune defense against viruses, bacteria, fungi, protozoa, and cancer cells. In the unactivated state, these cells circulate in the blood and accumulate in lymphoid tissues such as lymph nodes and spleen. Upon encounter with foreign materials (antigens) on the membranes of specialized antigen presenting cells (dendritic cells), these resting T-cells become activated, undergo numerous cell divisions, and differentiate into effector cells. The effector cells leave the lymphoid tissues and blood, entering sites of infection to combat pathogens. They can also invade normal tissues where their activity can cause autoimmune pathology. After elimination of an infecting organism, most of the activated T-cells die, but some remain as memory cells, to provide a more rapid and vigorous response if the same pathogen is encountered in the fu

--------------------
DESCRIPTION (provided by applicant): The success of neuronal cell replacement therapy depends on the ability of transplanted cells to synaptically integrate with host tissue. Complete integration requires that neurons can both send and receive synaptic information, as well as to modify their synaptic strength in response to changes in the cellular behavior of synaptically connected neurons. Due to limited cell tracking and stimulation techniques, previous reports have shown only that transplanted neurons can receive information from host neurons via synaptic stimulation. Thus, no direct evidence exists for their ability to send information to host cells or undergo synaptic plasticity. To test these hypotheses, we propose to use the light-activated Channelrhodopsin-2 (ChR2) ion channel linked to the mCherry fluorophore in human embryonic stem cell (hESC)-derived neurons. Following transplantation of hESC-derived forebrain-patterned neurons to the mouse hippocampus, 

--------------------
ABSTRACT NOT PROVIDED
--------------------
Topic 15:
--------------------
alcohol drinking use alcoholism consumption abuse dependence heavy substance binge
--------------------
alcohol
--------------------
Topic 16:
--------------------
stem differentiation hematopoietic progenitor hsc adult niche self regeneration transplantation
--------------------
DESCRIPTION (provided by applicant): Continuous replacement and repair of adult epithelial tissues such as the skin, intestine, and lung depend on self-renewing stem cells which generate the specialized cells necessary for tissue maintenance. Recent work has shown that the local environment, or niche, in which stem cells reside is critical for their maintenance and function. Specifically, positioning of the stem cell within the niche exposes it to signals that promote its survival and maintenance and guide the production of specialized daughter cells that perform the normal tissue functions. Self-renewing cells that 

--------------------
The discovery of antibiotics nearly 80 years ago was a major milestone in the battle against infectious disease, yet bacterial infections continue to be a significant cause of death worldwide. In fact, management of many bacterial infections is becoming progressively more difficult due to the emergence of new and rapidly evolving pathogens with increased virulence, resistance to antibiotics, a greater ability to evade host responses, and heightened transmissibility. To reverse this trend, a systematic understanding of the complex dynamics between the pathogen and host is needed at every level of interaction, including those between cells, individuals, microbial communities, and populations. We will develop and implement an integrated experimental framework that provides systematic and complementary insights into bacterial infections encompassing single cells, animal models, and human patients, to investigate cellular genomics, transcriptional networks, and host mic

--------------------
DESCRIPTION (provided by applicant):  This proposal focuses on DNA repair enzymes and signaling proteins with multiple roles in cellular responses to DNA damage. We view these multitasking proteins as potential control nodes that may integrate DNA damage-specific signals and marshal the appropriate repair process(es) to the sites of DNA damage. A frontier area in the field of DNA repair is to understand how the biochemical pathways of DNA repair are coordinated with one another, in a manner analogous to other intracellular signaling pathways. Our immediate efforts are focused on 1) defining the mechanism of substrate selection by mammalian DNA ligase III in DNA damage responses, 2) creating a  chemical genetic switch  to shut off the NER pathway in order to explore other diverse functions of the repair endonuclease ERCC1-XPF, and 3) the enzymatic regulation of Sir2, a protein deacetylase with diverse activities, including DNA repair functions. We are using a struct

--------------------
DESCRIPTION (provided by applicant): This SBIR Phase I project will develop novel polymers as a drug delivery carrier for pharmacological applications. The carrier is a highly-structured copolymer that can form micelle-like structures with the drug in aqueous media. The polymer/drug micelle contains specially tailored interaction functionalities for drug encapsulation and slow release. The features of this type of molecular structure will minimize early degradation of the encapsulated drug and the polymer carriers are fully diodegradable. The polymer/drug micelles will exhibit uniform size and drug loading levels, and a predictable, well-controlled drug release rate.  The overall objective in the Phase I program is to develop a drug delivery system based on the proposed copolymer for pharmacological applications, and to demonstrate the enhanced drug encapsulation capability and biodegradbility as a drug delivery carrier. The specific technical aims are: 1) to synth

--------------------
DESCRIPTION (provided by applicant): The goal of this R34 application is to create and formative evaluate an intervention that promotes healthful lifestyle for Chinese immigrants at risk for developing diabetes based on the principles of Diabetes Prevention Program intervention. The acceptability and feasibility of the intervention will be assessed in a two-arm pilot randomized trial (n=60) that will compare the effects of a lifestyle intervention (n=30) to a minimal control intervention (n=30). The R34 evaluation at the end of 6 months and at 12 months will examine changes in weight and biomarkers including fasting blood glucose and insulin, HbA1c, lipid profile (total, HDL and LDL cholesterol, and triglycerides), and blood pressure. Measures of lifestyle changes will include dietary intake and physical activity. The study will also assess change in psychological measures. The pilot study will use the RE-AIM evaluation framework to focus on: Reach - How many of th

care patient quality outcomes medical providers services healthcare primary hospital
--------------------
DESCRIPTION (provided by applicant): The care of patients with complex healthcare needs is often fragmented because they receive care from multiple providers across disparate care locations and because information related to this care is frequently not transmitted between providers or locations. Inadequate inter-provider communication and care coordination significantly lessen care quality and compromise patient safety. This three-year project seeks to improve outcomes, quality and coordination of care for patients with complex healthcare needs by facilitating the availability of information following three types of care transitions into the ambulatory care setting. Specifically, information regarding care transitions will be made available to patients, primary care practitioners and care managers following hospitalizations, emergency department (ED) encounters, and specialty clini

lung pulmonary copd fibrosis alveolar epithelial airway cf respiratory injury
--------------------
DESCRIPTION (provided by applicant): Lung diseases, including lung cancer and chronic lung diseases such as chronic obstructive pulmonary disease, together account for some 280,000 deaths annually (American Lung Association). Contributing to this mortality is the fact that remediation of all forms of lung disease is hampered by the limited ability of lung to regenerate. Hence, lung tissue that is damaged by degeneration or infection, or lung tissue that is surgically resected, is not functionally replaced in vivo. Currently, the only way to replace lung tissue is to perform lung transplantation, an expensive procedure that is achieves only a 10% survival at 10 years, and one that is hampered by a severe shortage of organs.  Over the past 3 years, we have worked to address some fundamental challenges in lung tissue engineering. In order to produce a lung scaffold that has suitable geometry

subproject sources ncrr nih grant subprojectand likelyrepresents resourcesprovided center staff
--------------------
This subproject is one of many research subprojects utilizing the resourcesprovided by a Center grant funded by NIH/NCRR. Primary support for the subprojectand the subproject's principal investigator may have been provided by other sources,including other NIH sources.  The Total Cost listed for the subproject likelyrepresents the estimated amount of Center infrastructure utilized by the subproject,not direct funding provided by the NCRR grant to the subproject or subproject staff.N/A
--------------------
Topic 41:
--------------------
technology systems software project design performance power devices applications cost
--------------------
Published reports on software-development costs and our interactions with many companies reveal that testing and debugging consume too much of the development costs. Furthermore, the ubiquitous nature of software requires that it be h

--------------------
The study of human immunology is often difficult because certain types of experiments can't be performed on humans for obvious ethical reasons. To facilitate human research, many scientists utilize animal models. The animal model of choice for immunologists is the mouse. This animal model is very useful because of the large number of reagents available as well as the creation of transgenic and knockout mice. Both types of mice are required for this program project grant. Transgenic carrying  humanized  MHC molecules will be iused to discover new epitopes for influenza proteins in project 2 and in project 3, they will be used to model human influenza infection in mice. Knockout mice are required so that the mouse endogenous MHC class Ior II genes are not expressed in the MHC transgenic mice. Two projects will require some of thesame strains of mice and procedures, so for continuity between the projects an animal core providing the necessary services is required. The

stress oxidative er ptsd depression response anxiety induced hpa chronic
--------------------
DESCRIPTION (provided by applicant): All organisms must protect their internal system from cellular stress. Whether stress arises from external toxins or mutation and disease, cells must sensitively monitor stress signals and mount the appropriate responses to maintain internal homeostasis. Despite the importance of stress defense, much remains unknown about the mechanisms eukaryotes use to survive stressful situations. Functional genomics has uncovered functions for many genes in various genomes, largely by characterizing gene function under standard conditions. However, a substantial fraction of genes remains uncharacterized, and many of these are likely to be involved in stress defense and thus have not been uncovered through traditional studies. This proposal will use high-throughput functional genomics, genomic expression analysis, computational biology, and techniques in genetics and bio

liver hepatic hcc fibrosis hepatocytes nafld fatty disease alcoholic nash
--------------------
DESCRIPTION (provided by applicant): Liver diseases are among the 10 leading causes of death in the United States. The liver's extraordinary regenerative capacity is critical to understanding the mechanisms leading to developmental defects, acute and chronic liver diseases, and liver carcinogenesis. Stimulation of hepatocyte proliferation while preventing apoptosis is essential to the liver's regenerative process. Recently we uncovered a tumor suppressor pathway has potent effects upon liver cell division and death. Our long range goal is to define the cellular mechanisms underlying liver regeneration in health and disease and to apply these findings to developing improved therapies for liver disease. To that end, we are uniquely capable to address the objectives of this application, which are to uncover the mechanism that leads to the novel tumor suppressors' profound effects upon the liver.

clinical trials translational trial phase protocol studies conduct monitoring basic
--------------------
PROJECT SUMMARY (See instructions): Clinical Trials Office The Clinical Trials Office (CTO) provides a centralized service to support the development and execution of clinical trials in all of the Center's defined Research Programs. The CTO oversees all trials from a variety of venues and sponsors, including pharmaceutical, local institutional trials, and cooperative group trials. It has oversight of trials from a diverse group of medical specialties including, but not limited to, medical oncology, radiation oncology, surgery, hematology oncology, gynecologic oncology, palliative care, supportive care, and basic science. This includes protocol preparation, data acquisition, safety monitoring and reporting, quality assurance, regulatory compliance, overall study management and personnel training. The CTO serves the overall clinical research needs of investigators in protocol developm

administrative committee scientific advisory meetings management external activities oversight core
--------------------
The Administrative Core will coordinate the activities of the three Projects and three Cores of this Program Project. The Specific Aims of the Administrative Core are: 1) to provide administrative assistance and fiscal oversight to all Projects and Cores B and C; 2) to facilitate communication both within the Program and between the Program members and other investigators, centers, programs and institutions; and 3) to establish and maintain an Executive Committee, an external Scientific Advisory Committee, and an Internal Advisory Committee. The services offered by the Core will include administrative and secretarial support (ordering of supplies, financial reports, assistance with budget management), organization of regular meetings and of the annual meeting with the Scientific Advisory Committee, and preparation of progress reports and other documents. The Director

--------------------
DESCRIPTION (provided by applicant): Heart disease caused by the loss or dysfunction of cardiomyocytes is the leading cause of death worldwide. The adult mammalian heart possesses little regenerative potential and therefore displays fatal loss of function following myocardial infarction (MI) and other heart diseases. Fibrosis and scar formation due to activation of cardiac fibroblasts serve as barriers to cardiac regeneration and contribute to loss of contractile function, pathological remodeling and susceptibility to arrhythmias. Recently, combinations of cardiogenic transcription factors were shown to be capable of activating cardiac gene expression in fibroblasts in vitro. Moreover, we have shown that forced expression of four transcription factors in cardiac fibroblasts is sufficient to activate cardiac gene expression in vivo, leading to improvement of cardiac function and reduction of adverse ventricular remodeling following MI in mice. Although these reprogr

--------------------
The IL-2 receptor and related cytokine receptor systems are being studied to clarify the T cell immune response in normal, neoplastic, and immunodeficient states. Following T-cell activation by antigen, the magnitude and duration of the T-cell immune response is determined by the amount of IL-2 produced, levels of receptors expressed, and time course of each event. The IL-2 receptor contains three chains, IL-2Ra, IL-2Rb, and gc. Dr. Leonard cloned IL-2Ra in 1984, we discovered IL-2Rb in 1986, and reported in 1993 that mutation of the gc chain results in X-linked severe combined immunodeficiency (XSCID, which has a T-B+NK- phenotype) in humans. We reported in 1995 that mutations of the gc-associated kinase, JAK3, result in an autosomal recessive form of SCID indistinguishable from XSCID and in 1998 that T-B+NK+ SCID results from mutations in the IL7R gene. Based on work in our lab and others, gc was previously shown to be shared by the receptors for IL-2, IL-4, IL-7

--------------------
PROJECT SUMMARY (See instructions): This is an application for renewal of the George M. O'Brien Kidney Center at Yale. This center was established with the overarching goal to facilitate basic, translational and clinical research that will advance the prevention and treatment of kidney diseases. Major research areas of emphasis are renal epithelial cell biology and physiology; inherited kidney disease and kidney development; acute kidney injury (AKI) and chronic kidney disease (CKD); and vascular biology, inflammation and glomerular disease. A critically important benefit of the Center is to provide renal investigators both at Yale and across the country with access to highly specialized services not otherwise routinely available to support their research. To this end, the Center includes three cores to provide small animal physiology and phenotyping services to enable detailed characterization of renal function at the level of the tubule, the kidney, and the intac

--------------------
﻿   DESCRIPTION (provided by applicant): The fundamental characteristic of epithelia is cell-cell adhesion, which regulates signaling pathways involved in cell organization, migration and gene expression. Disruption of cell-cell adhesion leads to loss of epithelial cell organization, increasd cell migration, and loss of contact-inhibition of cell proliferation, which are characteristic of mny genetic diseases including cancer. Thus the RATIONALE for our work is that a deep mechanistic understanding of cell-cell adhesion will provide fundamental insights into the regulation of epithelial tissue organization in normal and disease states. Our CENTRAL HYPOTHESIS is that cell-cell adhesion maintains epithelial homeostasis by controlling cytoskeleton organization and cell migration, and sequestering key signaling proteins that regulate cell proliferation. Our LONG- TERM OBJECTIVES are organized under 2 broad themes that address KEY CHALLENGES about: A. Mechanisms involve

--------------------
﻿   DESCRIPTION (provided by applicant): The impact of mitochondrial biology on human cancers is broad because these organelles are critical regulators of metabolism, proliferation, and apoptosis. Indeed, mitochondrial aberrations are common in multiple cancer types --- not only do mitochondrial dysfunctions correlate with disease pathogenesis, but aberrant mitochondria also negatively impact upon chemotherapeutic success. Within a cell, mitochondrial homeostasis is maintained by a process referred to as  mitochondrial dynamics , which is essential for mitochondrial genome integrity, efficient ATP generation, managing ROS, and the rapid distribution of mitochondrial metabolites. Mitochondrial dynamics result from the cumulative nature of two opposing forces: mitochondrial division and mitochondrial fusion. Recent published work from my group demonstrated: (1) mitochondrial division is chronically enhanced in RAS-transformed murine cells and human cancer lines harbo

--------------------
﻿   DESCRIPTION (provided by applicant): The study of the factors that control RNA expression to give rise to diverse cell types, from the same genome, has occupied scientists for more than 50 years. However, transcriptional regulation is just the beginning. Each expressed RNA can potentially adopt a new purpose as a function of its spatial position within a cell. RNA localization has been analyzed one RNA at a time because assaying RNA spatial organization on a systems-level is currently not possible. This Innovator proposal is focused on developing the methods to produce the first ever view of RNA localization on a transcriptome-wide level within cells.  We will develop a novel methodology to seek and find the cellular localization of every RNA within specific cellular locations. These methods are designed to be applicable to any cell type. By merging our technology with RNA sequencing we will construct  RNA Localization Heatmaps  to identify sites where certain 

women pregnancy maternal reproductive hpv pregnant birth fetal men ovarian
--------------------
Since 1995, over 40 countries have adopted legislation that mandates women's participation in government, and these policies enjoy considerable support from international organizations, politicians, and activists.  Practitioners who seek to increase women's presence in office do so in large part because they believe it will provide normative benefits for women through improved policy representation.  Despite this widely held assumption, it is clear that in a number of cases increasing women's representation fails to transform the legislative agenda.  While in some countries women's increased participation alters policy representation-defined in this project as the adoption of legislation related to women's interests-in others the addition of female legislators has had only limited consequences.Why does increasing women's participation lead to such mixed results?  This project represents one 

program members programs funding programmatic year scholars departments peer reviewed
--------------------
The Program Leaders for the Markey Cancer Center (MCC) provide cohesive leadership for promoting the research activities within and among the program areas. They assure continued interchange, program planning and evaluation, and educational exchange within their program. The Program Leaders identify areas of need within their research program and work with leadership of the cancer center to establish priorities and pursue recruitment in their respective areas. They are involved in identifying the shared resource needs of their program members and encourage collaborations between programs by communicating with leaders of other programs on possible areas of interactions and collaborations. They inform the appropriate Associate Director and the MCC Director on program activities and serve on the MCC's Scientific Advisory Panel and the Program and Shared Resource Leaders Committee. Th

--------------------
The development of HIV vaccines and other prevention strategies relies on the use of nonhuman primates in preclinical studies to advance the development of effective AIDS vaccine candidates and to advance development of effective topical microbicides and other prevention modalities or immune-based therapies. The NIAID Simian Vaccine Evaluation Unit (SVEU) contracts shall provide nonhuman primate resources that primarily support preclinical evaluation of AIDS vaccines. The SVEUs conduct studies in support of vaccines being developed by a wide range of investigators. These studies complement NIAID-supported basic vaccine research and vaccine evaluation studies funded through R01 (investigator-initiated research) grants, HIVRAD (HIV Vaccine Research and Design Program) grants, IPCAVD (Integrated Preclinical-Clinical AIDS Vaccine Development) grants, and NHP Consortium awards. The SVEUs also perform studies of candidate vaccines offered by companies or other researcher

iron heme fe hepcidin deficiency anemia metabolism metal transport overload
--------------------
﻿   DESCRIPTION (provided by applicant): As a cofactor for thousands of enzymes, iron is an essential micronutrient. Yet, free iron is toxic because it catalyzes rapid formation of damaging reactive oxygen species. Therefore, homeostasis systems exert tight control on iron levels in all organisms and gene expression is adjusted in response to iron deprivation and iron abundance: expression of at least 100 genes is known to be iron-dependent in Escherichia coli. In humans, disruption of iron homeostasis contributes to severe diseases: iron accumulation in the brain is linked to neurodegenerative diseases, iron overload causes the liver disease hemochromatosis, and iron deficiency leads to anemia and impaired cognitive development. Furthermore, invading bacterial pathogens hijack iron out of human proteins to establish infections. These considerations underscore the essential role of iron hom

projects spore project pilot developmental support investigators cores funds biostatistics
--------------------
The objectives of the Developmental Research Projects Program are to provide a continuous flow of new ideas and projects to stimulate myeloma research in the context of the Myeloma SPORE. It encourages new research directions and methodologies and facilitates collaborations. By providing initial support to pilot projects, it will foster the development of new translational projects. It also allows the Myeloma SPORE to have participation and recruitment of new investigators not only from the DF/HCC but also from outside institutions. Our Developmental Research Program during the previous funding period was extremely successful, with three projects in this renewal SPORE application directly evolving from prior Developmental Projects. This Program will continue to rely on scientific and programmatic review by the Governance Committee, which will assure selection of the most prom

As a result of the first topic model with 100 topics, 2 topics related to opioids and drug abuse are found.

In [14]:
"""Get topic weights per document."""

topics_weights = []
for index,i in enumerate(nmf_W): # for every document
    topics_weights.append([index, i[25], i[57]]) # get topic weights for 2 opioid- and drug abuse-related topics

In [15]:
"""Get those abstracts which have at least some value (not zero) for either of 2 topics."""

topics_list_dataframe = pd.DataFrame(topics_weights)

abstracts = abstracts.reset_index()
topics_list_dataframe = topics_list_dataframe.rename(columns={0:'index'})
concat = pd.concat([abstracts,topics_list_dataframe],axis=1)

filtered = concat[(concat[1] != 0) | (concat[2] != 0)] 

In [16]:
"""Get a list of abstracts from the filtered dataframe above."""

filtered_abstracts = filtered[' ABSTRACT'].values.tolist()

In [17]:
len(filtered_abstracts)

430340

In [None]:
"""Run another topic model with 100 topics on the filtered abstracts."""

"""Convert a collection of raw documents to a matrix of TF-IDF features."""

vectorizer = TfidfVectorizer(stop_words='english')
tfidf = vectorizer.fit_transform(filtered_abstracts)

"""Get feature names"""
vectorizer_feature_names = vectorizer.get_feature_names()

"""Run the model with 100 topics"""
nmf = NMF(n_components=100, verbose=2).fit(tfidf)

In [21]:
"""Get topics to documents and word to topics matrices."""

nmf_W = nmf.transform(tfidf) # get topics to documents matrix
nmf_H = nmf.components_ # get word to topics matrix

violation: 1.0
violation: 0.5476233303884369
violation: 0.059274689694468134
violation: 0.01423237577474501
violation: 0.003332604999603872
violation: 0.0008879660903467299
violation: 0.0002619486792995731
violation: 6.933238294311101e-05
Converged at iteration 8


In [22]:
"""View the list of topics (10 top words per topic)"""

for topic_idx, topic in enumerate(nmf_H):
    print("Topic %d:" % (topic_idx))
    print('----------------------------')
    print(" ".join([vectorizer_feature_names[i]
                for i in topic.argsort()[:-10 - 1:-1]]))
    print('----------------------------')

Topic 0:
----------------------------
compounds inhibitors agents chemical novel activity new synthesis effects drugs
----------------------------
Topic 1:
----------------------------
training trainees faculty postdoctoral fellows scientists doctoral predoctoral medicine biology
----------------------------
Topic 2:
----------------------------
treatment intervention trial randomized efficacy cbt therapy treatments study placebo
----------------------------
Topic 3:
----------------------------
subproject institution nih center isfor andinvestigator crisp theresources entries necessarily
----------------------------
Topic 4:
----------------------------
hiv infected aids infection antiretroviral art prevention transmission msm tat
----------------------------
Topic 5:
----------------------------
cancer cancers nci pancreatic colon prevention colorectal bladder survivors members
----------------------------
Topic 6:
----------------------------
core projects investigators project core

management water soil production crop practices crops agricultural pest plant
----------------------------
Topic 57:
----------------------------
et al 2007 2005 2008 2006 2004 2009 2003 2010
----------------------------
Topic 58:
----------------------------
liver hepatic fibrosis hcc hbv nafld hepatitis fatty nash hepatocytes
----------------------------
Topic 59:
----------------------------
aging age older adults related aged changes life decline lifespan
----------------------------
Topic 60:
----------------------------
kidney renal ckd chronic cric injury aki hypertension esrd fibrosis
----------------------------
Topic 61:
----------------------------
signaling activation pathway aim pathways mechanisms induced role kinase receptor
----------------------------
Topic 62:
----------------------------
cardiac heart failure myocardial hf ventricular hypertrophy remodeling mi hearts
----------------------------
Topic 63:
----------------------------
program members programs year fun

In [23]:
"""View a top document related to a given topic"""

for topic_idx, topic in enumerate(nmf_H):
    print('--------------------')
    print("Topic %d:" % (topic_idx))
    print('--------------------')
    print(" ".join([vectorizer_feature_names[i]
                    for i in topic.argsort()[:-10 - 1:-1]]))
    top_doc_indices = np.argsort(nmf_W[:,topic_idx] )[::-1][0:1]
    for doc_index in top_doc_indices:
        print('--------------------')
        print(filtered_abstracts[doc_index])

--------------------
Topic 0:
--------------------
compounds inhibitors agents chemical novel activity new synthesis effects drugs
--------------------
1. To evaluate promising lead compounds for treating drug dependence through a broad panel of receptor and enzyme targets (Profile) in order to identify potential side effect liabilities and/or characterize the selectivity of the compounds for the desired pharmacological target set (Task 1). 2. To perform binding affinity and/or functional activity assessments of active compounds (Task 2). 3. To evaluate test compounds in a set of toxicological and pharmacokinetic assays in order to judge their potential for development as future therapeutic agents. These compounds and assays will be evaluated under Task 3. 4. Large compound libraries will be screened against a single target. (Task 4).
--------------------
Topic 1:
--------------------
training trainees faculty postdoctoral fellows scientists doctoral predoctoral medicine biology
------

--------------------
The Data Management/Statistical Core is a key component of the Center. The Core will serve as a support mechanism to the Center's overall research program. The overall objective of the Core is to facilitate the ability of the Center's investigators to conduct research that is of the highest standards and to disseminate the research outcomes to the academic and service communities and industry. The Core will provide statistical and analytical support to the research programs at all of the sites. The Core will also provide technical support for the research projects. In addition, the Core will serve as the depository for the cross-site core battery of measures. The Core will build on the structures developed in CREATE I for data collection, storage, transfer, and management, and quality control for the core battery. The methods developed in CREATE I will be further enhanced during the proposed CREATE II project to ensure that they reflect recent strategies in data ma

cognitive schizophrenia deficits measures control psychosis pfc prefrontal symptoms functional
--------------------
﻿   DESCRIPTION (provided by applicant): Cognitive deficits are an intrinsic part of schizophrenia, occurring independently of positive symptoms, and often persisting even when psychotic symptoms of schizophrenia have been successfully treated. Cognitive functioning is moderately to severely impaired in patients with schizophrenia and is typically present even in the prodromal phase of the disorder, in young drug-naïve patients. The deficits are in the domain of executive function largely controlled by the prefrontal cortex (PFC). However, there is only a fragmentary understanding of biochemical dysfunctions in brain that leads to cognitive impairment in schizophrenia. Furthermore, even though atypical antipsychotic drugs can improve certain aspects of cognition, many patients do not achieve remission. Therefore, the development of new therapeutic drugs for cognitive impa

--------------------
DESCRIPTION (provided by applicant): Hunter College proposes to continue, expand and improve its 27year old (1981) MBRS supported RISE program which has produced approximately 66 PhDs (38 in progress) from students who are underrepresented in science. In this Revision, we seek a supplement to support our Graduate Student Training program. In the competitive renewal of this grant (submitted May 2007), only the Undergraduate Research Training Program was supported. At both the undergraduate and graduate level, students who are underrepresented in science receive an intensive research experience in a laboratory engaged in nationally funded, competitive, state of the art, biomedical science research. At the undergraduate level (10 supported students), it also involves intensive mentoring by a faculty member and other enrichment activities intended to enhance science performance and to inspire and motivate students to seek advanced degrees in science (PhD) and a researc

--------------------
DESCRIPTION (provided by applicant): Type 2 diabetes is characterized by insulin resistance and disordered beta-cell function, especially a defect in glucose-stimulated insulin-secretion. Insulin receptors and signaling proteins are found within the beta-cells themselves. Recent evidence in rodents suggests the insulin signaling pathway is functional in islets and is important for glucose sensing. The physiologic role of insulin signaling within beta-cells in humans remains unknown. Our preliminary studies demonstrate that in healthy humans beta-cells are insulin sensitive tissues and raising circulating insulin levels by exogenous insulin administration can enhance glucose induced insulin secretion. We hypothesize that dysfunctional insulin signaling, i.e. insulin resistance, at the level of the beta-cell may be one mechanism underlying blunted insulin secretion in persons with type 2 diabetes and insulin resistant syndromes, which would be manifest as reduced ins

--------------------
C. Data Management and Analysis Core1. ObjectiveThe objective of the Data Management and Analysis Core is to continue to provide data managementservices and statistical expertise to Program Project investigators in a wide range of data acquisition andanalysis activities. Integral to the goals of each project is the management and analysis of Core observational,interview, questionnaire, and behavioral data, as well as management and analysis of data from the individualprojects. Data management activities draw on the considerable resources of the Data Management andAnalysis Center (DMAC) at the Frank Porter Graham Child Development Center (FPG) at UNC and TheMethodology Center (TMC) at PSD. Specific Aims of the Data Management and Analysis Core are to:1) Implement the planned missing design in conjunction with the Executive Committee.2) Develop and maintain data management strategies that process data collected in the common protocol andthe individual projects effici

--------------------
DESCRIPTION (provided by applicant): Lung diseases, including lung cancer and chronic lung diseases such as chronic obstructive pulmonary disease, together account for some 280,000 deaths annually (American Lung Association). Contributing to this mortality is the fact that remediation of all forms of lung disease is hampered by the limited ability of lung to regenerate. Hence, lung tissue that is damaged by degeneration or infection, or lung tissue that is surgically resected, is not functionally replaced in vivo. Currently, the only way to replace lung tissue is to perform lung transplantation, an expensive procedure that is achieves only a 10% survival at 10 years, and one that is hampered by a severe shortage of organs.  Over the past 3 years, we have worked to address some fundamental challenges in lung tissue engineering. In order to produce a lung scaffold that has suitable geometry and mechanics for lung regeneration, we have developed technologies to decell

dna repair damage replication methylation recombination genome strand dsb chromatin
--------------------
Title: Biological physics of DNA bending  DNA is one of the essential molecules of life, carrying the genetic information of a cell.  The mechanical properties of DNA are related to the ability of different protein complexes to bind to DNA, and control gene expression and DNA repair.  Understanding DNA mechanics, especially at short length scales, ends up being more complicated than the mechanics of more classic materials such as an elastic rod, given the double-helical nature and inhomogeneous building blocks of DNA.  This research will produce physical insights that will advance our understanding of the relationship between DNA mechanics and several important DNA-related biological processes. In this project, the PI will investigate how sharp DNA bending promotes tight packaging of the genome, local defects in the double helix, and protein transport between DNA sites, all features

--------------------
DESCRIPTION (provided by applicant): Abuse of cocaine is a widespread and severely deleterious public health problem. Although it is well established that there is substantial variability in an individual's response to cocaine exposure, we do not yet understand the mechanisms that mediate individual differences in the etiology of cocaine abuse and addiction. In particular, we do not understand why some individuals are resistant to the effects of cocaine and will consume cocaine at low rates whereas other individuals are relatively vulnerable to the effects of cocaine and will consume cocaine at high rates. In the present proposal, we will evaluate several novel biomarkers that may predict the vulnerability of an individual to abuse cocaine or to relapse following abstinence. We will evaluate these processes using drug self-administration (SA) and reinstatement of previously drug-maintained behavior in nonhuman primates, as these are well established animal models o

women sex pregnancy men female reproductive sexual male differences estrogen
--------------------
This report includes work arising from the following clinical protocols: NCT00026832, NCT00100360, NCT00001177, and NCT00001322.Behavioral observations from this and a related study in healthy young men, show that clinically significant depressive symptoms are rare accompaniments of induced hypogonadism in these healthy premenopausal women and men. Although hypogonadism is accompanied by hot flushes and disturbed sleep in approximately 80% and decreased sexual function in approximately 30% of these women, neither night-time hot flushes nor disturbed sleep are sufficient to cause depressive symptoms in more than 5% of hypogonadal young women (or men).  Thus this paradigm serves as an excellent comparison group for women with reproductive endocrine-related mood disorders who undergo identical hormone manipulations. Additionally, in a naturalistic study, we have demonstrated that healthy prem

oa cartilage knee joint osteoarthritis articular acl chondrocytes progression chondrocyte
--------------------
DESCRIPTION (provided by applicant): Osteoarthritis (OA) is the most common form of joint disease and a major cause of long-term disability in the United States (US). It is estimated that 2.5% of the adult population have symptomatic knee or hip OA. Over two-thirds of the 7.8 million OA patients in the US who seek treatment have moderate to severe joint involvement and would benefit from a therapy which arrests or delays cartilage loss. The etiology of OA is still partially unclear: While genetic factors are believed to underlie a significant proportion of OA cases, the majority of occurrences may not be genetically predetermined. OA is influenced by diet, body condition, or physical stress experienced (due to injury or overuse of a joint). Patient condition may therefore likely be improved or further progression prevented by an early identification of OA progression, combined

--------------------
﻿   DESCRIPTION (provided by applicant):    The regulation of gene expression dictates when, where, and to what degree protein isoforms are generated, thereby meeting the metabolic needs of various cell types that make up organs and living beings. Given the constantly changing microenvironment, this process is highly dynamic, capable of adjusting instantly to multiple external cues, or mounting gene expression responses within extended periods of time. To achieve diverse gene expression responses cells rely on multiple points of regulation that include modulating the origin and rate of transcription,altering pre-mRNA processing to induce the generation of alternatively spliced mRNA isoforms, changing the length or location of the 3' UTR through alternative polyadenylation, selective mRNA isoform translation, and altering the stability of mRNA pools. All of these gene expression steps are integrated and co-dependent. However, our understanding of the dynamic nature 

--------------------
DESCRIPTION (provided by applicant): Wound-healing complications are an important health concern that can be associated with diabetes, bed sores, and infection. The inability to form a stable provisional matrix over the wound site is a common hallmark of poor wound healing. Without a stable matrix, the migration of inflammation responsive cells such as endothelial cells, neutrophils and macrophages, needed to produce new blood vessels1 and fight infections is not possible. The goal of this project is to develop an imaging method to quantify the stability of theprovisional matrix during early-stage wound healing. This biological process has been largely invisible and the proposed work is expected to provide significant insight into molecular events that delay wound healing. Wound healing involves an intricate set of precisely timed processes that begin with the formation of a fibrin clot, followed by an inflammatory response, and eventually ending with extensive tis

--------------------
DESCRIPTION (provided by applicant): Human infants are confronted with a complex world that is filled with ambiguity. Not only are many different features and dimensions of information present in the environment, but these cues are often unrelated to any reinforcement or feedback. There are two solutions to learning in a complex and ambiguous environment: (a) innate constraints on the cues selected for processing (bottom-up), or (b) rapid learning-to-learn mechanisms that assess cues (top-down). Learned top-down mechanisms of information selection may be tuned more to specific task demands, and thus more useful for learning. Given how much infants have to learn over the first two years of life, it is not efficient to use mainly slow but precise (top-down) search methods. My hypothesis is that the developmental progression of learning how to learn requires using bottom-up information in a systematic way, while creating top-down buffers against bottom- up distraction

--------------------
DESCRIPTION (provided by applicant): Identifying genetic mechanisms that are protective or detrimental to age-dependent events would have profound consequences for preventing or delaying age-related functional declines. Studies on rare heritable human disorders and targeted gene mutations in rodents demonstrate that both the rate of age-related changes and longevity can be modulated through genetic mutation in single genes. In particular, there is a critical role of longevity genes that modulate hormonal, metabolic, and cellular insult repair pathways for the rate of aging in model organisms and potentially also in humans. While the genetics of aging has been predominantly studied in dividing peripheral tissues in relation to senescence, or in the context of neurodegenerative disorders, here we propose to explore a less studied area,  normal  human brain aging. We present a concise approach to defining age in the human central nervous system (CNS) using postmortem 

skin melanoma cutaneous uv keratinocytes epidermal hair uvb human dermatology
--------------------
DESCRIPTION (provided by applicant): In the first two funding cycles of this grant, we have characterized the molecular structure and function of Cutaneous Lymphocyte Antigen or CLA. We have also made the surprising observation that in humans, the vast majority (>90 percent) of CLA+ skin homing T cells are not in peripheral blood, but rather are in skin at any given time. These cells appear to enter skin directly from blood, through constitutively expressed E selectin, chemokine ligands, and LFA-1 on dermal microvasculature. These observations suggest that memory responses in skin to antigens originally encountered through skin may not require acute extravasation of circulating memory T cells. To more fully define the immunophysiology of these phenomena, we have used murine models of cutaneous infection with vaccinia virus to track a T cell mediated, antigen-specific immune response from 

--------------------
This subproject is one of many research subprojects utilizing the resourcesprovided by a Center grant funded by NIH/NCRR. Primary support for the subprojectand the subproject's principal investigator may have been provided by other sources,including other NIH sources.  The Total Cost listed for the subproject likelyrepresents the estimated amount of Center infrastructure utilized by the subproject,not direct funding provided by the NCRR grant to the subproject or subproject staff.Preparation of the present report.
--------------------
Topic 69:
--------------------
spinal sci cord injury motor recovery bladder cervical locomotor functional
--------------------
--------------------
Topic 70:
--------------------
clinical trials trial network studies translational ctn protocol phase conduct
--------------------
DESCRIPTION (provided by applicant): Our objective is to become a participating Clinical Research Site in the Network for Excellence in Neuroscience Clinical 

--------------------
DESCRIPTION (provided by applicant): Peripheral sensory neurons are responsible for detecting chemical, thermal and mechanical stimuli in the skin. Their ability to recognize and process these touch sensations is influenced by the territories in which their peripheral arbors innervate the skin, how they connect in the central nervous system, and their ability to relay information to downstream circuits. Defects in any of these components can result in a range of debilitating disorders collectively known as peripheral neuropathies. Extensive research has gone into cataloguing the different subtypes of sensory neurons based on the expression of specific molecular markers. However, the mechanisms by which sensory neuron subtypes identify process and transmit different kinds of sensory information are still largely unknown. We have discovered that two subclasses of peripheral sensory neurons in zebrafish larvae can be distinguished by their pattern of axon projection i

--------------------
DESCRIPTION (provided by applicant):     Although most patients are believed to have mild-to-moderate asthma that responds to inhaled corticosteroids, there are subpopulations of asthma patients with severe disease whose symptoms and control are largely unresponsive to treatment including systemic corticosteroids. Severe asthma affects 10% of all asthmatic subjects, but these patients have greater morbidity and a disproportionate need for health care support. The mechanisms of severe asthma are not determined, but respiratory infections cause the majority of asthma exacerbations and have profound and potentially long-lasting, effects on asthma including its severity and response to treatment. In attempting to evaluate mechanisms of severe asthma, striking similarities exist between features of severe asthma and the response to respiratory infections in asthma, including intensity of airflow obstruction, neutrophilic inflammation, and a diminished response to cortic

--------------------
Depression is a common psychiatric illness that often begins in adolescence and adds to high mental health costs in the United States. Some depression can be normal and short-lived, but having a depressive disorder in adolescence can lead to other mental illnesses such as anxiety disorders, substance abuse, and suicide in adolescence and adulthood. Thus, it is important to study depression during adolescence in order to identify risk factors for depressive disorders before functioning is impaired. The ways in which the body physiologically responds and adapts to stress has been linked with the development of depression. Little is known how these biological processes work in Mexican-origin youth, which is important given that Mexican-origin adolescents show higher rates of depression symptoms and diagnoses relative to peers of other racial/ethnic groups and suicide is the third leading cause of death among 10-20 year old Hispanics in the U.S. If it is not treated ea

--------------------
DESCRIPTION (provided by applicant): The overall goal of The Clinical Profile of Parkinson's Disease (PD) Pathology, is to characterize the clinical profile of PD pathology in older person's without a diagnosis of PD. Showing that PD pathology is associated with a distinct and progressive condition among persons without a clinical diagnosis of PD, would have a transformative effect on PD studies. Although, PD only affects up to 5% of persons by age 85, compelling preliminary data shows that indices of PD pathology including nigral degeneration and Lewy bodies are present in nearly 20% of older persons without PD and are associated with the severity of parkinsonism proximate to death. This suggests that PD pathology is common and causes clinical signs in persons who do not meet clinical criteria for PD. Like the recent reclassification of AD, PD may also have an asymptomatic PD pathology phase, followed by a stage in which PD pathology results in mild motor and non-

--------------------
Biologists have been successful in revealing many of the molecular components involved in cell biology. Attention is now moving from identification of components to understanding the emergent, or system, behavior of the connected components. It has been hoped that general principles would emerge via the study of mathematical models of specific systems, but to date this form of study has not generally proven effective at discovering the hidden principles of biology. How the complex networks found in biological systems produce their emergent properties and behaviors remains elusive. Theoretical mathematics offers a possible route forward, and one which could, in time, have a profound influence on biology. This research project aims to cut through the complexity of biological models to elucidate the general principles of cell biology. Additionally, the project aims to develop new computational methods that can address currently infeasible problems. There is high deman

--------------------
﻿   DESCRIPTION (provided by applicant): Inflammatory bowel diseases are chronic and progressive inflammatory disorders that affect more than 1.4 million Americans. There is currently no cure for IBD, and over two-thirds of IBD patients become refractory to treatment with traditional therapies at some point in their lives and require surgical bowel resection. IBD i driven by inappropriate immune responses to the intestinal microbiota, which consists of trillions of bacteria that constitutively colonize the human intestine. The human gut microbiota shows considerable interindividual complexity and diversity with each person displaying a unique consortium of hundreds of bacterial species. Studies in mouse models of IBD have shown that select members of the microbiota have dramatic effects on colitis: some species exacerbate colitis by driving chronic inflammatory responses, while others protect against colitis by inducingimmunoregulatory responses. However, identifyi

--------------------
Our work focuses on two major research themes. One theme attempts to elucidate normative psychological and neurobiological mechanisms of fear and anxiety in healthy individuals in order to identify which of these mechanisms are perturbed in anxiety disorders. Anxiety is an adaptive response to threat that enables organisms to efficiently confront challenges either via rapid flight or fight responses (fear) or via sustained increased vigilance and behavioral inhibition (anxiety). When anxiety is evoked by anticipation of an unpleasant stimulus such as a mild electric shock (induced-anxiety), our sensorimotor system becomes primed to detect and react quickly to any threat. Our past work shows that this effect is mediated by increased communication between an aversive amplification circuit that includes the dorsal-medial prefrontal cortex (dmPFC) and the amygdala.  Phasic increased dmPFC-amygdala communication is responsible for our ability to rapidly detect and react

--------------------
Hepatitis C virus is both difficult to treat and a significant cause of morbidity and mortality. It is not understood how hepatitis C establishes infection nor how some patients clear infection. To this end 83 humans exposed to hepatitis C were studied in protocol 00-DK-0221. The immune response (both antibody and T cell) was closely followed and is being characterized. The study is in collaboration with Dr. Rehermann. As an extension of this 28 patients with acute hepatitis C have been followed and treated as indicated. Twenty-five patients (16 women; mean age 43 years) had adequate follow up to be included in an analysis.  Symptoms in the acute phase were reported by 80% and jaundice by 40%. Two patients (8%) developed ascites and acute liver failure but both survived.  Genotype 1 was most frequent (72%) but many patients (20%) could not be genotyped.   Serum aminotransferase levels and HCV RNA levels fluctuated greatly; 18% of patients were intermittently negati

4 related topics are found in a second round of a topic model with 100 topics.

In [24]:
"""Get abstracts with 4 related topics."""

topics_weights = []
for index,i in enumerate(nmf_W): # for every document
    topics_weights.append([index, i[8], i[11], i[33], i[53]]) # get topic weights for 4 related topics

topics_weights_dataframe = pd.DataFrame(topics_weights)

"""Sum the weights for all 4 related topics per document."""

topics_weights_dataframe['sum'] = topics_weights_dataframe[1] + topics_weights_dataframe[2] + topics_weights_dataframe[3] + topics_weights_dataframe[4]

In [32]:
filtered.columns = ['index','PROJECT_ID','ABSTRACT','index_copy',1,2]

In [37]:
filtered_abstracts = pd.DataFrame(filtered_abstracts)

In [38]:
filtered_abstracts = filtered_abstracts.reset_index()

In [None]:
filtered_abstracts = filtered_abstracts.reset_index()

In [53]:
"""Filter by those abstracts who have a summed value of more than 0.01."""

filtered_by_threshold = topics_weights_dataframe[topics_weights_dataframe['sum'] >= 0.01]

"""Merge with the original dataframe. Make sure that level_0 column is in both dataframes to merge on."""

filtered_by_threshold = filtered_by_threshold.rename(columns={0:'level_0'})
filtered_by_threshold_finalized = filtered_by_threshold.merge(filtered_abstracts,on='level_0')

In [54]:
filtered_by_threshold_finalized = filtered_by_threshold_finalized.rename(columns={0:'ABSTRACT'})

In [55]:
filtered_by_threshold_updated = filtered_by_threshold_finalized.merge(filtered, on='ABSTRACT')

In [63]:
filtered_by_threshold_updated = filtered_by_threshold_updated[['PROJECT_ID','ABSTRACT']]

filtered_by_threshold_updated = filtered_by_threshold_updated.drop_duplicates()

"""Export results to .CSV"""

filtered_by_threshold_updated.to_csv('Filtered_Results_Abstracts.csv')