In [1]:
#import tensorflow as tf
import pandas as pd
from gensim import corpora
import re
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
import string
from tqdm import tqdm
from gensim.corpora.dictionary import Dictionary
from gensim.models.ldamodel import LdaModel
import pandas as pd
import nltk
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords, wordnet
from nltk.stem import WordNetLemmatizer
!pip install scikit-learn
import sklearn
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import LatentDirichletAllocation
import torch
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
#from transformers import BertTokenizer, BertModel
from transformers import pipeline



  from .autonotebook import tqdm as notebook_tqdm


## NLP Project Walkthrough Notebook:
# Creating an Academic Paper Browsing Tool using Natural Language Processing Techniques

### 5/16/2023
### Jeffrey Bailey

#### Overview:
For this project, I want to make a model or set of models and tools that can be used to easily parse through dense academic papers and assist with finding interesting papers for the user to read. These are to be packaged into a prototype "app" that runs in python, that demonstrates how the models and created database could be used to develop this paper-browsing tool.

Specifically, I am making a transformer-based summarization model, doc2vec embeddings model, and topic selection model to make a database of scientific articles that includes short summaries, preprocessed versions of article text, vectorized document embeddings, and topic tags. This database will then be used by the set of functions and prototype app to create our paper-browser.

#### About the Data
I am using a very large set of data taken from the huggingface "datasets" library. It contains nearly 200,000 full academic articles from the arXiv repository of math and physics papers. To create my pipeline and prototype app, I used a random subset of 2,500 articles so I could quickly test different models and code architectures.

![pipeline](https://raw.githubusercontent.com/jbailey424/340_03_Project/main/pipeline.png)


In [2]:
df = pd.read_csv("scientific_papers_arxiv.csv")
df = df.sample(n=2500, random_state=42)
df = df.reset_index().drop('index', axis = 1)


In [3]:
#this is how our data started!
df.head()

Unnamed: 0,article,abstract,section_names
0,two decades ago bender and boettcher have foun...,we study the effect of @xmath0-symmetric comp...,introduction\nmodels\nnon-hermitian @xmath0-sy...
1,consider a family of bounded domains @xmath0 i...,consider a family of bounded domains @xmath0 ...,introduction\ncircle to square homotopy\nsierp...
2,"several years ago , an interesting observation...",we consider collision of two massive particle...,introduction\nenergy in cm frame versus killin...
3,the parallel chip - firing game or candy - pas...,the parallel chip - firing game is a periodic...,introduction\nparallel chip-firing on simple c...
4,"human physiology , as a science , aims to unde...",studying physiology over a broad population f...,introduction\nmaterials and methods\nresults\n...


## Part 1: Preprocessing!
Before any models could be made, text had to be preprocessed. To do this, I made a preprocessing function which removes formatting tags, non-alphanumeric or punctuation characters, converts text to lowercase, and then tokenizes it. Stopwords are then removed, and the tokens are lemmatized using the wordnet lemmatizer and part-of-speech tagger.

In [6]:
def preprocess(text):
    if isinstance(text, str):
        text = re.sub(r'(@xmath\d+|\\n|@xcite)', '', text) # remove formatting and special character tags
        text = re.sub(r'[^a-zA-Z0-9\s.,;:?!-]', '', text) # retain punctuation, remove other non-alphanumeric characters
        tokens = word_tokenize(text.lower())  # convert to lowercase before tokenizing
        tokens = [token for token in tokens if token not in stop_words] # drop stop words
        tokens = [lemmatizer.lemmatize(token, get_wordnet_pos(token)) for token in tokens] # use a POS tagging function for more accurate lemmas
        preprocessed_text = ' '.join(tokens)
        return preprocessed_text
    else:
        return ''

def get_wordnet_pos(word):
    tag = pos_tag_dict.get(nltk.pos_tag([word])[0][1][0].upper(), wordnet.NOUN)
    return tag

#define the stop_word list and lemmatizer
stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()

#define POS dictionary
pos_tag_dict = {
    "J": wordnet.ADJ,
    "N": wordnet.NOUN,
    "V": wordnet.VERB,
    "R": wordnet.ADV
}

preprocessed_articles = []
for article in tqdm(df['article']):
    preprocessed_article = preprocess(article)
    preprocessed_articles.append(preprocessed_article)

df['articlepreprocessed'] = preprocessed_articles

100%|██████████| 2500/2500 [1:30:38<00:00,  2.18s/it]  


## Part 2: Embeddings

After text is preprocessed, the first step towards making my functions is to make vectorized embeddings for the documents. This was done using the gensim Doc2Vec model, which can be imported pre-trained. With some light coding and waiting, we can generate vector embeddings for every document!

In [114]:
from gensim.models.doc2vec import TaggedDocument
from gensim.models.doc2vec import Doc2Vec

progress_bar = tqdm(total=len(df), desc="Creating Tagged Documents")
documents = []

#first, we need to tag the documents using the gensim function
for i, doc in enumerate(df['articlepreprocessed']):
    tagged_doc = TaggedDocument([str(tok) for tok in doc], [str(i)])
    documents.append(tagged_doc)
    progress_bar.update(1)

progress_bar.close()

#make a doc2vec model from our vectorized tokenized articles
doc2vecmodel = Doc2Vec(documents, vector_size=100, window=5, min_count=1, epochs=10)

doc2vecmodel.train(documents, total_examples=doc2vecmodel.corpus_count, epochs=doc2vecmodel.epochs)

#create doc vectors from the trained model and put them in our df
doc2vecs = []
for i in range(len(documents)):
    doc2vecs.append(model.dv[str(i)])
doc2vecs

Creating Tagged Documents: 100%|██████████| 2398/2398 [00:07<00:00, 305.93it/s]


[array([ 0.14345333, -0.5762746 , -0.6705191 ,  1.749033  , -0.08577247,
         0.9452673 , -0.7761058 ,  0.22611226, -1.0667661 ,  0.5009656 ,
         0.5620699 , -1.0295006 , -0.62735975,  0.12629114,  0.6831548 ,
        -0.5607723 ,  1.1495582 , -0.12214264, -0.08829594, -0.6494633 ,
         0.08825265, -0.33638024, -0.41168812,  0.10715834,  0.997938  ,
        -0.49096358, -1.3144943 ,  1.3361001 , -0.37984622, -0.168224  ,
         0.04087993,  0.83333206, -0.569373  ,  0.482072  , -0.5172734 ,
         0.35694447, -0.31452212,  0.15081766, -0.01062046, -0.4480339 ,
        -0.3882343 , -0.68731654, -0.41067213,  0.14214025,  0.8019014 ,
         1.2323159 , -1.1215405 ,  0.11879256,  0.0066396 , -0.3702025 ,
        -1.2418287 , -0.23889193, -0.99922574, -0.41744503, -0.31699383,
        -0.15924625, -0.40653506,  0.7826689 ,  0.94256425,  0.01129316,
         0.08220161, -0.43269092, -0.16232117,  0.8940847 ,  0.6492745 ,
        -0.05660123, -0.4107487 ,  0.5118683 , -0.8

In [70]:
#create doc vectors from the trained model and put them in our df
doc2vecs = []
for i in range(len(documents)):
    doc2vecs.append(doc2vecmodel.dv[str(i)])
doc2vecs

[array([-5.11873662e-01,  4.37892973e-01, -1.02749634e+00,  7.11872354e-02,
        -7.22468734e-01, -5.36378026e-01, -7.49973655e-01, -1.01734541e-01,
         4.81254309e-01, -2.00443640e-01,  1.83207631e-01, -7.63724521e-02,
        -9.00939524e-01,  3.01085174e-01,  8.07661116e-02, -6.63589779e-03,
         7.76158333e-01, -9.71933454e-02, -9.40027297e-01, -5.38942099e-01,
        -8.43590438e-01,  4.54482228e-01, -3.07552129e-01, -4.52302992e-02,
        -3.15270424e-01,  2.29428887e-01, -6.32156670e-01, -9.30249393e-01,
         6.56355143e-01, -1.37349117e+00,  1.38888335e+00,  6.23860776e-01,
        -6.80528939e-01,  2.00898498e-01, -1.72575474e-01,  5.25241077e-01,
        -3.07773024e-01, -8.65580082e-01,  1.01616874e-01, -1.04079962e-01,
         2.45870620e-01, -6.51523829e-01,  1.29366433e-03, -3.20173562e-01,
         3.35324883e-01, -4.39218313e-01, -1.30593288e+00,  3.05166036e-01,
         3.23403269e-01,  2.24625245e-01,  5.42919576e-01, -4.74976420e-01,
        -7.9

In [8]:
#df['doc2vec'] = doc2vecs
df.head()
#our dataframe is filling up :)

Unnamed: 0,index,article,abstract,articlepreprocessed,doc2vec
0,0,two decades ago bender and boettcher have foun...,we study the effect of @xmath0-symmetric comp...,two decade ago bender boettcher found broad fa...,"[0.14345333, -0.5762746, -0.6705191, 1.749033,..."
1,1,consider a family of bounded domains @xmath0 i...,consider a family of bounded domains @xmath0 ...,consider family bound domain plane generally e...,"[-0.47792354, 0.26594213, 0.5366033, -0.604406..."
2,2,"several years ago , an interesting observation...",we consider collision of two massive particle...,"several year ago , interest observation make ,...","[0.04416266, -0.8174544, 0.36274335, 0.1008200..."
3,3,the parallel chip - firing game or candy - pas...,the parallel chip - firing game is a periodic...,parallel chip - fire game candy - passing game...,"[1.1534754, -0.6944586, -0.255986, -0.13721564..."
4,4,"human physiology , as a science , aims to unde...",studying physiology over a broad population f...,"human physiology , science , aim understand me...","[0.732659, 0.7404116, -0.07270055, -0.15618636..."


## Part 3: Topic Generator!
The next model we will utilize is an LDA model, which will be used for topic identification and tagging. After this, I will have preprocessed tokens, document vectors, and topic tags for every document. I found that the LDA model makes funky results when documents include alphanumeric characters and the single-letter words frequently used to refer to figures or references, so we do some specialized preprocessing before giving our documents to the LDA model. 

In [17]:
# Create a dictionary representation of the articles
articles = df['articlepreprocessed']

def preprocessforlda(text):
    if isinstance(text, str):
        text = re.sub(r'(@xmath\d+|\\n|@xcite)', '', text)  # remove formatting and special character tags
        text = re.sub(r'[^a-zA-Z0-9\s]', '', text)  # remove non-alphanumeric characters
        text = re.sub(r'\b\w{1}\b', '', text)  # remove single-character words
        return text.strip()  # strip leading/trailing whitespaces
    else:
        return ''

preprocessed_for_lda = []
for article in tqdm(articles):
    preprocessed_article = preprocessforlda(article)
    tokens = list(tokenize(preprocessed_article))  # tokenize the preprocessed text
    preprocessed_for_lda.append(tokens)

articles = preprocessed_for_lda


100%|██████████| 2498/2498 [00:09<00:00, 251.32it/s]


In [None]:
from gensim import models

#create a BOW corpus of the articles
dictionary = corpora.Dictionary(articles)
corpus = [dictionary.doc2bow(tokens) for tokens in articles]

#initiate and train the model
num_topics = 10
lda_model = models.LdaModel(corpus, num_topics=num_topics, id2word=dictionary, passes=10)

#assign topics to the articles
df['topic_tag'] = None
for i, doc in enumerate(corpus):
    topic_dist = lda_model[doc]
    sorted_topics = sorted(topic_dist, key=lambda x: x[1], reverse=True)
    top_topic = sorted_topics[0][0]
    df.at[i, 'topic_tag'] = top_topic


In [141]:
import pyLDAvis.gensim

#visualize the LDA model
lda_vis = pyLDAvis.gensim.prepare(lda_model, corpus, dictionary)
pyLDAvis.display(lda_vis)


If you want to see a cool visualization of the topic model, expand the above cell!

In [139]:
#now, i would like to replace the numerical topic tags with "descriptive" titles
#first we sort articles into their categories and make a list of keywords for each topic
from collections import defaultdict

topic_keywords = defaultdict(list)
corpus = [dictionary.doc2bow(tokens) for tokens in articles]
for index, doc in enumerate(corpus):
    topic_distribution = lda_model[doc]
    dominant_topic = max(topic_distribution, key=lambda x: x[1])[0]
    topic_keywords[dominant_topic].extend(doc)

In [150]:
#generate "descriptive" titles for each topic based on the extracted keywords
topic_titles = {}
for topic, keywords in topic_keywords.items():
    top_keywords = sorted(keywords, key=lambda x: x[1], reverse=True)[:5]
    title = ' '.join([dictionary[word_id] for word_id, _ in top_keywords])  #generate a title by combining the top keywords
    topic_titles[topic] = title

#print the descriptive titles for each topic
for topic, title in topic_titles.items():
    print(f"Topic {topic}: {title}")


Topic 3: laser fig receptor pe one
Topic 4: e model program one thr
Topic 0: disc solution event model mass
Topic 9: oalignmedskip element vertex curve smallmatrix
Topic 5: d p f ii v
Topic 8: qquad theory right cdots field
Topic 7: j e q j hop
Topic 6: velocity disk distribution planet m
Topic 1: de de field de et
Topic 2: reaction fig mass cluster time


In [20]:
topic_titles
#so these titls dont make the most sense, but they're better than meaningless numbers

{3: 'laser fig receptor pe one',
 4: 'e model program one thr',
 0: 'disc solution event model mass',
 9: 'oalignmedskip element vertex curve smallmatrix',
 5: 'd p f ii v',
 8: 'qquad theory right cdots field',
 7: 'j e q j hop',
 6: 'velocity disk distribution planet m',
 1: 'de de field de et',
 2: 'reaction fig mass cluster time'}

In [21]:
df['topic_title'] = df['topic_tag'].map(topic_titles)
df.head()

Unnamed: 0,index,article,abstract,articlepreprocessed,doc2vec,topic_tag,summary,topic_title
0,0,two decades ago bender and boettcher have foun...,we study the effect of @xmath0-symmetric comp...,two decade ago bender boettcher found broad fa...,"[0.14345333, -0.5762746, -0.6705191, 1.749033,...",3,We study the effect of @xmath0-symmetric compl...,laser fig receptor pe one
1,1,consider a family of bounded domains @xmath0 i...,consider a family of bounded domains @xmath0 ...,consider family bound domain plane generally e...,"[-0.47792354, 0.26594213, 0.5366033, -0.604406...",4,Consider a family of bounded domains @xmath0 i...,e model program one thr
2,2,"several years ago , an interesting observation...",we consider collision of two massive particle...,"several year ago , interest observation make ,...","[0.04416266, -0.8174544, 0.36274335, 0.1008200...",0,Collision of two massive particles in the equa...,disc solution event model mass
3,3,the parallel chip - firing game or candy - pas...,the parallel chip - firing game is a periodic...,parallel chip - fire game candy - passing game...,"[1.1534754, -0.6944586, -0.255986, -0.13721564...",9,The parallel chip - firing game is a periodic ...,oalignmedskip element vertex curve smallmatrix
4,4,"human physiology , as a science , aims to unde...",studying physiology over a broad population f...,"human physiology , science , aim understand me...","[0.732659, 0.7404116, -0.07270055, -0.15618636...",5,The electronic health record ( ehr ) data prom...,d p f ii v


## Part 4: Summary Generator
For the final model, the summary generator, I used a pretrained BART transformer model. This was downloaded pretrained from the spacy transformers library, and used to generate short summaries from the article abstracts. While abstracts are themselves short summaries, for this browsing tool I need summaries that can be understood at a glance. Using the BART model, I generated new summaries by condensing the articles' abstracts into one or two sentences. 

While many of the summaries were just trimmed and shortened versions of the raw abstract text, some of them are surprisingly robust and original summaries!

In [63]:
from transformers import pipeline

summarizer = pipeline(
    "summarization",
    model="facebook/bart-large-cnn",
    max_length=50,
    min_length=20,
    do_sample=False
)

text = df['abstract'][3]
summary = summarizer(text)[0]['summary_text']
print(summary)


Some weights of the model checkpoint at allenai/scibert_scivocab_uncased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.bias', 'cls.predictions.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [17]:
from tqdm import tqdm
summary_list = []

#iterate through the abstracts and generate summaries
for abstract in tqdm(df['abstract']):
    abstract = abstract[:1023]
    summary = summarizer(abstract)[0]['summary_text']
    summary_list.append(summary)

#add the summaries to a new column 'summary' in the DataFrame
df['summary'] = summary_list

#this code was ran on a separate google colab notebook while I had local python issues, and then joined back onto our df

  0%|          | 0/2498 [00:00<?, ?it/s]


NameError: name 'summarizer' is not defined

In [158]:
df.head()

Unnamed: 0,index,article,abstract,articlepreprocessed,doc2vec,topic_tag,summary,topic_title
0,0,two decades ago bender and boettcher have foun...,we study the effect of @xmath0-symmetric comp...,two decade ago bender boettcher found broad fa...,"[0.14345333, -0.5762746, -0.6705191, 1.749033,...",3,We study the effect of @xmath0-symmetric compl...,laser fig receptor pe one
1,1,consider a family of bounded domains @xmath0 i...,consider a family of bounded domains @xmath0 ...,consider family bound domain plane generally e...,"[-0.47792354, 0.26594213, 0.5366033, -0.604406...",4,Consider a family of bounded domains @xmath0 i...,e model program one thr
2,2,"several years ago , an interesting observation...",we consider collision of two massive particle...,"several year ago , interest observation make ,...","[0.04416266, -0.8174544, 0.36274335, 0.1008200...",0,Collision of two massive particles in the equa...,disc solution event model mass
3,3,the parallel chip - firing game or candy - pas...,the parallel chip - firing game is a periodic...,parallel chip - fire game candy - passing game...,"[1.1534754, -0.6944586, -0.255986, -0.13721564...",9,The parallel chip - firing game is a periodic ...,oalignmedskip element vertex curve smallmatrix
4,4,"human physiology , as a science , aims to unde...",studying physiology over a broad population f...,"human physiology , science , aim understand me...","[0.732659, 0.7404116, -0.07270055, -0.15618636...",5,The electronic health record ( ehr ) data prom...,d p f ii v


In [16]:
#observe this example of a very long and annoyingly wordy abstract, condensed into a two sentence blurb!
#while the transformer does cut itself off when it reaches the length limit and leaves cut-off sentences, the summaries are still useful
print(df['abstract'][4])
print('\n', df['summary'][4])

 studying physiology over a broad population for long periods of time is difficult primarily because collecting human physiologic data is intrusive , dangerous , and expensive . 
 one solution is to use data that has been collected for a different purpose . 
 electronic health record ( ehr ) data promise to support the development and testing of mechanistic physiologic models on diverse populations and allow correlation with clinical outcomes , but limitations in the data have thus far thwarted such use . 
 for example , using uncontrolled population - scale ehr data to verify the outcome of time dependent behavior of mechanistic , constructive models can be difficult because : ( i ) aggregation of the population can obscure or generate a signal , ( ii ) there is often no control population with a well understood health state , and ( iii ) diversity in how the population is measured can make the data difficult to fit into conventional analysis techniques . 
 this paper shows that it is

## Part 5: Putting it Together into the Prototype App!

The prototype article-browsing app has three main parts:
* the main loop gives the user options to search, browse, or exit. Searching prompts for an input, which is fed to the search_function, browsing suggests articles (random at first) to read, and as the user chooses which articles to read it provides subsequent similar articles to browse. 
* the browse_article function suggests a number of random articles, displays their summaries, and allows the user to pick one to look at. After the reader is done with the selected paper, the function returns a set of new articles that are similar to what the user read, and we return to the main loop. Finding similar articles to suggest for the next iteration is done using the search function.
* the search_function function takes an inputted phrase or query (or full article from the browse_article function) and returns a set of articles with the nearest cosine similarities. This is done by preprocessing and vectorizing the input query as if it were a document, and then comparing the distance of the query vector to the database's document embedding vectors. 

See the output below for a usage demonstration, where I browse and read the abstract of a random paper, and then search for articles that could be relevant to physical chemistry. 

In [15]:
import random

def browse_articles(stored_articles = None):
    #suggest a set of articles to the user
    if stored_articles != None:
        suggestions = stored_articles
    else:
        suggestions = random.sample(range(len(df)), 5)
    suggested_articles = df.iloc[suggestions]
    
    #print the article options with full summary and topic title
    print("Please pick an article from the following options:")
    current_option = 0
    for index, article in suggested_articles.iterrows():
        print(f"Press {current_option} to read article {index}:")
        current_option += 1
        print("Summary:", article['summary'])
        print("Topic Title:", topic_titles[article['topic_tag']])
        print()

    while True:
        try:
            choice = int(input("Enter the index of the chosen article (0-4): "))
            if choice in range(5):
                break
            else:
                print("Invalid input. Please enter a valid index.")
        except ValueError:
            print("Invalid input. Please enter a valid index.")
    
    chosen_article = suggested_articles.iloc[choice]
    
    #display the abstract of the chosen article and prompt the user to read more or continue
    print("\nAbstract:\n", chosen_article['abstract'])
    while True:
        option = input("Do you want to read the full article? (Y/N): ")
        if option.lower() == 'y':
            print("\nBody Text:\n", chosen_article['article'])
            break
        elif option.lower() == 'n':
            break
        else:
            print("Invalid input. Please enter 'Y' to read the full article or 'N' to continue browsing.")
    
    #suggest similar articles based on the chosen article
    similar_articles_indices = search_function(chosen_article['article'], top_n=5)
    similar_articles = df.iloc[similar_articles_indices]
    
    # print("\nYou have finished reading the article. Here are some similar articles:")
    # for index, article in similar_articles.iterrows():
    #     print(f"Option {index}:")
    #     print("Summary:", article['summary'])
    #     print("Topic Title:", topic_titles[article['topic_tag']])
    #     print()
    return similar_articles_indices

def search_function(phrase, top_n=5):
    # Process the input phrase into a document vector
    query = preprocess(phrase)
    query_vector = doc2vecmodel.infer_vector(query.split())
    
    # Compute cosine similarity between the input vector and all document vectors in the model
    similar_documents = doc2vecmodel.docvecs.most_similar([query_vector], topn=top_n)
    
    # Get the indices of the top-n most similar documents
    top_indices = [int(similarity[0]) for similarity in similar_documents]
    
    return top_indices

# Run the article browsing app
while True:
    print("Please choose an option:")
    print("1. Search for articles")
    print("2. Browse articles")
    print("3. Exit")
    
    choice = input("Enter the number of your choice: ")
    
    if choice == "1":
        search_phrase = input("Enter a phrase to search for articles: ")
        search_results = search_function(search_phrase)
        print("Search Results:")
        for index, article in df.iloc[search_results].iterrows():
            print(f"Option {index}:")
            print("Summary:", article['summary'])
            print("Topic Title:", topic_titles[article['topic_tag']])
            print()
    elif choice == "2":
        browse_articles()
    elif choice == "3":
        print("Exiting the app. Goodbye!")
        break
    else:
        print("Invalid choice. Please enter a valid option.")


Please choose an option:
1. Search for articles
2. Browse articles
3. Exit


Enter the number of your choice:  2


Please pick an article from the following options:
Press 0 to read article 531:
Summary: The exact confidence interval for a proportion of @xmath0 is known to be unnecessarily conservative. We propose coverage - adjustments of the clopper - pearson interval. The adjusted intervals have improved coverage and are often shorter than competing intervals found
Topic Title: laser fig receptor pe one

Press 1 to read article 1703:
Summary: In this paper, we give a polynomial lower bound for the resonances of @xmath0 perturbed by an obstacle in even - dimensional euclidean spaces. The proof is based on a poisson summation formula
Topic Title: qquad theory right cdots field

Press 2 to read article 1687:
Summary: There is now strong evidence that many llagns contain accreting massive black holes and that the nuclear radio emission is dominated by parsec - scale jets. The results provide further support for jet dominance of the core radio emission.
Topic Title: j e q j hop

Press 3 to read artic

Enter the index of the chosen article (0-4):  3



Abstract:
  the velocity profiles of weak metal absorption lines can be used to observationally probe the kinematic state of gas in damped lyman-@xmath0 systems . 
 prochaska and wolfe @xcite have argued that the flat distribution of velocity widths ( @xmath1 ) combined with the asymmetric line profiles indicate that the dlas are disks with large rotation velocities ( @xmath2200 km / s ) . 
 an alternative explanation has been proposed by haehnelt , steinmetz , and rauch ( hsr)@xcite , in which the observed large velocity widths and asymmetric profiles can be produced by lines of sight passing through two or more clumps each having relatively small internal velocity dispersions . 
 we investigate the plausibility of this scenario in the context of semi - analytic models based on hierarchical merging trees and including simple treatments of gas dynamics , star formation , supernova feedback , and chemical evolution . 
 we find that all the observed properties of the metal - line system

Do you want to read the full article? (Y/N):  n


  similar_documents = doc2vecmodel.docvecs.most_similar([query_vector], topn=top_n)


Please choose an option:
1. Search for articles
2. Browse articles
3. Exit


Enter the number of your choice:  1
Enter a phrase to search for articles:  quantum spin state atomic eigenvalue wave function


Search Results:
Option 1531:
Summary: In the stochastic mean - field ( smf ) approach, an ensemble of initial values for a selected set of one - body observables is formed. This ensemble is formed from a phase - space distribution that reproduces the initial
Topic Title: reaction fig mass cluster time

Option 2079:
Summary: Ground - based measurements of the transmission and emission spectra of the hot - jupiter wasp-19b in nine spectroscopic channels from 1.25 to 2.35@xmath0 m. The measurements are based on
Topic Title: reaction fig mass cluster time

Option 2171:
Summary:  abundance anomalies observed in globular cluster stars indicate pollution with material processed by hydrogen burning. Two main sources have been suggested : asymptotic giant branch ( agb ) stars and massive stars rotating near the break-up limit ( spin
Topic Title: laser fig receptor pe one

Option 1778:
Summary: Small single - domain proteins often exhibit only a single free - energy barrier , or transition state

Enter the number of your choice:  3


Exiting the app. Goodbye!
