* Konu modelleme uygulaması Daniel Wolffram'ın (https://www.kaggle.com/danielwolffram) çalışmalarından uyarlanmıştır. 

* Derlem olarak Lykke ve arkadaşları (2010) tarafından oluşturulan iSearch tercih edilmiştir. Derlem 434.817 (PF + PN) belge, 65 konu listesi, her bir konu için 200 derecelendirilmiş ilgililik değerlendirmeleri ve 3,7 milyondan fazla dahili referanstan oluşmaktadır. ArXiv.org'dan alınan tam metin ve üstveri setlerinden meydana gelmektedir.

Kaynaklar: 

Kyberd, P. (2015). Explainer: what are fundamental particles? (No. PRESSCUT-H-2015-109).

Lykke M., Larsen B., Lund H., Ingwersen P. (2010) Developing a Test Collection for the Evaluation of Integrated Search. In: Gurrin C. et al. (eds) Advances in Information Retrieval. ECIR 2010. Lecture Notes in Computer Science, vol 5993. Springer, Berlin, Heidelberg.
> 
![](https://images.theconversation.com/files/75083/original/image-20150317-22294-qb87ml.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=240&h=300&fit=crop&dpr=2)

Spesifik olarak kuark (quark) terimini seçtim. Altı kuark türü bulunuyor:

* yukarı (up)
* aşağı (down)
* tılsım (charm)
* acayip (strange)
* üst (top)
* alt (bottom)

Bu altı kuark da üç çifte ayrılıyor: "yukarı ve aşağı", “sevimli ve tuhaf" ile “üst ve alt" (önceden “gerçek (truth) ve güzel (beauty)). Aşağı ve yukarı kuarklar, her atomun kalbinde bulunan proton ve nötronları oluşturacak şekilde birleşirler. Sadece kuark çiftlerinin en hafifi olan aşağı ve yukarı kuarklar normal madde içinde bulunurlar. Sevimli/tuhaf ve üst/alt çiftlerinin şu andaki evrende oynadıkları bir rol yok gibi görülmektedir; ama ağırlığı daha fazla olan leptonlar gibi, evrenin ilk anlarında bizi meydana getiren evreni oluşturmada görev almışlardı. (https://evrimagaci.org/parcacik-fizigi-standart-model-nedir-temel-parcaciklar-nelerdir-3733)



# Paketleri Yükleme

In [4]:
from IPython.utils import io
with io.capture_output() as captured:
    !pip install scispacy
    !pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.4/en_core_sci_lg-0.2.4.tar.gz

In [5]:
import numpy as np 
import pandas as pd

from sklearn.feature_extraction import text
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.decomposition import LatentDirichletAllocation

import scispacy
import spacy
import en_core_sci_lg

from scipy.spatial.distance import jensenshannon

import joblib

from IPython.display import HTML, display

from ipywidgets import interact, Layout, HBox, VBox, Box
import ipywidgets as widgets
from IPython.display import clear_output

from tqdm import tqdm
from os.path import isfile

import seaborn as sb
import matplotlib.pyplot as plt
plt.style.use("dark_background")

# Verileri Yükleme ve Hazırlma

In [6]:
#df = pd.read_excel('http://mugeakbulut.com/phd/dataset.xlsx',
#df = pd.read_excel('http://mugeakbulut.com/phd/iSearch_full.xlsx',
#df = pd.read_excel('http://mugeakbulut.com/phd/100.xlsx',
encoding="ISO-8859-1",
df = pd.read_excel('/Users/mugeakbulut/Academia/Right Now/PhD_Tez/Programlar/Kaggle/iSearch.xlsx',
header=0,
index_col=False,
keep_default_na=True
)

FileNotFoundError: [Errno 2] No such file or directory: '/Users/mugeakbulut/Academia/Right Now/PhD_Tez/Programlar/Kaggle/iSearch.xlsx'

In [None]:
#kaç tane listelesin?
df.head(n = 50) # df.head(2) = df.head(n = 50)

#df.head()

dataframe'in (df) satır ve sütun sayısı nedir?

In [None]:
df.shape

![](http://)Abstractlar üzerinde çalışıyorum. DİKKAT! İndirme tamamlanınca UTF-8 yapıp "\n"leri kaldıracağım.

In [None]:
all_texts = df.Abstract

In [None]:
# example snippet
all_texts[0][:200]

# LDA (Latend Dirichlet Allocation)

Ön işleme için [scispaCy](https://allenai.github.io/scispacy/) paketini kullandım.
https://www.kaggle.com/alexandruuu/spacy-preprocessing güzel anlatıyor.

In [None]:
# medium model
nlp = en_core_sci_lg.load(disable=["tagger", "parser", "ner"])
nlp.max_length = 2000000

In [None]:
def spacy_tokenizer(sentence):
    return [word.lemma_ for word in nlp(sentence) if not (word.like_num or word.is_stop or word.is_punct or word.is_space or len(word)==1)]

In [None]:
# New stop words list 
customize_stop_words = [
    'doi', '\n', 'preprint', 'copyright', 'peer', 'reviewed', 'org', 'https', 'et', 'al', 'author', 'figure', 
    'rights', 'reserved', 'result','permission', 'used', 'using', 'license', 'fig', 'ArXiv', 'fig.', 'al.', 'Elsevier', 'PMC', 'CZI',
    '-PRON-'
]

# Mark them as stop words
for w in customize_stop_words:
    nlp.vocab[w].is_stop = True

In [None]:
filepath = '../input/topic-modeling-finding-related-articles/'

# Dosyaları oluşturma

In [None]:
vectorizer = CountVectorizer(tokenizer = spacy_tokenizer, min_df=2)
data_vectorized = vectorizer.fit_transform(tqdm(all_texts))

In [None]:
data_vectorized.shape

In [None]:
# vectorizer = CountVectorizer(tokenizer = spacy_tokenizer, max_features=800000)
# data_vectorized = vectorizer.fit_transform(tqdm(all_texts))

In [None]:
# data_vectorized.shape # with bigrams: 6428134

# data_vectorized.shape # all 1.2 mio?

In [None]:
# en sık geçen kelimeler
word_count = pd.DataFrame({'word': vectorizer.get_feature_names(), 'count': np.asarray(data_vectorized.sum(axis=0))[0]})

word_count.sort_values('count', ascending=False).set_index('word')[:20].sort_values('count', ascending=True).plot(kind='barh')

In [None]:
#joblib.dump(vectorizer, 'vectorizer.csv')
joblib.dump(data_vectorized, 'data_vectorized.csv')
joblib.dump(vectorizer, 'vectorizer.xlsx')

In [None]:
# if not (isfile(filepath + 'vectorizer.csv') & isfile(filepath + 'data_vectorized.csv')):
#     print('Files not there: generating')
#     vectorizer = CountVectorizer(tokenizer = spacy_tokenizer, max_features=800000)
#     data_vectorized = vectorizer.fit_transform(tqdm(all_texts))
#     joblib.dump(vectorizer, 'vectorizer.csv')
#     joblib.dump(data_vectorized, 'data_vectorized.csv')

# else:
#     vectorizer = joblib.load(filepath + 'vectorizer.csv')
#     data_vectorized = joblib.load(filepath + 'data_vectorized.csv')

# Kaç Konu Olsun?

In [None]:
lda = LatentDirichletAllocation(n_components=20, random_state=0)
lda.fit(data_vectorized)
joblib.dump(lda, 'lda.csv')

In [None]:
# # Train/Load Model
# if not (isfile(filepath + 'lda.csv')):
#     print('File not there: generating')
#     lda = LatentDirichletAllocation(n_components=50, random_state=0)
#     lda.fit(data_vectorized)

#     joblib.dump(lda, 'lda.csv')

# else:
#     lda = joblib.load(filepath + 'lda.csv') 

# Konuları Belirleme

In [None]:
def print_top_words(model, vectorizer, n_top_words):
    feature_names = vectorizer.get_feature_names()
    for topic_idx, topic in enumerate(model.components_):
        message = "\nTopic #%d: " % topic_idx
        message += " ".join([feature_names[i]
                             for i in topic.argsort()[:-n_top_words - 1:-1]])
        print(message)
    print()

In [None]:
print_top_words(lda, vectorizer, n_top_words=30)

Each article is a mixture of topics / a distribution over topics

In [None]:
doc_topic_dist = pd.DataFrame(lda.transform(data_vectorized))
doc_topic_dist.to_csv('doc_topic_dist.csv', index=False)

In [None]:
#if not (isfile(filepath + 'doc_topic_dist.csv')):
#       print('File not there: generating')
#        doc_topic_dist = pd.DataFrame(lda.transform(data_vectorized))
#         doc_topic_dist.to_csv('doc_topic_dist.csv', index=False)
# else:
#        doc_topic_dist = pd.read_csv(filepath + 'doc_topic_dist.csv')  

# Nearest Makaleler (in Topic Space)

In [None]:
#50 tane sıralasın
#doc_topic_dist.head(n = 50) # df.head(2) = df.head(n = 50)

doc_topic_dist.head()

In [None]:
def get_k_nearest_docs(doc_dist, k=10, lower=1950, upper=2020, only_quark=False, get_dist=False):
    '''
    doc_dist: topic distribution (sums to 1) of one article
    
    Returns the index of the k nearest articles (as by Jensen–Shannon divergence in topic space). 
    '''
    
    relevant_time = df['Year '].between(lower, upper)
    
    if only_quark:
        temp = doc_topic_dist[relevant_time & is_quark_article]
        
    else:
        temp = doc_topic_dist[relevant_time]
         
    distances = temp.apply(lambda x: jensenshannon(x, doc_dist), axis=1)
    k_nearest = distances[distances != 0].nsmallest(n=k).index
    
    if get_dist:
        k_distances = distances[distances != 0].nsmallest(n=k)
        return k_nearest, k_distances
    else:
        return k_nearest

In [None]:
#d = get_k_nearest_docs(doc_topic_dist[df.paper_id == 'PN018446'].iloc[0])

#sb.kdeplot(d)

In [None]:
def plot_article_dna(paper_id, width=20):
    t = df[df.DOCNO == paper_id].title.values[0]
    doc_topic_dist[df.DOCNO == paper_id].T.plot(kind='bar', legend=None, title=t, figsize=(width, 4))
    plt.xlabel('Topic')

def compare_dnas(paper_id, recommendation_id, width=20):
    t = df[df.DOCNO == recommendation_id].Title.values[0]
    temp = doc_topic_dist[df.DOCNO == paper_id]
    ymax = temp.max(axis=1).values[0]*1.25
    temp = pd.concat([temp, doc_topic_dist[df.DOCNO == recommendation_id]])
    temp.T.plot(kind='bar', title=t, figsize=(width, 4), ylim= [0, ymax])
    plt.xlabel('Topic')
    plt.legend(['Selection', 'Recommendation'])

# compare_dnas('90b5ecf991032f3918ad43b252e17d1171b4ea63', 'a137eb51461b4a4ed3980aa5b9cb2f2c1cf0292a')

def dna_tabs(paper_ids):
    k = len(paper_ids)
    outs = [widgets.Output() for i in range(k)]

    tab = widgets.Tab(children = outs)
    tab_titles = ['Makale ' + str(i+1) for i in range(k)]
    for i, t in enumerate(tab_titles):
        tab.set_title(i, t)
    display(tab)

    for i, t in enumerate(tab_titles):
        with outs[i]:
            ax = plot_article_dna(paper_ids[i])
            plt.show(ax)

def compare_tabs(paper_id, recommendation_ids):
    k = len(recommendation_ids)
    outs = [widgets.Output() for i in range(k)]

    tab = widgets.Tab(children = outs)
    tab_titles = ['Makale ' + str(i+1) for i in range(k)]
    for i, t in enumerate(tab_titles):
        tab.set_title(i, t)
    display(tab)

    for i, t in enumerate(tab_titles):
        with outs[i]:
            ax = compare_dnas(paper_id, recommendation_ids[i])
            plt.show(ax)

# Seçilen Makaleyle İlgili Makaleleri Sıralama

Benzerlik ölçüsü olarak **Jensen-Shannon distance** kullanıldı. Jensen-Shannon mesafesi iki olasılık dağılımı arasındaki mesafeyi ölçmek için kullanılıyor. Information radius (yani iRad) veya ortalamaya olan toplam uzaklık (total divergence to average) olarak da geçiyor.

In [None]:
def recommendation(paper_id, k=5, lower=1950, upper=2020, only_quark=False, plot_dna=False):
    '''
    Returns the title of the k papers that are closest (topic-wise) to the paper given by paper_id.
    '''
    
    print(df.Title[df.DOCNO == paper_id].values[0])

    recommended, dist = get_k_nearest_docs(doc_topic_dist[df.DOCNO == paper_id].iloc[0], k, lower, upper, only_quark, get_dist=True)
    recommended = df.iloc[recommended].copy()
    recommended['similarity'] = 1 - dist 
    
    h = '<br/>'.join(['<a href="' + l + '" target="_blank">'+ n + '</a>' +' (Benzerlik: ' + "{:.2f}".format(s) + ')' for l, n, s in recommended[['Link','Title', 'similarity']].values])
    display(HTML(h))
    
    if plot_dna:
        compare_tabs(paper_id, recommended.DOCNO.values)


In [None]:
recommendation('PN018464', k=5, plot_dna=True)

In [None]:
recommendation('PN018548', k=10, plot_dna=True)

In [None]:
recommendation('PN018456', k=1, plot_dna=True)

In [None]:
recommendation('PN018444', k=5, only_quark=False, plot_dna=True)

In [None]:
recommendation('PN018501', k=10, plot_dna=True)#PN018501

# Widget: Quark ile ilgili olanlar

This makes it easier to pick a paper (you don't have to search the paper_id).

In [None]:
def related_papers():
    '''
    Creates a widget where you can select one of many papers about quarks and then displays related articles from the whole dataset.
    '''
    justquark_papers = df[df.Abstract.str.contains('quark|top quark|up quark|charm quark|strange quark')][['DOCNO', 'Title']] # are there more names?
    title_to_id = justquark_papers.set_index('Title')['DOCNO'].to_dict()
    
    def main_function(bullet, k=5, year_range=[1950, 2020], only_quark=False):
        recommendation(title_to_id[bullet], k, lower=year_range[0], upper=year_range[1], only_quark=only_quark)
    
    yearW = widgets.IntRangeSlider(min=1950, max=2020, value=[2010, 2020], description='Year Range', 
                                   continuous_update=False, layout=Layout(width='40%'))
    justquarkW = widgets.Checkbox(value=False,description='Sadece quarklarla ilgili',disabled=False, indent=False, layout=Layout(width='20%'))
    kWidget = widgets.IntSlider(value=10, description='k', max=50, min=1, layout=Layout(width='20%'))

    bulletW = widgets.Select(options=title_to_id.keys(), layout=Layout(width='90%', height='200px'), description='Title:')

    widget = widgets.interactive(main_function, bullet=bulletW, k=kWidget, year_range=yearW, only_quark=justquarkW)

    controls = VBox([Box(children=[widget.children[:-1][1], widget.children[:-1][2], widget.children[:-1][3]], 
                         layout=Layout(justify_content='space-around')), widget.children[:-1][0]])
    output = widget.children[-1]
    display(VBox([controls, output]))

In [None]:
related_papers()

# Görevler

Görevleri konu ile eşleyip ilgili makaleleri sıralayalım

In [None]:
#taskID:iSearch'teki Topic no oluyor
#task içinde 
#   1.satır: current_information_need
#   2.satır: work_task
#   3.satır: background_knowledge
#   4.satır ideal_answer
#   5.satır: search_terms
#dolayısıyla 3. satır falan gereksiz.

#author_id 85
task1 = ["I am looking for information about manipulation and immobilisation of nano spheres and peptide nano particles.",
"I am starting my master thesis in which I will fabricate self-assembled peptide nano spheres, which needs to be manipulated and immobilized. This is intended done by filling them with metals e.g. gold (Au) or iron (Fe) and use the electrical and magnetic properties to manipulate and immobilise the spheres. This could be by using dielectrophoresis on a chip or micro fluidic device. The nano spheres are intended for biomedical use in which techniques for manipulating biological and biomedical materials are interesting.",
"The background knowledge is limited since the thesis is starting up this week. But I have been working with sorting of blood cells in micro fluidic devices and flow cytometry.",
"An ideal answer could be an article showing how to manipulate peptide nano spheres. But in it would in fact might be better if there isnt any articles on the subject since this would mean the research is new.",
"Manipulation, nano spheres, peptides, immobilisation"]

#author_id 85
task2 = ['I am looking for information about manipulation and sorting of magnetic particles, beads or spheres on nanoscale. This might be in a micro fluidic system.',
'As a part of my master thesis it is interesting to fabricate a sorting device which can sort magnetic nano spheres from a sample. This will often be in a micro fluidic device because the nano sphere/particles often will be diluted in some sort of solution.',
'I have been making sorting devices for micro particles based on flow profiles in a microfluidic system.',
'Published material on how to sort magnetic beads, particles or spheres on nanoscale.',
'Nano spheres, beads, magnetic, sorting']

#task = ['',
#'',
#'',
#'',
#'']

tasks={'I am looking for information about manipulation and immobilisation of nano spheres and peptide nano particles': task1,
       'I am looking for information about manipulation and sorting of magnetic particles, beads or spheres on nanoscale. This might be in a micro fluidic system.': task2} 
       #'I am looking for information on how to make an on chip flow cytometry using an LED as a light source and an APD (Avalanche photodiode) as a detector.': task3, 
       #'I want information about protein-protein interaction, the surface charge distribution of these proteins and how this has been investigated with Electrostatic Force Microscopy (EFM). The proteins of interest are the Avidin-Biotin and IgG-anti-IgG systems.The most important for the scenario is how this has been investigated with EFM.': task4,
       #'I would like some basic knowledge about two methods called Elisa and Immunoassay.': task5, 
       #'I would like some basic knowledge of the electrical properties of Avidin-Biotin and IgG - anti-IgG, the electrostatic behaviour, surface charge distribution and there interaction.': task6, 
       #'I have a certain equation that I would like to solve numerically. I need to find out what people have done previously with similar equations, preferably of the exact same form. It is a system of nonlinear coupled ordinary  integro-differential equations with delay terms. This means that the derivatives of the functions  to solve for depend on a combination of integrals, function values and derivatives of these functions, whose arguments (independent variables) can be smaller (hence the term "delay"). I am looking for theory on this type of equations or sufficiently similar types with emphasis on numerical solution and implementation.': task7,
       #'Descriptions of models and theory concerning passive mode-locking in linear cavities. It is important that it is for linear and not ring cavities or some sort of coupled cavities or mode-locking using coupled external cavities.': task8, 
       #'I want information on how to measure dielectric properties on cells, example in microfluidic systems.': task9,
       #'Literature on sorting of cells in microfludic systems. One example could be cancer cells.': task10,
       #'Theory on dielectricphoresis for cell sorting.': task11,
       #'Im looking for reaction kinetics of borate and phosphate buffers. More specifically, I need information on the forward and backward rate constants.': task12,
       #'Im looking for information on possible ways to achieve significant slip-lengths in micro- and nanochannels.': task13,
      # 'Im looking for the dynamics in induced-charge electro-osmotic flow with a finite electric double layer.': task14,
      # 'I am looking for examples of intracellular recordings on cells. Especially Im looking for recordings of the intracellular electrical potential of living neurons. I want to investigate which methods that previously have been used to study the intracellular environment of living cells. And again especially methods that has been used to record the intracellular electrical potential': task15,
       #'I am looking for a general expression for the heat transfer coefficient in a liquid solid interface. I want to incorporate this expression into a simulation as a boundary condition during a thermodynamic numerical simulation.': task16,
       #'I am looking for a general theory of thermo pneumatic actuation. Especially I am interested in am overview over devices that utilize thermo pneumatic actuation. It would also be interesting to how this actuation principle has been incorporated into a fabrication sequence of a final device.': task17,
      # 'I am looking for results for experimental data where researchers have built a vertical cavity surface emitting laser (VCSEL), deposited a chemical coating (an organic thin film polymer, e.g. MAH) on these and then monitored the power output, or voltage as function of exposure to a gaseous compound.': task18,
       #'I am looking for all an extensive research on the articles that exist on making tunable vertical cavity surface emitting laser diodes (tunable VCSEL) by integration a micro-electromechanical system with a VCSEL.': task19,
       #'I am looking for both theoretical and experimental articles describing the properties of single laser cavities in photonic crystals - The electromagnetic mode profiles, lasing curves, threshold powers and emission spectra.': task20,
       #'I am looking for both theoretical and experimental articles describing the properties of coupled arrays of laser cavities in photonic crystals - The electromagnetic mode profiles, lasing curves, threshold powers, emission spectra and coupling strength.': task21,
      # 'I am looking for a numerical method to calculate the far-field (or far-zone) emission spectrum from the electromagnetic modes from Finite Difference Time Domain (FDTD) calculations.': task22,
       #'I am looking for research papers investigating the properties of coupled resonator optical waveguides (sometimes abbreviated CROW) or equivalently coupled resonators in photonic crystals. The papers can investigate active or passive structures, but in particular, I am interested in the optical steady state intensity transmission properties or general dynamical properties of the structures.': task23,
       #'The effect of the Raman coefficient on the super continuum spectrum generated by soliton propagation in non-liner fibres, influenced by self-steepening or optical shock effect. In particular I look for results obtained by using a Fourier Split Step Method for solving the non-linear Schrvdinger equation.': task24,
       #'An example of an implementation of Fast Fourier Split Step method, and in particular an example of how to use Fast Fourier Transformation (FFT) algorithms in Matlab (a commercial software package) correctly.': task25,
       #'I want information about piezoelectric energy harvesting. I would like reviews and research articles about the topic vibrational energy harvesting with special emphasize on piezo electric conversion.  ': task26,
       #'I want information about low resonance frequency cantilevers, membranes, micro structures and MEMS structures.': task27,
       #'I want information about the application of non-equilibrium Greens functions to optical phenomena in semiconductors. Especially quantum dot systems with the interaction with phonons and photons.': task28,
       #'I want information about single-photon indistinguishability in connection with semiconductor single-photon sources. Especially theoretical models describing the important physics while including non-Markovian effects arising from electron-phonon interactions.': task29,
       #'I need data on wet oxidation of GaAs (galiumarsenide).': task30,
       #'I need information on the piezoelectric coefficient of AlGaAs.': task31,
       #'I am looking for information high-index-constrast (HCG) subwavelength grating mirrors.': task32,
       #'I am looking for some description of the basic properties of the SO(N), SU(N), O(N), and U(N) groups and their generators.More specifically I need to know about the SU(2), the SU(3), the SU(4), and the SO(6) groups.Some introductory material on the associated Lie Algebras would be nice.': task33,
       #'I am looking for derivations concerning the vertex structure and the correlation functions of N=4 Supersymmetric Yang-Mills Theory with a U(N) gauge group (often called SYM).More specifically I need information on derivations originating from the scalar interaction terms of the action of the theory.': task34,
       #'A brief introduction to String Theory in an AdS background (which is short for Anti-de-Sitter space). Connections with this topic to the AdS/CFT-correspondence would be helpful, too. Specifically I am looking for details on quantization of the theory (though this has not been done successfully yet) and the holography and the properties of the AdS manifold.': task35,
       #'I want information about atmospheric disturbances of light waves and strehl ratio.': task36,
       #'I want information about techniques to calculate and subtract bias, flatfield, darkcurrent and other sources of error from CCDs.': task37,
       #'I want information about Lucky Imaging and Speckle Imaging techniques using L3-CCD chips.': task38,
       #'I  am looking for literature that can give me an overview on the research field symbolic dynamics and biological networks especially work concerning the Hastings-Powell (HP) model and the Blausius-Huppert-Stone (BHS) model. I want to find out, by varying the parameters in these two models, if they are ultimately identical except from one term. I think this has not been done before, but I am very interested in finding out if  the parameter space of these two models has been investigated by others. By this I mean, if anybody else has reported findings from varying the parameters of the Hastings-Powell model and/or the Blausius-Huppert-Stone model. The findings could for instance be: If  this parameter in the HP-model is assigned this value, the system goes chaotic - if the parameter is assigned this other value the system becomes periodic. I am interested in the parameter space where these two models are periodic.NB: There is uncertainty about the spelling of Blausius from the Blausius-Huppert-Stone model, sometimes he is also spelled Blasius. I do not know which of these is the correct spelling.': task39,
       #'Effect of octanol and other primary alchohols on phase transition in 1,2-Dipalmitoyl-sn-Glycero-3-Phosphocholine (DPPC) and other phospholipids. Has any effect on transition enthalpy been reported? I wish to find out if the presence of alcohol affects the transition enthalpy of the main transition of DPPC and other phospholipids. The transition is also called melting and freezing.': task40,
       #'How much energy can be generated globally from wind energy systems?This is related to the question of how much wind energy is dissipated globally.And equivalently, how much mechanical energy is dissipated in the ocean?What are the effects of large scale wind energy systems on the atmosphere?': task41,
       #'Typical dates for melting and freezing of ice/snow in the polar region? Are polar sea ice and snow on land melting more and faster than before?I want to know what are the main points of debate within different scientific communities concerned with ice/snow melting or freezing in the polar region.I have access to new data and a new method that gives new information in a more consistent way. I want to relate this to more traditional studies.': task42,
       #'The solar wind and its interaction with planetary magnetospheres as well as the interstellar medium, but with special focus in the shielding effects of the wind with respect to galactic cosmic rays from supernovae.': task43,
       #'Models of emerging magnetic flux tubes.': task44,
       #'Information about nano-structured anti-reflective surfaces.': task45,
       #'Implementaion of diffractive optics on multi-processor systems, especially on graphics processing units (GPU) using the CUDA-framework.': task46,
       #'I am looking for information on manufactoring of ZnO films by rf magnetron sputtering and specifically highly doped ZnO': task47,
       #'Information on characterization by photo luminescence of highly doped ZnO films.': task48,
       #'Information on characterization of resistivity of highly doped ZnO films.': task49,
       #'I am looking for articles/books describing the production of RF magnetron sputter coated zinc oxide films doped with Al. The articles should include descriptions of the influence different parameters have on the film quality, especially the RF effect, the deposition rate, the amount of oxygen in the chamber, the sputter pressure, and the deposition temperature.': task50,
       #'I am looking for articles/books describing the post annealing of RF magnetron sputter coated zinc oxide films. They should describe the electrical properties of the films and how they are changed due to the post annealing. ': task51,
       #'I am looking for articles/books describing the characterisation of post annealed RF magnetron sputter coated zinc oxide film, possibly doped with Al. The characterisation should include; surface characterisation, electrical characterisation, impurity characterisation, and optically characterisation. ': task52,
       #'The economy/financing of  the development of new transparent electrodes for use in solar cells.': task53,
       #'I am looking for the radiative lifetime of erbium ions embedded in SiO2 deposited by magnetron sputtering': task54,
       #'I am looking for experimental evidence for the energy transfer mechanism between silicon nanocrystals and erbium ions in a SiO2 matrix': task55,
       #'I am looking for theoretical models of silicon nanocrystals preferably embedded in SiO2 and their optical properties': task56,
       #'I would like to find work done on "trions" (also known as "charged excitons") especially with focus on trions in carbon nanotubes.': task57,
       #'I am looking for Hartree-Fock models for trion or helium-like problems and extension to this kind of problems in terms of perturbation theory. Other kind of extensions would be great too.': task58,
       #'I am looking for biexcitonic models for second order optical response in semiconductor.': task59,
       #'An algorithm capable of performing decomposition of non-convex, triangulated polyhedrons (3-Dimensional polygons) into their convex parts.': task60,
       #'Algorithms for efficient Ray-Tracing of non-convex polyhedrons. More specifically, Ray-Tracing of Constructive Solid Geometry (CSG) unions and differences of convex polyhedrons. ': task61,
       #'An algorithm for calculating a triangulated minimal surface problem of 3D curves represented as polygons (stepwise linear approximation of curves).': task62,
       #'An algorithm for automatic generation of Constructive Solid Geometry (CGS) trees by providing information about the convex and concave regions of a surface.': task63,
       #'Information about fabrication and testing of field effect transistors (FETs) on silicon-on-insulator (SOI) wafers.': task64,
       #'Finite difference simulations of the potential inside a metal on oxide field effect transistors, MOSFETs. Solving Poissons equation inside a MOSFET with the use of finite difference method.': task65}

In [None]:
def relevant_articles(tasks, k=3, lower=1950, upper=2020, only_covid19=False):
    tasks = [tasks] if type(tasks) is str else tasks 
    
    tasks_vectorized = vectorizer.transform(tasks)
    tasks_topic_dist = pd.DataFrame(lda.transform(tasks_vectorized))

    for index, bullet in enumerate(tasks):
        print(bullet)
        recommended = get_k_nearest_docs(tasks_topic_dist.iloc[index], k, lower, upper, only_covid19)
        recommended = df.iloc[recommended]
        

        h = '<br/>'.join(['<a href="' + l + '" target="_blank">'+ n + '</a>' for l, n in recommended[['Link','Title']].values])
        display(HTML(h))



# Görev 1

In [None]:
#task 1 için beşer tane sıralayalım
relevant_articles(task1, 5)

# Görev 2

In [None]:
relevant_articles(task2, 3)

# Widget 2: Görevler

Belirli bir görevle ilgili çalışmaları sıralama

In [None]:
def relevant_articles_for_task():
    def main_function(bullet, task, k=5, year_range=[1950, 2020], only_quark=False):
        relevant_articles([bullet], k, lower=year_range[0], upper=year_range[1], only_quark=only_quark)
        bulletW.options = tasks[task]    

    yearW = widgets.IntRangeSlider(min=1950, max=2020, value=[2010, 2020], description='Year Range', 
                                   continuous_update=False, layout=Layout(width='40%'))
    quarkW = widgets.Checkbox(value=True,description='Only Quark-Papers',disabled=False, indent=False, layout=Layout(width='20%'))
    kWidget = widgets.IntSlider(value=10, description='k', max=50, min=1, layout=Layout(width='30%'))

    taskW = widgets.Dropdown(options=tasks.keys(), layout=Layout(width='90%', height='50px'), description='Task:')
    init = taskW.value
    bulletW = widgets.Select(options=tasks[init], layout=Layout(width='90%', height='200px'), description='Bullet Point:')

    widget = widgets.interactive(main_function, task=taskW, bullet=bulletW, k=kWidget, year_range=yearW, only_quark=quarkcovidW)
    
    controls = VBox([HBox([widget.children[2], widget.children[3], widget.children[4]], layout=Layout(width='90%', justify_content='space-around')),
                     widget.children[1],
                     widget.children[0]], layout=Layout(align_items='center'))
    
    output = widget.children[-1]
    display(VBox([controls, output]))

In [None]:
relevant_articles_for_task()

# Widget: Free Text Search

In this widget you can insert any kind of text (abstract, paragraph, full text, keywords, questions, ...) and find related articles.

In [None]:
def relevant_articles_for_text():    
    textW = widgets.Textarea(
        value='',
        placeholder='Type something',
        description='',
        disabled=False,
        layout=Layout(width='90%', height='200px')
    )

    yearW = widgets.IntRangeSlider(min=1950, max=2020, value=[2010, 2020], description='Year Range', 
                               continuous_update=False, layout=Layout(width='40%'))
    quarkW = widgets.Checkbox(value=True,description='Only quark papers',disabled=False, indent=False, layout=Layout(width='25%'))
    kWidget = widgets.IntSlider(value=10, description='k', max=50, min=1, layout=Layout(width='25%'))

    button = widgets.Button(description="Search")

    display(VBox([HBox([kWidget, yearW, quarkW], layout=Layout(width='90%', justify_content='space-around')),
        textW, button], layout=Layout(align_items='center')))

    def on_button_clicked(b):
        clear_output()
        display(VBox([HBox([kWidget, yearW, quarkcovidW], layout=Layout(width='90%', justify_content='space-around')),
            textW, button], layout=Layout(align_items='center')))        
        relevant_articles(textW.value, kWidget.value, yearW.value[0], yearW.value[1], quarkW.value)

    button.on_click(on_button_clicked)

In [None]:
relevant_articles_for_text()

In [None]:
pyLDAvis.enable_notebook()
panel = pyLDAvis.gensim.prepare(lda, corpus_lda, dictionary, mds='tsne')
panel