# Case study 2: Visualization of the global COVID-19 document space

The second case study focuses on visualization of all abstracts present in the CORD19 data base of COVID-19 related literature. We computed the UMAP projections of doc2vec vectors of each of the >30k abstracts. The interactive visualization is shown below!

In [1]:
import pandas as pd

docspace = pd.read_csv("../document_embeddings/euclidean_0.9_9_coordinates.tsv", sep = "\t")
docspace.columns = ['doi','c1','c2']
docspace.head()

Unnamed: 0,doi,c1,c2
0,10.1038/srep18030,0.735898,8.712381
1,10.1371/journal.pone.0018669,1.859369,12.605724
2,10.1186/1476-069x-12-115,-3.113907,2.69468
3,10.1371/journal.ppat.1003248,4.025132,13.922592
4,10.1371/journal.pntd.0006628,0.885949,9.406822


Next, we ca nvisualize the document space interactively!

In [2]:
## Assuming we wish to cluster the space (this is just an example)
from sklearn.cluster import KMeans
km = KMeans(n_clusters=15).fit(docspace[['c1','c2']])
labels = km.labels_
docspace['label'] = labels.astype(str)

Let's crete an interactive visualization of the obtained clustering!

In [4]:
import plotly.express as px
fig = px.scatter(docspace, x="c1", y="c2", hover_data=['doi'], color = "label")
fig.update_traces(marker=dict(size=3,
                              opacity=0.5,
                              line=dict(width=1,
                                        color='DarkSlateGrey')),
                  selector=dict(mode='markers'))
fig.update_layout(showlegend=False)
fig.update_layout(
    title="Visualization of >30k COVID-19 related documents",
    xaxis_title="Dimension 1",
    yaxis_title="Dimension 2",
    font=dict(
        family="Courier New, monospace",
        size=12,
        color="#7f7f7f"
    )
)
fig.show()