# Case Study: VaSyR 2018

In this section, we present an end-to-end example using a sample dataset. Information about the dataset can be found on [this page](https://microdata.unhcr.org/index.php/catalog/189).

## Query generation

For the sample dataset, the six query formulations returned between one and 18 potentially relevant papers.

|  	| **Query String** 	| **Number of results** | 
|---	|:---:	|:---:	|
| 1 	| 'UNHCR Lebanon 2018 VASYR'	| 18	|
| 2 	| 'UNHCR Lebanon 2018 Vulnerability Assessment of Syrian 0Refugees (VASYR)' | 5 |
| 3 	| 'UNHCR Lebanon 2018 VASYR Vulnerability Assessment of Syrian efugees (VASYR)'	| 2	|
| 4 	| 'UNHCR, WFP, UNICEF Lebanon 2018 VASYR'	| 6	|
| 5 	| 'UNHCR, WFP, UNICEF Lebanon 2018 Vulnerability Assessment of Syrian Refugees (VASYR)' | 2 |
| 6 	| 'UNHCR, WFP, UNICEF Lebanon 2018 VASYR Vulnerability Assessment of Syrian Refugees (VASYR)' 	|1	|

In [1]:
import pandas as pd
df = pd.read_csv("data/semantic_scholar_query_results_with_web_count.csv")
print(df[df["id"] == 189])

     id  query_type1_pubs  query_type2_pubs  query_type3_pubs  \
27  189                16                 3                 1   

    query_type4_pubs  query_type5_pubs  query_type6_pubs  \
27                 6                 0                 1   

    no_related_pubs_jdc_website  
27                           13  


## Semantic Search
For query 1, below is the output returned from Semantic Scholar for 18 potentially relevant papers:

In [2]:
#from glueviz import glue
from myst_nb import glue
df  = pd.read_csv("data/dataset189_query1.csv",encoding = "latin")
glue("df_tbl", df)

Unnamed: 0.1,Unnamed: 0,corpus_ID,dataset_id,title,abstract,authors,DOI,years,journal
0,0,195504488,189,Dignity and displaced Syrians in Lebanon. âT...,Since the popular uprising and subsequent war ...,"[{'authorId': '7702531', 'name': 'Francesca Gr...",,2018,
1,1,211116634,189,Resettled Syrian Refugees in Oxford,This policy brief presents preliminary finding...,[],10.1163/2210-7975_hrd-3181-20180005,2018,
2,2,188442469,189,"Educate, Empower, Employ",Since the start of the Syrian Civil War in 201...,"[{'authorId': '71465164', 'name': 'Daniel Bern...",,2018,Journal of International Relations
3,3,203291058,189,From security to resilience: New vistas for in...,International institutions and organizations l...,"[{'authorId': '83370432', 'name': 'R. Anholt'}...",10.5075/EPFL-IRGC-262527,2018,
4,4,211103442,189,MENA PS RESEARCH PROJECT PROTECTION IMPACTS OF...,UNHCR has been promoting research projects aim...,[],,2020,
5,5,254071595,189,Using Facebook advertising data to describe th...,While the fighting in the Syrian civil war has...,"[{'authorId': '72339731', 'name': 'M. Fatehkia...",10.3389/fdata.2022.1033530,2022,Frontiers in Big Data
6,6,235475057,189,Multi-purpose cash transfers and health among ...,,"[{'authorId': '3751306', 'name': 'E. Lyles'}, ...",10.1186/s12889-021-11196-8,2021,BMC Public Health
7,7,244240562,189,Sudanese Refugees and the âSyrian Refugee Re...,\n By focusing on Sudanese refugees and asylum...,"[{'authorId': '74726527', 'name': 'Maja Janmyr'}]",10.1093/rsq/hdab012,2021,Refugee Survey Quarterly
8,8,250652270,189,Syrian Refugees In Lebanon: âNew Communityâ...,The paper analyzes the Syrian refugee crisis i...,"[{'authorId': '9899978', 'name': 'L. Harutyuny...",10.52837/2579-2970-2022.11.1-5,2022,BULLETIN OF THE INSTITUTE OF ORIENTAL STUDIES
9,9,148645104,189,No Country of Asylum: âLegitimizingâ Leban...,How do States âlegitimizeâ their non-ratif...,"[{'authorId': '74726527', 'name': 'Maja Janmyr'}]",10.1093/IJRL/EEX026,2017,International Journal of Refugee Law


In [3]:
import observable_jupyter_widget
import ipywidgets as widgets
from pprint import pprint
w = observable_jupyter_widget.ObservableWidget(
    '@microdata-citation-explorer/query-generation',
    cells=['viewof table','typed_entry'], # optional
    inputs={'typed_entry': 189},  # optional
    outputs=['viewof table','typed_entry']  # optional
)
w

ObservableWidget(value=None, cells=['viewof table', 'typed_entry'], inputs={'typed_entry': 189}, outputs=['vie…

## Topic Modeling
Out of the XX papers returned across the six query types, YYY had abstracts available in either the NLP4Dev or Semantic Scholar corpuses, and ZZZ had full body text available.

These were used as inputs to the NLP4Dev API, as described in [Section 3.5](methods/topic-modeling-and-sentiment-analysis.md).

9 were defined as relevant.

In [4]:
#from glueviz import glue
from myst_nb import glue
df  = pd.read_csv("data/sample_results.csv",encoding = "latin")
glue("df_tbl", df)

Unnamed: 0,Dataset Title,Paper Title,Relevance,Topic 39 Percentage,JDC Tags Mentioned
0,Vulnerability Assessment of Syrian Refugees in...,Dignity and displaced Syrians in Lebanon. âT...,1.0,54%,5
1,Vulnerability Assessment of Syrian Refugees in...,Resettled Syrian Refugees in Oxford,0.5,49%,4
2,Vulnerability Assessment of Syrian Refugees in...,"Educate, Empower, Employ",1.0,29%,5
3,Vulnerability Assessment of Syrian Refugees in...,Sudanese Refugees and the âSyrian Refugee Re...,1.0,52%,4
4,Vulnerability Assessment of Syrian Refugees in...,Syrian Refugees In Lebanon: âNew Communityâ...,1.0,43%,6
5,Vulnerability Assessment of Syrian Refugees in...,No Country of Asylum: âLegitimizingâ Leban...,1.0,39%,7
6,Vulnerability Assessment of Syrian Refugees in...,Vulnerable Permanency in Mass Inï¬ ux The Cas...,0.5,49%,2
7,Vulnerability Assessment of Syrian Refugees in...,AnÃ¡lisis derecho de asilo: Colombia,0.5,39%,3


## Model Output
Following manual review of the papers identified automatically, 7 were confirmed as relevant. Of these, YYY had previously been identified through the manual procedure, and ZZZ were new.

This represents a XXX percent improvement over the baseline.

## Evaluating Model Performance
Based on our selected evaluation metric, Query X performed most effectively.

## Network analysis
Of the papers that referenced this dataset, XYZ came from ABC geographies and ABC instutions.