# HAL

The [HAL API](https://api.archives-ouvertes.fr/docs/search) is used to extract all the institution's countries appearing in peer-reviewed articles from INPHYNI. 

The typical request is:
[http://api.archives-ouvertes.fr/search/INPHYNI/?q=*:*&rows=2000&fq=docType_s:ART&fl=instStructCountry_s](http://api.archives-ouvertes.fr/search/INPHYNI/?q=*:*&rows=2000&fq=docType_s:ART&fl=instStructCountry_s)

with the following info:
* `/INPHYNI/`: restrict the request to the [INPHYNI's collection](https://cnrs.hal.science/INPHYNI)

* `q=*:*`: search in all fields

* `rows=2000`: display 2000 outputs max

* `fq=docType_s:ART`: filter out only articles

* `fl=instStructCountry_s`: export the institution country of each author

### Import libraries

In [1]:
import requests
from collections import Counter
import pandas as pd
import pycountry
import plotly.express as px

### Define the HAL request and get the response

In [2]:
inphyni_art = "http://api.archives-ouvertes.fr/search/INPHYNI/?q=*:*&rows=2000&fq=docType_s:ART&fl=instStructCountry_s"

response = requests.get(inphyni_art, timeout=5)

docs = response.json()['response']['docs']
num = response.json()['response']['numFound']

print("Number of articles found: {}".format(num))

Number of articles found: 1765


### Count the occurences per country

In [3]:
country_list = []

# For each entry in docs, get the list of countries, remove duplicates and concatenate in countries list
for i, d in enumerate(docs):
    country_list += list(set(d["instStructCountry_s"]))

# Count the number of occurences for each country
country_occ = Counter(country_list)

### Convert to dataframe for processing

In [9]:
def convert_iso(df):
    ''' Convert iso-alpha-2 to iso-alpha-3 country names. '''
    return pycountry.countries.get(alpha_2=df['iso-alpha-2']).alpha_3

# Make a df from country_occ dict
df = pd.DataFrame(country_occ.items(), columns=['iso-alpha-2', 'occurence'])

# Convert country name to make it readable with plotly
df['iso-alpha-3'] = df.apply(convert_iso, axis=1)

# Sort and keep only results outside France
df = df.sort_values(by='occurence', ascending=False)
df = df.drop(0)

# Remove iso-alpha-2 and rename columns
del df['iso-alpha-2']
df = df.rename(columns={'iso-alpha-3': 'country'})

# Swap columnns and rest index
column_names = ['country', 'occurence']
df = df.reindex(columns=column_names)
df = df.reset_index(drop=True)

df.head(5)

Unnamed: 0,country,occurence
0,USA,158
1,ITA,95
2,DEU,85
3,ESP,80
4,GBR,78


In [10]:
num_occ_wo_fr = df["occurence"].sum()

print("Number of occurences outside France: {}".format(num_occ_wo_fr))

Number of occurences outside France: 991


### Use `plotly.express` module to map the results

In [15]:
fig = px.scatter_geo(df, locations="country", size="occurence", projection="natural earth")
fig.show()