<a href="https://colab.research.google.com/github/restrepo/medicion/blob/master/cienciometria/Query_CTR.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Búsquedas WOS+SCI+SCP+PTJ+CTR para UdeA

Búsquedas en bases bibligráficas  
* Web of Science (WOS), 
* Scielo (SCI)
* Scopus  (SCP)
* Puntaje (UDEA)
* Center (CTR)
de los artículos científicos de la UdeA

La base de datos se creó con:

[WOS_SCI_SCP_PTJ_CTR.ipynb](./WOS_SCI_SCP_PTJ_CTR.ipynb)

In [1]:
import os
VERSION='NEW'
if os.getcwd()=='/content':
    !pip install openpyxl xlrd wosplus fuzzywuzzy[speedup] > /dev/null

## functions

In [2]:
import pandas as pd
import wosplus as wp
pd.set_option('display.max_colwidth',200)
from venn import draw_venn, generate_colors
import numpy as np
import fuzzywuzzy.process as fwp
from fuzzywuzzy import fuzz

##  Configure public links of  files in Google Drive
* If it is a Google Spreadsheet the corresponding file is downloaded as CSV
* If it is in excel/json or text file the file is downloaded  directly

To define your  own labeled IDs for public google drive files edit the next cell:

In [3]:
%%writefile drive.cfg
[FILES]
WOS_SCI_SCP_PTJ_CTR.json.gz=19E1C1kRk4I0V3uXojqko8-NEicWaPp1j

Overwriting drive.cfg


##  Load data bases

In [4]:
affil='Univ Antioquia'
drive_files=wp.wosplus('drive.cfg')

In [5]:
UDEAjsonfile='WOS_SCI_SCP_PTJ_CTR.json.gz'
tmp=drive_files.load_biblio(UDEAjsonfile,compression='gzip')
UDEA=drive_files.biblio['WOS'].copy().reset_index(drop=True)



In [6]:
#from check_quality import *
#check_quality(UDEA)

## Indices:
Información obtenida de la columna: `json_column='UDEA_authors'`

In [7]:
json_column='UDEA_authors'

Que contiene listas de diccionarios con la información del autor UDEA: 

`{'DEPARTAMENTO': 'Instituto de Biología',
  'FACULTAD': 'Facultad de Ciencias Exactas y Naturales',
  'GRUPO': 'Sin Grupo Asociado',
  'INICIALES': 'I.',
  'NOMBRE COMPLETO': 'Idalyd Fonseca Gonzalez',
  'NOMBRES': 'Idalyd',
  'PRIMER APELLIDO': 'Fonseca',
  'SEGUNDO APELLIDO': 'Gonzalez',
  'WOS_affiliation': ['Univ Antioquia, Colombia.'],
  'WOS_author': ['FONSECA, IDALYD',
   'FONSECA-GONZALEZ, IDALYD',
   'Fonseca-Gonzalez, Idalyd',
   'Fonseca-Gonzalez, I.'],
  'full_name': 'FONSECA GONZALEZ IDALYD'}`

Otras columnas: `['OA','Z9'*,SCP_Cited by']`, `*`: WOS cited by

Ver también [WOS field tags](https://images.webofknowledge.com/images/help/WOS/hs_wos_fieldtags.html)

# Resultados totales

Artículos no identificados:

In [8]:
UDEA_NOT=UDEA[UDEA[json_column]==''].reset_index(drop=True)
UDEA_NOT.shape[0]

4019

Artículos identificados

In [9]:
UDEA_YES=UDEA[UDEA[json_column]!=''].reset_index(drop=True)
UDEA_YES.shape[0]

11681

### Análisis sobre artículos identificados

In [10]:
def flatten_if_nested(l):
    flatten=False
    for i in l:
        if type(i)==list:
            #return i
            flatten=True
    if flatten:
        l=[item for sublist in l for item in sublist]
        l=pd.np.array(l)
    return l
def extract_key(df,key,json_column='UDEA_authors'):
    '''
    Extract all the unique key values of the list of dictionaries in 
    a json column when the key value is a string or another list
    '''
    ll=df[json_column].apply(lambda l: np.unique([ d.get(key) for d in l 
                                if d.get(key) ]) if type(l)==list else l)
    if ll.str[0].apply(lambda l: l if type(l)==list else None).dropna().shape[0]:
        ll=ll.apply(flatten_if_nested)
    ll=ll.apply(pd.Series).stack().values
    return pd.DataFrame( {key:list(ll)} ).groupby(key)[key].count().sort_values(ascending=False)

In [11]:
extract_key(UDEA_YES,'FACULTAD')

FACULTAD
Facultad de Medicina                        3077
Facultad de Ciencias Exactas y Naturales    2255
Facultad de Ingeniería                      1778
Facultad de Ciencias Agrarias                656
Facultad de Ciencias Sociales y Humanas      215
Facultad de Artes                             13
Name: FACULTAD, dtype: int64

In [12]:
extract_key(UDEA_YES,'DEPARTAMENTO')

DEPARTAMENTO
Departamento de Microbiología y Parasitología                   886
Instituto de Física                                             862
Instituto de Investigaciones Médicas                            717
Instituto de Biología                                           656
Instituto de Química                                            644
Departamento de Medicina Interna                                642
Departamento de  Producción Agropecuaria                        406
Departamento de Pediatría y Puericultura                        388
Departamento de Ingeniería Metalúrgica                          334
Departamento de Ingeniería Sanitaria  y Ambiental               317
Escuela de Medicina Veterinaria                                 305
Departamento de Ingeniería Mecánica                             279
Departamento de Ingeniería Quimica                              276
Departamento de Cirugía                                         210
Departamento de Fisiología         

In [13]:
extract_key(UDEA_YES,'GRUPO')

GRUPO
Sin Grupo Asociado                                                                                                                                                                                436
Grupo de Materia Condensada-UdeA                                                                                                                                                                  261
Inmunovirología                                                                                                                                                                                   236
Centro de Investigación, Innovación y Desarrollo de Materiales - CIDEMAT - Anteriormente: Grupo de Corrosión y Protección                                                                         232
Grupo de Manejo Eficiente de la Energía, GIMEL                                                                                                                                                    219
Grup

In [14]:
extract_key(UDEA_YES,'full_name')

full_name
DUQUE ECHEVERRI CARLOS ALBERTO        261
BEDOYA BERRIO GABRIEL DE JESUS        123
LOPERA RESTREPO FRANCISCO JAVIER      120
CERON MUÑOZ MARIO FERNANDO            120
RUGELES LOPEZ MARIA TERESA            116
JAIMES BARRAGAN FABIAN ALBERTO        113
CARMONA FONSECA JAIME DE JESUS        113
PEÑUELA MESA GUSTAVO ANTONIO          110
VELEZ BERNAL IVAN DARIO               109
OLIVERA ANGEL MARTHA EUFEMIA          101
AMARILES MUÑOZ PEDRO JOSE              95
RIOS LUIS ALBERTO                      92
RESTREPO BETANCUR LUIS FERNANDO        92
CARDONA MAYA WALTER DARIO              91
CARDONA ARIAS JAIBERTH ANTONIO         91
ARDILA MEDINA CARLOS MARTIN            86
MORALES ARAMBURO ALVARO LUIS           84
BLAIR TRUJILLO SILVIA VICTORIA         82
ROBLEDO RESTREPO SARA MARIA            80
MONDRAGON PEREZ FANOR                  79
CADAVID JARAMILLO ANGELA PATRICIA      78
AGUDELO SUAREZ ANDRES ALONSO           77
CORNEJO OCHOA JOSE WILLIAM             75
TRIANA CHAVEZ OMAR      

# Búsquedas

In [15]:
def extract_key_unique(*args,**kwargs):
    keys=extract_key(*args,**kwargs).keys()
    return [ k for k in keys if k]

def get_groups(l,g):
    for d in l:
        gt=d.get('GRUPO')
        if gt and type( gt )==str:
            gs=gt.replace(
                ', Grupo','; Grupo'
            ).split('; ')
            for gg in gs:
                if gg not in g:
                    g.append(gg)
    return g

facultades={'key':'FACULTAD',
            'values' : extract_key_unique(UDEA,'FACULTAD',json_column='UDEA_authors') }
departamentos={'key':'DEPARTAMENTO',
            'values' :extract_key_unique(UDEA,'DEPARTAMENTO',json_column='UDEA_authors')}
nombre_completo={'key'    : 'NOMBRE COMPLETO',
            'values' : extract_key_unique(UDEA,'NOMBRE COMPLETO',json_column='UDEA_authors')}
full_name={'key'    : 'full_name',
            'values' : extract_key_unique(UDEA,'full_name',json_column='UDEA_authors')}
udea_affiliations={'key'    : 'WOS_affiliation',
            'values' : extract_key_unique(UDEA,'WOS_affiliation',json_column='UDEA_authors')}
wos_affiliations={'key'    : 'affiliation',
            'values' : extract_key_unique(UDEA,'WOS_affiliation',json_column='authors_WOS')}
udea_author={'key'    : 'WOS_author',
            'values' : extract_key_unique(UDEA,'WOS_author',json_column='UDEA_authors')}
wos_author={'key'    : 'WOS_author',
            'values' : extract_key_unique(UDEA,'WOS_author',json_column='authors_WOS')}


#.apply(....) is a loop!
g=[]
#append to g
tmp=UDEA.UDEA_authors.apply(lambda l: 
                        get_groups(l,g)
        if type(l)==list else None
                        )
grupos={'key':'GRUPO',
            'values' :g}


## Función de búsqueda

For value string or list of each dictionary within a list of dictionaries, like the column 'UDEA_authors' in `UDEA` DataFrame

In [16]:
def query_json_column(q,df=UDEA,json_column='UDEA_authors',
                        choices=nombre_completo,scorer=fuzz.partial_token_sort_ratio,**kwargs):
    #Found best exact match from index
    fchoices=fwp.extractOne(q,choices['values'],scorer=scorer)[0]
    # Exact search in indexed subcolumn converted to strins (e.g list → string if necessary)
    dfF=df[df[json_column].apply(lambda l: True in [ str(d.get(choices['key'])).find(fchoices)>-1 
                                    for d in l if d.get(choices['key'])] if type(l)==list else False)]
    return dfF.reset_index(drop=True)

### Autor

In [17]:
r=query_json_column('Diego Alejandro Restrepo Quintero',df=UDEA,json_column='UDEA_authors',
                        choices=nombre_completo,scorer=fuzz.partial_token_sort_ratio,score_cutoff=79)

In [18]:
r.shape

(37, 181)

In [19]:
r[['TI','AU','authors_WOS',json_column]].reset_index(drop=True)[5:7]

Unnamed: 0,TI,AU,authors_WOS,UDEA_authors
5,Gravitino dark matter and neutrino masses with bilinear R-parity violation,"Restrepo, D\nTaoso, M\nValle, JWF\nZapata, O\n","[{'i': 0, 'WOS_author': 'Restrepo, Diego', 'affiliation': ['Univ Antioquia, Inst Fis, Medellin 1226, Colombia.']}]","[{'WOS_author': ['Zapata, Oscar', 'Zapata, O.'], 'SEGUNDO APELLIDO': 'Noreña', 'INICIALES': 'O. A.', 'NOMBRE COMPLETO': 'Oscar Alberto Zapata Noreña', 'FACULTAD': 'Facultad de Ciencias Exactas y N..."
6,Radiative type III seesaw model and its collider phenomenology,"von der Pahlen, F\nPalacio, G\nRestrepo, D\nZapata, O\n","[{'i': 0, 'WOS_author': 'von der Pahlen, Federico', 'affiliation': ['Univ Antioquia, Inst Fis, Calle 70 52-21, Medellin 050010, Colombia.']}, {'i': 1, 'WOS_author': 'Palacio, Guillermo', 'affiliat...","[{'WOS_author': ['Restrepo, Diego', 'Restrepo, D.'], 'SEGUNDO APELLIDO': 'Quintero', 'INICIALES': 'D. A.', 'NOMBRE COMPLETO': 'Diego Alejandro Restrepo Quintero', 'FACULTAD': 'Facultad de Ciencias..."


## Grupos

Ejemplo

In [20]:
r=query_json_column('Grupo de Fenomenología de Interacciones Fundamentales',df=UDEA,json_column='UDEA_authors',
                        choices=grupos,scorer=fuzz.partial_token_sort_ratio,score_cutoff=79)

In [21]:
r.shape

(82, 181)

Buscar todos

In [22]:
gdf=pd.DataFrame()
for g in grupos['values']:
    r=query_json_column(g,df=UDEA,json_column='UDEA_authors',choices=grupos,
                        scorer=fuzz.partial_token_sort_ratio,score_cutoff=79)
                        
    gdf=gdf.append( {'Group':g,'articles':r.shape[0]},ignore_index=True )
gdf['articles']=gdf['articles'].astype(int)

In [23]:
gdf.sort_values('articles',ascending=False).reset_index(drop=True)[:10]

Unnamed: 0,Group,articles
0,Sin Grupo Asociado,436
1,Grupo de Materia Condensada-UdeA,300
2,"Grupo Reproducción, Inmunovirología, Infección y Cáncer",284
3,Inmunovirología,284
4,Grupo de Estado Sólido,272
5,"Centro de Investigación, Innovación y Desarrollo de Materiales - CIDEMAT - Anteriormente: Grupo de Corrosión y Protección,",252
6,"Centro de Investigación, Innovación y Desarrollo de Materiales - CIDEMAT - Anteriormente: Grupo de Corrosión y Protección",252
7,Grupo Académico de Epidemiología Clínica,240
8,"Grupo Académico de Epidemiología Clínica, Nacer, Salud Sexual y Reproductiva",240
9,Grupo de Neurociencias de Antioquia,240


## Departamento

In [24]:
r=query_json_column('Instituto de Física',df=UDEA,json_column='UDEA_authors',
                        choices=departamentos,scorer=fuzz.partial_token_sort_ratio,score_cutoff=79)

In [25]:
r.shape

(862, 181)

## Centro

Ejemplo

In [26]:
cen=query_json_column('Facultad de Ciencias Exactas y Naturales',df=UDEA,json_column='UDEA_authors',
                        choices=facultades,scorer=fuzz.partial_token_sort_ratio,score_cutoff=79)

In [27]:
cen.shape

(2255, 181)

Todos

In [28]:
fdf=pd.DataFrame()
for f in facultades['values']:
    r=query_json_column(f,df=UDEA,json_column='UDEA_authors',choices=facultades,
                        scorer=fuzz.partial_token_sort_ratio,score_cutoff=79)
    fdf=fdf.append( {'Facultad':f,'articles':r.shape[0]},ignore_index=True )
fdf['articles']=fdf['articles'].astype(int)

In [29]:
fdf.sort_values('articles',ascending=False)

Unnamed: 0,Facultad,articles
0,Facultad de Medicina,3077
1,Facultad de Ciencias Exactas y Naturales,2255
2,Facultad de Ingeniería,1778
3,Facultad de Ciencias Agrarias,656
4,Facultad de Ciencias Sociales y Humanas,215
5,Facultad de Artes,13


## Citas

In [30]:
UDEA_YES.sort_values('Z9',ascending=False)[['Z9','TI','SO','AU','PY']].reset_index(drop=True)[:10]

Unnamed: 0,Z9,TI,SO,AU,PY
0,3610,"An integrated map of genetic variation from 1,092 human genomes",NATURE,"Altshuler, DM\nDurbin, RM\nAbecasis, GR\nBentley, DR\nChakravarti, A\nClark, AG\nDonnelly, P\nEichler, EE\nFlicek, P\nGabriel, SB\nGibbs, RA\nGreen, ED\nHurles, ME\nKnoppers, BM\nKorbel, JO\nLande...",2012
1,1526,Leishmaniasis Worldwide and Global Estimates of Its Incidence,PLOS ONE,"Alvar, J\nVelez, ID\nBern, C\nHerrero, M\nDesjeux, P\nCano, J\nJannin, J\nden Boer, M\n",2012
2,1271,A global reference for human genetic variation,NATURE,"Altshuler, DM\nDurbin, RM\nAbecasis, GR\nBentley, DR\nChakravarti, A\nClark, AG\nDonnelly, P\nEichler, EE\nFlicek, P\nGabriel, SB\nGibbs, RA\nGreen, ED\nHurles, ME\nKnoppers, BM\nKorbel, JO\nLande...",2015
3,901,Human papillomavirus genotype attribution in invasive cervical cancer: a retrospective cross-sectional worldwide study,LANCET ONCOLOGY,"de Sanjose, S\nQuint, WGV\nAlemany, L\nGeraets, DT\nKlaustermeier, JE\nLloveras, B\nTous, S\nFelix, A\nBravo, LE\nShin, HR\nVallejos, CS\nde Ruiz, PA\nLima, MA\nGuimera, N\nClavero, O\nAlejo, M\nL...",2010
4,711,The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and Evolution,SCIENCE,"Elsik, CG\nTellam, RL\nWorley, KC\nGibbs, RA\nAbatepaulo, ARR\nAbbey, CA\nAdelson, DL\nAerts, J\nAhola, V\nAlexander, L\nAlioto, T\nAlmeida, IG\nAmadio, AF\nAnatriello, E\nAntonarakis, SE\nAnzola,...",2009
5,601,GENETIC ABSOLUTE DATING BASED ON MICROSATELLITES AND THE ORIGIN OF MODERN HUMANS,PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF\nAMERICA,"GOLDSTEIN, DB\nLINARES, AR\nCAVALLISFORZA, LL\nFELDMAN, MW\n",1995
6,474,Mutations in IRF6 cause Van der Woude and popliteal pterygium syndromes,NATURE GENETICS,"Kondo, S\nSchutte, BC\nRichardson, RJ\nBjork, BC\nKnight, AS\nWatanabe, Y\nHoward, E\nde Lima, RLLF\nDaack-Hirsch, S\nSander, A\nMcDonald-McGinn, DM\nZackai, EH\nLammer, EJ\nAylsworth, AS\nArdinge...",2002
7,429,Leptogenesis,PHYSICS REPORTS-REVIEW SECTION OF PHYSICS LETTERS,"Davidson, S\nNardi, E\nNir, Y\n",2008
8,410,Temperature sensitivity of drought-induced tree mortality portends increased regional die-off under global-change-type drought,PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF\nAMERICA,"Adams, HD\nGuardiola-Claramonte, M\nBarron-Gafford, GA\nVillegas, JC\nBreshears, DD\nZou, CB\nTroch, PA\nHuxman, TE\n",2009
9,376,Electron localization following attosecond molecular photoionization,NATURE,"Sansone, G\nKelkensberg, F\nPerez-Torres, JF\nMorales, F\nKling, MF\nSiu, W\nGhafur, O\nJohnsson, P\nSwoboda, M\nBenedetti, E\nFerrari, F\nLepine, F\nSanz-Vicario, JL\nZherebtsov, S\nZnakovskaya, ...",2010


In [31]:
UDEA_YES.Z9.sum()

75281

In [32]:
UDEA_YES.sort_values('SCP_Cited by',ascending=False)[[
    'SCP_Cited by','TI','SO','AU','PY']].reset_index(drop=True)[:10]

Unnamed: 0,SCP_Cited by,TI,SO,AU,PY
0,1586,Leishmaniasis Worldwide and Global Estimates of Its Incidence,PLOS ONE,"Alvar, J\nVelez, ID\nBern, C\nHerrero, M\nDesjeux, P\nCano, J\nJannin, J\nden Boer, M\n",2012
1,1160,"Effects of tranexamic acid on death, vascular occlusive events, and blood transfusion in trauma patients with significant haemorrhage (CRASH-2): A randomised, placebo-controlled trial",The Lancet,"Olldashi F., Kerçi M., Zhurda T., Ruçi K., Banushi A., Traverso M.S., Jiménez J., Balbi J., Dellera C., Svampa S., Quintana G., Piñero G., Teves J., Seppelt I., Mountain D., Hunter J., Balogh Z., ...",2010
2,994,Human papillomavirus genotype attribution in invasive cervical cancer: a retrospective cross-sectional worldwide study,LANCET ONCOLOGY,"de Sanjose, S\nQuint, WGV\nAlemany, L\nGeraets, DT\nKlaustermeier, JE\nLloveras, B\nTous, S\nFelix, A\nBravo, LE\nShin, HR\nVallejos, CS\nde Ruiz, PA\nLima, MA\nGuimera, N\nClavero, O\nAlejo, M\nL...",2010
3,626,The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and Evolution,SCIENCE,"Elsik, CG\nTellam, RL\nWorley, KC\nGibbs, RA\nAbatepaulo, ARR\nAbbey, CA\nAdelson, DL\nAerts, J\nAhola, V\nAlexander, L\nAlioto, T\nAlmeida, IG\nAmadio, AF\nAnatriello, E\nAntonarakis, SE\nAnzola,...",2009
4,598,GENETIC ABSOLUTE DATING BASED ON MICROSATELLITES AND THE ORIGIN OF MODERN HUMANS,PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF\nAMERICA,"GOLDSTEIN, DB\nLINARES, AR\nCAVALLISFORZA, LL\nFELDMAN, MW\n",1995
5,485,Mutations in IRF6 cause Van der Woude and popliteal pterygium syndromes,NATURE GENETICS,"Kondo, S\nSchutte, BC\nRichardson, RJ\nBjork, BC\nKnight, AS\nWatanabe, Y\nHoward, E\nde Lima, RLLF\nDaack-Hirsch, S\nSander, A\nMcDonald-McGinn, DM\nZackai, EH\nLammer, EJ\nAylsworth, AS\nArdinge...",2002
6,439,The importance of early treatment with tranexamic acid in bleeding trauma patients: An exploratory analysis of the CRASH-2 randomised controlled trial,The Lancet,"Olldashi F., Kerçi M., Zhurda T., Ruçi K., Banushi A., Traverso M.S., Jiménez J., Balbi J., Dellera C., Svampa S., Quintana G., Piñero G., Teves J., Seppelt I., Mountain D., Balogh Z., Zaman M., D...",2011
7,432,Leptogenesis,PHYSICS REPORTS-REVIEW SECTION OF PHYSICS LETTERS,"Davidson, S\nNardi, E\nNir, Y\n",2008
8,424,Temperature sensitivity of drought-induced tree mortality portends increased regional die-off under global-change-type drought,PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF\nAMERICA,"Adams, HD\nGuardiola-Claramonte, M\nBarron-Gafford, GA\nVillegas, JC\nBreshears, DD\nZou, CB\nTroch, PA\nHuxman, TE\n",2009
9,405,THE STRUCTURE OF THE PRESENILIN-1 (S182) GENE AND IDENTIFICATION OF 6 NOVEL MUTATIONS IN EARLY-ONSET AD FAMILIES,NATURE GENETICS,"CLARK, RF\nHUTTON, M\nFULDNER, RA\nFROELICH, S\nKARRAN, E\nTALBOT, C\nCROOK, R\nLENDON, C\nPRIHAR, G\nHE, C\nKORENBLAT, K\nMARTINEZ, A\nWRAGG, M\nBUSFIELD, F\nBEHRENS, MI\nMYERS, A\nNORTON, J\nMOR...",1995


In [33]:
UDEA_YES['SCP_Cited by'].sum()

78299

# TMP

In [34]:
aun=extract_key(UDEA_NOT,'WOS_author',json_column='authors_WOS').keys()

In [35]:
aun

Index(['Builes, J. J.', '', 'Restrepo, A.', 'Ines Saldarriaga, C. I. Clara',
       'Montegranario, Hebert', 'Ibarra, A.', 'Calderón-Vélez, J. C.',
       'Botero, D.', 'Narvaez-Sanchez, R.', 'Moncada-Velez, Marcela',
       ...
       'Orrego Rodriguez, M. Á.', 'Orozco-Hoyos, Nataly',
       'Orozco-Arroyave, Juan Rafael', 'Orozco, Lina P.', 'Orozco, L. P.',
       'Orozco, L. E.', 'Orozco, J. S.', 'Orozco, J. R.', 'Orozco, I. C.',
       'López, J. A.'],
      dtype='object', name='WOS_author', length=4205)

In [36]:
extract_key(UDEA_YES,'WOS_author',json_column='authors_WOS')[:3]

WOS_author
Duque, C. A.         202
Lopera, Francisco     67
Bedoya, Gabriel       65
Name: WOS_author, dtype: int64

In [37]:
n=aun[2]
n

'Restrepo, A.'

In [38]:
fwp.extractOne( n, posib,scorer=fuzz.ratio )

NameError: name 'posib' is not defined

In [None]:
fwp.extractOne( n, posib,scorer=fuzz.token_set_ratio )

In [None]:
kk=query_json_column(n,df=UDEA,json_column='UDEA_authors',
                        choices=udea_author,scorer=fuzz.ratio,score_cutoff=79).reset_index(drop=True)

In [None]:
extract_key(kk,'WOS_affiliation',json_column='UDEA_authors')

In [None]:
query_json_column(n,df=UDEA_NOT,json_column='authors_WOS',
                        choices=udea_author,scorer=fuzz.ratio,score_cutoff=79)[['authors_WOS','SO']].loc[0]

Homónimo detectado.

In [None]:
fuzz.token_set_ratio(  'the guys that developed fuzzywuzzy are super responsive',
                       'update: the guys that developed fuzzywuzzy super responsive 2016'  )

In [None]:
fuzz.token_sort_ratio(  'the guys that developed fuzzywuzzy are super responsive',
                       'update: the guys that developed fuzzywuzzy super responsive 2016'  )