getLattes
The getLattes R package, written by Roney Fraga
Souza and Winicius
Sabino, was
built to extract data from the Lattes
curriculum platform exported as XML.
The XML file needs to be extracted from .zip.
To automate the download process, please see Captchas Negated by Python reQuests - CNPQ.
Installation
Stable version from CRAN.
install.packages('getLattes')
library(getLattes)Development version from GitHub.
# install and load devtools from CRAN
install.packages("devtools")
library(devtools)
# install and load getLattes
devtools::install_github("roneyfraga/getLattes")
library(getLattes)Import XML file as R list
# the file 4984859173592703.xml is stored in datatest directory
# cl <- readLattes(filexml='4984859173592703.xml', path='datatest/')
# import all Lattes XML files in datateste
# cls <- readLattes(filexml='*.xml$', path='datatest/')
# import all Lattes XML files in the working directory
cls <- readLattes(filexml='*.xml$')Loaded data
To load 2 Lattes curricula, from important researchers in my academic journey, imported as R list.
data(xmlsLattes)
length(xmlsLattes)Import general data
# to combine list of data frames in data frame
library(dplyr)
# to import from one curriculum
getDadosGerais(xmlsLattes[[2]])
# to import from two or more curricula
lt <- lapply(xmlsLattes, getDadosGerais)
head(bind_rows(lt))Import Published Academic Papers
# to import from one curriculum
getArtigosPublicados(xmlsLattes[[2]])
# to import from two or more curricula
lt <- lapply(xmlsLattes, getArtigosPublicados)
head(bind_rows(lt))Normalize informations
See normalizeByDoi, normalizeByJournal and normalizeByYear to
normalize publications data (journal title, ISSN and year).
