bibnets is an R package for constructing bibliometric networks from scholarly
metadata. It reads common export formats, converts multi-valued fields such as
authors, references, keywords, countries, and affiliations into sparse incidence
matrices, and returns edge lists for co-authorship, co-citation, bibliographic
coupling, keyword co-occurrence, direct citation, historiograph, and custom
co-occurrence analyses.
The package is designed to keep the computational core small and inspectable:
the only CRAN imports are Matrix, stats, and utils. Optional graph and
bibliometric packages are used only when the user explicitly requests their
formats or runs equivalence tests.
- 8 dedicated network builders plus one generic builder:
author_network(),document_network(),reference_network(),keyword_network(),source_network(),institution_network(),country_network(),historiograph(), andconetwork(). - Counting methods for full, fractional, paper-level, strength, and position-aware authorship weighting, including harmonic, arithmetic, geometric, adaptive geometric, golden-ratio, first-author, last-author, first-last, and custom position-weighted schemes.
- Attention-style position weights through
attention = "lead","last","proximity", or"circular"for author, keyword, country, and institution networks. - 6 similarity measures: none, association strength, cosine, Jaccard, inclusion, and equivalence.
- Readers for scholarly exports: Scopus, Web of Science, OpenAlex nested data, OpenAlex flat CSV, BibTeX, RIS, Lens.org, Dimensions, Crossref, and generic CSV files.
- Network reduction and export:
backbone(),prune(),filter_top(),to_gephi(),to_graphml(),to_igraph(),to_tbl_graph(),to_matrix(), andto_cograph(). - Temporal and historical analysis:
temporal_network(),local_citations(), andhistoriograph(). - Standard output: all network builders return a
bibnets_networkedge list withfrom,to,weight, andcountcolumns.
Once accepted on CRAN:
install.packages("bibnets")Development version from GitHub:
# install.packages("remotes")
remotes::install_github("mohsaqr/bibnets")library(bibnets)
# Read a file or folder. The reader is detected from file content.
data <- read_biblio("export.csv")
# Common networks
authors <- author_network(data, type = "collaboration")
refs <- reference_network(data, type = "co_citation", min_occur = 2)
docs <- document_network(data, type = "coupling", similarity = "cosine")
keys <- keyword_network(data, similarity = "association")
# Inspect the standard edge-list schema
head(authors)
summary(authors)The edge list separates two quantities:
count: the raw binary co-occurrence count for the pair.weight: the analysis weight after counting and optional similarity normalization.
With similarity = "none" and counting = "full", weight and count are
usually the same. Once fractional counting or similarity normalization is used,
they intentionally diverge.
Use read_biblio() when you have a file, a vector of files, or a directory:
data <- read_biblio("scopus_export.csv")
data <- read_biblio(c("wos_1.txt", "wos_2.txt"))
data <- read_biblio("exports/")Use a format-specific reader when the source is already known:
scopus <- read_scopus("scopus.csv")
wos <- read_wos("savedrecs.txt")
oa <- read_openalex_csv("openalex_works.csv")
dim <- read_dimensions("dimensions.csv")
lens <- read_lens("lens.csv")For custom CSV files, tell read_biblio() which columns should be split into
list-columns:
data <- read_biblio(
"my_data.csv",
format = "generic",
id = "paper_id",
actors = c("Authors", "Keywords"),
sep = ";"
)All readers try to return the same core columns: id, title, year,
journal, doi, cited_by_count, abstract, type, authors,
references, and keywords. Source-specific extras such as countries,
affiliations, index_keywords, and keywords_plus are preserved when
available.
edges <- author_network(data, type = "collaboration")Two authors are linked when they appear on the same paper. Use counting to
change how each paper contributes:
author_network(data, "collaboration", counting = "full")
author_network(data, "collaboration", counting = "fractional")
author_network(data, "collaboration", counting = "harmonic")
author_network(data, "collaboration", counting = "first_last")Use attention instead of counting when the goal is position-based
weighting independent of the standard counting families. The same option is
available on keyword_network(), country_network(), and
institution_network():
author_network(data, attention = "lead") # first author dominant
author_network(data, attention = "last") # last author dominant
author_network(data, attention = "proximity") # middle authors weighted most
author_network(data, attention = "circular") # first and last upweightedattention and counting are mutually exclusive: when attention is
non-NULL, the network is built directly from positional weights and the
type/counting arguments are ignored.
refs <- reference_network(data, type = "co_citation", min_occur = 2)Two references are linked when they are cited together by the same paper. This
is a column-mode projection of the papers x references matrix.
coupling <- document_network(data, type = "coupling", similarity = "cosine")
direct <- document_network(data, type = "citation")Coupling links two papers when they cite the same references. Direct citation returns directed within-corpus citation edges from citing paper to cited paper.
keyword_network(data, field = "keywords")
source_network(data, type = "coupling", min_occur = 2)
country_network(data, type = "collaboration", counting = "fractional")
institution_network(data, type = "collaboration", counting = "fractional")Entity labels are trimmed and uppercased before matrix construction so that minor casing differences do not create separate nodes.
conetwork() is useful when a dedicated wrapper is not needed:
conetwork(data, "keywords")
conetwork(data, "authors", by = "keywords")
conetwork(data, "journal", by = "references", similarity = "cosine")With one field, entities are linked when they co-occur in the same paper. With
by, entities are linked through shared values of another field.
Counting controls how each paper contributes to edge weights. Similarity normalization controls how pair-level totals are rescaled after projection.
| Method | Main use | Interpretation |
|---|---|---|
"full" |
all networks | Each observed co-occurrence contributes 1. |
"fractional" |
all networks | Contribution is scaled by list size so large teams or long reference lists do not dominate. |
"paper" |
co-occurrence networks | Each paper contributes a fixed total amount spread across pairs. |
"strength" |
coupling networks | Uses reference-frequency weighting and row-list-size normalization for Perianes-Rodríguez-style coupling strength. |
"harmonic" |
author collaboration | Authorship credit decreases by harmonic rank and sums to 1 per paper. |
"arithmetic" |
author collaboration | Authorship credit decreases linearly by position. |
"geometric" |
author collaboration | Authorship credit decays geometrically by position. |
"adaptive_geometric" |
author collaboration | Geometric decay adapts to the number of authors. |
"golden" |
author collaboration | Geometric decay based on the golden ratio. |
"first" / "last" |
author collaboration | Only the first or last author receives credit. |
"first_last" |
author collaboration | First and last authors are upweighted. |
"position_weighted" |
author collaboration | User-supplied position weights. |
Similarity options:
keyword_network(data, similarity = "association")
keyword_network(data, similarity = "cosine")
keyword_network(data, similarity = "jaccard")
keyword_network(data, similarity = "inclusion")
keyword_network(data, similarity = "equivalence")Association strength is often useful for co-occurrence data because it compares the observed co-occurrence against the product of the two marginal frequencies. Cosine normalization is often easier to interpret for coupling because it scales shared references by the geometric mean of the two reference-list lengths.
edges <- author_network(data, "collaboration")
strong_edges <- prune(edges, threshold = 3)
local_top <- prune(edges, top_n = 5)
top_nodes <- filter_top(edges, n = 50)
backbone_net <- backbone(edges, alpha = 0.05)Use:
prune(threshold = x)for an absolute edge-weight cutoff.prune(top_n = k)to retain locally strong edges for each node.filter_top(n = k)to keep only the most connected nodes.backbone(alpha = x)to apply the Serrano-Boguñá-Vespignani disparity filter for multiscale weighted networks.
temporal_network(data, keyword_network, window = 3)
temporal_network(data, author_network, "collaboration",
window = 2, strategy = "sliding")
lcs <- local_citations(data)
h <- historiograph(data, n = 30)temporal_network() supports fixed, sliding, and cumulative windows. If a
window cannot be built, it warns with the window label instead of silently
dismissing the failure.
local_citations() counts within-corpus citations. historiograph() then
builds a directed network among the top locally cited documents.
to_matrix(edges)
to_gephi(edges)
to_graphml(edges, file = "network.graphml")
if (requireNamespace("igraph", quietly = TRUE)) {
g <- to_igraph(edges)
}Optional converters are guarded by requireNamespace(), so packages such as
igraph, tidygraph, and cograph are not required unless their output
formats are requested.
- Batagelj, V., & Cerinšek, M. (2013). On bibliographic networks. Scientometrics, 96(3), 845-864. doi:10.1007/s11192-012-0940-1.
- Campos, R., Mangaravite, V., Pasquali, A., Jorge, A. M., Nunes, C., & Jatowt, A. (2020). YAKE! Keyword extraction from single documents using multiple local features. Information Sciences, 509, 257-289. doi:10.1016/j.ins.2019.09.013.
- Hagen, N. T. (2008). Harmonic allocation of authorship credit: Source-level correction of bibliometric bias assures accurate publication and citation analysis. PLOS ONE, 3(12), e4021. doi:10.1371/journal.pone.0004021.
- Liu, X. Z., & Fang, H. (2023). A geometric counting method adaptive to the author number. Journal of Informetrics, 17(2), 101404. doi:10.1016/j.joi.2023.101404.
- Perianes-Rodríguez, A., Waltman, L., & van Eck, N. J. (2016). Constructing bibliometric networks: A comparison between full and fractional counting. Journal of Informetrics, 10(4), 1178-1195. doi:10.1016/j.joi.2016.10.006.
- Serrano, M. Á., Boguñá, M., & Vespignani, A. (2009). Extracting the multiscale backbone of complex weighted networks. Proceedings of the National Academy of Sciences, 106(16), 6483-6488. doi:10.1073/pnas.0808904106.
- Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4), 265-269. doi:10.1002/asi.4630240406.
- van Eck, N. J., & Waltman, L. (2009). How to normalize co-occurrence data? An analysis of some well-known similarity measures. Journal of the American Society for Information Science and Technology, 60(8), 1635-1651. doi:10.1002/asi.21075.
Two vignettes ship with the package:
vignette("bibnets")— end-to-end workflow: builders, counting and similarity options, attention weighting, network reduction, temporal windows, historiograph, and exports.vignette("reading-data", package = "bibnets")— readers for each supported source (Scopus, Web of Science, OpenAlex JSON and flat CSV, BibTeX, RIS, Lens, Dimensions, Crossref), the standard schema, generic CSV input, and multi-source merging.
MIT