Unusual spike in 23.02 literature data #2904
Labels
Data
Relates to Open Targets data team
Literature
Relates to EPMC literature pipeline
Platform
Issues related to Open Targets Platform
When looking at the data in
gs://open-targets-data-releases/23.02/output/etl/parquet/literature/matches
, we noticed that there are some coarse patterns in time, with regards to included publications, that look like large departures from older versions. For example, this shows the number of pmids by year in that dataset for 23.02 vs 22.06:Has anything changed drastically in the underlying corpus that might make this expected? To be clear, I don't think this is a definitive problem. It does seem to merit a little digging though ... I'm not aware of any reason to expect a spike like that in the mid 1970's.
Code
The text was updated successfully, but these errors were encountered: