## Introduction

This notebook is finding Rogue Scholar blog posts about the Retraction Watch project using the [Rogue Scholar API](https://api.rogue-scholar.org/posts). [Retraction Watch](https://retractionwatch.com/) reports on retractions of scientific papers. the project was started in 2010 by Ivan Oransky and Adam Marcus.

:::{.callout-note}
* We use the query `retraction watch`.
* We limit results to posts published since `2010` (the year Retraction Watch launched) and `en` as language.
* We retrieve the `title`, `authors`, `publication date`, `abstract`, `blog name`, `blog_slug`, and `doi`
* We sort the results in reverse chronological order (newest first)
:::

## Results

We found 22 blog posts mentioning `retraction watch` out of 10560 total posts, and ended up with 12 posts after manual curation:

```{mermaid}
flowchart LR
  A[10560] -- Query: retraction watch --> B(22)
  B -- Manual curation --> C(12)
```

In [19]:
import requests
import locale
import re
from typing import Optional
import pydash as py_
import datetime
from IPython.display import Markdown
locale.setlocale(locale.LC_ALL, "en_US")
baseUrl = "https://api.rogue-scholar.org/"
query = "retraction watch"
include_fields = "title,authors,published_at,summary,blog_name,blog_slug,doi,url"
url = baseUrl + f"posts?query={query.replace(' ', '+')}&published_since=2010&language=en&sort=published_at&order=desc&per_page=50&include_fields={include_fields}"
response = requests.get(url)
result = response.json()

# Some results removed after manual curation
curated = [1,3,9,12,16]
found = result["found"]
out_of = result["out_of"]
print(f"Found {found} out of {out_of} results.")

def get_post(post):
    return post["document"]

def format_post(post):
    url = post.get("doi", None) or post.get("url", None)
    url = f"[{url}]({url})"
    title = f"[{post['title']}]({url})"
    published_at = datetime.datetime.utcfromtimestamp(post["published_at"]).strftime("%B %-d, %Y")
    blog = f"[{post['blog_name']}](https://rogue-scholar.org/blogs/{post['blog_slug']})"
    author = ", ".join([ f"{x['name']}" for x in post.get("authors", None) or [] ])
    summary = post["summary"]
    return f"### {title}\n{url}<br />Published {published_at} in {blog}<br />{author}<br /><br />{summary}\n"

posts = [ get_post(x) for i, x in enumerate(result["hits"]) if i not in curated ]
posts_as_string = "\n".join([ format_post(x) for x in posts])

def doi_from_url(url: str) -> Optional[str]:
    """Return a DOI from a URL"""
    match = re.search(
        r"\A(?:(http|https)://(dx\.)?(doi\.org|handle\.stage\.datacite\.org|handle\.test\.datacite\.org)/)?(doi:)?(10\.\d{4,5}/.+)\Z",
        url,
    )
    if match is None:
        return None
    return match.group(5).lower()

# Get bibtex-formatted metadata for all posts
def get_bibtex(post):
    doi = doi_from_url(post["doi"])
    res = requests.get(baseUrl + "posts/" + doi + "?format=bibtex")
    return res.text

bibtex = "\n".join([ get_bibtex(x) for x in posts if x.get("doi", None) is not None ])
with open('references.bib', 'w') as f:
    f.write(bibtex)
    
Markdown(posts_as_string)

Found 22 out of 10560 results.


### [Generating Overlay blog posts]([https://doi.org/10.53731/gzrse-p5d35](https://doi.org/10.53731/gzrse-p5d35))
[https://doi.org/10.53731/gzrse-p5d35](https://doi.org/10.53731/gzrse-p5d35)<br />Published October 11, 2023 in [Front Matter](https://rogue-scholar.org/blogs/front_matter)<br />Martin Fenner<br /><br />On Monday the Rogue Scholar science blog archive launched a dedicated API.

### [Two new browser extensions that help with delivery or access - ebsco passport and GetFTR browser extension and other developments]([https://api.follow.it/track-rss-story-click/v3/seIFwoSimx5_rT3db7hkaYMOI2eu70Rg](https://api.follow.it/track-rss-story-click/v3/seIFwoSimx5_rT3db7hkaYMOI2eu70Rg))
[https://api.follow.it/track-rss-story-click/v3/seIFwoSimx5_rT3db7hkaYMOI2eu70Rg](https://api.follow.it/track-rss-story-click/v3/seIFwoSimx5_rT3db7hkaYMOI2eu70Rg)<br />Published September 4, 2023 in [Musings about Librarianship - Aaron Tay](https://rogue-scholar.org/blogs/musings)<br /><br /><br />

### [(The?) 3 kinds of papermills]([https://doi.org/10.59350/dakrb-j7a75](https://doi.org/10.59350/dakrb-j7a75))
[https://doi.org/10.59350/dakrb-j7a75](https://doi.org/10.59350/dakrb-j7a75)<br />Published October 31, 2022 in [Stories by Adam Day on Medium](https://rogue-scholar.org/blogs/clearskiesadam)<br />Adam Day<br /><br />TL;DR: Join me at ConTech Live to hear about a recent project with Open Credo to see if we could detect unusual co-authorships in a dataset created by Anna Abalkina. Sign up&nbsp;here! Papermilling has a few definitions which you see here and there.

### [This blog turned 15 (years old) this month]([https://doi.org/10.53731/bs60jms-sqaehsk](https://doi.org/10.53731/bs60jms-sqaehsk))
[https://doi.org/10.53731/bs60jms-sqaehsk](https://doi.org/10.53731/bs60jms-sqaehsk)<br />Published August 11, 2022 in [Front Matter](https://rogue-scholar.org/blogs/front_matter)<br />Martin Fenner<br /><br />The first post on this blog was published on August 3, 2007 (Open access may become mandatory for NIH-funded research). This is post number 465, and in the past 15 years the blog has seen changes in technology and hosting location – but I wrote all posts (with the exception of a few guest posts). The overall theme remained unchanged: technology used in scholarly communication.

### [When your journal reads you]([https://elephantinthelab.org/when-your-journal-reads-you](https://elephantinthelab.org/when-your-journal-reads-you))
[https://elephantinthelab.org/when-your-journal-reads-you](https://elephantinthelab.org/when-your-journal-reads-you)<br />Published April 14, 2021 in [Elephant in the Lab](https://rogue-scholar.org/blogs/elephantinthelab)<br />Elias Koch<br /><br /><strong>Introduction</strong> Renke Siems  In December 2018, a University of Minnesota web librarian, Cody Hanson, participated in a workshop hosted by the Coalition for Networked Information. The topic of this, and a number of other events to date, is the drive by major scholarly publishers to more fully integrate authentication systems for accessing electronic media into their platforms.

### [What’s going on with <i>Oculudentavis</i>?]([https://doi.org/10.59350/hk3jx-6sv77](https://doi.org/10.59350/hk3jx-6sv77))
[https://doi.org/10.59350/hk3jx-6sv77](https://doi.org/10.59350/hk3jx-6sv77)<br />Published July 22, 2020 in [Sauropod Vertebra Picture of the Week](https://rogue-scholar.org/blogs/svpow)<br />Mike Taylor<br /><br />Back in March, Nature published “Hummingbird-sized dinosaur from the Cretaceous period of Myanmar” by Xing et al. (2020), which described and named a tiny putative bird that was preserved in amber from Myanmar (formerly Burma). It’s a pretty spectacular find. Today, though, that paper is retracted. That’s a very rare occurrence for a palaeontology paper.

### [Suppression as a form of liberation?]([https://doi.org/10.59350/v5rp0-nde12](https://doi.org/10.59350/v5rp0-nde12))
[https://doi.org/10.59350/v5rp0-nde12](https://doi.org/10.59350/v5rp0-nde12)<br />Published July 3, 2020 in [A blog by Ross Mounce](https://rogue-scholar.org/blogs/rossmounce)<br />Ross Mounce<br /><br />On Monday 29th June 2020, I learned from Retraction Watch that Clarivate, the for-profit proprietor of <em>Journal Impact Factor</em> ™  has newly “suppressed”   33 journals from their indexing service.

### [The R2R debate, part 1: opening statement in support]([https://doi.org/10.59350/c8fz7-kyc20](https://doi.org/10.59350/c8fz7-kyc20))
[https://doi.org/10.59350/c8fz7-kyc20](https://doi.org/10.59350/c8fz7-kyc20)<br />Published February 27, 2020 in [Sauropod Vertebra Picture of the Week](https://rogue-scholar.org/blogs/svpow)<br />Mike Taylor<br /><br />This Monday and Tuesday, I was at the R2R (Researcher to Reader) conference at BMA House in London.

### [Guest Blog: Data in the time of Coronavirus]([https://doi.org/10.59350/qh3na-ehy20](https://doi.org/10.59350/qh3na-ehy20))
[https://doi.org/10.59350/qh3na-ehy20](https://doi.org/10.59350/qh3na-ehy20)<br />Published February 4, 2020 in [GigaBlog](https://rogue-scholar.org/blogs/gigablog)<br />Scott Edmunds<br /><br /><strong><em>With much of the GigaScience team spanning the Hong Kong-Shenzhen border and now confined to remote working, the current 2019-novel coronavirus outbreak has been particularly disruptive and close to home.

### [The lowest common denominator: marketing science with jIF]([https://doi.org/10.59350/8wafm-6yc04](https://doi.org/10.59350/8wafm-6yc04))
[https://doi.org/10.59350/8wafm-6yc04](https://doi.org/10.59350/8wafm-6yc04)<br />Published July 8, 2016 in [GigaBlog](https://rogue-scholar.org/blogs/gigablog)<br />Scott Edmunds<br /><br /><strong>Shallow Impact. Tis the season. </strong>In case people didn’t know— the world of scientific publishing has seasons: There is the Inundation season, which starts in November as authors rush to submit their papers before the end of year. Then there is the Recovery season beginning in January as editors come back from holidays to tackle the glut.

### [My Blank Pages V: Raw Data]([https://doi.org/10.59350/j2nba-89g15](https://doi.org/10.59350/j2nba-89g15))
[https://doi.org/10.59350/j2nba-89g15](https://doi.org/10.59350/j2nba-89g15)<br />Published March 31, 2016 in [quantixed](https://rogue-scholar.org/blogs/quantixed)<br />Stephen Royle<br /><br /><strong>Raw Data: A novel on Life in Science</strong> by <strong>Pernille Rørth</strong> (Springer, 2016)    I was keen to read this “lab lit” novel written by renowned cell biologist Pernille Rørth. I’d seen lots of enthusiastic comments about the book, and it didn’t disappoint.

### [What Difference Does It Make?]([https://doi.org/10.59350/cmyz9-ms451](https://doi.org/10.59350/cmyz9-ms451))
[https://doi.org/10.59350/cmyz9-ms451](https://doi.org/10.59350/cmyz9-ms451)<br />Published January 1, 2016 in [quantixed](https://rogue-scholar.org/blogs/quantixed)<br />Stephen Royle<br /><br />A few days ago, Retraction Watch published the top ten most-cited retracted papers. I saw this post with a bar chart to visualise these citations. It didn’t quite capture what the effect (if any) a retraction has on citations. I thought I’d quickly plot this out for the number one article on the list.    The plot is pretty depressing. The retraction has no effect on citations.

### [The <i>Medical Journal of Australia</i> vs Elsevier]([https://doi.org/10.59350/6m0m9-bng28](https://doi.org/10.59350/6m0m9-bng28))
[https://doi.org/10.59350/6m0m9-bng28](https://doi.org/10.59350/6m0m9-bng28)<br />Published May 6, 2015 in [Sauropod Vertebra Picture of the Week](https://rogue-scholar.org/blogs/svpow)<br />Matt Wedel<br /><br />While Mike’s been off having fun at the Royal Society, this has been happening: Lots of feathers flying right now over the situation at the Medical Journal of Australia (MJA). The short, short version is that AMPCo, the company that publishes MJA, made plans to outsource production of the journal, and apparently some sub-editing and […]

### [Getting Techy With It: GigaScience Technology Update 2014]([https://doi.org/10.59350/tzayy-hvn20](https://doi.org/10.59350/tzayy-hvn20))
[https://doi.org/10.59350/tzayy-hvn20](https://doi.org/10.59350/tzayy-hvn20)<br />Published November 27, 2014 in [GigaBlog](https://rogue-scholar.org/blogs/gigablog)<br />Nicole Nogoy<br /><br />When it comes to technology, <em>GigaScience</em> has always been open and willing to embrace new ways of integrating technology in its publishing processes, with the ultimate goal of working towards more reproducible, interactive and executable papers.

### [Continuing the push beyond static documents. ISMB, and more on our “What Bioinformaticians need to know about digital publishing beyond the PDF2” workshop]([https://doi.org/10.59350/gtw2x-fc921](https://doi.org/10.59350/gtw2x-fc921))
[https://doi.org/10.59350/gtw2x-fc921](https://doi.org/10.59350/gtw2x-fc921)<br />Published July 31, 2014 in [GigaBlog](https://rogue-scholar.org/blogs/gigablog)<br />Scott Edmunds<br /><br /><strong>Boston 2014: More than a (Bioinformatics) Feeling</strong> Following from our previous posting on BOSC, our birthday and the BMC Open Data award party in Boston, on top of having to dash between the many great talks and sessions at ISMB, we were kept even busier than usual helping to organize and present in a special Beyond-the-PDF inspired “What Bioinformaticians need to know about digital publishing beyond the PDF” workshop at the end

### [Rewarding Reproducibility: First Papers in our Galaxy Series utilizing our GigaGalaxy platform]([https://doi.org/10.59350/w8kyr-0e629](https://doi.org/10.59350/w8kyr-0e629))
[https://doi.org/10.59350/w8kyr-0e629](https://doi.org/10.59350/w8kyr-0e629)<br />Published February 6, 2014 in [GigaBlog](https://rogue-scholar.org/blogs/gigablog)<br />Scott Edmunds<br /><br /><strong>Push the button! <em>GigaScience</em> moves toward more interactive articles</strong> Research articles are being published with increasingly large and complicated supporting datasets, together with the software code used in analyses of the data.

### [The difficulties sharing neuroscience data: can data publishing help?]([https://doi.org/10.59350/kmne3-5xg86](https://doi.org/10.59350/kmne3-5xg86))
[https://doi.org/10.59350/kmne3-5xg86](https://doi.org/10.59350/kmne3-5xg86)<br />Published May 9, 2013 in [GigaBlog](https://rogue-scholar.org/blogs/gigablog)<br />Scott Edmunds<br /><br />Last week we published our first neuroscience data note containing 10GB of fMRI data hosted and integrated into the paper by a DOI to our GigaDB database. While we have published a number of genomics datasets and data notes (see the Puerto Rican Parrot genome data note and its associated data DOI), this is a nice example of us providing a home for “orphan data”, the long tail of data types without community agreed curated repositories.


### References

::: {#refs}
:::