## Introduction

This notebook uses the [Rogue Scholar](https://rogue-scholar.org) science blog archive to search for science blog posts, 
retrieve the results, and format them. We are interested in book reviews published in 2023.

:::{.callout-note}
* We use the query `retraction watch`.
* We limit results to posts published since `2010` (the year Retraction Watch launched) and `en` as language.
* We retrieve the `title`, `authors`, `publication date`, `abstract`, `blog name`, and `doi`
* We sort the results in reverse chronological order (newest first)
:::

In [5]:
import requests
import pydash as py_
from markdown_it import MarkdownIt
baseUrl = "https://api.rogue-scholar.org/"
query = "retraction watch"
include_fields = "title,authors,published_at,summary,blog_name,doi"
url = baseUrl + f"posts?query={query.replace(' ', '+')}&published_since=2010&language=en&sort=published_at&order=desc&per_page=50&include_fields={include_fields}"
response = requests.get(url)
result = response.json()

# Some results removed after manual curation
curated = [1,3,9,12,16]
found = result["found"]
out_of = result["out_of"]

def get_post(post):
    return post["document"]

def format_post(post):
    md = MarkdownIt('commonmark' ,{'breaks':True,'html':True})
    title = post["title"]
    blog = post["blog_name"]
    url = post.get("doi", "")
    summary = post["summary"]
    return f"### {title}\nPublished in {blog}\n{url}\n{summary}\n"  # md.render

posts = [ get_post(x) for i, x in enumerate(result["hits"]) if i not in curated ]
posts_as_string = "\n".join([ format_post(x) for x in posts])

In [None]:
# Get bibtex-formatted metadata for all posts
def get_bibtex(post):
    doi = doi_from_url(post["doi"])
    res = requests.get(baseUrl + "posts/" + doi + "?format=bibtex")
    return res.text

bibtex = "\n".join([ get_bibtex(x) for x in posts if x.get("doi", None) is not None ])
with open('references.bib', 'w') as f:
    f.write(bibtex)


## Results

We found 17 blog posts mentioning `retraction watch` out of 9086 total posts, and ended up with 12 posts after manual curation:

```{mermaid}
flowchart LR
  A[9086] -- Query: retraction watch --> B(17)
  B -- Manual curation --> C(12)
```

```{Python}

```

## References


In [None]:
print(posts_as_string)