# Python and APIs

The problems in this notebook touch on the material covered in the Lecture 2: Python and APIs notebook.

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
from time import sleep

##### 1. scite_

We start with a problem continuing with the final problem in `3. Web Scraping`. While our direct requests for `www.science.org` html data may have been stymied, there is another path.

If we have the dois for these articles we can submit requests for the article metadata to the `scite_` api for free. First we load in the articles and demonstrate how we can extract the dois from the Science articles.

In [2]:
articles = pd.read_csv("journal_article_urls.csv")

In [3]:
articles.loc[articles.domain=='www.science.org'].url.values[1]

'https://www.science.org/doi/10.1126/scisignal.abk3067'

In the example url above the text following `doi/` is the doi extension for that particular article. To see this first look at the article via its link, <a  href="https://www.science.org/doi/10.1126/scisignal.abk3067">https://www.science.org/doi/10.1126/scisignal.abk3067</a> and then access it with this doi url <a href="https://www.doi.org/10.1126/scisignal.abk3067">https://www.doi.org/10.1126/scisignal.abk3067</a>.

Unfortunately `scite_` does not have a nice Python API wrapper, but we can still submit requests to their API with python. We demonstrate how below.

In [4]:
## The basic request string looks like this
'https://api.scite.ai/{endpoint}/{doi}'

## For us the API "endpoint" we want is 'papers/'
## and for this example we will use the doi from above, '10.1126/sciadv.abo1719'
endpoint = 'papers/'
doi = '10.1126/scisignal.abk3067'


## then you just call requests.get for the string
r = requests.get('https://api.scite.ai/' + endpoint + doi)

In [5]:
## We can get the returned data with
## r.json()
r.json()

{'id': 11371189169,
 'doi': '10.1126/scisignal.abk3067',
 'slug': 'march8-attenuates-cgas-mediated-innate-immune-5GEVWzR2',
 'type': 'journal-article',
 'title': 'MARCH8 attenuates cGAS-mediated innate immune responses through ubiquitylation',
 'abstract': 'Cyclic GMP-AMP synthase (cGAS) binds to microbial and self-DNA in the cytosol and synthesizes cyclic GMP-AMP (cGAMP), which activates stimulator of interferon genes (STING) and downstream mediators to elicit an innate immune response. Regulation of cGAS activity is essential for immune homeostasis. Here, we identified the E3 ubiquitin ligase MARCH8 (also known as MARCHF8, c-MIR, and RNF178) as a negative regulator of cGAS-mediated signaling. The immune response to double-stranded DNA was attenuated by overexpression of MARCH8 and enhanced by knockdown or knockout of MARCH8. MARCH8 interacted with the enzymatically active core of cGAS through its conserved RING-CH domain and catalyzed the lysine-63 (K63)–linked polyubiquitylation of 

Write a script to use the `scite_` api to get the title, authors and doi for each `www.science.org` paper.

In [6]:
def science(url):
    doi = url.split("doi/")[-1]
    endpoint = 'papers/'
    r = requests.get('https://api.scite.ai/' + endpoint + doi)
    
    if 'title' in r.json().keys():
        title = r.json()['title']
    else:
        title = "NA"
        
    if 'authors' in r.json().keys():
        authors = ", ".join([author['given'] + " " + author['family'] for author in r.json()['authors']])
    else:
        authors = "NA"
        
    return title, authors, doi

In [7]:
for url in articles.loc[articles.domain=='www.science.org'].url.values:
    print(url)
    title,authors,doi = science(url)
    print(title)
    print(authors)
    print(doi)
    print()
    sleep(3)

https://www.science.org/doi/10.1126/sciimmunol.abo2159
ILC killer: Qu’est-ce que c’est?
David R. Withers, Matthew R. Hepworth
10.1126/sciimmunol.abo2159

https://www.science.org/doi/10.1126/scisignal.abk3067
MARCH8 attenuates cGAS-mediated innate immune responses through ubiquitylation
Xikang Yang, Chengrui Shi, Hongpeng Liu, Siqi Shen, Chaofei Su, Hang Yin
10.1126/scisignal.abk3067

https://www.science.org/doi/10.1126/sciimmunol.abm8161
Succinate dehydrogenase/complex II is critical for metabolic and epigenetic regulation of T cell proliferation and inflammation
Xuyong Chen, Benjamin D. Sunkel, Meng Wang, Siwen Kang, Tingting Wang, JN Rashida Gnanaprakasam, Lingling Liu, Teresa Cassel, David A. Scott, Ana M. Muñoz-Cabello, José López‐Barneo, Jun Yang, Andrew N. Lane, Gang Xu, Teresa W.‐M. Fan, Ruoning Wang
10.1126/sciimmunol.abm8161

https://www.science.org/doi/10.1126/scitranslmed.abo5395
The rapid replacement of the SARS-CoV-2 Delta variant by Omicron (B.1.1.529) in England
Robert S

##### 2. Book Reviews

Use the `pynytimes` package to get any New York Times reviews for books by the author David Graeber.

##### Sample Solution

In [12]:
from pynytimes import NYTAPI
from steve_api_info import get_nytimes_key

In [13]:
nytapi = NYTAPI(get_nytimes_key(), parse_dates=True) 

In [20]:
nytapi.book_reviews(author = "David Graeber")

[{'url': 'https://www.nytimes.com/2018/06/26/books/review/david-graeber-bullshit-jobs.html',
  'publication_dt': datetime.date(2018, 6, 26),
  'byline': 'ALANA SEMUELS',
  'book_title': 'Bullshit Jobs: A Theory',
  'book_author': 'David Graeber',
  'summary': 'In “Bull__ Jobs,” the anthropologist David Graeber argues that technological advances have led to people working more, not fewer, hours at useless jobs.',
  'uuid': '00000000-0000-0000-0000-000000000000',
  'uri': 'nyt://book/00000000-0000-0000-0000-000000000000',
  'isbn13': ['9781501143311']}]

##### 3. IMDB Data
Use `Cinemagoer` to find the rating of <a href="https://www.imdb.com/title/tt8097030/">Turning Red</a> on IMDB. Also produce a list of all the cast members.

<i>Hint: once you have gotten the movie returned from IMDB, try doing `variable.data`, where you should replace `variable` with whatever variable name you used to store the movie.</i>

In [21]:
from imdb import Cinemagoer

In [22]:
ia = Cinemagoer()

In [23]:
ia.search_movie('Turning Red')

[<Movie id:8097030[http] title:_Turning Red (2022)_>,
 <Movie id:16026664[http] title:_Embrace the Panda: Making Turning Red (2022)_>,
 <Movie id:1086640[http] title:_"Red Chapters: Turning Points in the History of Communism" (1999) (mini)_>,
 <Movie id:5370536[http] title:_Turning on the Red Lights: Making of 'Red Lights' (2012) (V)_>,
 <Movie id:0087010[http] title:_The Burning Bed (1984) (TV)_>,
 <Movie id:18688690[http] title:_Turning Red (2022)_>,
 <Movie id:28736864[http] title:_Turning Red (2022)_>,
 <Movie id:27180222[http] title:_Turning Red (2022)_>,
 <Movie id:29027953[http] title:_Turning Red (2023)_>,
 <Movie id:18518800[http] title:_Turning Red (2022)_>,
 <Movie id:18688286[http] title:_Turning Red (2022)_>,
 <Movie id:18952048[http] title:_Turning Red (2022)_>,
 <Movie id:18548220[http] title:_Turning Red (2022)_>,
 <Movie id:24020516[http] title:_Turning Red (2022)_>,
 <Movie id:25563134[http] title:_Turning Red (2022)_>,
 <Movie id:27316358[http] title:_Turning Red (20

In [24]:
turningred_id = '8097030'

turningred = ia.get_movie(turningred_id)

In [25]:
print("IMDB Rating:", turningred['rating'])

IMDB Rating: 7.0


In [26]:
[cast_member['name'] for cast_member in turningred['cast']]

['Rosalie Chiang',
 'Sandra Oh',
 'Ava Morse',
 'Hyein Park',
 'Maitreyi Ramakrishnan',
 'Orion Lee',
 'Wai Ching Ho',
 'Tristan Allerick Chen',
 'Lori Tan Chinn',
 'Mia Tagano',
 'Sherry Cola',
 'Lillian Lim',
 'James Hong',
 'Jordan Fisher',
 "Finneas O'Connell",
 'Topher Ngo',
 'Grayson Villanueva',
 'Josh Levi',
 'Sasha Roiz',
 'Addison Chandler',
 'Lily Sanfelippo',
 'Anne-Marie',
 'Brian Cummings']

##### 4. Python Wrapper for the Reddit API

In this problem you will become more familiar with the `praw` package, <a href="https://praw.readthedocs.io/en/stable/">https://praw.readthedocs.io/en/stable/</a>.

`praw` is a Python wrapper for Reddit's API, which allows you to scrape Reddit data without having to write much code.

The first step for using `praw` is creating a Reddit application with your Reddit account, instructions on how to do so can be found here, <a href="https://github.com/reddit-archive/reddit/wiki/OAuth2-Quick-Start-Example#first-steps">https://github.com/reddit-archive/reddit/wiki/OAuth2-Quick-Start-Example#first-steps</a>.

The second step is installing `praw`, you can find instructions to do so here, <a href="https://praw.readthedocs.io/en/stable/getting_started/installation.html">https://praw.readthedocs.io/en/stable/getting_started/installation.html</a>, for `pip` and here, <a href="https://anaconda.org/conda-forge/praw">https://anaconda.org/conda-forge/praw</a> for `conda`.

Once you think that you have successfully installed `praw` try running the code chunks below.

In [27]:
import praw

ModuleNotFoundError: No module named 'praw'

In [None]:
print(praw.__version__)

Next you need to connect to the API using your app's credentials. <b>As always, never share your credentials with anyone, especially online. Store these in a safe place on your computer</b>. I have stored them in the file `matt_api_info.py` which can only be found on my personal laptop.

In [None]:
from matt_api_info import get_reddit_client_id, get_reddit_client_secret

In [None]:
## Connect to the api
reddit = praw.Reddit(
    ## input your client_id here
    client_id=get_reddit_client_id(),
    ## input your client_secret here
    client_secret=get_reddit_client_secret(),
    ## put in a string for your user_agent here
    user_agent="testscript"
)

Once you have a connection to the Reddit API, you can start to request data.

For example, with `.subreddit`, <a href="https://praw.readthedocs.io/en/stable/code_overview/models/subreddit.html">https://praw.readthedocs.io/en/stable/code_overview/models/subreddit.html</a>, you can get the information for a particular subreddit. Choose your favorite subreddit below.

In [None]:
## place the name of your favorith subreddit here,
## this should not include r/
## for example, "books" leads to the books subreddit, https://www.reddit.com/r/books/
subreddit_name = "books"

## here we get the subreddit data
subreddit = reddit.subreddit(subreddit_name)

Here is some of the data you can get on a subreddit.

In [None]:
## The name of the subreddit
subreddit.display_name

In [None]:
## The description of the subreddit
print(subreddit.description)

In [None]:
## The number of subscribers
subreddit.subscribers

Read the `praw` 'Quick Start' documentation, <a href="https://praw.readthedocs.io/en/stable/getting_started/quick_start.html">https://praw.readthedocs.io/en/stable/getting_started/quick_start.html</a>, to find how to get the top 1 "hot" submissions to your favorite subreddit.

Store this in a variable named `top_post`.

In [None]:
top_post = [post for post in subreddit.hot(limit=1)][0]

Read the `praw` submission documentation, <a href="https://praw.readthedocs.io/en/latest/code_overview/models/submission.html">https://praw.readthedocs.io/en/latest/code_overview/models/submission.html</a>, to return the:
- Author of the post,
- The title of the post,
- The text of the post (if there is any),
- The number of comments and
- The number of upvotes.

In [None]:
print(top_post.author)

In [None]:
print(top_post.title)

In [None]:
print(top_post.selftext)

In [None]:
print(top_post.num_comments)

In [None]:
print(top_post.score)

You can learn more about `praw` by reading the documentation, <a href="https://praw.readthedocs.io/en/latest/index.html">https://praw.readthedocs.io/en/latest/index.html</a>.

--------------------------

This notebook was written for the Erd&#337;s Institute C&#337;de Data Science Boot Camp by Matthew Osborne, Ph. D., 2023.

Any potential redistributors must seek and receive permission from Matthew Tyler Osborne, Ph.D. prior to redistribution. Redistribution of the material contained in this repository is conditional on acknowledgement of Matthew Tyler Osborne, Ph.D.'s original authorship and sponsorship of the Erdős Institute as subject to the license (see License.md)