# Data Retrieval II

In this notebook, we will work with the following:

- API access.
    - NYTimes API.
    - FRED API.
- Commercial databases.
- SQL.

In [None]:
import datetime
import os

import plotly.express as px
import pandas as pd
import pyfredapi as pf
from pynytimes import NYTAPI

In [None]:
pd.set_option("mode.copy_on_write", True)

# APIs

As we're about to see, it's really nice when sites help us out.

We'll be using the New York Times API, and you'll need a key to use it. You can get one [here](https://developer.nytimes.com/accounts/create).
Once you sign up for an account, do the following:

1. Sign in to their [developer site](https://developer.nytimes.com) with your new account.
1. In the upper right, click on your user id (it's the email address you signed up with).
1. Then, in the popover, click "Apps".
1. On the resulting page, click the "New App" button.
1. I used the following info:
    - App name: workshop-example
    - Description: Text analysis workshop API example
1. Click the "Enable" button to turn on the "Article Search API".
1. Click "Save" in the bottom right of the page.
1. On your app page, click the "Copy API Key" button to the right of your API key.
1. Paste the API key into the string below.

In [None]:
nyt_api = NYTAPI(os.environ["API_NYT"])

msft_articles = nyt_api.article_search(
    query="Microsoft", dates={"begin": datetime.datetime(2018, 8, 1)}, results=20
)

In [None]:
msft_nyt = pd.DataFrame(msft_articles)
msft_nyt.head()

In [None]:
# While we're at it, let's add Microsoft's ticker.
# We'd usually add an identifier when getting query results.
msft_nyt["id_ticker"] = "msft"

In [None]:
# We should also clean up the publication date.
type(msft_nyt["pub_date"][0])

In [None]:
msft_nyt["pub_date"] = pd.to_datetime(msft_nyt["pub_date"])
type(msft_nyt["pub_date"][0])

This is only one example, and there are a lot of APIs out there.
Many of them have packages, official or unofficial, that will make access easy.

# FRED

Another API example is FRED, the Federal Reserve Economic Data system.
It's a fairly typical API in that you register for a key, and then you can access the data.

Let's try a less-guided example of obtaining your API key.

- [FRED API key](https://fred.stlouisfed.org/docs/api/api_key.html)
- [pyfredapi documentation](https://pyfredapi.readthedocs.io/en/latest/)

Below, we have a simple example of getting the 10-year US Treasury bond yield at monthly resolution from FRED.

In [None]:
fred_gs10 = pf.get_series("GS10")
fred_gs10.head()

In [None]:
px.line(
    fred_gs10[fred_gs10["date"] >= pd.to_datetime("2012-01-01")],
    x="date",
    y="value",
    template="plotly_dark",
).show()

In [None]:
len(fred_gs10[fred_gs10["date"] >= pd.to_datetime("2012-01-01")])

# Commercial databases

This is a more difficult topic to cover in a hands-on workshop (especially one with  openly-distributed materials) because of copyright.
They tend to sort out into a couple of types:

1. Those that come in tabular formats and simply require cleaning.
2. Others (like LexisNexis and Factiva) that come in semi-structured form and require extensive parsing. 

For the second type, it's best to either recruit one of the few coauthors with this skillset (and perhaps even a written or partially written implementation) or hiring a programmer or student who can write one.
It's worth noting that it's not all that hard to get a reasonable implementation written, but going from 90 percent parsing accuracy to 99 percent to 99.9+ percent is difficult, painstaking work.

With that in mind, feel free to ask questions about this topic during the workshop.
If time permits, I'll demonstrate some non-sharable stuff.

# SQL

For some data sources, it is helpful to use Structured Query Language ("SQL") to query a database.
The Wharton Management Data Service ("WRDS") has a [Python package](https://github.com/wharton/wrds) that makes it easy to pass queries to their servers and get back a pandas dataframe.
We will work with this in a later segment.

# Breakout Exercises (time permitting)

If time permits, choose one of the following as a group.

1. Experiment with other searches with the NY Times API by adapting the code above. You may want to create new cells below.