# MOZ Links API Documentation
https://moz.com/help/linksa-api

Welcome to the MOZ Links API! Using this, you can access the same data that
powers the MOZ Link Explorer. This environment you're in is called a Jupyter
Notebook, though the actual software that's running the Notebook is called
JupyterLab. There are a variety of ways to run Notebooks, including cloud-hosted, 
such as Google Colab.

This is the documentation that is embedded into the Notebook along with live
run-able code. Below is Python programming code. When you run a cell, it actually 
runs in a full fledged Python virtual machine (kernel), with a fully modern, 
exactly as from the Python.org CPython official version.

## Learn a Little Python

Don't be scared. Many people using APIs like this will do everything within their
power to not look at a line of code, but instead lean on tools like ScreamingFrog
API integration or a Google Sheets or Excel plug-in. I'm here to show you it's not
so bad, and you can do plenty of things you can't do under those restrictions.

## Black Uncompromising Code Formatting

Stylistic code formatting issues are not automatically tended to by JupyterLab. 
basically lets you use whatever coding style you like, but this next line loads
a pre-installed external resource called "black" which takes care of issues like
line-wraps, indents and other matters of PEP8 coding style compliance. Running it
is optional, but when you do it will reformat each of your code-blocks after you
run them.

In [None]:
%load_ext lab_black

# Do global imports

These below import lines are to load resources that are not loaded into the
running Python interpreter by default, but we know we will need it later on.
They can be used anywhere in the Notebook.

In [1]:
# Import Python libraries
import json
import requests
from headlines import *
from pprint import pprint
from sqlitedict import SqliteDict as sqldict

# Load login values from file

We avoid putting login information in the part of this folder that gets
syncronized up to Github in a repo.

In [None]:
# Get credentials from external file
with open("mozcreds.txt") as fh:
    ACCESSID, SECRETKEY = [x.strip().split(": ")[1] for x in fh.readlines()]

# Configure variables

Things in ALL CAPS are constants and never change, though that's not a set
rule in Python. The rest of the things are just enough to define an API-call.

In [None]:
ENDPOINT = "https://lsapi.seomoz.com/v2/"
end_sub = "anchor_text"

url = ENDPOINT + end_sub
auth = (ACCESSID, SECRETKEY)
data = {"target": "moz.com/blog", "scope": "page", "limit": 1}
data = json.dumps(data)

# Hit API (Ensure Success)

This step actually uses the requests package to reach out over the
Internet and hit the MOZ API. The data that comes back is real.

In [None]:
r = requests.post(url, data=data, auth=auth)
print(r)
r.json()

In [None]:
# This is no longer needed. It was used to convert the stringified json
# Python request examples from the MOZ website into Python dict objects.

def objectify(strjson):
    """Returns dict object given stringified JSON."""
    adict = json.loads(strjson)
    return adict

# List Endpoints

These are the various sub-endpoints. The common part of the service's
address is in the ENDPOINT constant. It never changes. But we may hit
different services reachable from that common endpoint:

In [None]:
points = [
    "anchor_text",
    "final_redirect",
    "global_top_pages",
    "global_top_root_domains",
    "index_metadata",
    "link_intersect",
    "link_status",
    "linking_root_domains",
    "links",
    "top_pages",
    "url_metrics",
    "usage_data",
    "link_intersect",
    "link_status",
    "url_metrics",
]

In [None]:
# List all available endpoints
for i, point in enumerate(points):
    print(i + 1, point)

We can make an identical list, but with the "human" labels, plus descriptions.
If I keep it in the same order, I can "zip" the 2 lists together for documentation.

In [None]:
names = [
    "Anchor Text",
    "Final Redirect",
    "Global Top Pages",
    "Global Top Root Domains",
    "Index Metadata",
    "Link Intersect",
    "Link Status",
    "Linking Root Domains",
    "Links",
    "Top Pages",
    "URL Metrics",
    "Usage Data",
    "Link Intersect",
    "Link Status",
    "URL Metrics",
]

In [None]:
descriptions = [
    "Use this endpoint to get data about anchor text used by followed external links to a target. Results are ordered by external_root_domains descending.",
    "Use this endpoint to get data about anchor text used by followed external links to a target. Results are ordered by external_root_domains descending.",
    "This endpoint returns the top 500 pages in the entire index with the highest Page Authority values, sorted by Page Authority. (Visit the Top 500 Sites list to explore the top root domains on the web, sorted by Domain Authority.)",
    "This endpoint returns the top 500 pages in the entire index with the highest Page Authority values, sorted by Page Authority. (Visit the Top 500 Sites list to explore the top root domains on the web, sorted by Domain Authority.)",
    "This endpoint returns the top 500 pages in the entire index with the highest Page Authority values, sorted by Page Authority. (Visit the Top 500 Sites list to explore the top root domains on the web, sorted by Domain Authority.)",
    "Use this endpoint to get sources that link to at least one of a list of positive targets and don't link to any of a list of negative targets.",
    "Use this endpoint to get information about links from many sources to a single target.",
    "Use this endpoint to get linking root domains to a target.",
    "Use this endpoint to get links to a target.",
    "This endpoint returns top pages on a target domain.",
    "Use this endpoint to get metrics about one or more urls.",
    "This endpoint Returns the number of rows consumed so far in the current billing period. The count returned might not reflect rows consumed in the last hour. The count returned reflects rows consumed by requests to both the v1 (Moz Links API) and v2 Links APIs.",
    "Use this endpoint to get information about links from many sources to a single target.",
    "Use this endpoint to get metrics about one or more urls.",
    "Use this endpoint to get sources that link to at least one of a list of positive targets and don't link to any of a list of negative targets.",
]

In [None]:
# Simple zipping example
list(zip(names, points))

# Make an example request for each endpoint.

In [None]:
ed = {
    "anchor_text": {"target": "moz.com/blog", "scope": "page", "limit": 5},
    "links": {
        "target": "moz.com/blog",
        "target_scope": "page",
        "filter": "external+nofollow",
        "limit": 1,
    },
    "final_redirect": {"page": "seomoz.org/blog"},
    "global_top_pages": {"limit": 5},
    "global_top_root_domains": {"limit": 5},
    "index_metadata": {},
    "link_intersect": {
        "positive_targets": [
            {"target": "latimes.com", "scope": "root_domain"},
            {"target": "blog.nytimes.com", "scope": "subdomain"},
        ],
        "negative_targets": [{"target": "moz.com", "scope": "root_domain"}],
        "source_scope": "page",
        "sort": "source_domain_authority",
        "limit": 1,
    },
    "link_status": {
        "target": "moz.com/blog",
        "sources": ["twitter.com", "linkedin.com"],
        "source_scope": "root_domain",
        "target_scope": "page",
    },
    "linking_root_domains": {
        "target": "moz.com/blog",
        "target_scope": "page",
        "filter": "external",
        "sort": "source_domain_authority",
        "limit": 5,
    },
    "top_pages": {"target": "moz.com", "scope": "root_domain", "limit": 5},
    "url_metrics": {"targets": ["moz.com", "nytimes.com"]},
    "usage_data": {},
}

In [None]:
for i, point in enumerate(points):
    h1(f"{i + 1}. {names[i]} ({point})")
    print(descriptions[i])
    h4("Example request:")
    pprint(ed[point])
    print()

# About The Above

The above output has not really touched the API. It's only listing each
endpoint and enough about them to run a sample request. But we don't want
to run these requests over and over because API usage has some cost.

In order to just run each API sample query once, we're going to store the
results as we receive them and ***check*** whether we've already gotten
those results before attempting to retrieve them again.

# Define a Function

Before we go executing the above requests like in the opening example
where we first touched the API, we're going to create a function so that
we're not repeating the same code. Each time we hit the MOZ Links API,
the only thing that changes is the sub-endpoint and the request. And so:

In [None]:
def moz(subend, datadict):
    """Hits MOZ Links API with specified endpoint and request and returns results."""
    data = json.dumps(datadict)
    url = ENDPOINT + subend
    r = requests.post(url, data=data, auth=auth)
    return r

In [None]:
with sqldict("dbs/mozlinksapi.db") as db:
    for endpoint in points:
        if endpoint not in db:
            print(endpoint)
            result = moz(point, ed[point])
            db[endpoint] = result
            db.commit()
            print("API hit and response saved!")
            print()

In [None]:
with sqldict("dbs/mozlinksapi.db") as db:
    for key in db:
        print(key)

In [None]:
with sqldict("dbs/mozlinksapi.db") as db:
    for key in db:
        print(db[key])

In [3]:
with sqldict("dbs/mozlinksapi.db") as db:
    for key in db:
        pprint(db[key].json())

{'results': [{'deleted_pages_to_page': 1952554,
              'deleted_pages_to_root_domain': 18974072,
              'deleted_pages_to_subdomain': 18511335,
              'deleted_root_domains_to_page': 6523,
              'deleted_root_domains_to_root_domain': 27521,
              'deleted_root_domains_to_subdomain': 27276,
              'domain_authority': 91,
              'external_indirect_pages_to_root_domain': 45290022,
              'external_nofollow_pages_to_page': 9695079,
              'external_nofollow_pages_to_root_domain': 17454717,
              'external_nofollow_pages_to_subdomain': 17298845,
              'external_pages_to_page': 14962736,
              'external_pages_to_root_domain': 69285993,
              'external_pages_to_subdomain': 68654664,
              'external_redirect_pages_to_page': 3632557,
              'external_redirect_pages_to_root_domain': 41080305,
              'external_redirect_pages_to_subdomain': 41076921,
              'http_code': 200