API (application programming interface) to automatically request specific information from a website (and then use that information to visualization)

API call: specific URLs to request certain information, typically returned in a JSON or CSV file

We will use GitHub API to request information about Python projects

https://
api.github.com/      --> part of GitHub that responds to API calls
search/repositories  --> conduct the search through the repositories
?q=language:python   --> q stands for "query", primary language python
&sort=stars          --> sort the projects for the number of stars

In [1]:
# request information from a website
import requests

In [2]:
url = "https://api.github.com/search/repositories?q=language:python&sort=stars"

# we store the the version of the API
headers = {"Accept": "application/vnd.github.v3+json"}

# API call, assign to a response object
r = requests.get(url, headers=headers)
# status 200 indicates succesful response
print(f"Status code: {r.status_code}")

# store API response in a variable, it comes in a json, we use json method to convert to a dict
response_dict = r.json()

print(response_dict.keys())

Status code: 200
dict_keys(['total_count', 'incomplete_results', 'items'])


In [3]:
repo_dicts = response_dict["items"]

# repositories returned
len(repo_dicts)

30

In [4]:
# eximine the first repository
repo_dict = repo_dicts[0]
print(f"\nKeys: {len(repo_dict)}")
for key in sorted(repo_dict.keys()):
    print(key)


Keys: 74
archive_url
archived
assignees_url
blobs_url
branches_url
clone_url
collaborators_url
comments_url
commits_url
compare_url
contents_url
contributors_url
created_at
default_branch
deployments_url
description
disabled
downloads_url
events_url
fork
forks
forks_count
forks_url
full_name
git_commits_url
git_refs_url
git_tags_url
git_url
has_downloads
has_issues
has_pages
has_projects
has_wiki
homepage
hooks_url
html_url
id
issue_comment_url
issue_events_url
issues_url
keys_url
labels_url
language
languages_url
license
merges_url
milestones_url
mirror_url
name
node_id
notifications_url
open_issues
open_issues_count
owner
private
pulls_url
pushed_at
releases_url
score
size
ssh_url
stargazers_count
stargazers_url
statuses_url
subscribers_url
subscription_url
svn_url
tags_url
teams_url
trees_url
updated_at
url
watchers
watchers_count


In [5]:
# we have 73 keys

# Pull out the values for some of the keys in the repo_dict

print("\nSelected information about first repository:")
print(f"Name: {repo_dict['name']}")
print(f"Owner: {repo_dict['owner']['login']}")
print(f"Stars: {repo_dict['stargazers_count']}")
print(f"Repository: {repo_dict['html_url']}")
print(f"Created: {repo_dict['created_at']}")
print(f"Updated: {repo_dict['updated_at']}")
print(f"Description: {repo_dict['description']}")


Selected information about first repository:
Name: system-design-primer
Owner: donnemartin
Stars: 87146
Repository: https://github.com/donnemartin/system-design-primer
Created: 2017-02-26T16:15:28Z
Updated: 2020-04-02T15:46:16Z
Description: Learn how to design large-scale systems. Prep for the system design interview.  Includes Anki flashcards.


In [6]:
print("\nSelected information about each repository:")
for repo_dict in repo_dicts:
   print(f"Name: {repo_dict['name']}")
   print(f"Owner: {repo_dict['owner']['login']}")
   print(f"Stars: {repo_dict['stargazers_count']}")
   print(f"Repository: {repo_dict['html_url']}")
   print(f"Created: {repo_dict['created_at']}")
   print(f"Updated: {repo_dict['updated_at']}")
   print(f"Description: {repo_dict['description']}") 


Selected information about each repository:
Name: system-design-primer
Owner: donnemartin
Stars: 87146
Repository: https://github.com/donnemartin/system-design-primer
Created: 2017-02-26T16:15:28Z
Updated: 2020-04-02T15:46:16Z
Description: Learn how to design large-scale systems. Prep for the system design interview.  Includes Anki flashcards.
Name: awesome-python
Owner: vinta
Stars: 80826
Repository: https://github.com/vinta/awesome-python
Created: 2014-06-27T21:00:06Z
Updated: 2020-04-02T15:44:05Z
Description: A curated list of awesome Python frameworks, libraries, software and resources
Name: public-apis
Owner: public-apis
Stars: 74102
Repository: https://github.com/public-apis/public-apis
Created: 2016-03-20T23:49:42Z
Updated: 2020-04-02T15:46:21Z
Description: A collective list of free APIs for use in software and web development.
Name: Python
Owner: TheAlgorithms
Stars: 69079
Repository: https://github.com/TheAlgorithms/Python
Created: 2016-07-16T09:44:01Z
Updated: 2020-04-02T15:

## Visualizing Repositories using Plotly

bar chart, being the height the number of stars the repository 

In [7]:
import plotly
from plotly import offline

In [8]:
url = "https://api.github.com/search/repositories?q=language:python&sort=stars"
headers = {"Accept": "application/vnd.github.v3+json"}
r = requests.get(url, headers=headers)
print(f"Status code: {r.status_code}")

Status code: 200


In [9]:
# Process results
response_dict = r.json()
repo_dicts = response_dict["items"]
repo_names, stars, labels = [], [], []
for repo_dict in repo_dicts:
    repo_names.append(repo_dict["name"])
    stars.append(repo_dict["stargazers_count"])
        
    # add info to the labels
    owner = repo_dict["owner"]["login"]
    description = repo_dict["description"]
    label = f"{owner}<br />{description}"
    labels.append(label)

In [10]:
# Make visualization
data = [{
    "type":"bar",
    "x": repo_names,
    "y": stars,
    "hovertext": labels,   # added text to hover the bars
    "marker":{
        "color":"rgb(60,100,150)",
        "line": {"width":1.5, "color": "rgb(25,25,25)"}
    },
    "opacity":0.6
}]

my_layout = {
    "title": "Most-Starred Python Projects on GigHub",
    "xaxis": {"title": "Repository",
              "titlefont":{"size":24},
              "tickfont":{"size":14}
             },
    "yaxis": {"title": "Stars",
              "titlefont":{"size":24},
              "tickfont":{"size":14}
             },
}

fig = {"data": data, "layout":my_layout}



adding clickable links, repeat the process

In [11]:

repo_links, stars, labels = [], [], []
for repo_dict in repo_dicts:
    repo_name = repo_dict["name"]
    repo_url = repo_dict["html_url"]
    repo_link = f"<a href='{repo_url}'>{repo_name}</a>"
    repo_links.append(repo_link)
    
    stars.append(repo_dict["stargazers_count"])
        
    # add info to the labels
    owner = repo_dict["owner"]["login"]
    description = repo_dict["description"]
    label = f"{owner}<br />{description}"
    labels.append(label)
    

In [12]:
# Make visualization
data = [{
    "type":"bar",
    "x": repo_links,
    "y": stars,
    "hovertext": labels,   # added text to hover the bars
    "marker":{
        "color":"rgb(60,100,150)",
        "line": {"width":1.5, "color": "rgb(25,25,25)"}
    },
    "opacity":0.6
}]

my_layout = {
    "title": "Most-Starred Python Projects on GigHub",
    "xaxis": {"title": "Repository",
              "titlefont":{"size":24},
              "tickfont":{"size":14}
             },
    "yaxis": {"title": "Stars",
              "titlefont":{"size":24},
              "tickfont":{"size":14}
             },
}

fig = {"data": data, "layout":my_layout}

offline.plot(fig, filename="python_repos.html")

'python_repos.html'