## Workshop: Getting Data from an API
### DSEA: Data Science & Education Association
### Teachers College, Columbia University

**Author:** Nicolás Dussaillant (MS in Learning Analytics)

## Preparation

In [36]:
import requests
import pandas as pd

## College Scoreboard
Info:
* [Main Website](https://collegescorecard.ed.gov/): Information about this dataset from the US Department of Education.
* [API Documentation](https://collegescorecard.ed.gov/data/documentation/): Data Documentation: data dictionary, techninal documentation, and API documentation structure and Register form for an API key.
* [API Documentation HTTP specific](https://github.com/RTICWDT/open-data-maker/blob/master/API.md): Specifics about how to properly use this API.

Read API key stored in file "collegescoreboard_api_key.txt"

In [21]:
# Open the file
fd = open('collegescoreboard_api_key.txt', 'r')

# Read the key from the first line
cs_api_key = fd.read()

## Raw request

Doing the request creating a string with the url

In [41]:
base_url = 'https://api.data.gov/ed/collegescorecard/v1/schools.json'

test_url = 'https://api.data.gov/ed/collegescorecard/v1/schools.json?school.degrees_awarded.predominant=frog&_fields=id,school.name,wombat' + f'&api_key={cs_api_key}'

x = requests.get(test_url)
x.content

b'{"metadata":{"page":0,"total":507,"per_page":20},"results":[{"school.name":"Stockton Christian Life College","id":112093},{"school.name":"HSHS St. John\'s Hospital School of Clinical Laboratory Science","id":364122},{"school.name":"Troy University-Phenix City Campus","id":10236801},{"school.name":"Troy University-Montgomery Campus","id":10236802},{"school.name":"Troy University-Dothan Campus","id":10236803},{"school.name":"Troy University-Online","id":10236808},{"school.name":"Troy University-Support Sites","id":10236809},{"school.name":"Arkansas College of Barbering and Hair Design","id":10635101},{"school.name":"Harding School of Theology","id":10704401},{"school.name":"Career Academy of Hair Design-Siloam Springs","id":10722001},{"school.name":"Career Academy of Hair Design-Rogers","id":10722002},{"school.name":"Career Academy of Hair Design-Fayetteville","id":10722003},{"school.name":"Career Academy of Hair Design - Fort Smith","id":10722004},{"school.name":"University of Arkansa

In [35]:
# Using the JSON method to have a better visualization and storing it as a dict
obj = x.json()
obj

{'metadata': {'page': 0, 'total': 507, 'per_page': 20},
 'results': [{'school.name': 'Stockton Christian Life College', 'id': 112093},
  {'school.name': "HSHS St. John's Hospital School of Clinical Laboratory Science",
   'id': 364122},
  {'school.name': 'Troy University-Phenix City Campus', 'id': 10236801},
  {'school.name': 'Troy University-Montgomery Campus', 'id': 10236802},
  {'school.name': 'Troy University-Dothan Campus', 'id': 10236803},
  {'school.name': 'Troy University-Online', 'id': 10236808},
  {'school.name': 'Troy University-Support Sites', 'id': 10236809},
  {'school.name': 'Arkansas College of Barbering and Hair Design',
   'id': 10635101},
  {'school.name': 'Harding School of Theology', 'id': 10704401},
  {'school.name': 'Career Academy of Hair Design-Siloam Springs',
   'id': 10722001},
  {'school.name': 'Career Academy of Hair Design-Rogers', 'id': 10722002},
  {'school.name': 'Career Academy of Hair Design-Fayetteville',
   'id': 10722003},
  {'school.name': 'Caree

In [40]:
# Using it in pandas
df = pd.DataFrame(obj["results"])
df

Unnamed: 0,school.name,id
0,Stockton Christian Life College,112093
1,HSHS St. John's Hospital School of Clinical La...,364122
2,Troy University-Phenix City Campus,10236801
3,Troy University-Montgomery Campus,10236802
4,Troy University-Dothan Campus,10236803
5,Troy University-Online,10236808
6,Troy University-Support Sites,10236809
7,Arkansas College of Barbering and Hair Design,10635101
8,Harding School of Theology,10704401
9,Career Academy of Hair Design-Siloam Springs,10722001


In [42]:
# Save it as CSV
df.to_csv("college_scoreboard.csv")