# Project: Milestone 4

## Connecting to an API/Pulling in the Data and Cleaning/Formatting

***Instructions)***

Perform at least 5 data transformation and/or cleansing steps to your API data. The below examples are not required - they are just potential transformations you could do. If your data doesn't work for these scenarios, complete different transformations. You can do the same transformation multiple times if you needed to clean your data. The goal is a clean dataset at the end of the milestone.

- Replace Headers
- Format data into a more readable format
- Identify outliers and bad data
- Find duplicates
- Fix casing or inconsistent values
- Conduct Fuzzy Matching

Make sure you clearly label each transformation (Step #1, Step #2, etc.) in your code and describe what it is doing in 1-2 sentences. You can submit a Jupyter Notebook or a PDF of your code. If you submit a .py file you need to also include a PDF or attachment of your results.


***Answer)***

**#0. Intro Work**

First I must write my API Key and Secret to a JSON file

In [1]:
import json
import pprint
import getpass

# defining the data (including the API Key and the Secret Key)
data = {
    "test_api_key": "---",
    "test_secret": "---"  # Adding the test_secret field
}

# specifying the JSON file name
filename = 'hc_test_api_key.json'

# writing the data to a JSON file
with open(filename, 'w') as file:
    json.dump(data, file, indent=4)




In [2]:
# This JSON file has my APIkey
with open('hc_test_api_key.json') as f:
    keys = json.load(f)
    test_key = keys['test_api_key']
    test_secret = keys['test_secret']

In [3]:
# api test url
test_url = 'https://api.housecanary.com/v2/property/test_addresses?'

These test addresses are used to make test calls to the API

In [6]:
import requests

# response chunk
response = requests.get(test_url, auth=(test_key, test_secret))

test_addresses = response.json()

pprint.pprint(test_addresses)

[{'address': '517 N Chugach St', 'zipcode': '99645'},
 {'address': '12731 Schooner Dr', 'zipcode': '99515'},
 {'address': '87 Misty Forest Dr', 'zipcode': '36869'},
 {'address': '5212 Short Leaf Ln', 'zipcode': '35071'},
 {'address': '209 N Highway 45', 'zipcode': '72916'},
 {'address': '1207 Cardinal Rd', 'zipcode': '72401'},
 {'address': '10004 E Vogel Ave', 'zipcode': '85258'},
 {'address': '321 E Marlette Ave', 'zipcode': '85012'},
 {'address': '4010 Martis St', 'zipcode': '95691'},
 {'address': '2439 Russell St', 'zipcode': '94705'},
 {'address': '5647 Calgary St', 'zipcode': '80547'},
 {'address': '6823 Ridgeway Cir', 'zipcode': '80134'},
 {'address': '4 Northbridge', 'zipcode': '06416'},
 {'address': '66 Fairfield St', 'zipcode': '06515'},
 {'address': '3367 18th St NW', 'zipcode': '20010'},
 {'address': '4806 7th St NE', 'zipcode': '20017'},
 {'address': '18 Old Fence Ln', 'zipcode': '19702'},
 {'address': '2846 Leipsic Rd', 'zipcode': '19901'},
 {'address': '8691 Flowersong Cv

In [7]:
# Now that we've retrieved some test addresses, we can use one of them
# to test out the HouseCanary API endpoints.
# Let's try one of the Analytics API endpoints.

# we'll just take the first address from the list.
sample_address = test_addresses[0]

url = 'https://api.housecanary.com/v2/property/details'

params = {'address': sample_address['address'],
          'zipcode': sample_address['zipcode']}

response = requests.get(url, params=params, auth=(test_key, test_secret))
response_output = response.json()

# We can see what the url looks like
print(response.url)
pprint.pprint(response_output)

https://api.housecanary.com/v2/property/details?address=517+N+Chugach+St&zipcode=99645
[{'address_info': {'address': '517 N Chugach St',
                   'address_full': '517 N Chugach St Palmer AK 99645',
                   'block_id': '021700012011002',
                   'blockgroup_id': '021700012011',
                   'city': 'Palmer',
                   'county_fips': '02170',
                   'geo_precision': 'rooftop',
                   'lat': 61.6128922,
                   'lng': -149.1104737,
                   'metrodiv': None,
                   'msa': '11260',
                   'slug': '517-N-Chugach-St-Palmer-AK-99645',
                   'state': 'AK',
                   'status': {'changes': ['Locality (city, municipality) added '
                                          'or changed',
                                          'State added or changed'],
                              'details': ['Address fully verified',
                                          

In [8]:
# Let's try the Value Report API.

url = 'https://api.housecanary.com/v2/property/value_report'

params = {'address': sample_address['address'],
          'zipcode': sample_address['zipcode'],
          'format': 'json'}

response = requests.get(url, params=params, auth=(test_key, test_secret))

print(response.url)
pprint.pprint(response.json())

# use response.json() to get the Value Report json. The content is too long to display here.

https://api.housecanary.com/v2/property/value_report?address=517+N+Chugach+St&zipcode=99645&format=json
{'active_listings': [{'active_days_on_market': None,
                      'address_id': 2865554,
                      'address_slug': '9830-E-Tern-Dr-Palmer-AK-99645',
                      'age': 40,
                      'apn': '1631B03L048',
                      'basement': False,
                      'bathrooms': 2.0,
                      'bedrooms': 3,
                      'bg_id': '021700011003',
                      'cerberus_id': '502446d5c588cc3b',
                      'city': 'Palmer',
                      'comp': 4,
                      'construction_type': 'Wood Siding',
                      'county_fips': '02170',
                      'cumulative_days_on_market': None,
                      'current_value': 415742,
                      'days_on_market': 23.0,
                      'distance_miles': 3.0620296887918155,
                      'flips': None,
   

In [9]:
# Finally, let's try the Rental Report API.

url = 'https://api.housecanary.com/v2/property/rental_report'

params = {'address': sample_address['address'],
          'zipcode': sample_address['zipcode'],
          'format': 'json'}

response = requests.get(url, params=params, auth=(test_key, test_secret))

print(response.url)
pprint.pprint(response.json())

# use response.json() to get the Rental Report json. The content is too long to display here.

https://api.housecanary.com/v2/property/rental_report?address=517+N+Chugach+St&zipcode=99645&format=json
{'active_listings': [{'active_days_on_market': None,
                      'address_id': 311013903,
                      'address_slug': '1735-N-Fanciful-Pl-Unit-A-Wasilla-AK-99654',
                      'age': None,
                      'apn': '',
                      'basement': None,
                      'bathrooms': 1.5,
                      'bedrooms': 2,
                      'bg_id': '021700008002',
                      'cerberus_id': '3c09f0dc72a6e640',
                      'city': 'Wasilla',
                      'comp': 9,
                      'construction_type': 'Unknown',
                      'county_fips': '02170',
                      'cumulative_days_on_market': None,
                      'current_value': 247491,
                      'days_on_market': 31,
                      'distance_miles': 11.754431225625051,
                      'flips': None,
   

In [11]:
# Now lets try our actual intended call MSA Batch Gross Rental Yield

url = 'https://api.housecanary.com/v2/msa/hcri'

params = {'msa': '38060',
          'msa': '29820',
          'format': 'json'}

response = requests.get(url, params=params, auth=(test_key, test_secret))

print(response.url)
pprint.pprint(response.json())


https://api.housecanary.com/v2/msa/hcri?msa=29820&format=json
{'message': 'Parameter not allowed in query string: format'}


In [13]:
# Now lets try our actual intended call MSA Batch Gross Rental Yield
url = 'https://api.housecanary.com/v2/msa/hcri'

# Assuming the API allows multiple 'msa' parameters
params = [('msa', '35620'), ('msa', '31080')]

response = requests.get(url, params=params, auth=(test_key, test_secret))

print(response.url)
pprint.pprint(response.json())

https://api.housecanary.com/v2/msa/hcri?msa=35620&msa=31080
{'code': 403,
 'message': 'This is an invalid MSA.  Test API keys will only work with MSA '
            'IDs available from this endpoint: '
            'https://api.housecanary.com/v2/msa/test_msas'}


**#1. Remove 'MSA' from the primary Name Column**

**#2. Remove intigers from the primary Name Column**

**#3. Clean up the Column Names**

**#4. Perform fuzzy matching to match Metropolitan Area to the official Census MSA Name**