 # Week 12: Final Project
 File: DSC540_Paulovici_Final_Porject.py (.ipynb)<br>
 Name: Kevin Paulovici<br>
 Date: 2/29/2020<br>
 Course: DSC 540 Data Preparation (2203-1)<br>
 Assignment: Final Project

 ## Project Outline
 During the course, you will be working on a term project to either pull data from an API or scrape a webpage. You will need to select either an API (different than Twitter) or a Webpage and create a process in Python that will extract data into a formatted dataset. <br><br>
 Part 1: Your formatted dataset with at least 15-20 variables (if the API or Webpage you selected doesn’t have that many fields available on it, you will want to search again, or do multiple!) <br> <br>
 Part 2: Your code or screenshots of your code outlining the steps and process you had to take to pull data from the API or web page and the steps you took to format the data.<br><br>
 Part 3: 2 Data Transformation/Clean-up Steps (can be any that we learned in class)<br><br>
 Part 4: A 250-word paper summarizing your steps and any challenges you ran into during the project. Discuss the importance and relevance of this type of process if you were a data scientist. How often do you think you would have to do this to get the data you need?

 ### Overview
 I'll be working with the https://openweathermap.org/api for this project. <br><br>
 I'll make the assumption that the user will use a zip code from the US to request weather data. Some checks will ensure a valid zip code is provided. <br><br>
 Select variables will be collected from the api, data will be cleaned and transformed into a more usable format (e.g., csv/json/excel)

 ### Part 1

In [1]:
# a) Start by requesting a zip code from the user.
user_val = ""
while isinstance(user_val, str): 
    if isinstance(user_val, int):
        break
    else:
        user_val = input("Welcome to the Weather App\nEnter a zip code to get started:")
    
    if len(str(user_val)) == 5:
        try:
            user_val = int(user_val)

        except:
            print("Value entered is not a valid zip code")
            pass
    else:
        print("Value entered is not a valid zip code")

print("The zip code to be used is: {}".format(user_val))


Welcome to the Weather App
Enter a zip code to get started:12302
The zip code to be used is: 12302


In [2]:
# b) Before we request data from the API, we'll import our key and set some parameters for the data 
# we'll use some data from our assumption to help create the url.
unit = 'imperial' # units for data
country = 'US' # country
zipFormat = 'http://api.openweathermap.org/data/2.5/weather?zip='

# to protect the API key we'll pull it from another file
from key import getKey
APIKEY = getKey()

print(len(APIKEY))


39


In [3]:
# c) Use the zip code to request data from the API

import requests
from pprint import pprint

# build the url 
# api.openweathermap.org/data/2.5/weather?zip={zip code},{country code}&appid={your api key}
url = '{}{},{}{}&units={}'.format(zipFormat, user_val, country, APIKEY, unit)

try:
    data = requests.get(url).json()
    pprint(data)
except:
    print("Failed to retrieve weather data")


{'base': 'stations',
 'clouds': {'all': 40},
 'cod': 200,
 'coord': {'lat': 42.88, 'lon': -73.99},
 'dt': 1583079713,
 'id': 0,
 'main': {'feels_like': 12.02,
          'humidity': 53,
          'pressure': 1020,
          'temp': 26.15,
          'temp_max': 30,
          'temp_min': 21},
 'name': 'Schenectady',
 'sys': {'country': 'US',
         'id': 5782,
         'sunrise': 1583062273,
         'sunset': 1583102731,
         'type': 1},
 'timezone': -18000,
 'visibility': 24140,
 'weather': [{'description': 'scattered clouds',
              'icon': '03d',
              'id': 802,
              'main': 'Clouds'}],
 'wind': {'deg': 280, 'gust': 18.34, 'speed': 14.99}}


In [4]:
# d) Next we'll need to parse the data into a format that is useable. 
# Additionally, we don't need all the values returned. 
# As we go, we can clean and transform the data (e.g., cleaner numbers and correct the 
# date/time associated with sunset/sunrise).

from datetime import datetime

# These list hold the dict key for data we retrieved from the API
coord = ["lat", "lon"]
main = ["feels_like", "humidity", "pressure", "temp", "temp_max", "temp_min"]
name = ["name"]
sys = ["country", "id", "sunrise", "sunset"]
weather = ["description"]
wind = ["speed"]
data_dicts = {"coord":coord, "main":main, "name":name, "sys":sys, "weather":weather, "wind":wind}

# We'll collect all data and set it to a dictionary, first with the API names
select_data = {} 

for d in data_dicts.keys():
    for item in data_dicts[d]:
        if d == "coord":
            val = float(data[d][item])
            select_data[item] = val
        elif d == "main":
            val = float(data[d][item])
            select_data[item] = val
        elif d == "name":
            val = data[d]
            select_data[item] = val
        elif d == "sys":
            val = data[d][item]

            # fix the time stamp
            if item == "sunrise" or item == "sunset":
                date_time = datetime.utcfromtimestamp(val).strftime('%Y-%m-%d %H:%M:%S').split()
                select_data["Date"] = date_time[0]
                val = date_time[1]

            select_data[item] = val
        elif d == "weather":
            val = data[d][0][item]
            select_data[item] = val
        elif d == "wind":
            val = data[d][item]
            select_data[item] = val
        
        else:
            pass
            
pprint(select_data)


{'Date': '2020-03-01',
 'country': 'US',
 'description': 'scattered clouds',
 'feels_like': 12.02,
 'humidity': 53.0,
 'id': 5782,
 'lat': 42.88,
 'lon': -73.99,
 'name': 'Schenectady',
 'pressure': 1020.0,
 'speed': 14.99,
 'sunrise': '11:31:13',
 'sunset': '22:45:31',
 'temp': 26.15,
 'temp_max': 30.0,
 'temp_min': 21.0}


In [5]:
# e) Lastly, we'll want to create better headers and save the data. 

# These are the headers we'll want at the end
cols = ["City", "Country", "Country_id", "Sunrise", "Sunset", 
        "Pressure_(hPa)", "Temperature_Feels_like_(F)", "Temperature_(F)", 
        "Min_Temperature_(F)", "Max_Temperature_(F)", "Humidity_(%)", 
        "Weather_Description", "Wind_Speed_(miles/hr)", "Latitude", "Longitude"]

# mapping of headers
col_dict = {"City":"name", 
"Country":"country", 
"Country_id":"id",  
"Sunrise":"sunrise", 
"Sunset":"sunset", 
"Pressure_(hPa)":"pressure",  
"Temperature_Feels_like_(F)":"feels_like",  
"Temperature_(F)":"temp", 
"Min_Temperature_(F)":"temp_min", 
"Max_Temperature_(F)":"temp_max", 
"Humidity_(%)":"humidity", 
"Weather_Description":"description",
"Wind_Speed_(miles/hr)":"speed", 
"Latitude": "lat",
"Longitude": "lon"}

# new dict with correct headers
clean_data = {}

for k, v in col_dict.items():
    clean_data[k] = select_data[v]

pprint(clean_data)

# save our the clean data.
import csv

with open("weather_data.csv", "w", newline="") as fOut:
    writer =csv.writer(fOut)
    for k, v in clean_data.items():
        writer.writerow([k, v])


{'City': 'Schenectady',
 'Country': 'US',
 'Country_id': 5782,
 'Humidity_(%)': 53.0,
 'Latitude': 42.88,
 'Longitude': -73.99,
 'Max_Temperature_(F)': 30.0,
 'Min_Temperature_(F)': 21.0,
 'Pressure_(hPa)': 1020.0,
 'Sunrise': '11:31:13',
 'Sunset': '22:45:31',
 'Temperature_(F)': 26.15,
 'Temperature_Feels_like_(F)': 12.02,
 'Weather_Description': 'scattered clouds',
 'Wind_Speed_(miles/hr)': 14.99}


 ### Part 2
 Part 1 steps a through e explain step by step the process to retrieve data from the weather API based on a zip code. The data cleaning and transformations are included in this step and will be further explain in Part 3 and 4.

 ### Part 3
 The data transformations were to 1) clean up the dates from Part 1 step d, and 2) transform the headers in Part 1 step e.

 ### Part 4
 See DSC540_Paulovici_Final_Project.docx for a summary of this project.