**Copyright: © NexStream Technical Education, LLC**.  
All rights reserved


# USGS Earthquake Scraper Introduction
In this project, you will create a 'web scraper' to access and retrieve real-time data from the US Geological Service (USGS) reflecting the latest active earthquakes around the world which are equal or above a user input magnitude.

The data is in JSON format so you'll need to convert the output into a user-readable (friendly) format.

The feed is from the USGS database here:  https://earthquake.usgs.gov/earthquakes/feed/.  You should become familiar with this site.

The format of the feed summary is here: https://earthquake.usgs.gov/earthquakes/feed/v1.0/geojson.php.  You should become familiar with the fields for the JSON data.  

Note you can use a JSON viewer for a more readable format of the data.  






# Part 1a:  Setup the environment and script and prompt the user for input.
Setup the script imports and prompt the user for the magnitude from which the USGS data will be accessed.  That is, any earthquake greater than or equal to the input magnitude will be retrieved from the database.  
You'll need to import the urllib.request library to get to the web site.
You also can input the json library to utilize the functions in that library.
Check out both API's for reference.


In [1]:
#Import the urllib.request and json libraries

import urllib.request
import json
import csv

from google.colab import drive
drive.mount('/content/drive')
%cd /content/drive/MyDrive/ML_S23-Mauro/USGS-Web-Scraper 

#Prompt the user to input a magnitude parameter of type floating point.  
#Limit the range that user can input to realistic magnitudes (check the magnitude entered and if it doesn't fall within a range, print out a message and prompt again.)
#Provide a prompt to the user to end the program or input another magnitude number (this code can be in a later cell).

global minMag
minMag = minMag = float(input("Enter a minimum earthquake magnitude to check: "))

# def getMinMag():
#   minMag = float(input("Enter a minimum earthquake magnitude to check: "))
#   print(minMag)
#   while(minMag < 2.5 or minMag > 10.0):
#       getMinMag()

# def getMaxMag():
#   maxMag = input("Enter a maximum earthquake magnitude to check (OPTIONAL): ")

#   if(maxMag == ''):
#     maxMag = 10.0

#   while((maxMag < '2.5' or maxMag > '10.0') and (minMag != maxMag)):
#       getMaxMag()

# getMinMag()
# getMaxMag()

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/MyDrive/ML_S23-Mauro/USGS-Web-Scraper
Enter a minimum earthquake magnitude to check: 4


# Part 1b:  Write the printResults function.  
In this function, you should print the output of the data you retrieved from the site:  http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.geojson      
See the code comments for guided instruction.


Note you can use a JSON viewer for a more readable format of the data if you want to view it before processing it with your function.



In [2]:
#Function printResults(data)
#In Python 3.x we need to explicitly decode the response to a string 
#i.e. data is output from data.decode("utf-8") 

def printResults(data):
  data.decode('utf-8')

  # 1.  Use the json "loads" api  to load the string data into a dictionary
  dictonary = json.loads(data)
  
  # 2.  Access the contents of the JSON data
  #     and print out the metadata title
  print(dictonary['metadata']['title'])
  
  #3.  Output the number of events
  print(dictonary['metadata']['count'])
  
  #4.  For each event, print the place where it occurred
  for i in len(dictonary['features']):
    print(dictonary[i]['properties']['place'])

  print("\n")
  #5 For each event, if the magnitude is greater than the user input
  #  print both the magnitude and the place it occurred. 
  #  HINT: use the "title" field that each feature has.
  for j in len(dictonary['features']):
    info = dictonary[j]['properties']
    if(info['mag'] >= minMag): # and info['mag'] <= maxMag
      print(info['title'])

      


# Part 1c:  Write the runner
In this code (either main or in a function), you should setup the URL from the USGS site, open the URL and read the data, call the printResults function.
http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.geojson  
See the code comments for guided instruction.  
 
Note you can use a JSON viewer for a more readable format of the data if you want to view it before processing it with your function.

In [6]:
# Define a variable to hold the source URL (see the notes for the URL)
# This feed lists all earthquakes for the last day larger than Mag 2.5 (this is your minimum input)
url = 'http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.geojson'

  
# Open the URL and read the data
# See urllib.request.urlopen API

# # Print the HTTP status code of the response (200 is a valid response)
# # See urllib.request.urlopen API


# # If the HTTP status code of the response is valid (hint: 200) 
# #    then read the data (hint: .read API) and convert to a string (hint: .decode("utf-8") API), 
# #    and print the results using your printResults function from step 1b
# # Make sure your code handles an error condition (i.e. non-valid status code) 
# #    and print out the error code in that case.
# ####Your code here....

try:
  data = urllib.request.urlopen(url, timeout=10)

  error_code = data.status
  print(data)

  if(error_code == 200):
    data = str(data.read().decode("utf-8"))
    
    printResults2(data)

except urllib.request.HTTPError as HTTPError:
        print(HTTPError.status, HTTPError.reason)
except urllib.request.URLError as URLError:
        print(URLError.reason)
except TimeoutError:
        print("Request timed out")

<http.client.HTTPResponse object at 0x7fe8c69d1d60>
Appending collumn to csv file
Appending collumn to csv file
Appending collumn to csv file
Appending collumn to csv file
Appending collumn to csv file
Appending collumn to csv file
Appending collumn to csv file
Appending collumn to csv file
Appending collumn to csv file
Appending collumn to csv file
Appending collumn to csv file
Appending collumn to csv file
Appending collumn to csv file
Appending collumn to csv file
Appending collumn to csv file
Appending collumn to csv file
Appending collumn to csv file
Appending collumn to csv file
Appending collumn to csv file
Appending collumn to csv file
Appending collumn to csv file


# Part 2:  Output data to spreadsheet
Convert output to CSV format.  

Rewrite the printResults function.  Call it printResults2(data) where a list or dictionary (your choice) is returned from the function to the runner then the data is converted to CSV format and saved to a file.

Change your runner to assign the returned data from your printResults2 function to a variable that you then convert to CSV format and save to a file.

Include at least the 4 retrieved from the database from Part 1.  
Include exception handling in your file IO processing.   

In [3]:
####Your code here....
#### print(data['features'][0]['properties'].keys())  -->  to get the keys
def printResults2(data):
  hit_counter = 0
  data = json.loads(data)

  with open('earthquakes.csv', 'w', newline='') as file:
    csv_writer = csv.writer(file)

    csv_writer.writerow(data['features'][0]['properties'].keys())

    for i in range(len(data["features"])):
      if(round(float(data['features'][i]['properties']['mag']), 1) >= minMag):
        print("Appending collumn to csv file")
        csv_writer.writerow(data['features'][i]['properties'].values())
        hit_counter += 1
    
    if(hit_counter == 0):
      for i in range(len(data["features"])):
        csv_writer.writerow(data['features'][i]['properties'].values())

  # return data

# Part 3:  Search on another field
Create a new printResults function called printResults3(data, searchField) where:  
'data' is the 'scraped' data from the usgs site as in the previous parts and  
'searchField' is a field defined at the geojson.php site below. 

The search field may be input from a selection provided to the user or may be fixed (programmer's choice).  Use a meaningful field that you can glean some information from (think about how a data scientist may want to analyze certain types of data from the set).  

Change your runner to search the database for the different field and print out the results based on that field.  For example you might want to search for all the earthquakes that occurred within a particular latitude and longitude bounding box.   

See https://earthquake.usgs.gov/earthquakes/feed/v1.0/geojson.php for the list of parameters that can be retrieved.


In [7]:
####Your code here....
def printResults3(data, searchField):
  data = json.loads(data)
  
  for i in range(len(data["features"])):
    if(data['features'][i]['properties']['sig']) > searchField:
      print(data['features'][i]['properties']['title'], "with a significance of", data['features'][i]['properties']['sig'])

significance = int(input("Enter a significance between 1-1000: "))
while(significance < 0 or significance > 1000):
    significance = int(input("ERROR, PLEASE TRY AGAIN - Enter a significance between 1-1000: "))

printResults3(data, significance)

Enter a significance between 1-1000: 15
M 5.3 - Molucca Sea with a significance of 432
M 4.6 - Molucca Sea with a significance of 326
M 5.8 - 171 km NW of Tobelo, Indonesia with a significance of 518
M 5.2 - Molucca Sea with a significance of 416
M 3.4 - Puerto Rico region with a significance of 174
M 3.0 - off the coast of Oregon with a significance of 138
M 4.5 - 90 km SW of Kotabumi, Indonesia with a significance of 312
M 3.9 - 64 km W of Ovalle, Chile with a significance of 234
M 4.2 - 67 km W of Abra Pampa, Argentina with a significance of 271
M 4.4 - Fiji region with a significance of 298
M 4.0 - 78 km SSW of Kaktovik, Alaska with a significance of 246
M 2.5 - 43 km SSW of Mākena, Hawaii with a significance of 97
M 5.5 - 162 km NW of Tobelo, Indonesia with a significance of 466
M 2.6 - 17 km ESE of Naalehu, Hawaii with a significance of 102
M 3.2 - 59 km NNE of Cruz Bay, U.S. Virgin Islands with a significance of 156
M 5.0 - Molucca Sea with a significance of 385
M 4.6 - 155 km N