### Data Preperation
### Term Project - MileStone 4
### Submitter - Himanshu Singh
### Connecting to an API/Pulling in the Data and Cleaning/Formatting

In [44]:
# We got the Latitude and Longitude level info in covid_dataset. It also has data from Jan 22 to March 24 2021.
# We got now Weekly Covid Cases and Weekly Covid Deaths information from https://ourworldindata.org/search?q=covid+data
# We merged the data in output_data2.csv and would now try to extract the temperature information against location (latitude/Longitude) and date.


# Loading the data frame from the file created in MileStone 2
import pandas as pd
from pandas.conftest import ordered

# Define the path to the .csv file
file_path = 'output_data2.csv'

# Load the data into a pandas DataFrame
try:
    # Reading the file
    df = pd.read_csv(file_path)
    print("Data loaded successfully!")
    # Reading first 10 rows
    #print(df.head(10)) # Print the first 10 rows of the DataFrame
except FileNotFoundError:
    print(f"Error: The file at {file_path} was not found.")
except Exception as e:
    print(f"An error occurred: {e}")






Data loaded successfully!


## Calling API for based on Date and Geographic location with data transformation tasks
### Step 1 : Add the new columns to the dataframe Temperature, Feels_like and Humidity (One time activity)
### Step 2 : Convert String date to UTC timestamp
### Step 3 : Convert float UTC timestamp to INT
### Step 4 : Get Kelvin Temperature and convert in Celsius
### Step 5 : Get the celsius round to 2 places after decimal
### Step 6 : Because we have more than 13K records and free limit is 1000 we have put a limit of 950 API calls per day and getting the data everyday
### Step 7 : Because the .csv data was already cleaned, didn't see any issue with API and didn't have to do lots of transformations

In [45]:
# The API will need three parameters
# Latitude
# longitude
# UTC Time

# The API return the temperature in Kelvin which should be converted to Celsius




import requests
from datetime import datetime, timezone
import numpy as np



def kelvin_to_celsius(kelvin):
  """
  Converts a temperature from Kelvin to Celsius.

  Args:
    kelvin: The temperature in Kelvin (float or int).

  Returns:
    The temperature in Celsius (float).
  """
  celsius = kelvin - 273.15
  return round(celsius,2)

# --- Configuration ---

API_KEY = "84fd858343ee131cad3fe6483cc8e8e4"  # Can be put in encrypted way like in environment variable
BASE_URL = "https://api.openweathermap.org/data/3.0/onecall/timemachine"

#df['temperature']=''
#df['feels_like']=''
#df['humidity']=''
i=0
for index, row in df.iterrows():
    lat = row['lat']
    lon = row['long']
    datestring = row['Date_String']
    dt_object = datetime.strptime(datestring, '%Y-%m-%d')


    # Converting to UTC time stamp
    res = dt_object.replace(tzinfo=timezone.utc).timestamp()
    res= int(res)

# --- Construct the API Request URL ---
# We use parameters (params) to pass the city co-ordinates, API key, and units (metric for Celsius)
    params = {
        'lat': lat,
        'lon': lat,
        'dt': res,
        'appid': API_KEY
    }


# --- Make the API Call ---
    try:
        # Since we can make only 1000 free calls, I am running the API everyday to get the data for 950 rows and storing in file. Same file is read next day and the values which are still blank are updated on next day.
        # For submission I got 10 calls to show the output in console.
        if (i < 1000) and np.isnan(df.loc[index,'temperature'])  :
            i=i+1
            response = requests.get(BASE_URL, params=params)
            #print(params)
            response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    # --- Process the JSON Response ---
            data = response.json()
            #print(data)
        # Check if the request was success
            if data.get("cod") != 404:
        # Extract relevant information
                main_data = data['data']
                weather_data = main_data[0]

                df.loc[index,'temperature'] = kelvin_to_celsius(weather_data['temp'])
                df.loc[index,'feels_like'] = kelvin_to_celsius(weather_data['feels_like'])
                df.loc[index,'humidity'] = weather_data['humidity']


             # --- Display Results ---

                #print(f"Temperature: {df.loc[index,'temperature']}°C")
                #print(f"Feels Like: {df.loc[index,'feels_like']}°C")
                #print(f"Humidity: {df.loc[index,'humidity'] }%")

            else:
                print(f"Error: Parameters not found.")
        #else :
            #print(f"Error: Will continue tomorrow")

    except requests.exceptions.HTTPError as err:
        print(f"HTTP Error: {err}")
    except requests.exceptions.RequestException as err:
        print(f"An error occurred: {err}")
    except KeyError:
        print("Error: Could not parse all expected data from the API response.")

df.drop_duplicates().to_csv('output_data2.csv', index=False)

* What changes were made to the data?

In order to utilize the API, the date was changed to UTC timestamp, Kelvin was converted to Celsius and stored in .csv file. Because we have more than 13K records and limit of 1000 API calls, a loop was added to make 950 calls a day. API response was added to the .csv file.

* Are there any legal or regulatory guidelines for your data or project topic?

Yes, there are critical legal and regulatory guidelines when using data from the OpenWeather API, primarily governed by their Terms of Service and Licensing Agreement.

The key restrictions are:

API Key and Rate Limits: We must use a valid API key and adhere to the call limits defined by the specific plan (free tier has strict limits).
Commercial Use: The Free plan is generally restricted to non-commercial use only. For commercial products (like apps with subscriptions or ads), you must purchase a paid subscription.
Attribution: Free and some paid plans require you to display an attribution (e.g., "Weather data provided by OpenWeather") in the visible part of the solution.
Redistribution: The license dictates whether you can redistribute the raw data or only "non-retrievable value-added services" built using the data. Reselling the original data is usually prohibited.
Failure to comply can result in the API key being revoked or legal action. You must check the current OpenWeather license for the specific plan.



* What risks could be created based on the transformations done?

The unit of the temperature and time should be consistent. The time unit should match on what is required by API.


* Did you make any assumptions in cleaning/transforming the data?
Because of limit on API, I had to restrict API call to 1000 per day which will be ran everyday to fetch all information required by data set.


* How was your data sourced/verified for credibility?

My response capability is based on the information I was trained on (a massive dataset of text and code) and, for current or specific inquiries, the information I can retrieve using my Google Search tool. I do not have a private, project-specific dataset of COVID-19 that I have sourced or verified.
However I am relying on https://ourworldindata.org which sources data from Authentic sources


* Was your data acquired in an ethical way?

OpenWeather's own data acquisition, is typically aggregate information from national weather services, satellite imagery, radar data, and commercial sensor networks. The ethical requirement for a user is to adhere strictly to their Terms of Service (e.g., attribution, commercial use restrictions) to respect their licensing agreements and data collection efforts.


* How would you mitigate any of the ethical implications you have identified?

To mitigate ethical implications when using the OpenWeather API:

Strict Compliance & Licensing: Purchase the appropriate commercial license if the project generates revenue (even via ads), and strictly adhere to all API call limits to respect their infrastructure investment.
Attribution: Clearly and prominently display the required attribution ("Weather data provided by OpenWeather") in the user interface where the data is presented.
Privacy: Since weather data is generally not personal, focus on internal data security. Never expose or log the API key in public repositories or client-side code.
Transparency: If the solution uses the data for critical decisions (e.g., automated farming), clearly communicate the data source and its inherent limitations (e.g., forecast accuracy) to end-users.