#### **Project Objective**
###### This project is to demonstrate the extraction and transformation of weather API data in order to provide insights into weather conditions across different country and continent

#### **Project Summary**
###### The project gave insights on countries experiencing high and low temperature, pressure and humidity with respect to the type of weather condition like, clear, cloudy, rainy, etc.

.
#### **Tools/Skills demonstrated**
###### Python, CSV, JSON, API, Web scraping, ETL, Power BI and Excel

.
#### **Data:**
Data extracted on: Thu 01th February, 2024 at 09:14:18AM UTC
###### **Data source 1:** *https://simple.wikipedia.org/wiki/List_of_countries_by_continents*
###### Web scraping was performed to profile countries with their respective capital cities and continents

###### **Data source 2:** *https://openweathermap.org/*
###### Using API data, current weather information was extracted for capital cities of different country. You may visit https://openweathermap.org/current#fields_json for explanation about the fields.

.
#### **Project Limitations:**
###### **Historical data:** *Historical weather data was not made available for the free version of the API that was used in this project. Hence, the restriction to the current weather information which was extracted on Mon 29th January, 2024 at 11:12:14PM UTC.*
###### **Bulk data download:** *Bulk weather data download was not made available for the free version of the API that was used in this project. Hence, the made different API call for each location.*
.

#### **Project Requirements:**
###### * Which continent has the highest temperature, pressure and	humidity?
###### * Which continent has the lowest temperature, pressure and	humidity?
###### * Which country has the  highest temperature, pressure and	humidity?
###### * Which country has the  lowest temperature, pressure and	humidity?
###### * Which country has the highest wind speed, and what is the speed?
###### * Are there any country experiencing extreme weather conditions (e.g., very high or very low temperatures)?
###### * What is the most common type of weather condition (e.g., clear, cloudy, rainy) for each continent?
.


#### **Suggested improvement**
###### * The outcome of the analysis may be compared with historical weather information in order to understand how the world has fared in the face of global warming conversation.
###### * Explore correlations between weather data and other environmental indicators.

### Libraries & Settings

In [2]:
from datetime import datetime, timezone
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd
import csv
from datetime import datetime, timedelta
pd.set_option('display.max_column',30)

### Methods

In [3]:
def apiUrl(baseUrl,location,appid):
    api = baseUrl + "q="+location +"&appid="+appid + "&units=metric"
    return api

def getWeatherData(api):
    response = requests.get(api)
    if response.status_code == 200:
        return response.json()
    else:
        return ("Error: Unable to fetch data from the API")


### Variable

In [4]:
with open('important/credentials.txt', 'r') as file:
    appid = file.readline().strip()

apiBaseUrl = "https://api.openweathermap.org/data/2.5/"

###### .

### Web scraping
#### To extract countries with their respective capital cities and continents

In [30]:
# In case the url below has been modified by the time you're viewing this project, the html file has been downloaded in this project for a work around.
url = "https://simple.wikipedia.org/wiki/List_of_countries_by_continents"

# Send a GET request to the URL
html = requests.get(url)

# Check if the request was successful (status code 200)
if html.status_code != 200:
    print("Something went wrong with the url")
else:
    soup = BeautifulSoup(html.content, "html.parser")
    bodyContent = soup.find(id='bodyContent')
    results = bodyContent.find(id='mw-content-text', class_='mw-body-content')
    results = results.find(class_='mw-content-ltr mw-parser-output')
    continent_names = results.find_all('span', class_='mw-headline')

    continent_index = 0
    countries_data = []
    
    
    all_continent_table = results.find_all('table', class_='wikitable')
    for this_continent_table in all_continent_table:
        this_continent = continent_names[continent_index].text.strip() # To extract the needed text
        this_continent = this_continent.replace(",", "") # To remove all comma to avoid error while writing into CSV
        
        this_country_row = this_continent_table.find('tbody').find_all('tr')
        if continent_index >1:
            this_country_row = this_country_row[1:]

        for this_country in this_country_row:                       
            cell = this_country.find_all('td')
            
            if cell:
                country_flag = cell[1].find('img')
                country_flag = country_flag['src'] # To extract the image source link
                country_flag = country_flag.replace('//upload.wikimedia.org/wikipedia/commons/thumb/','') # To remove redundant content
                country_flag = country_flag.replace(",", "") # To remove all comma to avoid error while writing into CSV
                
                # The cells of the tables are arranged differently for each continent, that is the reeason for this if...else... statement                
                if continent_index < 4:
                    country_name = cell[2].find('a', class_='extiw').text.strip() # To extract the needed text
                    country_name = country_name.replace(",", "") # To remove all comma to avoid error while writing into CSV
                    
                    country_capital = cell[3].find('a').text.strip() # To extract the needed text
                    country_capital = country_capital.replace(",", "") # To remove all comma to avoid error while writing into CSV
                else:
                    country_name = cell[1].find_all('a')
                    country_name = country_name[1].text.strip() # To extract the needed text
                    country_name = country_name.replace(",", "") # To remove all comma to avoid error while writing into CSV
                    
                    country_capital = cell[2].find('a').text.strip() # To extract the needed text
                    country_capital = country_capital.replace(",", "") # To remove all comma to avoid error while writing into CSV
                    
                countries_data.append([country_flag, country_name, country_capital, this_continent])                                
                
        continent_index += 1
        
    # Write the data to a CSV file
    with open("countries.csv", "w", newline="", encoding="utf-8") as csvfile:
        writer = csv.writer(csvfile)
        writer.writerow(["country_flag", "country", "capital", "continent"])
        writer.writerows(countries_data)
    
    print("Data extracted and saved to countries.csv")

            

Data extracted and saved to countries.csv


###### .

### Current weather information
##### To get the current weather information for the capital cities of the selected countries

In [23]:
weatherBaseUrl = apiBaseUrl + "weather?"
countries = pd.read_csv("countries.csv")

all_weather=[]
bad_api=[]
for index, row in countries.iterrows():
    # Print out the progress
    print(f"Progress: {index + 1} of {len(countries)} locations", end='\r', flush=True)
    
    country = row['Country']
    capital = row['Capital']
    url = apiUrl(weatherBaseUrl,capital,appid)
    weatherResponse = getWeatherData(url)  
    
    # If there is error in the api, the response will be a string else, a json
    if type(weatherResponse).__name__ == "str":
        bad_api.append(f"{country} ({capital}) | Error: {weatherResponse}")        
    else:
        row = f'''{{
                "country": "{country}",
                "capital": "{weatherResponse['name']}",
                "lon": "{weatherResponse['coord']['lon']}",
                "lat": "{weatherResponse['coord']['lat']}",
                "main": "{weatherResponse['weather'][0]['main']}",
                "description": "{weatherResponse['weather'][0]['description']}",
                "icon": "{weatherResponse['weather'][0]['icon']}",
                "base": "{weatherResponse['base']}",
                "temp": "{weatherResponse['main']['temp']}",                
                "feels_like": "{weatherResponse['main']['feels_like']}",                
                "temp_min": "{weatherResponse['main']['temp_min']}",                
                "temp_max": "{weatherResponse['main']['temp_max']}",                
                "pressure": "{weatherResponse['main']['pressure']}",                
                "humidity": "{weatherResponse['main']['humidity']}",
                "visibility": "{weatherResponse['visibility']}",
                "wind_speed": "{weatherResponse['wind']['speed']}",
                "wind_deg": "{weatherResponse['wind']['deg']}",
                "clouds": "{weatherResponse['clouds']['all']}",
                "dt": "{weatherResponse['dt']}",
                "sys_country_code": "{weatherResponse['sys']['country']}",
                "sys_sunrise": "{weatherResponse['sys']['sunrise']}",
                "sys_sunset": "{weatherResponse['sys']['sunset']}",
                "timezone": "{weatherResponse['timezone']}"
                }}'''
        row = json.loads(row)
        all_weather.append(row)

# Create dataframe from the JSON file
df = pd.DataFrame(all_weather)

# To sort in Ascending order
df = df.sort_values(by='country', ascending=True)

# Save the dataframe in a CSV file
df.to_csv('weatherData.csv', index=False)

# Display the bad API
if len(bad_api) > 0:
    print(f"You have {len(bad_api)} bad API call(s):")
    [print(this_bad_api) for this_bad_api in bad_api]
    print()

# format the current UTC time
now_utc = datetime.now(timezone.utc)
formatted_time = now_utc.strftime("%a %dth %B, %Y at %I:%M:%S%p UTC")
print("Data gotten on:", formatted_time)

df.head()



You have 1 bad API call(s):ons
Palau (Ngerulmud) | Error: Error: Unable to fetch data from the API

Data gotten on: Thu 01th February, 2024 at 09:14:18AM UTC


Unnamed: 0,country,capital,lon,lat,main,description,icon,base,temp,feels_like,temp_min,temp_max,pressure,humidity,visibility,wind_speed,wind_deg,clouds,dt,sys_country_code,sys_sunrise,sys_sunset,timezone
54,Afghanistan,Kabul,69.4167,34.5,Clear,clear sky,01d,stations,1.76,-1.03,1.76,1.76,1018,24,10000,2.6,260,0,1706778613,AF,1706754054,1706791834,16200
103,Albania,Tirana,19.8189,41.3275,Clear,clear sky,01d,stations,8.49,7.38,8.49,8.49,1028,40,10000,2.06,150,0,1706778330,AL,1706766769,1706802928,3600
0,Algeria,Algiers,3.042,36.7525,Mist,mist,50d,stations,12.9,12.39,12.9,12.9,1034,82,3000,1.03,0,75,1706778610,DZ,1706770229,1706807523,3600
104,Andorra,Andorra la Vella,1.4941,42.5045,Clear,clear sky,01d,stations,-0.65,-0.65,-2.39,7.59,1036,75,10000,0.51,0,0,1706778819,AD,1706771323,1706807171,3600
1,Angola,Luanda,13.2343,-8.8368,Rain,moderate rain,10d,stations,25.0,26.01,25.0,25.0,1015,94,8000,4.12,340,75,1706778777,AO,1706763953,1706808906,3600


In [7]:
print(len(df))

198
