## Exercise 3 - Data sources
- All files used in this exercise can be found under the Exercises/data_files directory

1 Use gamedata.json for this task. This file contains information of games sold through Steam. Parse out the following information from the data (Important: Do not combine these filters, but do them separately!):
- TOP 3 highest metacritic score. Present results using the following format: *Title* has metacritic score of *Score* (for example)
- Games with price discount being 90 % or more. Present results using the following format: *Title* | Discount: *Savings* (for example Metal Gear Solid V: Ground Zeroes | Discount: 90.090090)
- Games having metacritic score higher than steam score. Present results using the following format: *Title* has metacritic score of *MetacriticScore* and steam score of *SteamRatingPercent*

In [12]:
# TOP 3 highest metacritic score

import json

with open('data_files/gamedata.json') as gamedata:
    games = json.load(gamedata)
#    print(type(games))

sorted_games = sorted(games, key=lambda x: int(x["metacriticScore"]), reverse=True)

for game in sorted_games[:3]:
    print(f"{game['title']} has metacritic score of {game['metacriticScore']}")


Star Wars: Knights of the Old Republic has metacritic score of 93
Metal Gear Solid V: The Phantom Pain has metacritic score of 91
Bayonetta has metacritic score of 90


In [10]:
# Games with price discount being 90 % or more

discount_game = [game for game in games if float(game["savings"]) >= 90]

for game in discount_game:
    print(f"{game['title']} | Discount: {game['savings']}")


Shadow Tactics: Blades of the Shogun | Discount: 90.022506
Airscape: The Fall of Gravity | Discount: 90.180361
Making History: The Calm and the Storm | Discount: 90.180361
Avencast: Rise of the Mage | Discount: 90.090090
Metal Gear Solid V: Ground Zeroes | Discount: 90.045023
The Way | Discount: 90.060040
Teslagrad | Discount: 90.090090
White Wings  | Discount: 90.045023
Phantaruk | Discount: 90.180361
Oozi Earth Adventure | Discount: 90.180361
Lucius | Discount: 90.090090
The Long Journey Home | Discount: 90.045023
NEON STRUCT | Discount: 90.050028
House of Caravan | Discount: 90.180361


In [15]:
# Games having metacritic score higher than steam score

def compare_scores(data):
    higher_metacritic = []
    for game in data:
        if int(game["metacriticScore"]) > int(game["steamRatingPercent"]):
            higher_metacritic.append(game)
    return higher_metacritic

for game in compare_scores(games):
    print(f"{game['title']} has metacritic score of {game['metacriticScore']} and steam score of {game['steamRatingPercent']}")

NBA 2K21 has metacritic score of 67 and steam score of 39
Commander 85 has metacritic score of 45 and steam score of 35
Inversion has metacritic score of 59 and steam score of 57
Bionic Commando: Rearmed has metacritic score of 86 and steam score of 71
Metal Gear Solid V: The Phantom Pain has metacritic score of 91 and steam score of 90
Port Royale 2 has metacritic score of 75 and steam score of 68
Project Cars 2 has metacritic score of 84 and steam score of 79
Full Spectrum Warrior has metacritic score of 80 and steam score of 65
The Long Journey Home has metacritic score of 68 and steam score of 60
Star Wars: Knights of the Old Republic has metacritic score of 93 and steam score of 90
Starpoint Gemini Warlords has metacritic score of 73 and steam score of 72
Tidalis has metacritic score of 75 and steam score of 70


2 Use earthquakes.csv for this task. This file contains information about earthquakes recorded between 1965 and 2016. Earthquake magnitude value describes how strong the earthquake is. Magnitude information can be categorized like presented in the table below (*Source: http://www.geo.mtu.edu/UPSeis/magnitude.html*).

| Magnitude       | Class | Effects |
|-----------------|-------|---------|
| 2.49 or less    | Minor | Usually not felt, but can be recorded by seismograph. |
| 2.50 to 5.49    | Light | Often felt, but only causes minor damage. |
| 5.50 to 6.09    | Moderate | Slight damage to buildings and other structures. |
| 6.10 to 6.99    | Strong | May cause a lot of damage in very populated areas. |
| 7.00 to 7.99    | Major | Major earthquake. Serious damage. |
| 8.00 or greater | Great | Great earthquake. Can totally destroy communities near the epicenter. |

Count how many earthquakes have occurred in each class.

<b style="color:red;">Notice:</b> The first value has been modified to be 2.4 or less compared to the original source (has been 2.5 or less).

In [None]:
import csv

magnitude_classes = {
    "Minor": (0, 2.49),
    "Light": (2.5, 5.49),
    "Moderate": (5.5, 6.09),
    "Strong": (6.1, 6.99),
    "Major": (7.0, 7.99),
    "Great": (8.0, float("inf")),
}

eq_counts = {category: 0 for category in magnitude_classes}

with open('data_files/earthquakes.csv', newline="") as eq:
    eq_csv = csv.DictReader(eq)
    for row in eq_csv:
        try:
            eq_magnitude = float(row["Magnitude"])
            for category, (low, high) in magnitude_classes.items():
                if low <= eq_magnitude <= high:
                    eq_counts[category] += 1
                    break
        except ValueError:
            continue

for category, count in eq_counts.items():
    print(f"Earthquake magnitude class {category}: {count}")


Earthquake magnitude class Minor: 0
Earthquake magnitude class Light: 0
Earthquake magnitude class Moderate: 17639
Earthquake magnitude class Strong: 5035
Earthquake magnitude class Major: 698
Earthquake magnitude class Great: 40


3 Use netflix_titles.xml for this task. This file contains information about Netflix movies and TV shows. **Important:** Movies have duration presented in minutes while TV shows have duration presented in amount of seasons! Parse out the following information from the data and **show only counts** for these (how many instances are returned):
- Movies released in 2017
- TV show and movie amount (present both counts in separate lines)
- Movies with a length between 15 and 20 minutes (values 15 and 20 included)

In [23]:
# Movies released in 2017

import xml.etree.ElementTree as e

tree = e.parse('data_files/netflix_titles.xml')
root = tree.getroot()

movies_2017 = 0

for row in root.findall("row"):
    content_type = row.find("type").text
    release_year = row.find("release_year").text

    if content_type == "Movie":
        if release_year == "2017":
            movies_2017 += 1

print(f"Movies released in 2017: {movies_2017}")


Movies released in 2017: 744


In [28]:
# TV show count

tv_shows = 0

for row in root.findall("row"):
    content_type = row.find("type").text

    if content_type == "TV Show":
        tv_shows += 1

print(f"Number of TV Shows: {tv_shows}")

# Movie count

movies = 0

for row in root.findall("row"):
    content_type = row.find("type").text

    if content_type == "Movie":
        movies += 1
        
print(f"Number of Movies: {movies}")

Number of TV Shows: 2410
Number of Movies: 5377


In [30]:
# Movies with a length between 15 and 20 minutes

short_movies = 0

for row in root.findall("row"):
    content_type = row.find("type").text
    duration = row.find("duration").text
    
    if content_type == "Movie":
        if "min" in duration:
            minutes = int(duration.split(" ")[0])
            if 15 <= minutes <= 20:
                short_movies += 1

print(f"Number of movies with a length between 15 and 20 minutes: {short_movies}")

Number of movies with a length between 15 and 20 minutes: 11


4 Use the following Rest API for this task: https://tie.digitraffic.fi/api/weather/v1/stations/data. Calculate the average for air temperature (ILMA) and humidity (ILMAN_KOSTEUS) values using two decimals.

In [None]:
import requests

url = "https://tie.digitraffic.fi/api/weather/v1/stations/data"

response = requests.get(url)
if response.status_code == 200:
    weather_data = response.json()

weather_data

{'dataUpdatedTime': '2025-03-31T17:20:02Z',
 'stations': [{'id': 1001,
   'dataUpdatedTime': '2025-03-31T17:16:02Z',
   'sensorValues': [{'id': 1,
     'stationId': 1001,
     'name': 'ILMA',
     'shortName': 'Ilma ',
     'measuredTime': '2025-03-31T17:15:45Z',
     'value': 8.7,
     'unit': '°C'},
    {'id': 2,
     'stationId': 1001,
     'name': 'ILMA_DERIVAATTA',
     'shortName': 'DIlm',
     'measuredTime': '2025-03-31T17:15:45Z',
     'value': -1.2,
     'unit': '°C/h'},
    {'id': 3,
     'stationId': 1001,
     'name': 'TIE_1',
     'shortName': 'Tie1',
     'measuredTime': '2025-03-31T17:15:45Z',
     'value': 10.1,
     'unit': '°C'},
    {'id': 4,
     'stationId': 1001,
     'name': 'TIE_1_DERIVAATTA',
     'shortName': 'DTie1',
     'measuredTime': '2025-03-31T17:15:45Z',
     'value': -1.5,
     'unit': '°C/h'},
    {'id': 5,
     'stationId': 1001,
     'name': 'TIE_2',
     'shortName': 'Tie2',
     'measuredTime': '2025-03-31T17:15:45Z',
     'value': 9.8,
     'un

In [46]:
if weather_data:
    ilma_count = 0
    total_ilma = 0
    total_kosteus = 0
    kosteus_count = 0
    for station in weather_data.get('stations', []):
        for sensor in station.get('sensorValues', []):
            if sensor['name'] == 'ILMA':
                total_ilma += float(sensor['value'])
                ilma_count += 1
            elif sensor['name'] == 'ILMAN_KOSTEUS':
                total_kosteus += float(sensor['value'])
                kosteus_count += 1

avg_ilma = round(total_ilma / ilma_count, 2)
avg_kosteus = round(total_kosteus / kosteus_count, 2)

#print(ilma_count)
#print(total_ilma)
#print(total_kosteus)
#print(kosteus_count)
#print(avg_ilma)
#print(avg_kosteus)

if avg_ilma is not None:
    print(f"Average Air Temperature (ILMA): {avg_ilma}°C")
else:
    print("No temperature data available.")

if avg_kosteus is not None:
    print(f"Average Humidity (ILMAN_KOSTEUS): {avg_kosteus}%")
else:
    print("No humidity data available.")

Average Air Temperature (ILMA): 5.25°C
Average Humidity (ILMAN_KOSTEUS): 85.28%
