This is the code that is looking at looking at our datasets, and allowing us to analyze the datasets that we are working with. In this project, we are primarly working with three types of datasets: housing prices, food costs, and overall living expenses. For the housing prices dataset, we are looking at three specific cities: Moscow, Melbourne, and Utrech. The other two types of datasets are singular datasets. For analyzing the datasets, I will be using pandas and numpy to analyze the prices for the various datasets, and I will be creating python dictionaries, the equivalent of c++ maps to store this data in a data structure.

Now, we will be looking at the analysis of the Moscow Housing Data Set, creating a python dictionary (c++ map equivalent) from this dataset.

In [7]:
# import pandas
import pandas as pd
# Read CSV
moscow_df = pd.read_csv('moscow_data.csv')
moscow_data = moscow_df[['Region', 'Price', 'Number of rooms']]
# Convert the Data from the CSV to a Dictionary
moscow_dict = {}
for index, row in moscow_data.iterrows():
    # General Entry into the Dictionary
    region = row['Region']
    price = row['Price']
    num_rooms = row['Number of rooms']

    # Seeing if a region (key) is already in the dictionary or not, appending if is, making new key if not)
    if region not in moscow_dict:
        moscow_dict[region] = []
    moscow_dict[region].append((price, num_rooms))

# Sorting the dictionary by price and printing the top 5 most and least expensive apartments
for region in moscow_dict:
    moscow_sorted = sorted(moscow_dict[region], key=lambda x: x[0], reverse=True)
    most_expensive = moscow_sorted[:5]
    least_expensive = moscow_sorted[-5:]

    # Printing sorted list
    print(f"Region: {region} (rubles)")
    print("Most Expensive Apartments:")
    for apt in most_expensive:
        print(f"Price: {apt[0]}, Number of Rooms: {apt[1]+1.0}")
    print("Least Expensive Apartments:")
    for apart in least_expensive:
        print(f"Price: {apart[0]}, Number of Rooms: {apart[1]+1.0}")
    print("\n")

Region: Moscow region (rubles)
Most Expensive Apartments:
Price: 53000000.0, Number of Rooms: 3.0
Price: 51500000.0, Number of Rooms: 3.0
Price: 49500000.0, Number of Rooms: 3.0
Price: 24398774.0, Number of Rooms: 4.0
Price: 24340000.0, Number of Rooms: 5.0
Least Expensive Apartments:
Price: 1941650.0, Number of Rooms: 1.0
Price: 1939125.0, Number of Rooms: 1.0
Price: 1939125.0, Number of Rooms: 1.0
Price: 1939125.0, Number of Rooms: 1.0
Price: 1939125.0, Number of Rooms: 1.0


Region: Moscow (rubles)
Most Expensive Apartments:
Price: 2455020000.0, Number of Rooms: 7.0
Price: 1732170825.0, Number of Rooms: 1.0
Price: 1517997000.0, Number of Rooms: 7.0
Price: 1475430000.0, Number of Rooms: 6.0
Price: 1422120000.0, Number of Rooms: 6.0
Least Expensive Apartments:
Price: 2390000.0, Number of Rooms: 1.0
Price: 1750000.0, Number of Rooms: 1.0
Price: 1560000.0, Number of Rooms: 1.0
Price: 1420000.0, Number of Rooms: 1.0
Price: 1150000.0, Number of Rooms: 1.0




Next, we will be looking at the analysis of the Melbourne Housing Data Set, creating a python dictionary (c++ map equivalent) from this dataset.

In [8]:
# import pandas
import pandas as pd
# Read CSV
melb_df = pd.read_csv('melb_data.csv')
melb_data = melb_df[['Suburb', 'Price', 'Rooms']]
# Convert the Data from the CSV to a Dictionary
melb_dict = {}
for index, row in melb_data.iterrows():
  # General Entry into the Dictionary
  suburb = row['Suburb']
  price = row['Price']
  num_rooms = row['Rooms']

  # Seeing if a suburb (key) is already in the dictionary or not, appending if is, making new key if not)
  if suburb not in melb_dict:
    melb_dict[suburb] = []
  melb_dict[suburb].append((price, num_rooms))

# Sorting the dictionary by price and printing the top 5 most and least expensive apartments
for region in melb_dict:
  melb_sorted = sorted(melb_dict[region], key=lambda x: x[0], reverse=True)
  most_expensive = melb_sorted[:5]
  least_expensive = melb_sorted[-5:]

  # Printing sorted list
  print(f"Region: {region} (AUDs)")
  print("Most Expensive Apartments:")
  for apt in most_expensive:
    print(f"Price: {apt[0]}, Number of Rooms: {apt[1]+1.0}")
  print("Least Expensive Apartments:")
  for apart in least_expensive:
    print(f"Price: {apart[0]}, Number of Rooms: {apart[1]+1.0}")
  print("\n")

Region: Abbotsford (AUDs)
Most Expensive Apartments:
Price: 1876000.0, Number of Rooms: 4.0
Price: 1636000.0, Number of Rooms: 3.0
Price: 1635000.0, Number of Rooms: 4.0
Price: 1600000.0, Number of Rooms: 5.0
Price: 1542000.0, Number of Rooms: 5.0
Least Expensive Apartments:
Price: 470000.0, Number of Rooms: 2.0
Price: 457000.0, Number of Rooms: 2.0
Price: 441000.0, Number of Rooms: 2.0
Price: 426000.0, Number of Rooms: 2.0
Price: 300000.0, Number of Rooms: 2.0


Region: Airport West (AUDs)
Most Expensive Apartments:
Price: 1250000.0, Number of Rooms: 4.0
Price: 1064000.0, Number of Rooms: 4.0
Price: 1042000.0, Number of Rooms: 4.0
Price: 1026000.0, Number of Rooms: 5.0
Price: 1000001.0, Number of Rooms: 4.0
Least Expensive Apartments:
Price: 480000.0, Number of Rooms: 3.0
Price: 462500.0, Number of Rooms: 3.0
Price: 454000.0, Number of Rooms: 3.0
Price: 450000.0, Number of Rooms: 3.0
Price: 440000.0, Number of Rooms: 3.0


Region: Albert Park (AUDs)
Most Expensive Apartments:
Price: 4

Next, we will be looking at the analysis of the Utrecht Housing Data Set, creating a python dictionary (c++ map equivalent) from this dataset.

In [9]:
# import pandas
import pandas as pd
# Read CSV
utr_df = pd.read_csv('utrechthousinghuge.csv')
utr_data = utr_df[['retailvalue', 'bathrooms']]
# Convert the Data from the CSV to a Dictionary
utr_dict = {}
for index, row in utr_data.iterrows():
  # General Entry into the Dictionary
  suburb = "Utrecht" # Setting every region to Utrecht because no regional specifics were provided
  price = row['retailvalue']
  num_rooms = row['bathrooms']

  # Seeing if a suburb (key) is already in the dictionary or not, appending if is, making new key if not)
  if suburb not in utr_dict:
    utr_dict[suburb] = []
  utr_dict[suburb].append((price, num_rooms))

# Sorting the dictionary by price and printing the top 5 most and least expensive apartments
for region in utr_dict:
  utr_sorted = sorted(utr_dict[region], key=lambda x: x[0], reverse=True)
  most_expensive = utr_sorted[:5]
  least_expensive = utr_sorted[-5:]

  # Printing sorted list
  print(f"Region: {region} (euros)")
  print("Most Expensive Apartments:")
  for apt in most_expensive:
    print(f"Price: {apt[0]}, Number of Rooms: {apt[1]*2.0 + 1.0}")
  print("Least Expensive Apartments:")
  for apart in least_expensive:
    print(f"Price: {apart[0]}, Number of Rooms: {apart[1]+1.0}")
  print("\n")

Region: Utrecht (euros)
Most Expensive Apartments:
Price: 1428000, Number of Rooms: 3.0
Price: 1389000, Number of Rooms: 3.0
Price: 1383000, Number of Rooms: 3.0
Price: 1374000, Number of Rooms: 5.0
Price: 1372000, Number of Rooms: 5.0
Least Expensive Apartments:
Price: 430000, Number of Rooms: 2.0
Price: 426000, Number of Rooms: 2.0
Price: 424000, Number of Rooms: 2.0
Price: 422000, Number of Rooms: 2.0
Price: 419000, Number of Rooms: 2.0




Now we will look at the food prices throughout the world with the world food prices database.

In [10]:
# import pandas
import pandas as pd
# Read CSV
food_df = pd.read_csv('wfp_food_prices_database.csv')
food_data = food_df[['adm0_name', 'mp_price']]
# Convert the Data from the CSV to a Dictionary
# Calculate the average price of food in a place
average_prices = food_data.groupby('adm0_name')['mp_price'].mean()
food_dict = {}
food_dict = average_prices.to_dict()
# Sort the dictionary by average prices and find the top 5 and bottom 5
sorted_prices = sorted(food_dict.items(), key=lambda x: x[1])
top_5 = sorted_prices[-5:]
bottom_5 = sorted_prices[:5]
# Printing the results
print("Top 5 countries with highest average food prices (not accounting for currency exchange):")
for country, price in reversed(top_5):
    print(f"{country}: {price:.2f}")
print("\nBottom 5 countries with lowest average food prices:")
for country, price in bottom_5:
    print(f"{country}: {price:.2f}")

  food_df = pd.read_csv('wfp_food_prices_database.csv')


Top 5 countries with highest average food prices (not accounting for currency exchange):
Somalia: 136534.46
United Republic of Tanzania: 102524.25
Colombia: 78511.15
Iran  (Islamic Republic of): 64883.91
Afghanistan: 35021.93

Bottom 5 countries with lowest average food prices:
Azerbaijan: 0.53
Italy: 0.67
Belarus: 0.96
Zimbabwe: 1.01
Timor-Leste: 1.49


Finally, we will look at overall living index with the cost of living index database, which will give an overall comparison among all the countries and overall cost index, consisting of all expenses

In [11]:
# import pandas
import pandas as pd
# Read CSV
global_df = pd.read_csv('Cost_of_Living_Index_2022.csv')
global_data = global_df[['Unnamed: 1', 'Unnamed: 2']]
# Convert the Data from the CSV to a Dictionary
global_dict = {}
for index, row in global_data.iterrows():
  # General Entry into the Dictionary
  country = row['Unnamed: 1']
  cli = row['Unnamed: 2']
  global_dict[country] = cli

# Sorting the dictionary by price and printing the top 5 most and least expensive apartments
for region in global_dict:
  global_sort = sorted(global_dict.items(), key=lambda x: x[1], reverse=True)
  most_expensive = global_sort[:5]
  least_expensive = global_sort[-5:]

# Printing sorted list
print("Most Expensive Countries:")
for apt in most_expensive:
  print(f"Cost of Living Index: {apt[0]}")
print("Least Expensive Countries:")
for apart in least_expensive:
  print(f"Cost of Living Index: {apart[0]}")
print("\n")

Most Expensive Countries:
Cost of Living Index: Country
Cost of Living Index: Iceland
Cost of Living Index: Barbados
Cost of Living Index: Jersey
Cost of Living Index: Israel
Least Expensive Countries:
Cost of Living Index: Afghanistan
Cost of Living Index: Pakistan
Cost of Living Index: Bermuda
Cost of Living Index: Switzerland
Cost of Living Index: Norway


