Cost of Living in multiple cities across various countries is presented in this notebook.
Lets look at which cities are better w.r.t cost of living and debt to income ratio with better living conditions.
In this analysis, I am focussing on below attributes because they are the most important factors which determine cost of living and a better economic balance.


1. Mortgage Interest Rate in Percentages (%)
2. Price per Square Meter to Buy Apartment in City Centre
3. Average Monthly Net Salary
4. International Primary School Fee
5. Basic (Electricity, Heating, Cooling, Water, Garbage) cost



In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        inp=os.path.join(dirname, filename)

# Any results you write to the current directory are saved as output.

In [None]:
inpfile=pd.read_csv(inp)

#Subset the data with rows having attributes of interest
inp_subset=inpfile[inpfile[inpfile.columns.values[0]].isin([
    'Mortgage Interest Rate in Percentages (%), Yearly, for 20 Years Fixed-Rate',
    'Price per Square Meter to Buy Apartment in City Centre',
    'Average Monthly Net Salary (After Tax)','Gasoline (1 liter)', 
    'Internet (60 Mbps or More, Unlimited Data, Cable/ADSL)',
    'International Primary School, Yearly for 1 Child',
    'Basic (Electricity, Heating, Cooling, Water, Garbage) for 85m2 Apartment'])].reset_index().drop(columns=['index'])

inp_subset.columns.values[0]='Attribute'

#Convert Attribute rows into columns.
inp_subset=inp_subset.melt(id_vars='Attribute', var_name=['City'], value_name='Cost in EUR')

#Convert the entire dataframe using pivot, so we will have 'Country' column and a separate column for each attribute (cost of living factor)
inp_subset=inp_subset.pivot_table(index=['City'],columns='Attribute',values='Cost in EUR',aggfunc='first').reset_index()
inp_subset=inp_subset[[
    'City',
    'Average Monthly Net Salary (After Tax)',
    'Gasoline (1 liter)',
    'International Primary School, Yearly for 1 Child',
    'Internet (60 Mbps or More, Unlimited Data, Cable/ADSL)',
    'Mortgage Interest Rate in Percentages (%), Yearly, for 20 Years Fixed-Rate',
    'Price per Square Meter to Buy Apartment in City Centre',
    'Basic (Electricity, Heating, Cooling, Water, Garbage) for 85m2 Apartment']]

inp_subset.columns=['City', 'Monthly Salary', 'Gasoline','Yearly School Fee', 'Internet', 'Mortgage %', 'House Price per sqm','Basic Amenities']

#Lets create 'Country' based on 'City' field
inp_subset['Country']=inp_subset['City'].apply(lambda x: x.split(',')[-1].strip(' '))

#Remove Country from City field
inp_subset['City']=inp_subset['City'].apply(lambda x: x.split(',')[0].strip(' '))

inp_subset['School Fee (% of Salary)']=inp_subset['Yearly School Fee']*100/(inp_subset['Monthly Salary']*12)
inp_subset['Cost of 100 sqm house']=inp_subset['House Price per sqm']*100
inp_subset['Annual house payment']=((inp_subset['Cost of 100 sqm house']*inp_subset['Mortgage %']/100)+inp_subset['Cost of 100 sqm house'])/20
inp_subset['Housing Debt to income ratio']=inp_subset['Annual house payment']*100/(inp_subset['Monthly Salary']*12)
inp_subset['Basic Amenities']=inp_subset['Basic Amenities']*100/inp_subset['Monthly Salary']
inp_subset.head()

We calculated some of the cost of living attributes like 'School Fee' as a percentage of annual salary, cost of a house (considering a 2bhk with around 1100 sqft), annual house payment (major debt factor) including mortgage % for a fixed 20 year period and debt to income ratio. We need to keep in mind here that only housing payment is considered as a debt in our analysis as we dont have data for any other factors.

In [None]:
sns.pairplot(inp_subset[['Monthly Salary', 'Yearly School Fee', 'Mortgage %', 'House Price per sqm','Basic Amenities','Housing Debt to income ratio']])

There seems to be satisfactory positive correlation between Monthly Salary and Yearly School Fee , Monthly Salary and House Price per sqm.
This is somewhat expected because , a population with higher income can afford a higher priced house.
Correlation between Basic Amenities and Monthly salary is interesting. We have some cities/countries with high average salary and low basic amenities cost and low average salary with high amenities cost. The living conditions in the latter cities will be pretty bad.

Lets draw boxplot charts to look at the column wise data for IQR and outliers.
For all the attributes we considered:
1. 'Monthly Salary': 25% of the cities have salary below ~500 EUR and 50% have below ~1000 EUR. The cities that have high salaries are spread around from 2500 to 4000 EUR. There is certainly a lot of difference in the salaries.
2. 'Mortgage %': 75% of the cities have mortgage below 10% and 50% cities have below 5% which is great. High mortagage rates push us into higher debts eventually causing economic slowdown.
3. 'House Price per sqm': This is interesting because we have three kinds of prices here. On average 50% of the cities have house prices in 1000-2500 EUR per sqm. But there is a significant difference between the high end cities where the prices are on average of 7500 EUR per sqm.



In [None]:
for columns in ['Monthly Salary', 'Mortgage %', 'House Price per sqm','Basic Amenities', 'Yearly School Fee','Housing Debt to income ratio']:
    plt.figure()
    sns.boxplot(y = columns, data = inp_subset)

I am interested in knowing the cost of living conditions in India and United States.
1. There is a lot of difference in the average salary between India and United States. The salary in India is much lower compared to US but the mortgage% is very high.
2. Housing cost is higher in US.
3. There is quite a difference in the basic amenities cost in India (within cities) but US has almost same costs across the country which is great.
4. Education fee is higher in US (may be private schools because public schools are free of cost) and India is a cheaper option for education.
5. Housing debt to income ratio for India and US are almost close because although salaries are lower in India, housing is cheaper too. Debt is getting higher in US because of high housing cost in city centre.



In [None]:
inp_indiaandus=inp_subset[(inp_subset['Country'] == 'India') | (inp_subset['Country'] == 'United States')]

for columns in ['Monthly Salary', 'Mortgage %', 'House Price per sqm','Basic Amenities', 'Yearly School Fee','Housing Debt to income ratio']:
    plt.figure()
    sns.boxplot(x='Country' , y=columns, data = inp_indiaandus)


In [None]:
inp_india = inp_indiaandus[inp_indiaandus['Country'] == 'India']
inp_us = inp_indiaandus[inp_indiaandus['Country'] == 'United States']

#India
for columns in ['Monthly Salary', 'Mortgage %', 'House Price per sqm','Basic Amenities', 'Yearly School Fee','Housing Debt to income ratio']:
    plt.figure(figsize=(20,2))
    sns.scatterplot(x='City' , y=columns, data = inp_india)

Now, lets see if the cities give us interesting information in India and United States.
Wow. There is certainly a significant difference in the cities in India.
1. Gurgaon being a financial and industrial city close to New Delhi (capital of India) has high salary
2. All other cities have 500 EUR on average. 
3. Mortgage rate is not significantly different across the cities.
4. Housing price is definitely very high in Mumbai which pushes an individual into high debt because the salary is not high either.
5. Yearly school fee is higher in cities with higher salary ranges

In [None]:
#United States
for columns in ['Monthly Salary', 'Mortgage %', 'House Price per sqm','Basic Amenities', 'Yearly School Fee','Housing Debt to income ratio']:
    plt.figure(figsize=(20,2))
    sns.scatterplot(x='City' , y=columns, data = inp_us)

1. Did you see that? Look at how high Bay Area salaries are. Of course, housing is a bit high too but still it is affordable, given the low basic amenities and medium debt factor.
2. On the other side, NewYork has ok salaries but due to real estate limitation, the housing price is on the sky. Hence, the debt factor is high too.
3. Vegas being a gambling vacation spot, has lower salaries.
4. I feel Seattle is a great place to move to given good salary, low mortgage rate, good house pricing and living conditions. Never the least, it is a beautiful city too. 
5. Dallas,Chicago, San Deigo are great options too.

Now, lets decide which city is the best for me to move in US.

So, I am gathering more data from the dataset, just for the cities I am interested in. (Seattle, Dallas, San Diego, Chicago)


In [None]:
#Load dataset and filter data for the cities I am interested in.
inp_cities=pd.read_csv(inp)
inp_cities.columns.values[0]='Attribute'

#Convert Attribute rows into columns.
inp_cities=inp_cities.melt(id_vars='Attribute', var_name=['City'], value_name='Cost in EUR')

#Convert the entire dataframe using pivot, so we will have 'Country' column and a separate column for each attribute (cost of living factor)
inp_cities=inp_cities.pivot_table(index=['City'],columns='Attribute',values='Cost in EUR',aggfunc='first').reset_index()

#Lets create 'Country' based on 'City' field
inp_cities['Country']=inp_cities['City'].apply(lambda x: x.split(',')[-1].strip(' '))

#Remove Country from City field
inp_cities['City']=inp_cities['City'].apply(lambda x: x.split(',')[0].strip(' '))

inp_cities['School Fee (% of Salary)']=inp_cities['International Primary School, Yearly for 1 Child']*100/(inp_cities['Average Monthly Net Salary (After Tax)']*12)
inp_cities['Cost of 100 sqm house']=inp_cities['Price per Square Meter to Buy Apartment in City Centre']*100
inp_cities['Annual house payment']=((inp_cities['Cost of 100 sqm house']*inp_cities['Mortgage Interest Rate in Percentages (%), Yearly, for 20 Years Fixed-Rate']/100)+inp_cities['Cost of 100 sqm house'])/20
inp_cities['Housing Debt to income ratio']=inp_cities['Annual house payment']*100/(inp_cities['Average Monthly Net Salary (After Tax)']*12)
inp_cities['Basic Amenities']=inp_cities['Basic (Electricity, Heating, Cooling, Water, Garbage) for 85m2 Apartment']*100/inp_cities['Average Monthly Net Salary (After Tax)']

inp_cities=inp_cities[(inp_cities['City'] == 'Dallas') | (inp_cities['City'] == 'Chicago') | (inp_cities['City'] == 'Seattle') | (inp_cities['City'] == 'San Diego')]

In [None]:
inp_cities