# Extra Practise - Basics

In this optional practise session, I thought it would be fun to look at some cost of living data from, you guessed it, Kaggle: https://www.kaggle.com/stephenofarrell/cost-of-living

Here are the objectives:

1. Rename the "index" column to "location"
2. Utilise apply to generate two new columns from the location - city and country
3. Realise the easy solution doesn't doesnt work for the United States and create a function for apply to remove specific states.
3. Figure out which country has the most cities listed, and create a dataset from only that country
4. Sort the dataset by the cost of living 'Apartment (1 bedroom) in City Centre'
5. Cry over housing prices if you live in the Bay Area.

After that, feel free to keep playing with the data yourself.


In [32]:
# Code to start you off and manipulate the data. .T is transpose - swap columns and rows
import pandas as pd

df = pd.read_csv("cost-of-living.csv", index_col=0).T.reset_index()
df.head()

Unnamed: 0,index,"Meal, Inexpensive Restaurant","Meal for 2 People, Mid-range Restaurant, Three-course",McMeal at McDonalds (or Equivalent Combo Meal),Domestic Beer (0.5 liter draught),Imported Beer (0.33 liter bottle),Coke/Pepsi (0.33 liter bottle),Water (0.33 liter bottle),"Milk (regular), (1 liter)",Loaf of Fresh White Bread (500g),...,Lettuce (1 head),Cappuccino (regular),"Rice (white), (1kg)",Tomato (1kg),Banana (1kg),Onion (1kg),Beef Round (1kg) (or Equivalent Back Leg Red Meat),Toyota Corolla 1.6l 97kW Comfort (Or Equivalent New Car),"Preschool (or Kindergarten), Full Day, Private, Monthly for 1 Child","International Primary School, Yearly for 1 Child"
0,"Saint Petersburg, Russia",7.34,29.35,4.4,2.2,2.2,0.76,0.53,0.98,0.71,...,0.86,1.96,0.92,1.91,0.89,0.48,7.18,19305.29,411.83,5388.86
1,"Istanbul, Turkey",4.58,15.28,3.82,3.06,3.06,0.64,0.24,0.71,0.36,...,0.61,1.84,1.3,0.8,1.91,0.62,9.73,20874.72,282.94,6905.43
2,"Izmir, Turkey",3.06,12.22,3.06,2.29,2.75,0.61,0.22,0.65,0.38,...,0.57,1.56,1.31,0.7,1.78,0.58,8.61,20898.83,212.18,4948.41
3,"Helsinki, Finland",12.0,65.0,8.0,6.5,6.75,2.66,1.89,0.96,2.27,...,2.3,3.87,2.13,2.91,1.61,1.25,12.34,24402.77,351.6,1641.0
4,"Chisinau, Moldova",4.67,20.74,4.15,1.04,1.43,0.64,0.44,0.68,0.33,...,0.84,1.25,0.93,1.56,1.37,0.59,5.37,17238.13,210.52,2679.3


## Rename column

In [33]:
df2 = df.rename(columns={"index": "location"})

## Get city and country

In [47]:
# Split on ", " at most once, expand it to a dataframe, and then assign each column like a tuple
df2[["City", "Country"]] = df2.location.str.split(", ", n=1, expand=True)
df2.head()

Unnamed: 0,location,"Meal, Inexpensive Restaurant","Meal for 2 People, Mid-range Restaurant, Three-course",McMeal at McDonalds (or Equivalent Combo Meal),Domestic Beer (0.5 liter draught),Imported Beer (0.33 liter bottle),Coke/Pepsi (0.33 liter bottle),Water (0.33 liter bottle),"Milk (regular), (1 liter)",Loaf of Fresh White Bread (500g),...,"Rice (white), (1kg)",Tomato (1kg),Banana (1kg),Onion (1kg),Beef Round (1kg) (or Equivalent Back Leg Red Meat),Toyota Corolla 1.6l 97kW Comfort (Or Equivalent New Car),"Preschool (or Kindergarten), Full Day, Private, Monthly for 1 Child","International Primary School, Yearly for 1 Child",City,Country
0,"Saint Petersburg, Russia",7.34,29.35,4.4,2.2,2.2,0.76,0.53,0.98,0.71,...,0.92,1.91,0.89,0.48,7.18,19305.29,411.83,5388.86,Saint Petersburg,Russia
1,"Istanbul, Turkey",4.58,15.28,3.82,3.06,3.06,0.64,0.24,0.71,0.36,...,1.3,0.8,1.91,0.62,9.73,20874.72,282.94,6905.43,Istanbul,Turkey
2,"Izmir, Turkey",3.06,12.22,3.06,2.29,2.75,0.61,0.22,0.65,0.38,...,1.31,0.7,1.78,0.58,8.61,20898.83,212.18,4948.41,Izmir,Turkey
3,"Helsinki, Finland",12.0,65.0,8.0,6.5,6.75,2.66,1.89,0.96,2.27,...,2.13,2.91,1.61,1.25,12.34,24402.77,351.6,1641.0,Helsinki,Finland
4,"Chisinau, Moldova",4.67,20.74,4.15,1.04,1.43,0.64,0.44,0.68,0.33,...,0.93,1.56,1.37,0.59,5.37,17238.13,210.52,2679.3,Chisinau,Moldova


In [56]:
# However, there is actually an issue here, because the United States has state as well. 
# Ie "Austin, TX, United States", so we should add a state column
df2.Country = df2.Country.apply(lambda x: x if not "," in x else x.split(", ")[-1])
df2.head()

Unnamed: 0,location,"Meal, Inexpensive Restaurant","Meal for 2 People, Mid-range Restaurant, Three-course",McMeal at McDonalds (or Equivalent Combo Meal),Domestic Beer (0.5 liter draught),Imported Beer (0.33 liter bottle),Coke/Pepsi (0.33 liter bottle),Water (0.33 liter bottle),"Milk (regular), (1 liter)",Loaf of Fresh White Bread (500g),...,"Rice (white), (1kg)",Tomato (1kg),Banana (1kg),Onion (1kg),Beef Round (1kg) (or Equivalent Back Leg Red Meat),Toyota Corolla 1.6l 97kW Comfort (Or Equivalent New Car),"Preschool (or Kindergarten), Full Day, Private, Monthly for 1 Child","International Primary School, Yearly for 1 Child",City,Country
0,"Saint Petersburg, Russia",7.34,29.35,4.4,2.2,2.2,0.76,0.53,0.98,0.71,...,0.92,1.91,0.89,0.48,7.18,19305.29,411.83,5388.86,Saint Petersburg,Russia
1,"Istanbul, Turkey",4.58,15.28,3.82,3.06,3.06,0.64,0.24,0.71,0.36,...,1.3,0.8,1.91,0.62,9.73,20874.72,282.94,6905.43,Istanbul,Turkey
2,"Izmir, Turkey",3.06,12.22,3.06,2.29,2.75,0.61,0.22,0.65,0.38,...,1.31,0.7,1.78,0.58,8.61,20898.83,212.18,4948.41,Izmir,Turkey
3,"Helsinki, Finland",12.0,65.0,8.0,6.5,6.75,2.66,1.89,0.96,2.27,...,2.13,2.91,1.61,1.25,12.34,24402.77,351.6,1641.0,Helsinki,Finland
4,"Chisinau, Moldova",4.67,20.74,4.15,1.04,1.43,0.64,0.44,0.68,0.33,...,0.93,1.56,1.37,0.59,5.37,17238.13,210.52,2679.3,Chisinau,Moldova


## Figure out which country has the most cities

In [57]:
df2.Country.value_counts()

United States    13
India            11
Canada            8
Poland            6
Australia         5
                 ..
Latvia            1
Colombia          1
Iceland           1
Uzbekistan        1
New Zealand       1
Name: Country, Length: 82, dtype: int64

In [63]:
most_cities = df2.Country.value_counts().index[0]
most_cities

'United States'

## Create a subset of only that country

In [65]:
df3 = df2[df2.Country == most_cities]
df3

Unnamed: 0,location,"Meal, Inexpensive Restaurant","Meal for 2 People, Mid-range Restaurant, Three-course",McMeal at McDonalds (or Equivalent Combo Meal),Domestic Beer (0.5 liter draught),Imported Beer (0.33 liter bottle),Coke/Pepsi (0.33 liter bottle),Water (0.33 liter bottle),"Milk (regular), (1 liter)",Loaf of Fresh White Bread (500g),...,"Rice (white), (1kg)",Tomato (1kg),Banana (1kg),Onion (1kg),Beef Round (1kg) (or Equivalent Back Leg Red Meat),Toyota Corolla 1.6l 97kW Comfort (Or Equivalent New Car),"Preschool (or Kindergarten), Full Day, Private, Monthly for 1 Child","International Primary School, Yearly for 1 Child",City,Country
38,"Austin, TX, United States",13.48,44.92,7.19,4.49,5.39,1.87,1.37,0.75,2.4,...,2.6,3.19,1.04,1.82,10.6,18743.38,890.67,17537.93,Austin,United States
47,"Boston, MA, United States",13.47,62.85,7.18,6.29,6.73,1.75,1.53,0.78,2.62,...,3.68,5.14,1.39,3.4,14.07,18411.02,1534.76,24642.48,Boston,United States
48,"Chicago, IL, United States",13.47,53.87,7.18,4.94,6.29,1.75,1.52,0.7,2.54,...,3.92,3.52,1.47,2.39,12.4,18842.65,1025.73,15972.64,Chicago,United States
53,"Dallas, TX, United States",13.48,44.92,6.51,4.49,5.39,1.69,1.42,0.64,2.2,...,3.54,2.87,1.07,1.99,9.9,17561.12,800.09,16891.52,Dallas,United States
62,"Houston, TX, United States",13.48,53.91,6.74,5.39,5.84,1.81,1.59,0.68,2.03,...,3.09,2.07,1.25,1.59,9.17,18944.36,806.88,18762.48,Houston,United States
69,"Las Vegas, NV, United States",13.47,53.87,7.18,5.39,6.29,1.45,1.06,0.7,1.81,...,2.98,2.99,1.32,1.79,10.05,19092.7,786.6,12507.1,Las Vegas,United States
71,"Los Angeles, CA, United States",13.47,58.36,7.18,5.39,6.29,2.03,1.59,0.87,2.99,...,3.25,2.98,1.81,1.82,11.6,18702.58,951.17,19335.55,Los Angeles,United States
78,"New York, NY, United States",17.97,76.37,8.09,6.29,7.19,1.85,1.59,1.04,3.33,...,5.7,5.33,2.17,3.07,13.56,18118.42,2106.38,34441.93,New York,United States
86,"Phoenix, AZ, United States",10.77,53.87,6.73,3.59,4.49,1.53,1.07,0.51,2.18,...,4.53,3.19,1.38,1.91,9.6,18257.94,678.35,13498.49,Phoenix,United States
88,"Portland, OR, United States",12.58,44.92,6.74,4.49,4.49,1.78,1.35,0.78,2.83,...,3.91,3.67,1.3,1.9,14.63,20494.34,1076.09,16387.06,Portland,United States


## Sort by housing accommodation

In [70]:
key = "Apartment (1 bedroom) in City Centre"
df4 = df3.sort_values(key, ascending=False)
df4[["location", key]]

Unnamed: 0,location,Apartment (1 bedroom) in City Centre
100,"San Francisco, CA, United States",3131.06
78,"New York, NY, United States",2854.26
47,"Boston, MA, United States",2275.95
71,"Los Angeles, CA, United States",1980.11
102,"Seattle, WA, United States",1919.17
99,"San Diego, CA, United States",1816.1
48,"Chicago, IL, United States",1702.25
38,"Austin, TX, United States",1581.66
88,"Portland, OR, United States",1399.8
53,"Dallas, TX, United States",1322.45
