# Regional Effects on Thermal Preferences

This notebook aims to provide means to explore the conditions that individuals from different corners of the world prefer for their indoor spaces. This involves preparing the dataset to focus in on 3 different types of information:

* Location where these trials took place (What is climate like, who provided the data)
* Internal temperature and other conditions of the room itself
* Preference of those who were surveyed at that location (too hot, too cold)

This is just one of many potential areas to focus in on with this dataset, so feel free to edit this one or check out any of the other kernels!

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os, random, math, glob
from IPython.display import Image as IM
from IPython.display import clear_output
from matplotlib import pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = [16, 10]
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

We start by reading in the data, holding it under the general name "data"

In [None]:
data = pd.read_csv('/kaggle/input/ashrae-global-thermal-comfort-database-ii/ashrae_db2.01.csv')
#reading in the data

# Re-organizing Columns
* We can start by focusing specifically on air temperature, so we will drop some of the humidity data and use mostly the raw values of air temperature
* We'll stick with Celsius for our measurements (The fahrenheit measurements didn't appear to fill in any missing data anyways)
* We're going to generalize the climates a little better to simplify the type of environment we're looking at

I attached a description of each column below to show all the different pieces of information available to work with from the data set. For now, we're going to focus on a smaller subset of columns, but there is obviously a lot to glean for all the different sections of the data set!

# Column Description Reference

- Thermal Sensation: ASHRAE thermal sensation vote, from -3 (cold) to +3 (hot)
- Thermal Sensation acceptability: 0 = unacceptable, 1 = acceptable (Do the participants find the temperature acceptable)
- Thermal preference: cooler, no changes, warmer (Would the participants want the room to be cooler, no change or warmer)
- Air Movement acceptability: 0 = unacceptable, 1 = acceptable (Does the air movement they are feeling indoor feels acceptable)
- Air movement preference: less, no change, more
- Thermal Comfort: From 1 (very uncomfortable) to 6 (very comfortable)
- PMV (Predicted Mean Vote):  mean response of a larger group of people according the ASHRAE thermal sense scale: +3 hot. +2 warm. +1 slightly warm. 0 neutral.
- PPD (Predicted Percentage of Dissatisfied):  is an index that predicts the percentage of thermally dissatisfied people who feel too cool or too warm, and is calculated from the predicted mean vote (PMV)
- SET (Standard Effective Temperature in Celsius degree): In simpler terms, SET is a temperature metric that factors in relative humidity, mean radiant temperature, and air velocity, while also considering the anticipated activity rate and clothing levels.
- CLO: Intrinsic clothing ensemble insulation of the subject (clo): Thermal Insulation provided by clothing
- MET: Average Metabolic rate of Subject
- activity_10: Metabolic Activity in the last 10 minutes
- activity_20: Metabolic Activity in the last 20 minutes
- activity_30: Metabolic Activity in the last 30 minutes
- activity_60: Metabolic Activity in the last 60 minutes
- Air temperature (C): Air temperature measured in the occupied zone in Celsius degree
- Ta_h (C): Air temperature at 1.1 m above the floor in Celsius degree
- Ta_m (C): Air temperature at 0.6 m above the floor in Celsius degree
- Ta_l (C): Air temperature at 0.1 m above the floor in Celsius degree
- Operative Temperature (C): Calculated operative temperature in the occupied zone in Celsius degree
- Radiant Temperature (C): Radiant temperature measured in the occupied zone in Celsius degree
- Globe Temperature (C): Globe temperature measured in the occupied zone in Celsius degree
- Tg_h (C): Globe temperature at 1.1 m above the floor in Celsius degree
- Tg_m (C): Globe temperature at 0.6 m above the floor in Celsius degree
- Tg_l (C): Globe temperature at 0.1 m above the floor in Celsius degree
- Relative Humidity:
- Humidity Preference: 
- Humidity Sensation: 3-= very dry, 2 = dry, 1 = slightly dry, 0 = just right, -1 = slightly humid, -2 = humid, -3 = very humid
- Air velocity (m/s): Air speed in meter/seconds
- Velocity_h: Air speed at 1.1 m above the floor in meter per second
- Velocity_m: Air speed at 0.6 m above the floor in meter per second
- Velocity_l: Air speed at 0.1 m above the floor in meter per second
- Subject's height
- Subject's weight
- Blind(curtain): State of blinds or curtains if known (0 = open, 1 = closed); otherwise NA
- Fan: Fan mode if known (0 = off, 1 = on); otherwise NA
- Window: State of window if known (0 = open, 1 = closed); otherwise NA
- Door: State of doors if known (0 = open, 1 = closed); otherwise NA
- Heater: Heater mode if known (0 = off, 1 = on); otherwise NA
- Outdoor monthly air temperature (C): Outdoor monthly average temperature when the field study was done in Celsius degree
- Database: Database 1 or 2

From reading these descriptions, there are certainly a few columns that we will likely not need for this query.
* That said, feel free to re-add in these columns as needed

In [None]:
drop_col = ['Publication (Citation)','Data contributor','activity_10','activity_20','activity_30','activity_60','Air temperature (F)','Operative temperature (F)','Ta_h (F)','Ta_m (F)','Ta_l (F)','Radiant temperature (F)','Globe temperature (F)',
    'Tg_h (F)','Tg_m (F)','Tg_l (F)','Tg_h (C)','Tg_m (C)','Tg_l (C)','Air velocity (fpm)','Velocity_h (m/s)','Velocity_m (m/s)','Velocity_l (m/s)','Velocity_h (fpm)','Velocity_m (fpm)','Velocity_l (fpm)','Outdoor monthly air temperature (F)','Database']
data = data.drop(drop_col,axis=1)

Another quick piece of cleaning that can be done is to re-organize/generalize the climate column to help us make smaller, simpler groups of biomes
* We were able to accomplish by taking the Koppen Climate Classification fo each data point and using that to group each data point into one of five categories

1. Tropical
2. Dry
3. Temperate
4. Continental
5. Polar

In [None]:
tropical_A = []
dry_B = []
temperate_C = []
continental_D = []
polar_E = []

for climate in data['Koppen climate classification'].unique():
    if climate[0] == 'A':
        tropical_A.append(climate)
    elif climate[0] == 'B':
        dry_B.append(climate)
    elif climate[0] == 'C':
        temperate_C.append(climate)
    elif climate[0] == 'D':
        continental_D.append(climate)
    elif climate[0] == 'E':
        polar_E.append(climate)

data.loc[data['Koppen climate classification'].isin(tropical_A), 'Climate'] = 'Tropical'
data.loc[data['Koppen climate classification'].isin(dry_B), 'Climate'] = 'Dry'
data.loc[data['Koppen climate classification'].isin(temperate_C),'Climate'] = 'Temperate'
data.loc[data['Koppen climate classification'].isin(continental_D), 'Climate'] = 'Continental'
data.loc[data['Koppen climate classification'].isin(polar_E),'Climate'] = 'Polar'

In [None]:
data.head(50)

Now that we have the data we want and an idea of what it looks like, we can start cleaning and tuning it to focus in on the reported temperatures and thermal sensations.

In [None]:
import missingno as msno

msno.matrix(data.select_dtypes(include='number'));

* We're missing some temperature data.
* To salvage those points, we could use all the various readings to fill one temperature column.
* The SET metric is a really interesting one, as it factors in clothing, air speed, metabolic rate, and relative humidity to get a standardized value
* It may be worth exploring some of these other metrics in another notebook

* For this analysis, however, let's start with this just the standard **"Air Temperature"** data

In [None]:
data['Air temperature (C)'].isnull().sum()
#we're missing about a decent chunk of the data

There's two ways we can go with this:
    1. Continue with this collection of Air temperature data, dropping the data points that are missing a value
    2. Fill in these data points using other values (as much as we can), giving us a larger but slightly generalized dataset

Let's try option 2, but we we can picky about what values we actually choose to use, since we still have ~100,000 filled data points. 

* The Fahrenheit values could be usefully, but you'll find that columns with Fahrenheit also have celsius values, so no dice

* We can notice in the missing values matrix that at the bottom, there is a large section which has a temperature data in the "Ta_h (C)" column, but is missing air temperature. "Ta_h (C)" gives us the air temperature at 1.1m above the floor. This seems like a reasonable substitute to use for our standard air temperature value.

In [None]:
data['Air temperature (C)'] = data.apply(
    lambda column: column['Ta_h (C)'] if np.isnan(column['Air temperature (C)']) else column['Air temperature (C)'],
    axis=1
)

data = data.dropna(how='any', subset=['Air temperature (C)'])
#at this point, we can drop any row that's missing temperature data

msno.matrix(data);

Now let's take a lot about the preference data reported for each of these locations. The relevant columns are:

**Thermal Sensation**
* Thermal Sensation data tells us whether the subject found the room too warm, too cold, or alright
* The scale is -3.0 to 3.0, where -3 is the room is way too cold, 3 the room is way too hot, and 0 means it is acceptable

**Thermal Preference**
* More straightforward, qualitative value. It asks: Would like the room to be warmer, cooler, no change?

**Thermal Sensation Acceptability**
* Simplest metric, boolean for if the room's temperature is acceptable: 1 = acceptable, 0 = unacceptable

**Thermal Comfort**
* Allowed the subject to rate the comfort from 1 (uncomfortable) to 6 

Thermal Sensation gives us the most information, so let's see how complete of a dataset we can get with that column

In [None]:
data['Thermal sensation'].isnull().sum()

Thermal Sensation is missing some values

* From the missing value matrix it looks like Thermal sensation acceptability is missing even more
* Thermal Preference appears to be missing most of the same values as well

Thermal comfort appears to hold data in all the places where thermal sensation is missing values. Perhaps they asked the person to rate their comfort instead of sensation for those trials?

It's a little tricky to use comfort to fill in for preference, since if the patient was uncomfortable, we don't know whether it was because they found the room was too hot or too cold. Let's look at the type of responses we got for column:

In [None]:
data['Thermal sensation'].value_counts()

In [None]:
data['Thermal comfort'].value_counts()

We see in both columns that the most common response is the person is comfortable, with slightly uncomfortable being the next most common. I think it's reasonable to say if someone responded with a 5 or a 6 for comfort, meaning they are comfortable or very comfortable with the room's temperature, then they would have responded with a 0 for sensation as well. Let's use that to fill in some of those missing points

* Note: To do this, we have to fix up the thermal comfort column a bit, since the column is currently filled 'objects' (looks like strings). We need these values to be floats to merge with thermal sensation

In [None]:
data['Thermal comfort'].describe()

In [None]:
#data.loc[data['Thermal sensation'].isnull() & data['Thermal comfort'].astype(float) >= 4.0, 'Thermal sensation'] = 0.0
#data['Thermal sensation'].isnull().sum()

In [None]:
data['Thermal sensation'].describe()
#standard deviation is ~1.25
#if we want to fill these data points, we could potentially use thermal preference data and replace it with a corresponding value
#'warmer' : -1.25
#'colder' : 1.25
#'no change' : 0.0
# some code that may help with this:
# data1['Thermal sensation'] = data1['Thermal sensation'].fillna(data['Thermal preference'].map({'no change':0.0, 'warmer':-1.25,'cooler':1.25})

# Thermal sensation is also missing about 10,000 data points as well, so keep that in mind
# for now, lets drop the missing values
# Let's call this data set "Data Temp"
dataTemp = data.dropna(how='any', subset=['Thermal sensation'])
dataTemp['Thermal sensation'].isnull().sum()

In [None]:
import seaborn as sns

sns.violinplot(x = dataTemp['Thermal sensation'].round(0), y=dataTemp['Air temperature (C)'])

Couple things we can glean from this violin plot of our data:

1. As expected, higher Air temperature values led to higher thermal sensation value
2. There are some potential outliers that we should handle

Notice that there are Air temperature values below 10º C, and some values above 40º C. Even if these are legitimate results, they're not super useful to us. It seems unlikely that a building would ever be intentionally kept at those temperatures by preference, regardless of regional factors.

There are various definitions for the 'acceptable' range of temperatures of a building. In their 2018 Housing Health guidelines, the World Health Organization proposed 18º C as a minimum temperature for a "safe and well-balanced indoor temperature to protect the health of general populations. On the other end, maximum acceptable indoor temperatures ranged from 25-30º C. We could potentially limit our data to temperatures in that range, but for now, let's stick with a more general approach

For the sake of preserving data while still removing outliers, let's focus on Air temperature values between **10ºC and 40ºC**, adding a little rounding to avoid removing too much

In [None]:
dataTemp = dataTemp[dataTemp['Air temperature (C)'].round(0) >= 10.0]
dataTemp = dataTemp[dataTemp['Air temperature (C)'].round(0) <= 40.0]

dataTemp['Air temperature (C)'].describe()

This removed a few thousand data points, but again,those are pretty extreme temperatures for a building Feel free to expand or condense the range as you see fit!

Now that we have the data focused in a little bit, we can start breaking it into regional data to start to understand how different parts of the world prefer their indoor temperatures!

We can start by looking at where our data comes from...

In [None]:
dataTemp['Country'].value_counts().plot(kind='barh', figsize=(20,6))

We can see that a large portion of our data comes from the UK, India, USA, Australia, Brazil, and China. This is a pretty good mix of different parts of the world, although it might be worth looking at any data from the Middle Earth or Sub-Saharan Africa to cover more regions. That said, this is a good starting point. 

We can start our comparisons by focusing in on these six nations. (Feel free to change/add code to grab data from other countries or group countries together)

In [None]:
uk = dataTemp[dataTemp['Country'] == "UK"]
india = dataTemp[dataTemp['Country'] == "India"]
usa = dataTemp[dataTemp['Country'] == "USA"]
australia = dataTemp[dataTemp['Country'] == "Australia"]
brazil = dataTemp[dataTemp['Country'] == "Brazil"]
china = dataTemp[dataTemp['Country'] == "China"]

With the data now sectioned off, we can do some quick correlation checks to see if there are any other non-temperature data points that seem to also significantly thermal sensation. This will help with any tuning we want to accomplish.

In [None]:
def checkCorr(df):
    return df.corr()['Thermal sensation'].sort_values(ascending=False).head(10)

checkCorr(usa)

Air temperature and other temperature measurements naturally have  the strongest correlations with thermal sensation for each of the data sets. There are a few other columns to keep in mind:

* Predicted Mean Vote (PMV): This column attempts to predict thermal sensation, but it looks like Air temperature alone does a better job in most cases
* Standard Effective Temperature (SET): This metric is a weighted temperature that factors in humidity, air speed, clothing, and metabolic rate. Could be something interesting to explore in the future
* Fan/Window: These mark whether a fan is used in the room or a window is open. These definitely influence how a room feels, but could there be a placebo effect that makes people less likely to call a room too warm if a fan is on?

# Calculating Stats by Country

Now that the data is segmented, we can start calculating some basic stats for each country

In [None]:
countries = [uk,india,usa,australia,brazil,china]

In [None]:
c = []
temp = []
sens = []
for country in countries:
    c.append(country["Country"].iloc[0])
    temp.append(country["Air temperature (C)"].mean())
    sens.append(country["Thermal sensation"].mean())

meanDF=pd.DataFrame()
meanDF["Country"] = c
meanDF["Avg Room Temp"] = temp
meanDF["Avg Thermal Sensation"] = sens
meanDF.head(6)

# meanData = pd.DataFrame(np.array([c, temp, sens]),columns=['Country', 'Mean Room Temperature (C)', 'Mean Thermal Sensation'])


This is obviously very crude, but nonetheless interesting... There's a few interesting notes from this generalized way of looking at these averages

1. UK participants had the lowest average temperature, yet still had highest thermal sensation
2. The USA was the only group that leaned towards their room being too cold rather than too warm
3. Even with slight differences in temperature, Brazil, China, and Australia had roughly the same average sensation

Something that might be interesting would be to see what temperatures triggered certain responses in terms of thermal sensation. For example, a response above 1.0 represents someone finding a room more than a little too warm. What temperature illicits this response? Is it different in different populations? To investigate this, I decided to break up sensation responses into 5 groups

1. 0.0 (Room temperature is acceptable)
2. 0 to 1.0 (Room is a little too warm)
3. -1.0 to <0 (Room is a little too cold)
4. 1.0+ (Room is noticeably warm)
5. <-1.0 (Room is noticeably cold)

I set the cut off at +/- 1.0 because aside from 0.0, 1.0 and -1.0 were two of the most common responses. This suggested to me that most participants were reserving putting down a response outside that range unless the room's condition was noticeably uncomfortable. With these ranges set, I decided to look at the median temperature rather than the mean, since I didn't want outliers shifting the value significantly. This will hopefully give an idea of what temperature it typically takes to illicit different sensation responses in different parts of the world.

* I also decided to find how the responses were distributed for each country by calculating the ratio of the total responses that each type of response had. This will give some context to the median values and offer some insight into how often each group picked each type of response.

In [None]:
med_0to1 = []
med_1to0 = []
med_gt1 = []
med_lt1 = []
med_0 = []

for country in countries:
    med_0to1.append(country[(country["Thermal sensation"] <= 1.0) & (country["Thermal sensation"] > 0.0)]["Air temperature (C)"].median())
    med_1to0.append(country[(country["Thermal sensation"] >= -1.0) & (country["Thermal sensation"] < 0.0)]["Air temperature (C)"].median())
    med_gt1.append(country[country["Thermal sensation"] > 1.0]["Air temperature (C)"].median())
    med_lt1.append(country[country["Thermal sensation"] < -1.0]["Air temperature (C)"].median())
    med_0.append(country[country["Thermal sensation"] == 1.0]["Air temperature (C)"].median())
    
pct_0to1 = []
pct_1to0 = []
pct_gt1 = []
pct_lt1 = []
pct_0 = []

for country in countries:
    pct_0to1.append(country[(country["Thermal sensation"] <= 1.0) & (country["Thermal sensation"] > 0.0)].shape[0]/country.shape[0] * 100)
    pct_1to0.append(country[(country["Thermal sensation"] >= -1.0) & (country["Thermal sensation"] < 0.0)].shape[0]/country.shape[0] * 100)
    pct_gt1.append(country[country["Thermal sensation"] > 1.0]["Air temperature (C)"].shape[0]/country.shape[0] * 100)
    pct_lt1.append(country[country["Thermal sensation"] < -1.0]["Air temperature (C)"].shape[0]/country.shape[0] * 100)
    pct_0.append(country[country["Thermal sensation"] == 0.0]["Air temperature (C)"].shape[0]/country.shape[0] * 100)
    
meanDF["Median Temp 0 to 1.0"] = med_0to1
meanDF["Median Temp -1.0 to 0"] = med_1to0
meanDF["Median Temp > 1.0"] = med_gt1
meanDF["Median Temp < -1.0"] = med_lt1
meanDF["Median Temp 0.0"] = med_0

meanDF["Pct Temp 0 to 1.0"] = pct_0to1
meanDF["Pct Temp -1.0 to 0"] = pct_1to0
meanDF["Pct Temp > 1.0"] = pct_gt1
meanDF["Pct Temp < 1.0"] = pct_lt1
meanDF["Pct Temp 0.0"] = pct_0

meanDF.head(6)

In [None]:
df = pd.DataFrame({'Avg Room Temp': temp,'% Positive': pct_0}, index=c)
axes = df.plot.bar(rot=0, subplots=True)
axes[1].legend(loc=0)

Some more interesting notes from this:

* Australia and the US had very similar median temperatures for each of these responses, and roughly similar distributions of each type of response. The UK responded similarly, although their temperatures are shifted slightly lower. It's worth noting that Australia and the US had almost identical average room temperatures in the data, and that both countries have a wide range of climates. Could these similar conditions be the reason that the two nations seem to have similar preferences?
* China had by far the largest range between temperatures that caused subjects to say the room was significantly warm or cold. It took a median of 19.7º for subjects to say the room was too cold, and 29.4ºC to say the room was too warm. That leaves a 10º range that did not seem to frequently make subjects uncomfortable. Is it a cultural difference or just happenstance?
* Brazil's subjects seemed to be pretty comfortable. Only around 10% of subjects from Brazil thought the room was significantly too warm or too cold. In fact, just over 50% thought the room temperature was just right, even though the average room temperature for trials in Brazil was greater than the median "too hot" temperature for the UK, USA, and Australia

# Next Steps

With these basic observations on the thermal preferences of different regions, there are a few different applications that might be interesting:

1. Could we use this data to find a useful way predict thermal sensation? The PMV column, predicted mean value, attempts to do this, but we found that there is a stronger correlation with air temperature alone than the PMV column, and it gives us a decimal, while the vast majority most of our thermal sensation responses are integers  Could we improve upon this prediction model or develop a newer, better one that factors in the subject's location?

2. Can we take an even closer zoom in on these segmented data sets? Perhaps if we have enough data, we could compare region by region or city by city? For example, do New York and Phoenix exhibit the same differences we saw between India and the UK? Or is there something besides local climate or average room temperature that drives thermal sensation?


# Option 1: Predicting
Let's start with Option 1. We can check how good the estimates made using PMV were using root mean squared error

In [None]:
country = uk #change as needed
in_cols = country['PMV']
in_cols



from sklearn.metrics import mean_squared_error
from math import sqrt
country1 = country.dropna(subset=['PMV','Thermal sensation'])
#np.any(np.isnan(country['Thermal sensation']))
print("RMSE:", sqrt(mean_squared_error(country1['Thermal sensation'][-1000:], country1['PMV'][-1000:])))


Now let's try making predictions using the medians and country-specific distribution we found.

In [None]:
med_0to1 = country[(country["Thermal sensation"] <= 1.0) & (country["Thermal sensation"] > 0.0)]["Air temperature (C)"].median()
med_1to0 = country[(country["Thermal sensation"] >= -1.0) & (country["Thermal sensation"] < 0.0)]["Air temperature (C)"].median()
med_gt1 = country[(country["Thermal sensation"] > 1.0)]["Air temperature (C)"].median()
med_lt1 = country[(country["Thermal sensation"] < -1.0)]["Air temperature (C)"].median()
med_0 = country[(country["Thermal sensation"] == 0.0)]["Air temperature (C)"].median()

pct_0to1 = country[(country["Thermal sensation"] <= 1.0) & (country["Thermal sensation"] > 0.0)].shape[0]/country.shape[0]
pct_1to0 = country[(country["Thermal sensation"] >= -1.0) & (country["Thermal sensation"] < 0.0)].shape[0]/country.shape[0]
pct_0 = country[country["Thermal sensation"] == 0.0]["Air temperature (C)"].shape[0]/country.shape[0]
leftLine = med_0 - ((pct_1to0 / (pct_0 + pct_1to0)) * (med_0 - med_1to0))
rightLine = med_0 + ((pct_0to1 / (pct_0 + pct_0to1)) * (med_0to1 - med_0))


prediction = []
for temp in country['Air temperature (C)']:
    if(temp > med_gt1):
        prediction.append(2.0)
    elif(temp > rightLine):
        prediction.append(1.0)
    elif(temp > leftLine):
        prediction.append(0.0)
    elif(temp > med_lt1):
        prediction.append(-1.0)
    else:
        prediction.append(-2.0)

country['Prediction'] = prediction
print("RMSE:", sqrt(mean_squared_error(country['Thermal sensation'][-1000:], country['Prediction'][-1000:])))

        

We can see that the RMSE is slightly higher, but what is nice is that this prediction model does essentially give each data point a verdict on what subjects in the room would likely think of the temperature. Instead of providing a decimal point between values, this model gives an integer from -2 to 2. We could interpret this as:

* -2: Room is uncomfortably cold
* -1: Room is slightly chilly
* 0: Room is a good temperature
* 1: Room is slightly warm
* 2: Room is uncomfortably hot

Considering that the RMSE was still very similar to the PMV values despite the crudeness of this model, it definitely seems that grouping by region provided some value. If we did some missing value handling, we could try some ensemble methods to find an actual fit for the sensation column. That could be a great next step!

# Option 2: City by City


In [None]:
country = usa
country['City'].value_counts().plot(kind='barh', figsize=(20,6))

Honolulu and Philly definitely have differing climates, let's compare their results with a couple other cities!

In [None]:

honolulu = country[country['City'] == "Honolulu"]
philly = country[country['City'] == "Philadelphia"]
sanfran = country[country['City'] == "San Francisco"]
grandrap = country[country['City'] == "Grand Rapids"]

cities = [honolulu,philly,sanfran,grandrap]
c2 = []
temp2 = []
sens2 = []
for city in cities:
    c2.append(city["City"].iloc[0])
    temp2.append(city["Air temperature (C)"].mean())
    sens2.append(city["Thermal sensation"].mean())

meanDF=pd.DataFrame()
meanDF["City"] = c2
meanDF["Avg Room Temp"] = temp2
meanDF["Avg Thermal Sensation"] = sens2
meanDF.head(4)

In [None]:
med_0to1 = []
med_1to0 = []
med_gt1 = []
med_lt1 = []
med_0 = []

for city in cities:
    med_0to1.append(city[(city["Thermal sensation"] <= 1.0) & (city["Thermal sensation"] > 0.0)]["Air temperature (C)"].median())
    med_1to0.append(city[(city["Thermal sensation"] >= -1.0) & (city["Thermal sensation"] < 0.0)]["Air temperature (C)"].median())
    med_gt1.append(city[city["Thermal sensation"] > 1.0]["Air temperature (C)"].median())
    med_lt1.append(city[city["Thermal sensation"] < -1.0]["Air temperature (C)"].median())
    med_0.append(city[city["Thermal sensation"] == 1.0]["Air temperature (C)"].median())
    
pct_0to1 = []
pct_1to0 = []
pct_gt1 = []
pct_lt1 = []
pct_0 = []

for city in cities:
    pct_0to1.append(city[(city["Thermal sensation"] <= 1.0) & (city["Thermal sensation"] > 0.0)].shape[0]/city.shape[0] * 100)
    pct_1to0.append(city[(city["Thermal sensation"] >= -1.0) & (city["Thermal sensation"] < 0.0)].shape[0]/city.shape[0] * 100)
    pct_gt1.append(city[city["Thermal sensation"] > 1.0]["Air temperature (C)"].shape[0]/city.shape[0] * 100)
    pct_lt1.append(city[city["Thermal sensation"] < -1.0]["Air temperature (C)"].shape[0]/city.shape[0] * 100)
    pct_0.append(city[city["Thermal sensation"] == 0.0]["Air temperature (C)"].shape[0]/city.shape[0] * 100)
    
meanDF["Median Temp 0 to 1.0"] = med_0to1
meanDF["Median Temp -1.0 to 0"] = med_1to0
meanDF["Median Temp > 1.0"] = med_gt1
meanDF["Median Temp < -1.0"] = med_lt1
meanDF["Median Temp 0.0"] = med_0

meanDF["Pct Temp 0 to 1.0"] = pct_0to1
meanDF["Pct Temp -1.0 to 0"] = pct_1to0
meanDF["Pct Temp > 1.0"] = pct_gt1
meanDF["Pct Temp < 1.0"] = pct_lt1
meanDF["Pct Temp 0.0"] = pct_0

meanDF.head(4)