[seperate_covered_floor_or_not.ipynb] instruction

This is our third step code. The input files needed for running this file are [Stations.csv] and [Kitchens.csv] and [updated_wifi_data.csv]in [new_data] folder, as well as [Building and Kitchen List.xlsx] and [High Volume Event Spaces.xlsx - Sheet1.csv] in [original_data] folder.


In this [seperate_covered_floor_or_not.ipynb], we do further selection before answer our key question, includes:

1. Exclude covered floor (water station & kitchen floor). Floors with existing water bottle stations or kitchens are excluded from new station consideration because they already have drinking water access, and kitchen floors often restrict public access.
2. Add high volume and air condition (Mechanical Ventilation)
3. Delete dining hall floors. Many types of beverage are offered in dining halls.
 



After running this code, we can get [covered_floors.csv] and [uncovered_floors.csv] in [new_data] folder as result data , which means floors with or without existing water staton and kitchen after data filtering and processing.

In [23]:
import pandas as pd

In [24]:
stations_df = pd.read_csv('../new_data/Stations.csv')

In [25]:
kitchens_df = pd.read_csv('../new_data/Kitchens.csv')

In [26]:
stations_df.head()

Unnamed: 0,Floor,Building Description,Space@Bu room,Quantity,Date Installed,Type
0,1,1 UNIVERSITY ROAD,190A,1,,
1,1,1 UNIVERSITY ROAD,,1,,
2,3,1 UNIVERSITY ROAD,390,1,,
3,2,1 UNIVERSITY ROAD,284,1,,
4,1,100 ASHFORD STREET,191,1,,Combo Bottle Filler/Drinking Fountain


In [27]:
kitchens_df.head()

Unnamed: 0,Floor,Building Code,Building Description,Room #
0,1,623,1 UNIVERSITY ROAD,104
1,1,623,1 UNIVERSITY ROAD,150G
2,1,623,1 UNIVERSITY ROAD,150K
3,10,500,10 BUICK STREET,1001
4,5,500,10 BUICK STREET,505


In [28]:
# track buildings and floors that have either a kitchen or bottle station.
# the set stores tuples of the form (f, b), where f is a floor and b is a building description
covered_building_floors = set()

In [29]:
# add all stations first
for f, b in zip(stations_df['Floor'], stations_df['Building Description']):
    covered_building_floors.add((str(f), b))

In [30]:
# then add all kitchens
for f, b in zip(kitchens_df['Floor'], kitchens_df['Building Description']):
    covered_building_floors.add((str(f), b))

In [31]:
# it seems the building addresses contain both ST and STREET, AVE and AVENUE, RD and ROAD...

In [32]:
# get the foot traffic table
foot_traffic_df = pd.read_csv('../new_data/updated_wifi_data.csv')

In [33]:
# create mapping for the floors in the foot traffic data
floor_mapping = {
    'B': -1, 'b': -1, 'g': 1, 'l': -1, 'm': 1
}

In [34]:
foot_traffic_df['building_floor'] = foot_traffic_df['building_floor'].map(lambda x: 
                                                                          x if x not in floor_mapping else floor_mapping[x])

In [35]:
# create a new column that has both the floor and building desc
foot_traffic_df['F,B'] = list(zip(foot_traffic_df['building_floor'].astype(str), foot_traffic_df['building_desc']))

In [36]:
# join with ventilation data
#building_and_kitchen = pd.read_excel("Building and Kitchen List.xlsx")[['Building Description', 'Mechanical Ventilation']]
building_and_kitchen = pd.read_excel("../original_data/Building and Kitchen List.xlsx")[['Building Description', 'Mechanical Ventilation']]

In [37]:
foot_traffic_df = pd.merge(foot_traffic_df, building_and_kitchen, how = 'left', left_on = 'building_desc', right_on = 'Building Description').drop(columns=['Building Description'])

In [38]:
# handle foot traffic address that have hyphens, representing a (contiguous?) range of addresses
def is_covered(floor, address):
    if (floor, address) in covered_building_floors:
        return True
    if '-' not in address:
        return (floor, address) in covered_building_floors
    l,r = address.split('-')
    r, street = r.split()[0], ' '.join(r.split()[1:])
    l,r = int(l), int(r)
    for i in range(l, r+1):
        if (floor, f'{i} {street}') in covered_building_floors:
            return True
    return False

In [39]:
# filter out rows that have a (floor, description) in the station/kitchen set
foot_traffic_not_covered_df = foot_traffic_df[~foot_traffic_df['F,B'].map(lambda x: is_covered(*x))]
foot_traffic_not_covered_df.drop(columns=['F,B'], inplace=True)
# filter for rows that are covered
foot_traffic_covered_df = foot_traffic_df[foot_traffic_df['F,B'].map(lambda x: is_covered(*x))]
foot_traffic_covered_df.drop(columns=['F,B'], inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  foot_traffic_not_covered_df.drop(columns=['F,B'], inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  foot_traffic_covered_df.drop(columns=['F,B'], inplace=True)


In [40]:
# also filter out parking lots from the candidates
foot_traffic_not_covered_df = foot_traffic_not_covered_df[foot_traffic_not_covered_df['building_floor']!='p']

In [41]:
# sort by mean_density and export
foot_traffic_not_covered_df.sort_values('mean_density_cnt', ascending=False, inplace=True)
#foot_traffic_not_covered_df.to_csv('uncovered_floors_withoutvolumne.csv', index=False)

foot_traffic_covered_df.sort_values('mean_density_cnt', ascending=False, inplace=True)
foot_traffic_covered_df.to_csv('../new_data/covered_floors.csv', index=False)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  foot_traffic_covered_df.sort_values('mean_density_cnt', ascending=False, inplace=True)


In [42]:
#now add high volume or not to uncovered floors
df = foot_traffic_not_covered_df

df2 = pd.read_csv('../original_data/High Volume Event Spaces.xlsx - Sheet1.csv')

column_mapping = {'Address': 'building_desc', 'Floor': 'building_floor'}
df2['Address'] = df2['Address'].str.upper().str.strip()
df2 = df2.rename(columns=column_mapping)


merged_df = pd.merge(df, df2, on=['building_desc', 'building_floor'], how='left')
merged_df['high_volumn'] = ~merged_df['High Volume'].isna()


merged_df.drop(['Room Name', 'High Volume'], axis=1)



Unnamed: 0,building_floor,building_desc,latitude,longitude,building_type,capacity,mean_density_cnt,max_density_cnt,average_dc_ratio,max_dc_ratio,Residential Building Type,Mechanical Ventilation,Candidacy,high_volumn
0,-1,700 COMMONWEALTH AVENUE,42.3493,-71.1040,Residential,0,53.274023,363,inf,inf,Large Dormitory-Style,Partial,,False
1,c,925 COMMONWEALTH AVENUE,42.3522,-71.1177,Athletic,0,48.954518,375,inf,inf,,Yes,Maybe,True
2,1,949 COMMONWEALTH AVENUE,42.3519,-71.1187,Academic,45,45.248874,432,1.005531,9.600000,,Yes,,False
3,3,925 COMMONWEALTH AVENUE,42.3522,-71.1177,Athletic,29,41.111856,1857,1.417650,64.034483,,Yes,,False
4,2,8 ST. MARY'S STREET,42.3492,-71.1061,Research,145,39.097705,402,0.269639,2.772414,,Yes,,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
219,1,735 COMMONWEALTH AVENUE,42.3506,-71.1064,Student Support,53,1.298220,11,0.024495,0.207547,,No,,False
220,4,118 BAY STATE ROAD,42.3500,-71.0986,Academic,2,1.293103,4,0.646552,2.000000,,No,,False
221,21,273 BABCOCK STREET,42.3525,-71.1198,Residential,0,1.246964,8,inf,inf,Large Dormitory-Style,Yes,,False
222,r,820 COMMONWEALTH AVENUE,42.3500,-71.1122,Academic,0,1.205128,3,inf,inf,,Yes,,False


In [43]:
#merged_df.to_csv('uncovered_floors.csv', index=False)

print(merged_df.groupby(['high_volumn']).size())

high_volumn
False    220
True       4
dtype: int64


In [44]:
#now remove two dinng hall florrs, reason for choosing these 2 floors can be seen in Mannually_exlcuded_report.pdf
#for merge_df
#delete when [building_desc] is [100 BAY STATE ROAD] and [building_floor] is [2]
merged_df = merged_df[~((merged_df['building_desc'] == '100 BAY STATE ROAD') & (merged_df['building_floor'] == '2'))]
#delete when [building_desc] is [213 BAY STATE ROAD] and [building_floor] is [3]
merged_df = merged_df[~((merged_df['building_desc'] == '213 BAY STATE ROAD') & (merged_df['building_floor'] == '3'))]
merged_df.to_csv('../new_data/uncovered_floors.csv', index=False)