### EDA
Our project is a proof-of-concept model, so we set hard boundaries for the surrounding buildings, and the buildings that we would like to predict radiation. We have a single building that we would like to predict, and this consists of three fixed subunits that differ in heights. We then have 5 different buildings surrounding our target building, that would also have different shapes, but these surrounding buildings do not have any subunits.

With fixed boundary conditions (surrounding buildings), we change the heights of the subunits of our target building, and predict the monthly and annual radiation amount for each face of our target building.

FIXED VARIABLE: 
- NUMBER OF BOUNDARY BUILDINGS
- WIDTH AND HEIGHT OF THESE BOUNDARY BUILDINGS
- WIDTH AND HEIGHT OF THE SUB-BUILDINGS OF OUR TARGET BUILDING

NON-FIXED VARIABLE:
- THE HEIGHTS OF THE SUB-BUILDINGS OF OUR TARGET BUILDING

PREDICTION: MONTHLY/ANNUAL RADIATION AMOUNT FOR THE FACES OF OUR TARGET BUILDING

###### TOTAL NUMBER OF DATA SO FAR (WE PLAN TO ADD MORE AFTER SIMULATION)

In [None]:
!ls -1 result | wc -l

In [None]:
import pandas as pd
import numpy as np
import sys
import calendar
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

In [None]:
mapper = {0: "x", 1:"y", 2:"z"}
month = [calendar.month_name[i+1] for i in range(12)]

for i in range(3, 3+12):
    mapper[i] = month[i-3]
df = pd.read_fwf("result/12,27,6.txt")
df = df[list(df)[0]].str.split(",", n = 15, expand = True) 
df_building = df.rename(index=int, columns=mapper).rename(index=int, columns={15:"Annual Sum"})
df_building = df_building.apply(pd.to_numeric, errors='coerce')

###### Sample model data
The first three columns represent the coordinates of faces of our building. We then have monthly radiations, and the annual radiation sum. <u>It is important to understand that these x, y, z coordinates represent the central point of the 'faces' of the building.</u>

In [None]:
df_building.head(5)

###### Sample boundary data
The boundary data only have heights for each building, because we are not interested in how much radiation these boundary (surrounding) buildings block. Below shows a data format of one of the five boundary buildings.

In [None]:
mapper = {0: "x", 1:"y", 2:"z"}

df = pd.read_fwf("boundary/bd1.txt")
df = df[str(len(df))].str.split(",", n = 15, expand = True) 
df_boundary = df.rename(index=int, columns=mapper).rename(index=int, columns={15:"Annual Sum"})

In [None]:
df_boundary.head(5)

###### Annual radiation change for different coordinates

In [None]:
from IPython.display import Image
Image("Images/nocoord.JPG",width=500, height=500)

In [None]:
plt.figure(figsize = (18,3))
ax = plt.subplot(1,3,1)
ax.set_title("Annual radiation change for different x coordinate")
df_building.groupby(["x"]).mean()[["Annual Sum"]].plot(ax=plt.gca())
ax = plt.subplot(1,3,2)
ax.set_title("Annual radiation change for different y coordinate")
df_building.groupby(["y"]).mean()[["Annual Sum"]].plot(ax=plt.gca())
ax = plt.subplot(1,3,3)
ax.set_title("Annual radiation change for different z coordinate")
df_building.groupby(["z"]).mean()[["Annual Sum"]].plot(ax=plt.gca())
# plt.title('Annual radiation change relative to x-axis')
plt.show()

###### Monthly radiation changes

We expect our radiation to be larger during the summer, because there is more sunlight during this season. This confirms our intuition.

In [None]:
cols = list(df_building)
for i in range(3):
    cols.pop(0)
cols.pop()

df_building.mean()[cols].to_frame().plot(figsize = (15,5))
plt.title("Monthly average radiation")
plt.xticks([i for i in range(len(cols))], cols)
plt.show()

### Functions we will use

In [None]:
import sys, os, glob
import calendar
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from mpl_toolkits.mplot3d import Axes3D

pd.set_option('display.max_colwidth', -1)

In [None]:
def getBuildingHeights(filename):
    '''
    input: name of result file
    output: array of 3 heights of our target building (i.e [3,27,6])
    '''
    return filename.split("/")[-1].replace(".txt", "").split(",")

def createColumnMapper():
    mapper = {0: "x", 1:"y", 2:"z"}
    month = [calendar.month_name[i+1] for i in range(12)]
    for i in range(3, 3+12):
        mapper[i] = month[i-3]
    return mapper

def convertToDF(fname):
    '''
    input: name of result file
    output: pandas df
    '''
    with open(fname) as f:
        content = f.readlines()   
    content = np.array([line.replace("\n", "").split(",") for line in content[1:]])
    return pd.DataFrame(data=content)

def matplotlibViz(df):
    '''
    Visualizes in 3D a dataframe
    input: dataframe
    output: None
    '''
    fig = plt.figure(figsize=(8,8))
    ax = fig.add_subplot(111, projection='3d')
    ax.scatter(df['x'], df['y'], df['z'])
    plt.title("3-D rendered building")
    plt.show()

def convertToGrid(df, size=(100,100,100), increment=1):
    '''
    zero pad all the coordinates that do not have radiation values (even the coordinates inside the building are zero padded).
    input: dataframe
    output: dataframe
    
    '''
    res = []
    temp = df_building[['x', 'y', 'z']]*2
    temp = temp.apply(pd.to_numeric, errors='coerce', downcast="integer")
    hashmap = {tuple(row) for index, row in temp.iterrows()}

    for x in range(0, size[0]+increment, increment):
        for y in range(0, size[0]+increment, increment):
            for z in range(0, size[0]+increment, increment):
                if (x,y,z) in hashmap:
                    res.append([x,y,z,1])
                else:
                    res.append([x,y,z,0])
    
    assert len(res) == (size[0]+1)**3
    return pd.DataFrame(data=res)