# Temple Dataset Curation
The purpose of this notebook is to create a clean data set containing the temples of the Church of Jesus Christ of Latter-Day Saints. The goal is to create a data set containing the names, sizes, land plot sizes, dimensions (defined as the number of each respective ordinance room), and the elevation of where the temple was built.

In [78]:
import pandas as pd
import numpy as np
import re

#### Creating Subdata Set 1
##### Temple Dimension Dataset

In [79]:
urlDimensions = 'https://churchofjesuschristtemples.org/statistics/dimensions/'

dimensionTable = pd.read_html(urlDimensions)[0]
dimensionTable

Unnamed: 0,Temple,Instruction Rooms,Sealing Rooms,Baptismal Fonts,Square Footage,Acreage
0,Aba Nigeria Temple,2,2,1,11500,6.3
1,Abidjan Ivory Coast Temple,-,-,-,-,0.55
2,Accra Ghana Temple,2,2,1,17500,6
3,Adelaide Australia Temple,2,2,1,10700,6.94
4,Alabang Philippines Temple,-,-,-,-,2.6
...,...,...,...,...,...,...
236,Willamette Valley Oregon Temple,2,2,1,30000,10.5
237,Winnipeg Manitoba Temple,1,1,1,16100,7.7
238,Winter Quarters Nebraska Temple,2,2,1,16000,1.92
239,Yigo Guam Temple,1,1,1,6861,5.8


#### Creating Subdata Set 2
##### Temple Elevation Dataset

In [80]:
urlElevation = 'https://churchofjesuschristtemples.org/statistics/elevations/'

elevationTable = pd.read_html(urlElevation)[0]
elevationTable

Unnamed: 0,Temple,Elevation (Feet),Elevation (Meters)
0,Aba Nigeria Temple,192.2 ft.,58.6 m
1,Abidjan Ivory Coast Temple,173.1 ft.,52.8 m
2,Accra Ghana Temple,131.8 ft.,40.2 m
3,Adelaide Australia Temple,137.7 ft.,42.0 m
4,Alabang Philippines Temple,57.5 ft.,17.5 m
...,...,...,...
293,Winchester Virginia Temple,820.0 ft.,249.9 m
294,Winnipeg Manitoba Temple,766.2 ft.,233.5 m
295,Winter Quarters Nebraska Temple,"1,158.6 ft.",353.1 m
296,Yigo Guam Temple,487.1 ft.,148.5 m


## Merging the Datasets
The Church of Jesus Christ of Latter-Day Saints often releases the site location before they release other information about the temple. This includes information such as the size, rendering of the temple, etc. Because of this, the elevation data set is much larger as information for temples that are under construction or soon to be, have their location and as such their elevation released early.

Because of this, it seems appropriate to omit the temples which are not included in the dimensions dataset.

In [81]:
finalDataset = pd.merge(dimensionTable, elevationTable, on='Temple', how='left')
finalDataset

Unnamed: 0,Temple,Instruction Rooms,Sealing Rooms,Baptismal Fonts,Square Footage,Acreage,Elevation (Feet),Elevation (Meters)
0,Aba Nigeria Temple,2,2,1,11500,6.3,192.2 ft.,58.6 m
1,Abidjan Ivory Coast Temple,-,-,-,-,0.55,173.1 ft.,52.8 m
2,Accra Ghana Temple,2,2,1,17500,6,131.8 ft.,40.2 m
3,Adelaide Australia Temple,2,2,1,10700,6.94,137.7 ft.,42.0 m
4,Alabang Philippines Temple,-,-,-,-,2.6,57.5 ft.,17.5 m
...,...,...,...,...,...,...,...,...
236,Willamette Valley Oregon Temple,2,2,1,30000,10.5,429.4 ft.,130.9 m
237,Winnipeg Manitoba Temple,1,1,1,16100,7.7,766.2 ft.,233.5 m
238,Winter Quarters Nebraska Temple,2,2,1,16000,1.92,"1,158.6 ft.",353.1 m
239,Yigo Guam Temple,1,1,1,6861,5.8,487.1 ft.,148.5 m


## Cleaning the Dataset
This dataset includes missing data. It seems appropriate to investigate other aspects of this data such as the ordinance rooms. This information is most likely to be missing as it can only be surmised by viewing a publicly available floor plan of the building (hard to do outside the US) or by having someone visit the temple and report on it (also difficult to do, especially when the building is under construction).

However, the main objective is to compare the size of the temple to the elevation. As such, I will include the rows that have missing "dimension" values, but still have a square-footage provided.

Also, the elevation meters column has been dropped as it is redundant. The foot column was included instead of meters as all the other measurement columns are in an imperial units

In [82]:
finalDataset = finalDataset.drop(columns = ['Elevation (Meters)'])

In [83]:
finalDataset = finalDataset.replace('-', np.nan)
finalDataset = finalDataset.dropna(subset = ['Square Footage']).reset_index(drop = True)
finalDataset

Unnamed: 0,Temple,Instruction Rooms,Sealing Rooms,Baptismal Fonts,Square Footage,Acreage,Elevation (Feet)
0,Aba Nigeria Temple,2,2,1,11500,6.3,192.2 ft.
1,Accra Ghana Temple,2,2,1,17500,6,131.8 ft.
2,Adelaide Australia Temple,2,2,1,10700,6.94,137.7 ft.
3,Albuquerque New Mexico Temple,2,3,1,34245,8.5,"5,729.5 ft."
4,Anchorage Alaska Temple,2,1,1,11937,5.4,205.1 ft.
...,...,...,...,...,...,...,...
229,Willamette Valley Oregon Temple,2,2,1,30000,10.5,429.4 ft.
230,Winnipeg Manitoba Temple,1,1,1,16100,7.7,766.2 ft.
231,Winter Quarters Nebraska Temple,2,2,1,16000,1.92,"1,158.6 ft."
232,Yigo Guam Temple,1,1,1,6861,5.8,487.1 ft.


## Cleaning Data Part 2
First things first, all of the numeric categories should be either a float or an integer. In order to perform this, the elevation column will need to have the ' ft.' removed from each value in the column.

Second, it would be nice to be able to compare the temple data by location. As such, removing the word 'Temple' from the name seems in order.

In [84]:
finalDataset['Elevation (Feet)'] = finalDataset['Elevation (Feet)'].str[:-4]
finalDataset['Elevation (Feet)'] = finalDataset['Elevation (Feet)'].apply(lambda x: re.sub(',', '', x)).astype(float)


In [85]:
finalDataset['Square Footage'] = finalDataset['Square Footage'].astype(int)
finalDataset['Acreage'] = finalDataset['Acreage'].astype(float)
finalDataset['Instruction Rooms'] = finalDataset['Instruction Rooms'].astype(float).astype(pd.Int64Dtype())
finalDataset['Sealing Rooms'] = finalDataset['Sealing Rooms'].astype(float).astype(pd.Int64Dtype())
finalDataset['Baptismal Fonts'] = finalDataset['Baptismal Fonts'].astype(float).astype(pd.Int64Dtype())

In [86]:
finalDataset['Temple'] = finalDataset['Temple'].str[:-7]

## Creating the CSV

In [87]:
finalDataset.to_csv('templeDimensionElevation.csv')