# D209: Data Mining I - Task 2: Predictive Analysis
***

### By: Leng Yang
### Student ID: 012298452
### Date: August 30, 2024
***
<br>
<br>
<br>

## Table of Contents
* [A1. Proposal of Question](#A1)
* [A2. Defined Goal](#A2)
* [B1. Explanation of Prediction](#B1)
* [B2. Summary of Method Assumption](#B2)
* [B3. Packages or Libraries List](#B3)
* [C1. Data Preprocessing](#C1)
* [C2. Data Set Variables](#C2)
* [C3. Steps for Analysis](#C3)
* [C4. Cleaned Data Set](#C4)
* [D1. Splitting the Data](#D1)
* [D2. Output and Intermediate Calculations](#D2)
* [D3. Code Execution](#D3)
* [E1. Accuracy and MSE](#E1)
* [E2. Results and Implications](#E2)
* [E3. Limitation](#E3)
* [E4. Course of Action](#E4)
* [F. Panopto Recording](#F)
* [G. Sources for Third-Party Code](#G)
* [H. Sources](#H)

<BR>

## A1. Proposal of Question <a class="anchor" id="A1"></a>

The research question for this paper is: can a decision tree model be used to estimate the length of an initial hospital stay?

<BR>

## A2. Defined Goal <a class="anchor" id="A2"></a>

My analysis aims to estimate the length of an initial hospital stay using significant patient factors from a medical data set. This analysis is relevant as hospitals continually struggle with patient capacity and staffing needs. Hospitals constantly need help with patient capacity as each serves vast communities. This issue is more prevalent in a post-Covid era, where more patients are being hospitalized due to severe symptoms requiring hospital admittance. Additionally, staffing issues continue as healthcare providers leave the profession due to long working hours and burnout. Determining factors that lead to the length of an initial hospital stay may give better insight as it may assist hospital systems in planning around such circumstances as patient capacity, staffing shortages, and others.

<BR>

## B1. Explanation of Prediction <a class="anchor" id="B1"></a>

A decision tree model works well for this analysis as it can work with categorical and numerical data and aims to predict a continuous outcome. Additionally, it is robust to outliers, which can lead to better use of the original data, although data preparation still needs to occur to ensure no outstanding errors exist. This model can be considered a tree in terms of how it is structured. It starts with a root node and then continually splits into decision nodes based on certain conditions. This splitting continues until a leaf node is reached, which acts as the outcome and is determined based on a stopping criterion. This model imitates a tree containing a root, branches, and leaves. In the case of this analysis, the root is all the data and features that will be looked at. The model will then continually split the data until a stopping criterion is met and results are produced. An outcome of this analysis includes producing a decision tree regression model that can make predictions with minimal standard errors. This outcome would help the business in resource management and other things.

<BR>

## B2. Summary of Method Assumption <a class="anchor" id="B2"></a>

One of the primary assumptions of the decision tree model is that it doesn't make any assumptions about the training data or the prediction residuals (Abhigyan, 2021). Some examples include the distribution of the data and independence amongst variables. In terms of data distribution, for instance, there are no requirements for the data to be normally distributed, so the model can work with skewed data and data that contains outliers. Additionally, some models, such as a linear regression model, assume no multicollinearity amongst variables, but this is not the case for decision tree models.

<BR>

## B3. Packages or Libraries List <a class="anchor" id="B3"></a>

Listed are the Python packages and libraries used in assistance for this analysis:
* Pandas: used to load the data into a tabular dataframe and for data manipulation
* NumPy: used for matrix and mathematical calculations
* Matplotlib and Seaborn are both used for visualizations
* Sklearn provides several libraries that aid significantly in this analysis:
    * DecisionTreeRegressor is used as the main algorithm and regression model for this analysis 
    * train_test_split is used to split the data into training and testing sets
    * GridSearchCV is used for hyperparameter tuning of the model
    * r2_score is used to calculate the $R^{2}$ of the model and used for model evaluation
    * mean_squared_error is used to calculate the mean squared error of the model and used for model evaluation

In [19]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import r2_score, mean_squared_error

import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.max_columns', None)

In [20]:
#Print working versions
from platform import python_version
print("python:", python_version())
print("pandas:", pd.__version__)
print("numpy:", np.__version__)

python: 3.11.7
pandas: 2.1.4
numpy: 1.26.4


<BR>

## C1. Data Preprocessing <a class="anchor" id="C1"></a>

One of the primary data preprocessing steps will include encoding the categorical variables that will be used as predictors into numerical values. This includes encoding categorical variables with "Yes/No" values into 1/0's. Additionally, categorical variables with more than two levels will be dummy encoded. This will ensure that these variables will be inputted appropriately into the model. As the model can work with features of different scales, no normalization or standardization of the data needs to be performed.

<BR>

## C2. Data Set Variables <a class="anchor" id="C2"></a>

The variables selected for this analysis are from my prior analysis in D208, where statistically significant variables were determined (Yang, 2024). Below is a list of the variables and their respective data types.
* Initial Days Hospitalized: Numeric
* Total charge: Numeric
* Initial admission: Categorical
* Complication risk: Categorical
* High blood pressure: Categorical
* Arthritis: Categorical
* Diabetes: Categorical
* Hyperlipidemia: Categorical
* Back pain: Categorical

<BR>

## C3. Steps for Analysis <a class="anchor" id="C3"></a>

The data was first loaded into a Pandas dataframe, and then the column names were reformatted using snake case for consistency. Afterward, checks were performed to look for any missing values or outliers. Upon checking, there were no missing values. Outliers were kept as they were within a reasonable range. Once cleaning was complete, only the relevant variables were kept, and the data was briefly explored to get a sense of the structure.

After data cleaning and exploration were complete, the data was preprocessed before modeling. Categorical predictors with "Yes/No" values were encoded to 1/0. Categorical variables with more than two levels, such as `initial admission` and `complication risk`, were dummy encoded to represent all available levels.

In [30]:
#Read data file into dataframe
df = pd.read_csv('medical_clean.csv')

In [31]:
# Standardize column names using snake_case and re-express names for comprehensiveness (Yang, 2024).
col_names = ['case_order', 'customer_id', 'interaction', 'uid', 'city', 'state', 'county', 'zip', 'latitude', 'longitude',
             'population', 'area', 'timezone', 'job', 'children', 'age', 'income', 'marital', 'gender', 'readmission', 
             'vit_d_level', 'doc_visits', 'full_meals_eaten', 'vit_d_supp', 'soft_drink', 'initial_admin', 'high_blood', 'stroke', 'complication_risk', 'overweight',
             'arthritis', 'diabetes', 'hyperlipidemia', 'back_pain', 'anxiety', 'allergic_rhinitis', 'reflux_esophagitis', 'asthma', 'services', 'initial_days', 
             'total_charge', 'additional_charges', 'item_1', 'item_2', 'item_3', 'item_4', 'item_5', 'item_6', 'item_7', 'item_8']
df.columns = col_names
df.head()

Unnamed: 0,case_order,customer_id,interaction,uid,city,state,county,zip,latitude,longitude,population,area,timezone,job,children,age,income,marital,gender,readmission,vit_d_level,doc_visits,full_meals_eaten,vit_d_supp,soft_drink,initial_admin,high_blood,stroke,complication_risk,overweight,arthritis,diabetes,hyperlipidemia,back_pain,anxiety,allergic_rhinitis,reflux_esophagitis,asthma,services,initial_days,total_charge,additional_charges,item_1,item_2,item_3,item_4,item_5,item_6,item_7,item_8
0,1,C412403,8cd49b13-f45a-4b47-a2bd-173ffa932c2f,3a83ddb66e2ae73798bdf1d705dc0932,Eva,AL,Morgan,35621,34.3496,-86.72508,2951,Suburban,America/Chicago,"Psychologist, sport and exercise",1,53,86575.93,Divorced,Male,No,19.141466,6,0,0,No,Emergency Admission,Yes,No,Medium,No,Yes,Yes,No,Yes,Yes,Yes,No,Yes,Blood Work,10.58577,3726.70286,17939.40342,3,3,2,2,4,3,3,4
1,2,Z919181,d2450b70-0337-4406-bdbb-bc1037f1734c,176354c5eef714957d486009feabf195,Marianna,FL,Jackson,32446,30.84513,-85.22907,11303,Urban,America/Chicago,Community development worker,3,51,46805.99,Married,Female,No,18.940352,4,2,1,No,Emergency Admission,Yes,No,High,Yes,No,No,No,No,No,No,Yes,No,Intravenous,15.129562,4193.190458,17612.99812,3,4,3,4,4,4,3,3
2,3,F995323,a2057123-abf5-4a2c-abad-8ffe33512562,e19a0fa00aeda885b8a436757e889bc9,Sioux Falls,SD,Minnehaha,57110,43.54321,-96.63772,17125,Suburban,America/Chicago,Chief Executive Officer,3,53,14370.14,Widowed,Female,No,18.057507,4,1,0,No,Elective Admission,Yes,No,Medium,Yes,No,Yes,No,No,No,No,No,No,Blood Work,4.772177,2434.234222,17505.19246,2,4,4,4,3,4,3,3
3,4,A879973,1dec528d-eb34-4079-adce-0d7a40e82205,cd17d7b6d152cb6f23957346d11c3f07,New Richland,MN,Waseca,56072,43.89744,-93.51479,2162,Suburban,America/Chicago,Early years teacher,0,78,39741.49,Married,Male,No,16.576858,4,1,0,No,Elective Admission,No,Yes,Medium,No,Yes,No,No,No,No,No,Yes,Yes,Blood Work,1.714879,2127.830423,12993.43735,3,5,5,3,4,5,5,5
4,5,C544523,5885f56b-d6da-43a3-8760-83583af94266,d2f0425877b10ed6bb381f3e2579424a,West Point,VA,King William,23181,37.59894,-76.88958,5287,Rural,America/New_York,Health promotion specialist,1,22,1209.56,Widowed,Female,No,17.439069,5,0,2,Yes,Elective Admission,No,No,Low,No,No,No,Yes,No,No,Yes,No,No,CT Scan,1.254807,2113.073274,3716.525786,2,1,3,3,5,3,4,3


In [32]:
#Check for missing values and initial data types
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 50 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   case_order          10000 non-null  int64  
 1   customer_id         10000 non-null  object 
 2   interaction         10000 non-null  object 
 3   uid                 10000 non-null  object 
 4   city                10000 non-null  object 
 5   state               10000 non-null  object 
 6   county              10000 non-null  object 
 7   zip                 10000 non-null  int64  
 8   latitude            10000 non-null  float64
 9   longitude           10000 non-null  float64
 10  population          10000 non-null  int64  
 11  area                10000 non-null  object 
 12  timezone            10000 non-null  object 
 13  job                 10000 non-null  object 
 14  children            10000 non-null  int64  
 15  age                 10000 non-null  int64  
 16  incom

In [33]:
#Look for outliers from statistical measures
df.describe()

Unnamed: 0,case_order,zip,latitude,longitude,population,children,age,income,vit_d_level,doc_visits,full_meals_eaten,vit_d_supp,initial_days,total_charge,additional_charges,item_1,item_2,item_3,item_4,item_5,item_6,item_7,item_8
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,5000.5,50159.3239,38.751099,-91.24308,9965.2538,2.0972,53.5117,40490.49516,17.964262,5.0122,1.0014,0.3989,34.455299,5312.172769,12934.528587,3.5188,3.5067,3.5111,3.5151,3.4969,3.5225,3.494,3.5097
std,2886.89568,27469.588208,5.403085,15.205998,14824.758614,2.163659,20.638538,28521.153293,2.017231,1.045734,1.008117,0.628505,26.309341,2180.393838,6542.601544,1.031966,1.034825,1.032755,1.036282,1.030192,1.032376,1.021405,1.042312
min,1.0,610.0,17.96719,-174.2097,0.0,0.0,18.0,154.08,9.806483,1.0,0.0,0.0,1.001981,1938.312067,3125.703,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
25%,2500.75,27592.0,35.25512,-97.352982,694.75,0.0,36.0,19598.775,16.626439,4.0,0.0,0.0,7.896215,3179.374015,7986.487755,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0
50%,5000.5,50207.0,39.419355,-88.39723,2769.0,1.0,53.0,33768.42,17.951122,5.0,1.0,0.0,35.836244,5213.952,11573.977735,4.0,3.0,4.0,4.0,3.0,4.0,3.0,3.0
75%,7500.25,72411.75,42.044175,-80.43805,13945.0,3.0,71.0,54296.4025,19.347963,6.0,2.0,1.0,61.16102,7459.69975,15626.49,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0
max,10000.0,99929.0,70.56099,-65.29017,122814.0,10.0,89.0,207249.1,26.394449,9.0,7.0,5.0,71.98149,9180.728,30566.07,8.0,7.0,8.0,7.0,7.0,7.0,7.0,7.0


In [34]:
#Select for interested variables
df = df[['initial_days', 'total_charge', 'initial_admin', 'complication_risk', 'high_blood', 'arthritis', 'diabetes', 'hyperlipidemia', 'back_pain']]
df.head()

Unnamed: 0,initial_days,total_charge,initial_admin,complication_risk,high_blood,arthritis,diabetes,hyperlipidemia,back_pain
0,10.58577,3726.70286,Emergency Admission,Medium,Yes,Yes,Yes,No,Yes
1,15.129562,4193.190458,Emergency Admission,High,Yes,No,No,No,No
2,4.772177,2434.234222,Elective Admission,Medium,Yes,No,Yes,No,No
3,1.714879,2127.830423,Elective Admission,Medium,No,Yes,No,No,No
4,1.254807,2113.073274,Elective Admission,Low,No,No,No,Yes,No


In [35]:
#EDA of numeric variables
df.describe()

Unnamed: 0,initial_days,total_charge
count,10000.0,10000.0
mean,34.455299,5312.172769
std,26.309341,2180.393838
min,1.001981,1938.312067
25%,7.896215,3179.374015
50%,35.836244,5213.952
75%,61.16102,7459.69975
max,71.98149,9180.728


In [36]:
#EDA of categorical variables
df.initial_admin.value_counts()

initial_admin
Emergency Admission      5060
Elective Admission       2504
Observation Admission    2436
Name: count, dtype: int64

In [37]:
df.complication_risk.value_counts()

complication_risk
Medium    4517
High      3358
Low       2125
Name: count, dtype: int64

In [38]:
df.high_blood.value_counts()

high_blood
No     5910
Yes    4090
Name: count, dtype: int64

In [39]:
df.arthritis.value_counts()

arthritis
No     6426
Yes    3574
Name: count, dtype: int64

In [40]:
df.diabetes.value_counts()

diabetes
No     7262
Yes    2738
Name: count, dtype: int64

In [41]:
df.hyperlipidemia.value_counts()

hyperlipidemia
No     6628
Yes    3372
Name: count, dtype: int64

In [42]:
df.back_pain.value_counts()

back_pain
No     5886
Yes    4114
Name: count, dtype: int64

In [43]:
#Generate dummy variables for categoricals with more than two levels and assign 1/0s to True/Yes and False/No responses
df = pd.get_dummies(df, columns=['initial_admin', 'complication_risk']).replace({True:1, 'Yes': 1, False:0, 'No':0})
#Format column names
df.columns = df.columns.str.lower().str.replace(' ', '_')

df.head()

Unnamed: 0,initial_days,total_charge,high_blood,arthritis,diabetes,hyperlipidemia,back_pain,initial_admin_elective_admission,initial_admin_emergency_admission,initial_admin_observation_admission,complication_risk_high,complication_risk_low,complication_risk_medium
0,10.58577,3726.70286,1,1,1,0,1,0,1,0,0,0,1
1,15.129562,4193.190458,1,0,0,0,0,0,1,0,1,0,0
2,4.772177,2434.234222,1,0,1,0,0,1,0,0,0,0,1
3,1.714879,2127.830423,0,1,0,0,0,1,0,0,0,0,1
4,1.254807,2113.073274,0,0,0,1,0,1,0,0,0,1,0


<BR>

## C4. Cleaned Data Set <a class="anchor" id="C4"></a>

Attached to the submission is the cleaned data set, named "D209_Task2_Data.csv."

In [47]:
#Generate csv file of prepared data set
df.to_csv('D209_Task2_Data.csv', index=False)

<BR>

## D1. Splitting the Data <a class="anchor" id="D1"></a>

The prepared data set will be partitioned into an 80/20 split, where 80% of the data will be used for training and the other 20% for testing purposes. Also attached are the train and test set data.

In [51]:
#Set X and y
X = df.drop('initial_days', axis=1)
y = df.initial_days
#Use 80/20 split. Also set random state for reproducibility.
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8, test_size=0.2, random_state=69)

In [52]:
#Generate train and test files
med_train = pd.concat([X_train, y_train], axis=1)
med_test = pd.concat([X_test, y_test], axis=1)

med_train.to_csv('D209_Task2_Train.csv', index=False)
med_test.to_csv('D209_Task2_Test.csv', index=False)

<BR>

## D2. Output and Intermediate Calculations <a class="anchor" id="D2"></a>

The decision tree model employed in this analysis builds a model that is tree-like in structure. It contains a root node, which acts like the root or base of a tree. Also included are decision nodes and leaf nodes, which act like the branches and leaves of a tree. The model is built by recursively breaking down the data into smaller subsets until a decision is made, governed by a stopping criterion. This process creates a tree-like structure where the decision nodes contain the branches, and at the end are the leaf nodes, which include the results of the decisions.

During the modeling process, hyperparameter tuning was performed to select the best combination of hyperparameters that constitute the model based on a scoring metric. The two hyperparameters tuned were `max_depth` and `min_samples_leaf`, where a range of values for each was explored. The tuning was completed with `GridSearchCV`, using 5-fold cross-validation and a mean squared error (MSE) scoring metric. The best combination of hyperparameters was the one that produced a model with the lowest MSE score. This resulted in the hyperparameters of a `max_depth` of "None" and a `min_samples_leaf` of 5.

MSE scores were also calculated for both a model containing the default parameters and one containing the best hyperparameters and then compared. Again, lower MSE scores imply a better model as the score represents an error value that is trying to be minimized. Comparing the two MSE scores, 1.71 for the default model and 1.39 for the tuned model, the tuned model can perform better as it produces a smaller error.

In [56]:
#Hypertuning parameters used for testing
param_dict = {'max_depth':[None,2,4,6,8,10], 'min_samples_leaf':[1,2,5,10,50,100]}

#Instantiate model with random_state for reproducibility
dtr = DecisionTreeRegressor(random_state=69)

#Instantiate GridSearchCV object, fit the model, and determine the best hyperparameters
grid_dtr = GridSearchCV(estimator=dtr, param_grid=param_dict, cv=5, scoring='neg_mean_squared_error')
grid_dtr.fit(X_train, y_train)
print("The best hyperparameters are:", grid_dtr.best_params_)

The best hyperparameters are: {'max_depth': None, 'min_samples_leaf': 5}


In [57]:
#Instantiate and fit model with default hyperparameters, then calculate MSE score.
dtr = DecisionTreeRegressor(random_state=69)
dtr.fit(X_train, y_train)
y_pred = dtr.predict(X_test)
print('The MSE of the default model is:', mean_squared_error(y_test, y_pred))

The MSE of the default model is: 1.7112371728571327


In [107]:
#Instantiate and fit model with the best hyperparameters, then calculate metric scores.
dtr_best = DecisionTreeRegressor(min_samples_leaf=5, random_state=69)
dtr_best.fit(X_train, y_train)
y_pred = dtr_best.predict(X_test)
#Calculate MSE, RMSE, and R-squared scores for the tuned model
print('The MSE of the tuned model is:', mean_squared_error(y_test, y_pred))
print('The RMSE of the tuned model is:', np.sqrt(mean_squared_error(y_test, y_pred)))
print('The R-squared value of the tuned model is:', r2_score(y_test, y_pred))

The MSE of the tuned model is: 1.393993007309578
The RMSE of the tuned model is: 1.1806748101444267
The R-squared value of the tuned model is: 0.9979969535222221


<BR>

## D3. Code Execution <a class="anchor" id="D3"></a>

A copy of the code is submitted alongside the report, named "D209_Task_2_Leng_Yang.ipynb."

<BR>

## E1. Accuracy and MSE <a class="anchor" id="E1"></a>

Mean squared error (MSE) is calculated as the average sum of the residuals' squares. This value can be used as a model evaluation metric to determine how well a model performs. The MSE was calculated to be 1.39. Although MSE is a good starting point, it isn't easy to practically interpret as it is not in the same units as the response variable; in this case, it would be represented as $days^{2}$. Root mean squared error (RMSE), the square root of the MSE, is a better metric for determining the model's accuracy as it _is_ in the same units as the response variable. The RMSE was calculated to be 1.18 days. This result can be interpreted as an average of how many days off were predicted from the actual results. The model is accurate since, on average, predictions are only a bit over a day off.

The $R^{2}$ score was also calculated for model evaluation. This metric indicates how well the model fits the data and is easily interpretable. The score ranges from 0 to 1, with higher scores indicating a better model fit. The $R^{2}$ was calculated to be 0.99, which suggests that the model performs exceptionally well.

<BR>

## E2. Results and Implications <a class="anchor" id="E2"></a>

A decision tree regression model was used to estimate the length of initial hospitalizations. The developed model had a testing $R^{2}$ of 0.99, MSE of 1.39, and RMSE of 1.18. With a high $R^{2}$ and relatively low MSE and RMSE scores, the model is reliable in its predictive nature. Based on GridSearchCV results, the best hyperparameters were max_depth = None and min_samples_leaf = 5. In a hospital setting, predicting hospitalization lengths to be a day off from the expected results is good as it allows hospitals much better resource planning and utilization.

Although the model performs well, it could constantly be improved upon. The model could have its hyperparameters re-tuned, as only two were explored in this analysis. Additionally, the values for tuning were chosen arbitrarily, so gaining a better insight into the hyperparameters may yield better results in the future.

<BR>

## E3. Limitation <a class="anchor" id="E3"></a>

One limitation of using a decision tree model in this analysis is that it suffers from high variance (GeeksforGeeks, 2024). Small changes can cause significant results, such as changes in training data. Such changes could lead to different trees that are produced instead. This also leads to instability and can cause unreliability if the model is ever re-trained on different training data. However, this issue could be circumvented with a Random Forest method, reducing variability and increasing stability.

<BR>

## E4. Course of Action <a class="anchor" id="E4"></a>

Based on the results of this analysis, the model can be used to predict initial hospitalization lengths reliably. However, before that, it is recommended that the model be first validated against other out-of-sample data for completeness. Furthermore, the business should be wary of the limitations of the model. This is important as new data constantly emerges, and models are retrained to maintain efficacy. In this process, the reliability and stability of decision tree models may fall short.

<BR>

## F. Panopto Recording <a class="anchor" id="F"></a>

A recording is submitted alongside the report and can also be found at: https://wgu.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=836ac165-43ca-43c6-9c3a-b1dc00f6d08b

<BR>

## G. Sources for Third-Party Code <a class="anchor" id="G"></a>

Yang, L. (2024). _D208: Predictive Modeling - Performance Assessment - Task 1: Linear Regression Modeling_. [Unpublished assignment submitted for D208]. Western Governors Univeristy.

<BR>

## H. Sources <a class="anchor" id="H"></a>

Abhigyan. (2021, November 8). _Understanding Decision Tree!!_. Medium. https://medium.com/analytics-vidhya/understanding-decision-tree-3591922690a6 

GeeksforGeeks. (2024, February 15). _Pros and Cons of Decision Tree Regression in Machine Learning_. https://www.geeksforgeeks.org/pros-and-cons-of-decision-tree-regression-in-machine-learning/ 

Yang, L. (2024). _D208: Predictive Modeling - Performance Assessment - Task 1: Linear Regression Modeling_. [Unpublished assignment submitted for D208]. Western Governors Univeristy.