# DX 704 Week 6 Project

This project will develop a treatment plan for a fictious illness "Twizzleflu".
Twizzleflu is a mild illness caused by a virus.
The main symptoms are a mild fever, fidgeting, and kicking the blankets off the bed or couch.
Mild dehydration has also been reported in more severe cases.
These symptoms typically last 1-2 weeks without treatment.
Word on the internet says that Twizzleflu can be cured faster by drinking copious orange juice, but this has not been supported by evidence so far.
You will be provided with a theoretical model of Twizzleflu modeled as a Markov decision process.
Based on the model, you will compute optimal treatment plans to optimize different criteria, and compare patient discomfort with the different plans.

The full project description, a template notebook, and raw data are available on GitHub: [Project 6 Materials](https://github.com/bu-cds-dx704/dx704-project-06).

We will model Twizzleflu as a Markov decision process.
The model transition probabilities are provided in the file "twizzleflu-transitions.tsv" and the expected rewards are in "twizzleflu-rewards.tsv".
The goal for Twizzleflu is to minimize the expected discomfort of the patient which is expressed as negative rewards in the file.

## Example Code

You may find it helpful to refer to these GitHub repositories of Jupyter notebooks for example code.

* https://github.com/bu-cds-omds/dx601-examples
* https://github.com/bu-cds-omds/dx602-examples
* https://github.com/bu-cds-omds/dx603-examples
* https://github.com/bu-cds-omds/dx704-examples

Any calculations demonstrated in code examples or videos may be found in these notebooks, and you are allowed to copy this example code in your homework answers.

## Part 1: Evaluate a Do Nothing Plan

One of the treatment actions is to do nothing.
Calculate the expected discomfort (not rewards) of a policy that always does nothing.

Hint: for this value calculation and later ones, use value iteration.
The analytical solution has difficulties in practice when there is no discount factor.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
# Read the datafiles into dataframes
path = r"C:\Users\JT von Seggern\DS Masters Repos\Module-8-Projects\dx704-project-06"
df_rewards = pd.read_csv(rf"{path}\twizzleflu-rewards.tsv", sep='\t')
df_transitions = pd.read_csv(rf"{path}\twizzleflu-transitions.tsv", sep='\t')

In [3]:
df_rewards.head()

Unnamed: 0,action,state,reward
0,do-nothing,exposed-1,0.0
1,do-nothing,exposed-2,0.0
2,do-nothing,exposed-3,0.0
3,do-nothing,symptoms-1,-0.5
4,do-nothing,symptoms-2,-1.0


In [4]:
df_transitions.head()

Unnamed: 0,action,state,next_state,probability
0,do-nothing,exposed-1,exposed-2,0.8
1,do-nothing,exposed-1,recovered,0.2
2,do-nothing,exposed-2,exposed-3,0.8
3,do-nothing,exposed-2,recovered,0.2
4,do-nothing,exposed-3,symptoms-1,0.8


Save the expected discomfort by state to a file "do-nothing-discomfort.tsv" with columns state and expected_discomfort.

In [5]:
def compute_qT_once(R, P, gamma, v):
    return R + gamma * P @ v

In [6]:
def iterate_values_once(R, P, gamma, v):
    return np.max(compute_qT_once(R, P, gamma, v), axis=0)

In [7]:
def value_iteration(R, P, gamma, max_iterations=100, tolerance=0.001):
    # initial approximation v_0
    v_old = np.zeros(R.shape[-1])

    for i in range(max_iterations):
        # compute v_{i+1}
        v_new = iterate_values_once(R, P, gamma, v_old)

        # check if values did not change much
        if np.max(np.abs(v_new - v_old)) < tolerance:
            return v_new

        v_old = v_new

    # return v_{max_iterations}
    return v_old

In [12]:
# Get all states (including terminal states that may only appear in next_state)
states = sorted(set(df_transitions['state']).union(df_transitions['next_state']))
state_idx = {s: i for i, s in enumerate(states)}
S = len(states)

# Build reward vector for "do-nothing" action
r_1 = np.zeros(S, dtype=float)
do_nothing_rewards = df_rewards[df_rewards['action'] == 'do-nothing']
for _, row in do_nothing_rewards.iterrows():
    r_1[state_idx[row['state']]] = row['reward']

# Build transition matrix for "do-nothing" action
P_1 = np.zeros((S, S), dtype=float)
do_nothing_trans = df_transitions[df_transitions['action'] == 'do-nothing']
for _, row in do_nothing_trans.iterrows():
    i = state_idx[row['state']]
    j = state_idx[row['next_state']]
    P_1[i, j] = row['probability']

# Reshape for value_iteration (expects shape (num_actions, num_states, ...))
r_1 = r_1.reshape(1,-1)
P_1 = P_1.reshape(1, S, S)

# Run value iteration
gamma_1 = 1
v_1 = value_iteration(r_1, P_1, gamma_1, max_iterations=10000, tolerance=1e-8) * -1

# Save results
df_nothing = pd.DataFrame({
    'state': states,
    'expected_discomfort': v_1.flatten()
})

df_nothing


Unnamed: 0,state,expected_discomfort
0,exposed-1,3.413333
1,exposed-2,4.266667
2,exposed-3,5.333333
3,recovered,-0.0
4,symptoms-1,6.666667
5,symptoms-2,5.0
6,symptoms-3,1.666667


In [13]:
df_nothing.to_csv("submission/do-nothing-discomfort.tsv", sep='\t', index=False)

Submit "do-nothing-discomfort.tsv" in Gradescope.

## Part 2: Compute an Optimal Treatment Plan

Compute an optimal treatment plan for Twizzleflu.
It should minimize the expected discomfort (maximize the rewards).

In [19]:
# Compute the optimal treatment for each state
R_2 = df_rewards.pivot(index='action', columns='state', values='reward')
P_2 = np.zeros((R_2.shape[0], R_2.shape[1], R_2.shape[1]))
gamma_2 = 1
v_2 = value_iteration(R_2.values, P_2, gamma_2)
q_2 = compute_qT_once(R_2.values, P_2, gamma_2, v_2)
a_2 = np.argmin(q_2, axis=0)

# Save the actions to tsv file
df_optimal = pd.DataFrame({'state': R_2.columns, 'action': R_2.index[a_2]})
df_optimal

Unnamed: 0,state,action
0,exposed-1,do-nothing
1,exposed-2,do-nothing
2,exposed-3,do-nothing
3,recovered,do-nothing
4,symptoms-1,sleep-8
5,symptoms-2,sleep-8
6,symptoms-3,sleep-8


Save the optimal actions for each state to a file "minimum-discomfort-actions.tsv" with columns state and action.

In [20]:
df_optimal.to_csv("submission/minimum-discomfort-actions.tsv", sep='\t')

Submit "minimum-discomfort-actions.tsv" in Gradescope.

## Part 3: Expected Discomfort

Using your previous optimal policy, compute the expected discomfort for each state.

In [None]:
# Get the optimal action for each state (from Part 2)
# a_2 contains the action indices, R_2.index contains action names
optimal_actions = R_2.index[a_2]
optimal_policy = dict(zip(R_2.columns, optimal_actions))

# Build the reward vector for the optimal policy
r_opt = np.zeros(S, dtype=float)
for state in states:
    if state in optimal_policy:
        action = optimal_policy[state]
        # Get reward for this state-action pair
        reward_row = df_rewards[(df_rewards['state'] == state) & (df_rewards['action'] == action)]
        if not reward_row.empty:
            r_opt[state_idx[state]] = reward_row['reward'].values[0]

# Build transition matrix for the optimal policy
P_opt = np.zeros((S, S), dtype=float)
for state in states:
    if state in optimal_policy:
        action = optimal_policy[state]
        # Get transitions for this state-action pair
        trans_rows = df_transitions[(df_transitions['state'] == state) & (df_transitions['action'] == action)]
        for _, row in trans_rows.iterrows():
            i = state_idx[row['state']]
            j = state_idx[row['next_state']]
            P_opt[i, j] = row['probability']

# Reshape for value_iteration
r_opt = r_opt.reshape(1, -1)
P_opt = P_opt.reshape(1, S, S)

# Run value iteration for the fixed optimal policy
gamma_3 = 1.0
v_opt = value_iteration(r_opt, P_opt, gamma_3, max_iterations=10000, tolerance=1e-8) * -1

# 7. Save results
df_opt_vals = pd.DataFrame({
    'state': states,
    'expected_discomfort': v_opt.flatten()
})

df_opt_vals

Unnamed: 0,state,expected_discomfort
0,exposed-1,10.24
1,exposed-2,12.8
2,exposed-3,16.0
3,recovered,-0.0
4,symptoms-1,20.0
5,symptoms-2,15.0
6,symptoms-3,5.0


Save your results in a file "minimum-discomfort-values.tsv" with columns state and expected_discomfort.

In [23]:
df_opt_vals.to_csv("submission/minimum-discomfort-values.tsv", sep='\t', index=False)

Submit "minimum-discomfort-values.tsv" in Gradescope.

## Part 4: Minimizing Twizzleflu Duration

Modifiy the Markov decision process to minimize the days until the Twizzle flu is over.
To do so, change the reward function to always be -1 if the current state corresponds to being sick and 0 if the current state corresponds to being better.
To be clear, the action does not matter for this reward function.


In [None]:
# Modify the Markov decision process to minimize days until the Twizzle flu is over
# Change rewards to -1 for sick and 0 for health states


Save your new reward function in a file "duration-rewards.tsv" in the same format as "twizzleflu-rewards.tsv".

In [None]:
# YOUR CHANGES HERE

...

Ellipsis

Submit "duration-rewards.tsv" in Gradescope.

## Part 5: Optimize for Shorter Twizzleflu

Compute an optimal policy to minimize the duration of Twizzleflu.

In [None]:
# YOUR CHANGES HERE

...

Ellipsis

Save the optimal actions for each state to a file "minimum-duration-actions.tsv" with columns state and action.

In [None]:
# YOUR CHANGES HERE

...

Ellipsis

Submit "minimum-duration-actions.tsv" in Gradescope.

## Part 6: Shorter Twizzleflu?

Compute the expected number of days sick for each state to a file.

In [None]:
# YOUR CHANGES HERE

...

Ellipsis

Save the expected sick days for each state to a file "minimum-duration-days.tsv" with columns state and expected_sick_days.

In [None]:
# YOUR CHANGES HERE

...

Ellipsis

Submit "minimum-duration-days.tsv" in Gradescope.

## Part 7: Speed vs Pampering

Compute the expected discomfort using the policy to minimize days sick, and compare the results to the expected discomfort when optimizing to minimize discomfort.

In [None]:
# YOUR CHANGES HERE

...

Ellipsis

Save the results to a file "policy-comparison.tsv" with columns state, speed_discomfort, and minimize_discomfort.

In [None]:
# YOUR CHANGES HERE

...

Ellipsis

Submit "policy-comparison.tsv" in Gradescope.

## Part 8: Code

Please submit a Jupyter notebook that can reproduce all your calculations and recreate the previously submitted files.

## Part 9: Acknowledgements

If you discussed this assignment with anyone, please acknowledge them here.
If you did this assignment completely on your own, simply write none below.

If you used any libraries not mentioned in this module's content, please list them with a brief explanation what you used them for. If you did not use any other libraries, simply write none below.

If you used any generative AI tools, please add links to your transcripts below, and any other information that you feel is necessary to comply with the generative AI policy. If you did not use any generative AI tools, simply write none below.