<a href="https://colab.research.google.com/github/rija-ansari/MSE1003H_RijaAnsari/blob/main/Assignment_2/MSE1003_Assignment2_RA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction

- talk about the objective of this assignment 
- opentron
- measure response surface
- talk about statistical design of experiements what it is / how it works

## Analysis

### Set up environment and assignment folder

Before we begin, let's set up our environment and import the necessary libraries.

If we are using an API key, we need to set uo the environment variable for the API key. 
- Create a .env text file in the root directory of your project and save the key as MPI_KEY=your_api_key_here
- In a Jupyter cell run the following:



In [None]:
"""
import os
from dotenv import load_dotenv
load_dotenv()
MPI_KEY = os.getenv("MPI_KEY")
"""

Create a virtual environment in the terminal 
- python -m venv .venv  

Create a new text file with the name ".gitignore"
- add the text venv/,pycache/ and .env (if used)

**Issues arrived from multiple python versions that kept conflicting with each other  
Before beginning ensure that 3.13.9 and pymatgen 2025.10.7 are running**


In [None]:
import sys
import pkg_resources

print("Python version:", sys.version)
print("pymatgen version:", pkg_resources.get_distribution("pymatgen").version)

Check everything is in order:
- make sure this is the main repository on the local drive
    - pwd
- make sure this is the main repository url
    - git remote -v
- we need to add our new files from the assignment folder
    - cd /Users/rija/MSE1003H_RijaAnsari/Assignment_2
    - git add . 
- Move back to the main repo
    - cd .. 
    - git commit -m "Assignment 2 structure update"
    - git pull origin main
    - git push origin main

### Import data

In [None]:
pip install ternary

In [None]:
import os
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import ternary
import plotly.express as px
import colorsys

In [None]:
cwd = os.getcwd()
print("Current working directory:", cwd)
#open csv file
input = pd.read_csv("colors3.csv")
output = pd.read_csv("color_results.csv")

In [None]:
volumes = input.copy()
volumes

In [None]:
#ternary plot 
fig = px.scatter_ternary(
    volumes, 
    a="R", 
    b="Y", 
    c="B",
)

fig.update_traces(marker=dict(color='black', size=5))

fig.update_layout(
    ternary={
        'sum': 100,
        "aaxis": {
            "title": {"text": "Red", "font": {"color": "red", "size": 20}},
            "tickfont": {"color": "red"},
            "linecolor": "red"
        },
        "baxis": {
            "title": {"text": "Yellow", "font": {"color": "yellow", "size": 20}},
            "tickfont": {"color": "black"},
            "linecolor": "yellow"
        },
        "caxis": {
            "title": {"text": "Blue", "font": {"color": "blue", "size": 20}},
            "tickfont": {"color": "blue"},
            "linecolor": "blue"
        }
    }
)
fig.show()

The colour ratios chosen for the 26 data points were distributed so that they would cover even search space throughout the triangle. This was done to ensure that the effects of adding each dye were evenly measured for. There were some limitations on robot volume conditions such that total volume had to be 300 uL and the minimum amount for each dye had to be atleast 10 uL. This deviates from the ideal vertex values of (300,0,0) to determine an "absolute" signal for each dye.

The order of the sample values was also intentional. The experiment proceeds in a manner of descending yellow volume. This is because the Opentron 2 does not rinse between samples and so this order was chosen to lessen the accumulation of the more pigmented dyes (red and blue) in the pipette. 

### 8-channel Output Response

In [None]:
results = output.copy()
results

In [None]:
#find the rows with the highest values for each color
yellow_signal = results[results['Yellow'] == 280].index[0]
red_signal = results[results['Red'] == 280].index[0]
blue_signal = results[results['Blue'] == 280].index[0]

red_signal, yellow_signal, blue_signal

In [None]:
#create a new dataframe with only channel values from last 8 columns
channels = ['ch410', 'ch440', 'ch470', 'ch510', 'ch550', 'ch583', 'ch620', 'ch670']
results_ch = results[channels]

results_ch

In [None]:
results['ro_raw'] = results['ch620'] + results['ch670']

# Yellow: Green-Yellow to Amber wavelengths
results['yo_raw'] = results['ch510'] + results['ch550'] + results['ch583']

# Blue: Violet to Blue-Cyan wavelengths
results['bo_raw'] = results['ch410'] + results['ch440'] + results['ch470']

# 3. Calculate the total intensity for normalization
results['total_intensity'] = results['ro_raw'] + results['yo_raw'] + results['bo_raw']

# 4. Convert to percentages (ro, yo, bo)
# We multiply by 100 so the ternary plot sum equals 100
results['ro'] = (results['ro_raw'] / results['total_intensity']) * 100
results['yo'] = (results['yo_raw'] / results['total_intensity']) * 100
results['bo'] = (results['bo_raw'] / results['total_intensity']) * 100

# 5. Clean up: keep only the outputs you want
#results_out = results[['ro_raw', 'yo_raw', 'bo_raw']].copy()
results_out = results[['ro', 'yo', 'bo']].copy()

print(results_out)

In [None]:
#fig = px.scatter_ternary(results_out, a="ro_raw", b="yo_raw", c="bo_raw")
fig = px.scatter_ternary(results_out, a="ro", b="yo", c="bo")
fig.update_layout(title="Ternary Plot of Results")
fig.show()

We are getting a very congested display of our values when the wavelength values are normalized by total intensity. 

Let's see if there's another way of converting our channels. 

Here we are going to compare our max red, yellow and blue values with our middle point in the ternary diagram to see how the sensor responds to an increase in those values. 

This helps us understand how the wavelengths intensity changes relative to our center point.

In [None]:
yellow1 = results_ch.iloc[yellow_signal] - results_ch.iloc[10]
yellow2 = results_ch.iloc[yellow_signal] - results_ch.iloc[25]
yellow1, yellow2

Here we can see that ch550, ch583 and ch620 show the highest signals for max yellow. 

We also see that the signal is greatly reduced as we move along in our experiment

In [None]:
red1 = results_ch.iloc[red_signal] - results_ch.iloc[10]
red2 = results_ch.iloc[red_signal] - results_ch.iloc[25]
red1, red2

Red has moderate signals at ch583, and ch620 but definitely not as high as expected. 

Again we see a reduction in signal.

In [None]:
blue1 = results_ch.iloc[blue_signal] - results_ch.iloc[10]
blue2 = results_ch.iloc[blue_signal] - results_ch.iloc[25]
blue1, blue2

Max blue doesn't seem to show a high signal in the blue wavelengths (ch410, ch440, ch470) but rather is shown more as a decrease of red and yellow wavelengths. 

Even here we a reduction in overall signal.

In [None]:
results_relative_center1 = results_ch - results_ch.iloc[10]
results_relative_center1

In [None]:
results_relative_center1['ro_raw'] = results_relative_center1['ch620'] + results_relative_center1['ch670']

# Yellow: Green-Yellow to Amber wavelengths
results_relative_center1['yo_raw'] = results_relative_center1['ch510'] + results_relative_center1['ch550'] + results_relative_center1['ch583']

# Blue: Violet to Blue-Cyan wavelengths
results_relative_center1['bo_raw'] = results_relative_center1['ch410'] + results_relative_center1['ch440'] + results_relative_center1['ch470']

# 3. Calculate the total intensity for normalization
results_relative_center1['total_intensity'] = results_relative_center1['ro_raw'] + results_relative_center1['yo_raw'] + results_relative_center1['bo_raw']

# 4. Convert to percentages (ro, yo, bo)
# We multiply by 100 so the ternary plot sum equals 100
results_relative_center1['ro'] = (results_relative_center1['ro_raw'] / results_relative_center1['total_intensity']) * 300
results_relative_center1['yo'] = (results_relative_center1['yo_raw'] / results_relative_center1['total_intensity']) * 300
results_relative_center1['bo'] = (results_relative_center1['bo_raw'] / results_relative_center1['total_intensity']) * 300

# 5. Clean up: keep only the outputs you want
#results_out = results[['ro_raw', 'yo_raw', 'bo_raw']].copy()
results_center1 = results_relative_center1[['ro', 'yo', 'bo']].copy()

print(results_center1)

In [None]:
results_center1.iloc[10] = [1, 1, 1]

In [None]:
#fig = px.scatter_ternary(results_out, a="ro_raw", b="yo_raw", c="bo_raw")
fig = px.scatter_ternary(results_center1, a="ro", b="yo", c="bo")
fig.update_layout(title="Ternary Plot of Results")
fig.show()

In [None]:
results_relative_center2 = results_ch - results_ch.iloc[25]
results_relative_center2

In [None]:
results_relative_center2['ro_raw'] = results_relative_center2['ch620'] + results_relative_center2['ch670']

# Yellow: Green-Yellow to Amber wavelengths
results_relative_center2['yo_raw'] = results_relative_center2['ch510'] + results_relative_center2['ch550'] + results_relative_center2['ch583']

# Blue: Violet to Blue-Cyan wavelengths
results_relative_center2['bo_raw'] = results_relative_center2['ch410'] + results_relative_center2['ch440'] + results_relative_center2['ch470']

# 3. Calculate the total intensity for normalization
results_relative_center2['total_intensity'] = results_relative_center2['ro_raw'] + results_relative_center2['yo_raw'] + results_relative_center2['bo_raw']

# 4. Convert to percentages (ro, yo, bo)
# We multiply by 100 so the ternary plot sum equals 100
results_relative_center2['ro'] = (results_relative_center2['ro_raw'] / results_relative_center2['total_intensity']) * 300
results_relative_center2['yo'] = (results_relative_center2['yo_raw'] / results_relative_center2['total_intensity']) * 300
results_relative_center2['bo'] = (results_relative_center2['bo_raw'] / results_relative_center2['total_intensity']) * 300

# 5. Clean up: keep only the outputs you want
#results_out = results[['ro_raw', 'yo_raw', 'bo_raw']].copy()
results_center2 = results_relative_center2[['ro', 'yo', 'bo']].copy()

print(results_center2)

In [None]:
results_center2.iloc[25] = [1, 1, 1]

In [None]:
#fig = px.scatter_ternary(results_out, a="ro_raw", b="yo_raw", c="bo_raw")
fig = px.scatter_ternary(results_center2, a="ro", b="yo", c="bo")
fig.update_layout(title="Ternary Plot of Results")
fig.show()

We can see when we compare the results from the first center value that was the 11th sample and the second center value that was the 25th sample, there is a huge variation in the ternary plot. 

This highlights significant issues in precision with our data and its reliability. 

In the first ternary plot, we can also see the skew in the data towards the higher yellow concentration, since it is sampled after the high concentration yellow samples.

In the second plot we a slightly better spread, but does not mimick our response surface at all. 

There are several systemic errors at play. The first is the order of design of experiments with regard to the intensity of yellow recorded. The second is of the system itself. The Opentron sampling system used pipettes that we not disposed of or rinsed in between samples. The light sensor also was not calibrated for varying intensities of each dye. The light sensor was also not sampling in a controlled environment. The overhead lights of the room were on for some of the samples and depending on the activities in the room, this would affect the results. There was also no background correction. There was also no standard procedure for the dye solutions and given that they were uncapped for most of the week, the concentrations of the solutions may vary throughout the duration of the experiment due to sequence in the sampling rotation. 

Random errors include many factors such as temperature and humidity of the room during the experiment. The sampling procedure of the prior experiment would also affect the way the system operated during the experiment (i.e. if it was working properly, needed to be shutdown etc.). Any interactions by the staff scientist with the apparatus during the experiment would also affect the results. 