**Threshold Exceedance**

In [2]:
import pandas as pd
import numpy as np
from openai import OpenAI

file = "/content/PdM_telemetry - machine1 volt.csv"
data = pd.read_csv(file)

voltages = data['volt']
sizes = [50, 100, 150, 200]
results = {}

for size in sizes:
    subset = voltages[:size]
    threshold_95 = np.percentile(subset, 95)
    exceed_count = (subset > threshold_95).sum()
    results[size] = {'95th Percentile': threshold_95, 'Exceed Count': exceed_count}

results_df = pd.DataFrame.from_dict(results, orient='index')

client = OpenAI(api_key="DeepSeek API Key", base_url="https://api.deepseek.com")

summary_prompt = f"Here are the results of a 95th percentile analysis of voltage readings:\n{results_df.to_string()}\n\nProvide an interpretation of these results."

response = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=[
        {"role": "system", "content": "You are a data analysis assistant"},
        {"role": "user", "content": summary_prompt},
    ],
    stream=False
)
print(response.choices[0].message.content)

The results of the 95th percentile analysis of voltage readings can be interpreted as follows:

### Key Observations:
1. **95th Percentile Trend**:
   - The 95th percentile voltage **increases with larger sample sizes** (from ~193.14 V at n=50 to ~197.16 V at n=200). This suggests:
     - Higher voltage readings are observed as more data is collected, potentially indicating occasional voltage spikes or non-stationary behavior (e.g., increasing voltage over time or under specific conditions).
     - The system may experience variability or transient events that become more apparent in larger datasets.

2. **Exceed Count Validation**:
   - The "exceed count" (number of readings above the 95th percentile) aligns with expectations for a 95th percentile metric:
     - For n=50: 5% of 50 = 2.5 → **3 exceeds** (rounded up).
     - For n=100: 5% of 100 = **5 exceeds**.
     - For n=150: 5% of 150 = 7.5 → **8 exceeds** (rounded up).
     - For n=200: 5% of 200 = **10 exceeds**.
   - This consis

**Slope Calculation**

In [3]:
client = OpenAI(api_key="DeepSeek API Key", base_url="https://api.deepseek.com")

y = np.array([
    120.2961212, 109.2080162, 98.9537533, 87.3951084, 78.5215728, 71.5487645,
    62.1327063, 59.215158, 47.8672713, 52.1641521, 21.27311897, 20.9583709,
    4.62114204, 0.50534558, -5.4009138, -5.7863442, -28.43758437, -24.2008175,
    -54.34674664, -39.1959767, -55.95967942, -73.8228554, -60.4839082,
    -76.2550529, -74.1188103, -105.3065618, -113.2859763, -107.1281457,
    -114.2962236, -116.9062917, -133.9619619, -144.1898013, -137.0047106,
    -151.1097519, -154.4189764, -185.8445145, -184.4541504, -193.7439903,
    -210.52955, -206.1784129, -193.2500736, -215.9068514, -230.2125345,
    -228.7670146, -250.2199457, -266.1070973, -262.9228339, -288.4576993,
    -269.12947, -283.5241858, -294.5584899, -307.9799177, -312.8651144,
    -344.1481041, -321.5574975, -329.4373664, -328.4464111, -350.2971704,
    -369.5624025, -387.273018, -382.7558586, -390.9705394, -389.1991072,
    -409.0580274, -413.6738399, -398.5579103, -412.2268511, -442.9758933,
    -439.7998292, -438.252361, -446.1181744, -454.7550086, -469.7885275,
    -481.0593032, -475.5254844, -496.8041008, -500.8903476, -515.2757517,
    -507.2368661, -531.9039554, -537.5425866, -549.0310138, -537.730355,
    -564.0519484, -563.6181999, -576.6469249, -561.3313691, -583.8259648,
    -594.4479037, -615.3990507, -615.299238, -629.5286155, -640.9007857,
    -678.2662762, -663.919456, -643.4235936, -679.0742379, -658.7583244,
    -676.3862704, -687.8975981
])

x = np.arange(len(y))

data_string = f"x = {list(x)}\ny = {list(y)}\n"

response = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=[
        {"role": "system", "content": "You are a data analysis assistant"},
        {"role": "user", "content": f"Given the following dataset:\n{data_string}\nCompute the slope of the best-fit line using linear regression."},
    ],
    stream=False
)
print(response.choices[0].message.content) # takes 12 min

The slope of the best-fit line using linear regression is approximately **-8.076**.

**Step-by-Step Explanation:**

1. **Identify the formula for the slope (m) in linear regression:**
   
   \[
   m = \frac{N \sum (xy) - \sum x \sum y}{N \sum x^2 - (\sum x)^2}
   \]

2. **Compute required sums:**
   - \( N = 100 \) (number of data points).
   - \( \sum x = 4950 \) (sum of integers from 0 to 99).
   - \( \sum y \approx -29,010.24 \) (sum of all y-values).
   - \( \sum x^2 = 328,350 \) (sum of squares from 0 to 99).
   - \( \sum xy \approx -2,109,006.33 \) (sum of products of corresponding x and y values).

3. **Calculate the numerator:**
   
   \[
   N \sum xy - \sum x \sum y = 100(-2,109,006.33) - 4950(-29,010.24) = -67,299,945
   \]

4. **Calculate the denominator:**
   
   \[
   N \sum x^2 - (\sum x)^2 = 100(328,350) - (4950)^2 = 8,332,500
   \]

5. **Divide numerator by denominator to find the slope:**
   
   \[
   m = \frac{-67,299,945}{8,332,500} \approx -8.076
   \]

**Final Answ

In [5]:
client = OpenAI(api_key="DeepSeek API Key", base_url="https://api.deepseek.com")

y2 = np.array([
    113.972021, 96.8425417, 79.66548259, 116.8825082, 117.9882828, 103.6591404,
    124.3820187, 113.7961487, 122.8226025, 120.4973816, 119.2478919, 113.572426,
    106.0056189, 111.2260341, 112.9426683, 109.1883457, 118.4875615, 119.5830766,
    124.4114252, 107.273729, 126.8954537, 126.2115186, 132.4562815, 128.2934688,
    152.3993581, 144.2275388, 127.1433646, 154.9712733, 135.1302049, 144.263522,
    153.6343992, 153.1753224, 153.5606369, 145.390646, 168.5879094, 157.0493412,
    185.7548808, 156.2630919, 145.4990369, 178.8320882, 146.7587357, 151.9250376,
    177.7076008, 183.0418563, 169.9980566, 163.786648, 169.3729645, 169.7193797,
    174.7429577, 180.3129633, 175.7040688, 166.7623875, 180.9558295, 185.098228,
    176.6577398, 206.1803802, 178.9754782, 176.8255279, 200.1317998, 201.0369923,
    188.551254, 185.213703, 194.5855167, 203.2992118, 211.2455342, 194.2414645,
    209.3108035, 226.1038455, 223.9243755, 222.7375063, 218.9648207, 219.8055892,
    212.469638, 238.0222394, 215.162087, 233.9814363, 215.6033942, 216.9381807,
    224.7386822, 213.5673761, 238.160055, 212.821627, 220.1293233, 212.9667076,
    243.2293706, 240.9306931, 240.8027919, 233.8294231, 227.2869887, 230.8675768,
    232.1003134, 240.8508533, 238.5507094, 248.8676498, 241.8583785, 247.2165277,
    263.1905395, 255.8322099, 261.9058438, 281.2687446
])
x2 = np.arange(len(y))

data_string_2 = f"x = {list(x2)}\ny = {list(y2)}\n"

response = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=[
        {"role": "system", "content": "You are a data analysis assistant"},
        {"role": "user", "content": f"Given the following dataset:\n{data_string_2}\nCompute the slope of the best-fit line using linear regression."},
    ],
    stream=False
)
print(response.choices[0].message.content) # takes 7 min

The slope of the best-fit line using linear regression is calculated as follows:

1. **Compute the necessary sums:**
   - \( N = 100 \)
   - \( \sum x = 4950 \)
   - \( \sum y \approx 17,839.01386 \)
   - \( \sum xy \approx 1,017,068.9917 \)
   - \( \sum x^2 = 328,350 \)

2. **Apply the slope formula:**
   \[
   \text{slope} = \frac{N \sum xy - \sum x \sum y}{N \sum x^2 - (\sum x)^2}
   \]
   - Numerator: \( 100 \times 1,017,068.9917 - 4950 \times 17,839.01386 \approx 13,403,780.563 \)
   - Denominator: \( 100 \times 328,350 - 4950^2 = 8,332,500 \)
   - Slope: \( \frac{13,403,780.563}{8,332,500} \approx 1.6086 \)

**Final Answer:**  
The slope of the best-fit line is approximately \(\boxed{1.6086}\).


In [7]:
np.random.seed(42)

file_path = "/content/PdM_telemetry - machine1 pressure.csv"
dataset = pd.read_csv(file_path)

dataset['time_marker'] = pd.to_datetime(dataset['datetime'])
dataset = dataset.sort_values('time_marker')

time_steps = np.arange(len(dataset))
sensor_readings = dataset['pressure'].values

growth_rate = 1.5
fluctuation_level = 1.8
cyclic_variation = 7
cycle_duration = 336

trend_effect = growth_rate * time_steps
random_variation = np.random.normal(0, fluctuation_level, len(dataset))  # Fixed random variation
cyclic_effect = cyclic_variation * np.sin(2 * np.pi * time_steps / cycle_duration)

dataset['modified_sensor_readings'] = sensor_readings + trend_effect + random_variation + cyclic_effect

X_subset = time_steps[100:200].tolist()
y_subset = dataset['modified_sensor_readings'][100:200].tolist()
data_string_3 = f"X = {X_subset}\nY = {y_subset}\n"

client = OpenAI(api_key="DeepSeek API Key", base_url="https://api.deepseek.com")

response = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=[
        {"role": "system", "content": "You are a data analysis assistant"},
        {"role": "user", "content": f"Given the following dataset:\n{data_string_3}\nCompute the slope of the best-fit line using linear regression."}
    ],
    stream=False
)

print(response.choices[0].message.content) # takes 6 min, wrong value -> 1.5 vs. ground truth 1.389

To compute the slope of the best-fit line using linear regression, we use the formula:

\[
\text{slope} = \frac{n \sum (xy) - \sum x \sum y}{n \sum x^2 - (\sum x)^2}
\]

**Given Data:**
- \( X = [100, 101, \ldots, 199] \) (100 values)
- \( Y \) is provided as a list of 100 values.

**Step-by-Step Calculations:**
1. **Sum of X (\( \sum x \)):**
   \[
   \sum x = \frac{(100 + 199) \times 100}{2} = 14,950
   \]

2. **Sum of Y (\( \sum y \)):**
   \[
   \sum y = 32,722.291 \quad (\text{calculated by summing all Y values})
   \]

3. **Sum of \( X^2 \) (\( \sum x^2 \)):**
   \[
   \sum x^2 = 2,318,350 \quad (\text{calculated using the sum of squares formula for the sequence})
   \]

4. **Sum of \( XY \) (\( \sum xy \)):**
   \[
   \sum xy = 4,894,123.78 \quad (\text{calculated by summing each } x_i y_i)
   \]

**Plugging into the Formula:**
\[
\text{slope} = \frac{100 \times 4,894,123.78 - 14,950 \times 32,722.291}{100 \times 2,318,350 - (14,950)^2}
\]
\[
\text{Numerator} = 489,412,378 - 489

In [13]:
from google.colab import files

uploaded = files.upload()
file_name = list(uploaded.keys())[0]

with open(file_name, 'r', encoding='utf-8') as file:
    descriptions = file.readlines()
descriptions = [desc.strip() for desc in descriptions if desc.strip()]

client = OpenAI(api_key="DeepSeek API Key", base_url="https://api.deepseek.com")

def classify_maintenance(description):
    prompt = (
        "You are asked to classify the following maintenance record as either 'pass' or 'completed':\n\n"
        f"Description: {description}\n\n"
        "A 'pass' record means that there is no issue for the machine and thus no maintenance is required.\n"
        "A 'completed' record means there are anomalies or issues for the machine and thus maintenance is required.\n"
        "Return only 'pass' or 'completed' as the response."
    )

    response = client.chat.completions.create(
        model="deepseek-reasoner",
        messages=[{"role": "user", "content": prompt}],
        stream=False
    )

    return response.choices[0].message.content.strip()

results = {"Description": [], "Classification": []}

for desc in descriptions:
    classification = classify_maintenance(desc)
    results["Description"].append(desc)
    results["Classification"].append(classification)

results_df = pd.DataFrame(results) # takes 23 min

Saving short.txt to short.txt


In [19]:
results_df

Unnamed: 0,Description,Classification
91,"R091,M08,2024-12-13,Gearbox realigned after de...",completed
92,"R092,M05,2024-06-06,Air filter replaced after ...",completed
93,"R093,M07,2025-01-22,Faulty relay switch replac...",completed
94,"R094,M03,2024-01-27,Pump seals replaced to res...",completed
95,"R095,M03,2025-02-05,System recalibrated follow...",completed
96,"R096,M01,2024-02-20,Faulty relay switch replac...",completed
97,"R097,M10,2024-11-01,Air filter replaced after ...",completed
98,"R098,M04,2024-09-08,Technicians replaced a wor...",completed
99,"R099,M08,2024-07-16,System recalibrated follow...",completed
100,"R100,M06,2024-09-01,Technicians replaced a wor...",completed


In [20]:
uploaded = files.upload()
file_name = list(uploaded.keys())[0]

with open(file_name, 'r', encoding='utf-8') as file:
    descriptions = file.readlines()
descriptions = [desc.strip() for desc in descriptions if desc.strip()]

client = OpenAI(api_key="DeepSeek API Key", base_url="https://api.deepseek.com")

def classify_maintenance(description):
    prompt = (
        "You are asked to classify the following maintenance record as either 'pass' or 'completed':\n\n"
        f"Description: {description}\n\n"
        "A 'pass' record means that there is no issue for the machine and thus no maintenance is required.\n"
        "A 'completed' record means there are anomalies or issues for the machine and thus maintenance is required.\n"
        "Return only 'pass' or 'completed' as the response."
    )

    response = client.chat.completions.create(
        model="deepseek-reasoner",
        messages=[{"role": "user", "content": prompt}],
        stream=False
    )

    return response.choices[0].message.content.strip()

results = {"Description": [], "Classification": []}

for desc in descriptions:
    classification = classify_maintenance(desc)
    results["Description"].append(desc)
    results["Classification"].append(classification)

results_df = pd.DataFrame(results) # takes 23 min

Saving detailed.txt to detailed.txt


In [21]:
results_df

Unnamed: 0,Description,Classification
0,"Record ID,Machine ID,Date,Description,Status",pass
1,"R001,M01,2025-01-26,""Safety checks completed; ...",pass
2,"R002,M05,2024-12-13,Hydraulic pressure levels ...",pass
3,"R003,M06,2024-12-10,""Safety checks completed; ...",pass
4,"R004,M09,2025-01-25,""Routine lubrication perfo...",pass
...,...,...
96,"R096,M03,2025-01-09,""Pump seals replaced to re...",completed
97,"R097,M03,2024-06-27,""Electrical panel wiring i...",completed
98,"R098,M09,2024-03-22,""Electrical panel wiring i...",completed
99,"R099,M03,2024-04-28,""Gearbox realigned after d...",completed


**LangChain**

In [24]:
!pip uninstall -y langchain langchain-community langchain-core
!pip install --upgrade langchain langchain-community langchain-core openai

Found existing installation: langchain 0.3.20
Uninstalling langchain-0.3.20:
  Successfully uninstalled langchain-0.3.20
[0mFound existing installation: langchain-core 0.3.43
Uninstalling langchain-core-0.3.43:
  Successfully uninstalled langchain-core-0.3.43
Collecting langchain
  Downloading langchain-0.3.20-py3-none-any.whl.metadata (7.7 kB)
Collecting langchain-community
  Downloading langchain_community-0.3.19-py3-none-any.whl.metadata (2.4 kB)
Collecting langchain-core
  Downloading langchain_core-0.3.44-py3-none-any.whl.metadata (5.9 kB)
Collecting openai
  Downloading openai-1.66.2-py3-none-any.whl.metadata (25 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.8.1-py3-none-any.whl.metadata (3.5 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0

In [26]:
from langchain_community.chat_models import ChatOpenAI
from langchain_core.messages import HumanMessage

deepseek_llm = ChatOpenAI(
    model="deepseek-reasoner",
    openai_api_key="DeepSeek API Key",
    openai_api_base="https://api.deepseek.com"
)

response = deepseek_llm([HumanMessage(content="What is LangChain?")])
print(response.content)

  deepseek_llm = ChatOpenAI(
  response = deepseek_llm([HumanMessage(content="What is LangChain?")])


LangChain is a framework designed to streamline the development of applications using large language models (LLMs) by providing modular components and abstractions. Here's a structured breakdown of its key aspects:

1. **Core Purpose**: 
   - Facilitates integration of LLMs (like GPT, Claude) into applications by managing complexities such as context retention, data retrieval, and multi-step workflows.

2. **Key Components**:
   - **Models**: Supports various LLMs and embedding models, allowing easy swapping between providers (OpenAI, Anthropic, etc.).
   - **Prompts**: Offers prompt templates for dynamic input structuring and optimization techniques like few-shot learning.
   - **Memory**: Enables short-term (conversation history) and long-term (database-stored) context retention for coherent interactions.
   - **Chains**: Predefined sequences that combine LLM calls, prompts, and tools for tasks like summarization or Q&A. Custom chains can be created for specific workflows.
   - **Age