# Evaluation Module Tutorial



This tutorial demonstrates how to use the `ccp.Evaluation` class to analyze compressor performance by comparing operational data against reference performance curves.

The evaluation module is particularly useful for:
- Comparing actual compressor performance to design specifications
- Calculating efficiency deviations from expected performance

## Overview

The `Evaluation` class compares operational data points against reference impeller performance curves to calculate efficiency deviations and other performance metrics.

## Setup and Imports

First, let's import the necessary libraries and load our test data:

In [1]:
import pandas as pd
import ccp
from pathlib import Path

# Quantity shortcut
Q_ = ccp.Q_

# Path to test data
data_path = Path(ccp.__file__).parent / "tests/data"

In [2]:
# change pandas plot backend to plotly
pd.options.plotting.backend = "plotly"

## Loading Operational Data

Load the operational data that we want to evaluate:

In [3]:
# Load operational data from parquet file
df = pd.read_parquet(data_path / "data.parquet")

print(f"Loaded {len(df)} data points")
print("\nColumns in dataset:")
print(df.columns.tolist())
print("\nFirst few rows:")
df.head()

Loaded 30 data points

Columns in dataset:
['ps', 'Ts', 'pd', 'Td', 'delta_p', 'speed', 'flow_m', 'flow_v']

First few rows:


Unnamed: 0,ps,Ts,pd,Td,delta_p,speed,flow_m,flow_v
2023-04-04 11:30:00,5.289741,31.15196,6.043586,39.982773,110.686142,2858.396973,8.272493,1.245845
2023-04-04 20:15:00,4.525424,30.464081,7.522561,45.116444,891.66803,6441.448242,21.561627,3.795705
2023-04-04 20:45:00,5.124909,31.742037,5.871001,39.12587,107.371078,2852.725342,8.01002,1.248304
2023-04-04 20:52:30,4.850587,32.355854,4.923274,48.629524,1.418498,16.785841,0.90159,0.148893
2023-04-04 21:30:00,5.213386,31.804134,5.98125,41.777378,110.2071,2870.0,8.184707,1.253805


## Defining Fluid Compositions

We need to define two fluid compositions:
1. **Test fluid**: The fluid composition used when the reference curves were generated
2. **Operation fluid**: The actual fluid composition during operation

In [4]:
# Test fluid composition (used for reference curves)
fluid_a = {
    "methane": 58.976,
    "ethane": 3.099,
    "propane": 0.6,
    "n-butane": 0.08,
    "i-butane": 0.05,
    "n-pentane": 0.01,
    "i-pentane": 0.01,
    "n2": 0.55,
    "h2s": 0.02,
    "co2": 36.605,
}

# Operation fluid composition (actual operating conditions)
operation_fluid = {
    "methane": 44.04,
    "ethane": 3.18,
    "propane": 0.66,
    "n-butane": 0.15,
    "i-butane": 0.05,
    "n-pentane": 0.03,
    "i-pentane": 0.02,
    "n2": 0.25,
    "h2s": 0.06,
    "co2": 51.55,
}

print("Test fluid composition:")
for comp, frac in fluid_a.items():
    print(f"  {comp}: {frac:.3f}%")

print("\nOperation fluid composition:")
for comp, frac in operation_fluid.items():
    print(f"  {comp}: {frac:.3f}%")

Test fluid composition:
  methane: 58.976%
  ethane: 3.099%
  propane: 0.600%
  n-butane: 0.080%
  i-butane: 0.050%
  n-pentane: 0.010%
  i-pentane: 0.010%
  n2: 0.550%
  h2s: 0.020%
  co2: 36.605%

Operation fluid composition:
  methane: 44.040%
  ethane: 3.180%
  propane: 0.660%
  n-butane: 0.150%
  i-butane: 0.050%
  n-pentane: 0.030%
  i-pentane: 0.020%
  n2: 0.250%
  h2s: 0.060%
  co2: 51.550%


## Creating Reference Impeller

Now we'll create a reference impeller by loading performance curves from CSV files. These curves were digitized from performance charts:

In [5]:
# Create suction state for the reference curves
suc_a = ccp.State(
    p=Q_(4, "bar"),
    T=Q_(40, "degC"),
    fluid=fluid_a,
)

print(f"Reference suction state:")
print(f"  Pressure: {suc_a.p()}")
print(f"  Temperature: {suc_a.T()}")
print(f"  Density: {suc_a.rho():.2f}")
print(f"  Molar mass: {suc_a.molar_mass():.2f}")

Reference suction state:
  Pressure: 400000.0 pascal
  Temperature: 313.15 kelvin
  Density: 4.19 kilogram / meter ** 3
  Molar mass: 0.03 kilogram / mole


In [6]:
# Load impeller performance curves from CSV files
imp_a = ccp.Impeller.load_from_engauge_csv(
    suc=suc_a,
    curve_name="eval-lp-sec1-caso-a",
    curve_path=data_path,
    flow_units="m³/h",
    head_units="kJ/kg",
    number_of_points=4,  # Use 4 points for interpolation
)

print(f"Loaded impeller with {len(imp_a.points)} performance points")
print(f"Speed range: {min(p.speed.magnitude for p in imp_a.points):.0f} - {max(p.speed.magnitude for p in imp_a.points):.0f} RPM")

Loaded impeller with 8 performance points
Speed range: 927 - 1029 RPM


## Visualizing Reference Curves

Let's plot the reference performance curves to understand what we're comparing against:

In [7]:
# Plot the reference curves
imp_a.head_plot()

## Creating the Evaluation

Now we can create the `Evaluation` object that will compare our operational data against the reference curves:

In [8]:
# Create evaluation object
evaluation = ccp.Evaluation(
    data=df,
    operation_fluid=operation_fluid,
    data_units={
        "ps": "bar",        # Suction pressure
        "Ts": "degC",       # Suction temperature
        "pd": "bar",        # Discharge pressure
        "Td": "degC",       # Discharge temperature
        "flow_v": "m³/s",   # Volumetric flow rate
        "speed": "RPM",     # Rotational speed
    },
    impellers=[imp_a],      # Reference impeller(s)
    n_clusters=2,           # Number of clusters for data analysis
)

print(f"Evaluation created with {len(evaluation.df)} data points")
print(f"Average efficiency deviation: {evaluation.df['delta_eff'].mean():.2f}%")

Converting curves


  0%|          | 0/2 [00:00<?, ?it/s]

Calculating points...


0it [00:00, ?it/s]

Calculating expected points...


0it [00:00, ?it/s]

Evaluation created with 2 data points
Average efficiency deviation: 11.27%


## Understanding the Results

The evaluation calculates several key metrics for each data point:

In [9]:
# Results are available in the `evaluation.df` DataFrame
evaluation.df

Unnamed: 0,ps,Ts,pd,Td,delta_p,speed,flow_m,flow_v,valid,v_s,...,p_disch,expected_eff,expected_head,expected_power,expected_p_disch,delta_eff,delta_head,delta_power,delta_p_disch,timescale
2023-04-05 02:30:00,3.807444,24.541362,16.192327,138.779867,1256.971639,9062.537109,23.593979,4.847396,True,0.20545,...,16.192327,0.826488,151553.975693,4326463.0,18.416955,11.426007,-11.621656,-22.35579,-12.079239,0
2023-04-05 03:07:30,3.788713,24.206667,16.086901,138.611476,1257.790161,9063.27474,23.555144,4.857905,True,0.206237,...,16.086901,0.826538,151480.630354,4333480.0,18.405535,11.117314,-11.738841,-22.49968,-12.597482,1


### Key Metrics Explained:

- **`expected_eff`**: Expected efficiency from reference curve
- **`eff`**: Calculated efficiency for the operational point
- **`delta_eff`**: Efficiency deviation from reference curve (positive = better than expected)

The same results are available for head, power and discharge pressure.

## Analyzing Performance by Cluster

The evaluation automatically clusters the data to identify different operating regimes:

In [10]:
# Analyze performance by cluster
if 'cluster' in evaluation.df.columns:
    cluster_analysis = evaluation.df.groupby('cluster')['delta_eff'].agg(['count', 'mean', 'std'])
    cluster_analysis.columns = ['Count', 'Mean Δeff (%)', 'Std Δeff (%)']
    
    print("Performance by cluster:")
    print(cluster_analysis.round(2))
else:
    print("Cluster information not available in results")

Performance by cluster:
         Count  Mean Δeff (%)  Std Δeff (%)
cluster                                    
0            1          11.43           NaN
1            1          11.12           NaN


## Visualizing Results

Let's create some visualizations to better understand the evaluation results:

In [11]:
evaluation.df["delta_eff"].plot()

## Working with Flow Orifice Data

The evaluation module also supports flow measurement via orifice plates. Here's an example using differential pressure data:

In [12]:
# Load data with differential pressure measurements
df_delta_p = pd.read_parquet(data_path / "data_delta_p.parquet")

print(f"Loaded {len(df_delta_p)} data points with differential pressure")
print("Columns:", df_delta_p.columns.tolist())

Loaded 30 data points with differential pressure
Columns: ['ps', 'Ts', 'pd', 'Td', 'delta_p', 'speed', 'p_downstream']


In [13]:
# Create evaluation with flow orifice parameters
evaluation_orifice = ccp.Evaluation(
    data=df_delta_p,
    operation_fluid=operation_fluid,
    data_units={
        "ps": "bar",
        "Ts": "degC",
        "pd": "bar",
        "Td": "degC",
        "delta_p": "mmH2O",        # Differential pressure across orifice
        "p_downstream": "bar",     # Downstream pressure
        "speed": "RPM",
    },
    impellers=[imp_a],
    D=Q_(0.590550, "m"),           # Pipe diameter
    d=Q_(0.366130, "m"),           # Orifice diameter
    tappings="flange",             # Pressure tapping type
    n_clusters=2,
)

print(f"Orifice evaluation delta_eff: {evaluation_orifice.df['delta_eff'].mean():.2f}%")

Converting curves


  0%|          | 0/2 [00:00<?, ?it/s]

Calculating points...


0it [00:00, ?it/s]

Calculating expected points...


0it [00:00, ?it/s]

Orifice evaluation delta_eff: 11.33%


## Saving and Loading Evaluations

Evaluation results can be saved and loaded for later analysis:

In [14]:
from tempfile import NamedTemporaryFile
import os

# Save evaluation to file
with NamedTemporaryFile(suffix='.ccp_eval', delete=False) as tmp_file:
    tmp_path = tmp_file.name

evaluation.save(tmp_path)
print(f"Evaluation saved to: {tmp_path}")

# Load evaluation from file
loaded_evaluation = ccp.Evaluation.load(tmp_path)
print(f"Loaded evaluation delta_eff: {loaded_evaluation.df['delta_eff'].mean():.2f}%")

# Clean up temporary file
os.unlink(tmp_path)

Evaluation saved to: /tmp/tmpapo9dwi_.ccp_eval
Loaded evaluation delta_eff: 11.27%


## Advanced Usage: Manual Point Calculation

For more control, you can calculate evaluation points manually:

In [15]:
# Create evaluation without automatic calculation
evaluation_manual = ccp.Evaluation(
    data=df,
    operation_fluid=operation_fluid,
    data_units={
        "ps": "bar",
        "Ts": "degC",
        "pd": "bar",
        "Td": "degC",
        "flow_v": "m³/s",
        "speed": "RPM",
    },
    impellers=[imp_a],
    calculate_points=False,  # Don't calculate automatically
    n_clusters=2,
)

# Calculate points for a subset of data
subset_results = evaluation_manual.calculate_points(
    df,
    drop_invalid_values=True  # Remove invalid points
)

print(f"Manual calculation: {len(subset_results)} valid points")
print(f"Average deviation: {subset_results['delta_eff'].mean():.2f}%")

Converting curves


  0%|          | 0/2 [00:00<?, ?it/s]

Calculating points...


0it [00:00, ?it/s]

Calculating expected points...


0it [00:00, ?it/s]

Manual calculation: 2 valid points
Average deviation: 11.28%
