## yaml database
I have a yaml database with five example components: [YamlLib.yaml]

In [1]:
with open('YamlLib.yaml', 'r') as file:
    yaml_content = file.read()
print(yaml_content)

0:
    name: splitter_50_50
    description: |
        this is a balanced splitter.
        the splitting ratio is 50/50.
        the input optical power from either of the input ports splits into the two output ports with a 50/50 ratio.
        it is based on an MMI interferometer.
        it is designed for telecom c-band.
    number_of_input_ports: 2
    number_of_output_ports: 2
    x_y_size_µm: [15, 3]
    yaml_code: |
        component: mmi2x2
        settings:
            width_taper: 1
            length_taper: 5
            length_mmi: 5
            width_mmi: 2.5
            gap_mmi: 0.50

1:
    name: splitter_90_10
    description: |
        this is an unbalanced splitter.
        the splitting ratio is 90/10.
        the input optical power from either of the input ports splits into the two output ports with a 90/10 ratio.
        it is based on an MMI interferometer.
        it is designed for telecom c-band.
    number_of_input_ports: 2
    number_of_output_ports: 2
    

The objective is to select suitable items from this db according to a given prompt. For that, only name and description fields are helpful so i'll strip the rest.

In [2]:
import yaml
with open('YamlLib.yaml', 'r') as file:
    yaml_content = yaml.safe_load(file)

properties_to_remove = ['yaml_code', 'x_y_size_µm', 'number_of_output_ports', 'number_of_input_ports']
for item in yaml_content:
    for prop in properties_to_remove:
        yaml_content[item].pop(prop, None)

yaml_content = yaml.dump(yaml_content, default_flow_style=False, sort_keys=False)

print(yaml_content)

0:
  name: splitter_50_50
  description: 'this is a balanced splitter.

    the splitting ratio is 50/50.

    the input optical power from either of the input ports splits into the two output
    ports with a 50/50 ratio.

    it is based on an MMI interferometer.

    it is designed for telecom c-band.

    '
1:
  name: splitter_90_10
  description: 'this is an unbalanced splitter.

    the splitting ratio is 90/10.

    the input optical power from either of the input ports splits into the two output
    ports with a 90/10 ratio.

    it is based on an MMI interferometer.

    it is designed for telecom c-band.

    '
2:
  name: grating_coupler_elliptical
  description: 'this is a grating coupler.

    it has an ellicpical desgin.

    it is designed to couple light between the chip and single-mode c-band optical
    fibers, with flat polished facets.

    '
3:
  name: ring_resonator_single
  description: 'this is a single ring resonator coupled to a bus waveguide, only from
    one

## benchmark LLMs thru replicate

testing:
- meta/llama-2-70b-chat
- meta/llama-2-13b-chat
- meta/llama-2-7b-chat
- mistralai/mixtral-8x7b-instruct-v0.1
- mistralai/mistral-7b-instruct-v0.2


In [3]:
import os, sys
os.environ["REPLICATE_API_TOKEN"] = "??????????????????"
import replicate
import yaml

with open('YamlLib.yaml', 'r') as file:
    yaml_content = yaml.safe_load(file)

properties_to_remove = ['yaml_code', 'x_y_size_µm', 'number_of_output_ports', 'number_of_input_ports']
for item in yaml_content:
    for prop in properties_to_remove:
        yaml_content[item].pop(prop, None)  

yaml_content = yaml.dump(yaml_content, default_flow_style=False, sort_keys=False)

prompt = 	'instructions: You are a photonic chip layout developer. The following yaml data, \
			starts with [[ and ends with ]], lists all available 6 photonic items/components. \
			When your are asked to design or find suitable components, you can only choose \
			from the components included in this yaml list. In your answer, do not explain \
			details of the process, only provide a list items you find suitable, corresponding \
			to the prompt. \n[[' + yaml_content + ']]\n' + \
			'Prompt: '

# prompt += 'Extract suitable components for an unbalanced splitter' # A
prompt += 'Extract suitable components for a balanced splitter' # B
# prompt += 'Extract suitable components for a balanced splitter at telecom wavelengths' # C

# print(prompt)

for i in range(10):
    output = replicate.run(
        # "meta/llama-2-70b-chat",
        # "meta/llama-2-13b-chat",
        'meta/llama-2-7b-chat',
        # "mistralai/mixtral-8x7b-instruct-v0.1",
        # "mistralai/mistral-7b-instruct-v0.2",
        input={
            # few options can be called here but since the arguments vary between LLMs i am leaving all to default
            "prompt": prompt,
        }
    )
    print('\n#########\n######### run', i)
    print( "".join(output) )



#########
######### run 0
Based on the provided YAML data, suitable components for a balanced splitter are:

1. [[0]] - Splitter_50_50
2. [[5]] - Green_splitter_50_50

#########
######### run 1
Based on the YAML data provided, suitable components for a balanced splitter are:

1. Splitter_50_50
2. Green_splitter_50_50

#########
######### run 2
Based on the provided YAML data, suitable components for a balanced splitter are:

1. [[0]] - Splitter_50_50
2. [[5]] - Green_splitter_50_50

#########
######### run 3
Based on the given YAML data, suitable components for a balanced splitter are:

1. splitter_50_50
2. green_splitter_50_50

#########
######### run 4
Based on the given YAML data, suitable components for a balanced splitter are:

1. [[0]]: Splitter_50_50
2. [[4]]: Spiral

#########
######### run 5
Based on the provided YAML data, suitable components for a balanced splitter are:

1. [[0]] - Splitter_50_50
2. [[5]] - Green_splitter_50_50

#########
######### run 6
Based on the given 

## semantic similarity
using sentence_transformers

In [4]:
from sentence_transformers import SentenceTransformer, util
import sys
import yaml
from tabulate import tabulate
import numpy as np 
import warnings
warnings.filterwarnings("ignore")

with open('YamlLib.yaml', 'r') as file:
    yaml_content = yaml.safe_load(file)

components = []
for key, value in yaml_content.items():
    components.append( {'name': value['name'], 'description': value['description']} )

model = SentenceTransformer('paraphrase-albert-small-v2')

component_descriptions = [component['description'] for component in components]
component_embeddings = model.encode(component_descriptions, convert_to_tensor=True)


prompt = 'Extract suitable components for an unbalanced splitter' # A
# prompt = 'Extract suitable components for a balanced splitter' # B
# prompt = 'Extract suitable components for a balanced splitter at telecom wavelengths' # C

prompt_embedding = model.encode(prompt, convert_to_tensor=True)
cosine_scores = util.cos_sim(prompt_embedding, component_embeddings)

top_results = cosine_scores.topk(k=3) # returns top 3 scoring items

table_data = []
for score, idx in zip(top_results[0][0], top_results[1][0]):
    component_name = components[idx]['name']
    similarity_score = np.round(score.item(),4)
    table_data.append([component_name, similarity_score])
print()
print(tabulate(table_data, headers=["item", "score"], tablefmt="presto"))
print()



 item                 |   score
----------------------+---------
 splitter_90_10       |  0.5586
 splitter_50_50       |  0.5455
 green_splitter_50_50 |  0.5082

