## The `semantic_search` method

krixik's `semantic_search` method is a convenience function for both embedding and querying - and so can only be used with pipelines containing both `text-embedder` and `vector-db` modules in succession.

Below we construct the simplest custom pipeline that satisfies this criteria - a standard vector search pipeline consisting of three modules: a `parser`, `text-embedder`, and `vector-db` index.

### a simple semantic search pipeline

Below we construct the simplest custom pipeline that satisfies this criteria - a standard vector search pipeline consisting of three modules: a `parser`, `text-embedder`, and `vector-db` index.

In [None]:
# create a pipeline with multiple modules
pipeline = krixik.create_pipeline(name="vector-search-system-intro",
                                  module_chain=["parser", "text-embedder", "vector-db"])

In [None]:
reset_pipeline(pipeline)

### invoking the `semantic_search`  method 

We can now perform any of the core system methods on our custom pipeline (e.g., `.process`, `.list`, etc.,).  Additionally we can invoke the `semantic_search` method.

Lets first process a file with our new pipeline.  The `vector-db` module takes in a text file, and returns `faiss` vector database consisting of all non-trivial `(snippet, line_numbers)` tuples from the input.

In [None]:
# define path to an input file from examples directory
test_file = "../../data/input/1984_very_short.txt"

# process for search
process_output = pipeline.process(local_file_path = test_file,
                                  local_save_directory="../../data/output", # save output repo data output subdir
                                  expire_time=60 * 10,      # set all process data to expire in 10 minutes
                                  wait_for_process=True,    # wait for process to complete before regaining ide
                                  verbose=False)            # set verbosity to False

# nicely print the output of this process
json_print(process_output)

{
  "status_code": 200,
  "pipeline": "vector-search-pipeline-1",
  "request_id": "1a09068c-872a-4389-a399-7281e2d1764e",
  "file_id": "f69aac3d-e674-45d5-ab33-f16196ce82b2",
  "message": "SUCCESS - output fetched for file_id f69aac3d-e674-45d5-ab33-f16196ce82b2.Output saved to location(s) listed in process_output_files.",
  "process_output": null,
  "process_output_files": [
    "./f69aac3d-e674-45d5-ab33-f16196ce82b2.faiss"
  ]
}


Note that we did not define a `file_name` or `symbolic_directory_path` ourselves, so defaults will be given as described in the `.process` walkthrough [LINK HERE].

Here the `process_output` key value is `null` since the return object is a database.  We can see this database in the local location provided in the `process_output_files` value.

With `.process` complete we can run `semantic_search` on our input file. 

The `semantic_search` method takes in the exact same arguments as `.list` [LINK HERE] - that is `file_ids`, `file_names`, etc., - plus one additional argument: `query`.  The `query` is a string of words to be queried individually.

Let's look at an example.

In [None]:
# perform vector_search over the input file
vector_output = pipeline.semantic_search(query="it was cold night",
                                        file_ids=[process_output["file_id"]])

# nicely print the output of this process
json_print(vector_output)

{
  "status_code": 200,
  "request_id": "10503c1c-3959-4897-9315-a69438ecce2b",
  "message": "Successfully queried 1 user file.",
  "items": [
    {
      "file_id": "f69aac3d-e674-45d5-ab33-f16196ce82b2",
      "file_metadata": {
        "file_name": "krixik_generated_file_name_awiouirlff.txt",
        "symbolic_directory_path": "/etc",
        "file_tags": [],
        "num_vectors": 2,
        "created_at": "2024-04-26 21:10:50",
        "last_updated": "2024-04-26 21:10:50"
      },
      "search_results": [
        {
          "snippet": "It was a bright cold day in April, and the clocks were striking thirteen.",
          "line_numbers": [
            1
          ],
          "distance": 0.224
        },
        {
          "snippet": "Winston Smith, his chin nuzzled into his breast in an effort to escape the\nvile wind, slipped quickly through the glass doors of Victory Mansions,\nthough not quickly enough to prevent a swirl of gritty dust from entering\nalong with him.",
       

Here we can see one returned search result in `items`.

In [None]:
reset_pipeline(pipeline)