In [25]:
import dspy
import os
from dotenv import load_dotenv
load_dotenv()
lm = dspy.LM('openai/gpt-4o-mini', api_key=os.getenv('OPENAI_API_KEY'))
dspy.configure(lm=lm)
colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
dspy.settings.configure(rm=colbertv2_wiki17_abstracts)

In [2]:
lm("Say this is a test!", temperature=0.7)

['This is a test! How can I assist you today?']

In [3]:
qa = dspy.Predict('question: str -> response: str')
response = qa(question="what are high memory and low memory on linux?")

print(response.response)

In Linux, "high memory" and "low memory" refer to different regions of the system's memory address space, particularly in the context of 32-bit architectures.

- **Low Memory**: This typically refers to the memory that is directly accessible by the kernel. In a 32-bit system, this is usually the first 896 MB of RAM (from 0 to 896 MB). The kernel can directly map this memory, which allows for efficient access and management. Low memory is used for kernel data structures and for user processes that require memory allocation.

- **High Memory**: This refers to the memory above the low memory limit, which is not directly accessible by the kernel in a 32-bit system. This area is typically above 896 MB and requires special handling to access. The kernel uses a mechanism called "high memory management" to allow user processes to utilize this memory. When a process needs to access high memory, the kernel must map it into the kernel's address space temporarily.

In summary, low memory is direct

In [4]:
dspy.inspect_history(n=1)





[34m[2024-12-01T11:51:49.525695][0m

[31mSystem message:[0m

Your input fields are:
1. `question` (str)

Your output fields are:
1. `response` (str)

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## response ## ]]
{response}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `response`.


[31mUser message:[0m

[[ ## question ## ]]
what are high memory and low memory on linux?

Respond with the corresponding output fields, starting with the field `[[ ## response ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


[31mResponse:[0m

[32m[[ ## response ## ]]
In Linux, "high memory" and "low memory" refer to different regions of the system's memory address space, particularly in the context of 32-bit architectures.

- **Low Memory**: This typically refers to the memory that is directly accessib

In [5]:
cot = dspy.ChainOfThought('question -> response')
cot(question="should curly braces appear on their own line?")

Prediction(
    reasoning='The placement of curly braces on their own line is largely a matter of coding style and conventions. In some programming languages and style guides, such as those used in Java and C#, it is common to place opening curly braces on a new line to enhance readability and maintain a clear structure. This style is often referred to as "Allman style" or "BSD indent style." Conversely, other styles, like the "K&R style," place the opening brace on the same line as the control statement, which can save vertical space. Ultimately, the decision should be guided by the coding standards of the project or team you are working with.',
    response="Curly braces can appear on their own line depending on the coding style you choose to follow. Some styles prefer them on a new line for better readability, while others keep them on the same line. It's best to adhere to the conventions of your specific project or team."
)

In [21]:
class CausalRelation(dspy.Signature):
    """ Given a probable causal relation between two nodes, output the probability of the relation."""
    context: list[str] = dspy.InputField(desc="May contain relevant facts")
    causal_relation: str = dspy.InputField(desc="The causal relation between two nodes in a Markov equivalence class. Format: 'A triggers B'")
    probability: float = dspy.OutputField(desc="The probability (between 0 and 1) that the causal relation is true.")

class LLMExpert(dspy.Module):
    def __init__(self):
        self.causal_relation=dspy.ChainOfThought(CausalRelation)
        # retrieve the wiki knowledge based on the given causal relation
        self.retriever = dspy.Retrieve(k=3)
    
    def forward(self, causal_relation: str):
        context=self.retriever(causal_relation).passages
        response=self.causal_relation(context=context, causal_relation=causal_relation)
        return response
    
llm_expert = LLMExpert()
llm_expert(causal_relation="age triggers socioeconomic status")

Prediction(
    reasoning='The causal relation "age triggers socioeconomic status" suggests that as individuals age, their socioeconomic status may change. However, socioeconomic status is influenced by a variety of factors including education, occupation, and income, which are not solely determined by age. While age can play a role in the accumulation of wealth and experience, it is not a direct trigger for socioeconomic status. Other factors such as education level and job opportunities are more significant determinants. Therefore, the probability that age directly triggers socioeconomic status is low.',
    probability=0.2
)

In [10]:
causal_relation(causal_relation="neighbourhood type catalyzes socioeconomic status")

Prediction(
    reasoning='The relationship between neighbourhood type and socioeconomic status is supported by various studies that indicate that the characteristics of a neighbourhood, such as its resources, safety, and social networks, can significantly influence the economic opportunities and outcomes of its residents. For instance, affluent neighbourhoods often provide better access to quality education and employment opportunities, which can lead to higher socioeconomic status. Conversely, disadvantaged neighbourhoods may perpetuate cycles of poverty. Therefore, it is reasonable to conclude that neighbourhood type can indeed catalyze socioeconomic status.',
    probability=0.75
)

In [13]:
retriever = dspy.Retrieve(k=3)
retriever("age initiates socioeconomic status").passages

["Socioeconomic status | Socioeconomic status (SES) is an economic and sociological combined total measure of a person's work experience and of an individual's or family's economic and social position in relation to others, based on income, education, and occupation. When analyzing a family's SES, the household income, earners' education, and occupation are examined, as well as combined income, whereas for an individual's SES only their own attributes are assessed. However, SES is more commonly used to depict an economic difference in society as a whole.",
 'Survey of Health, Ageing and Retirement in Europe | The Survey of Health, Ageing and Retirement in Europe (SHARE) is a multidisciplinary and cross-national panel database of micro data on health, socio-economic status as well as social and family networks of more than 45,000 individuals aged 50 or over. As such, it responds to a Communication by the European Commission calling to "examine the possibility of establishing, in co-oper

In [6]:
from utils.language_models import gpt_call
import pandas as pd
codebook = pd.read_csv('codebooks/' + 'asia' + '.csv')
gpt_call(("smoke","lung"), codebook)

array([0.95, 0.  ])

In [7]:
import pickle
cache_file='dspy_cache.pickle'
# load the data from the cache file
with open(cache_file, 'rb') as f:
    data = pickle.load(f)
print(data)

{'\nsmoking cigarettes causes lung cancer\n': 0.95, '\nlung cancer causes smoking cigarettes\n': 0.0}
