We play around with different tasks revolving around causality in LLMs like GPT-3. Our goal is to measure the quality of its causal modelling capabilities in real-world tasks, toy problems and adversarial examples.
The final report can be found on the Alignment Forum
To reproduce the results
- set your OpenAI key as a variably by typing
export OPENAI_KEY="<insert_your_key_here>"
in your console before running your experiments. - (optional) check the playground notebooks to get a better feeling for the experiments.
- Run all of the experiment.py scripts to produce the results (Don't forget that running experiments costs money).
- Run the evaluation jupyter notebooks.
- (optional) run the analysis for report.ipynb to reproduce the exact figures of the report.
Please note that this is just a small side project and the code has not been optimized for efficiency or readability.