# Language Models Trained on Media Diets Can Predict Public Opinion

Chu et al. [Language Models Trained on Media Diets Can Predict Public Opinion](https://arxiv.org/abs/2303.16779) demonstrates that if you train language models on the media diets of different groups of people, you can predict their opinions on a variety of topics.

In [None]:
from llments.lm import empirical, hugging_face
from llments.distance.norm import L1Distance

Load the base language model.

In [None]:
base_lm = hugging_face.load_from_spec('base_lm_spec.json')

Load survey questions, and question/answer probabilities from each news audience.

In [None]:
news_source_names = ["cnn", "wsj", "fox", "npr"]
survey_questions = empirical.load_from_text_file(f'survey_questions.txt')
survey_answer_probs = {
    source_name: empirical.load_from_json_file(f'survey_answers_{source_name}.json')
    for source_name in news_source_names
}

Perform the experiment to find the correlation between empirical survey answers from each voter and the other voters.

In [None]:
# Define the distance we want to use
answer_distances = {}
distance_function = L1Distance()

for news_source in news_source_names:
    # Load the dataset (empirical distribution) for this source
    news_dataset = empirical.load_from_text_file(f'news_data_{news_source}.txt')
    # Fit the LM to the dataset
    news_lm = base_lm.fit(news_dataset)
    # Measure the distance between the empirical survey answers and actual survey answers
    for survey_source in news_source_names:
        answer_distances[news_source,survey_source] = distance_function(survey_answer_probs[news_source], news_lm)


Print the distances.

In [2]:
answer_distances = []