# CommunityLM

This is a replication of the experiments from [CommunityLM](https://arxiv.org/abs/2209.07065) (Jiang et al. 2022), which probes partisan worldviews from language models, based on the [original repo](https://github.com/hjian42/communitylm).

Running all the experiments on a single GPU takes about 3-4 hours.

Before running the notebook, please install requirements and download the data.
```bash
pip install -r requirements.txt
bash download_data.sh
```

In [4]:
from llments.lm.base.hugging_face import HuggingFaceLM, HuggingFaceLMFitter
# from llments.lm.base.empirical import load_from_text_file
from llments.eval.sentiment import HuggingFaceSentimentEvaluator
import pandas as pd
import numpy as np
from examples.community_lm.community_lm_constants import politician_feelings, groups_feelings, anes_df
from examples.community_lm.community_lm_utils import generate_community_opinion, compute_group_stance

device = 'cuda'  # change to 'mps' if you have a mac, or 'cuda:0' if you have an NVIDIA GPU 

## Train a CommunityLM model (optional)

The CommunityLM paper has released their pre-trained models on Hugging Face, so for the purpose of this notebook, we will use the pre-trained models. However, if you want to train a CommunityLM model from scratch, you can download training data to `data/{democrat,republican}-tweets.txt`, uncomment the following lines, and replace the `lm_name` variable in the following cell with `./data/{party}-twitter-gpt2`.

In [None]:
# base_model = HuggingFaceLM("gpt2", device=device)
# for party in ['democrat', 'republican']:
#     dataset = load_from_text_file(f"data/{party}-tweets.txt")
#     fit_model = HuggingFaceLMFitter.fit(base_model, dataset, output_dir=f"data/{party}-twitter-gpt2")

## Generate Opinions using CommunityLM

The following code generates opinions using CommunityLM.

In [None]:
for run in range(1, 6):
    for party in ['democrat', 'republican']:
        # This uses the pre-trained communitylm, but you can uncomment if you trained your own model
        lm_name = f'CommunityLM/{party}-twitter-gpt2'
        # lm_name = f'./data/{party}-twitter-gpt2'
        lm = HuggingFaceLM(lm_name, device=device)
        for prompt_option in ['Prompt1', 'Prompt2', 'Prompt3', 'Prompt4']:
            print(f'generating {party} opinion for {prompt_option} run {run}...')
            output_path = f'output/CommunityLM_{party}-twitter-gpt2/run_{run}'
            generate_community_opinion(lm, prompt_option, output_path, run)

## Perform Group-level Sentiment Analysis

In [None]:
evaluator = HuggingFaceSentimentEvaluator(
    "cardiffnlp/twitter-roberta-base-sentiment-latest",
    device=device
)
for party in ['democrat', 'republican']:
    compute_group_stance(
        evaluator=evaluator,
        data_folder=f'output/CommunityLM_{party}-twitter-gpt2',
        output_filename=f'output/CommunityLM_{party}-twitter-gpt2/stance_prediction.csv',
    )

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Processing run_1 ...


Processing questions:  33%|███▎      | 10/30 [00:37<01:22,  4.14s/it]--- Logging error ---
Traceback (most recent call last):
  File "/home/mihirban/miniconda3/lib/python3.11/logging/__init__.py", line 1110, in emit
    msg = self.format(record)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/mihirban/miniconda3/lib/python3.11/logging/__init__.py", line 953, in format
    return fmt.format(record)
           ^^^^^^^^^^^^^^^^^^
  File "/home/mihirban/miniconda3/lib/python3.11/logging/__init__.py", line 687, in format
    record.message = record.getMessage()
                     ^^^^^^^^^^^^^^^^^^^
  File "/home/mihirban/miniconda3/lib/python3.11/logging/__init__.py", line 377, in getMessage
    msg = msg % self.args
          ~~~~^~~~~~~~~~~
TypeError: not all arguments converted during string formatting
Call stack:
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/mihirban/miniconda3/lib/python3.11/site-packages/ipykernel

Processing run_2 ...


Processing questions: 100%|██████████| 30/30 [02:04<00:00,  4.16s/it]
Processing questions: 100%|██████████| 30/30 [02:05<00:00,  4.17s/it]
Processing questions: 100%|██████████| 30/30 [02:05<00:00,  4.18s/it]
Processing questions: 100%|██████████| 30/30 [02:05<00:00,  4.18s/it]


Processing run_3 ...


Processing questions: 100%|██████████| 30/30 [02:05<00:00,  4.17s/it]
Processing questions: 100%|██████████| 30/30 [02:05<00:00,  4.17s/it]
Processing questions: 100%|██████████| 30/30 [02:05<00:00,  4.17s/it]
Processing questions: 100%|██████████| 30/30 [02:04<00:00,  4.17s/it]


Processing run_4 ...


Processing questions: 100%|██████████| 30/30 [02:05<00:00,  4.17s/it]
Processing questions: 100%|██████████| 30/30 [02:05<00:00,  4.17s/it]
Processing questions: 100%|██████████| 30/30 [02:05<00:00,  4.17s/it]
Processing questions: 100%|██████████| 30/30 [02:05<00:00,  4.18s/it]


Processing run_5 ...


Processing questions: 100%|██████████| 30/30 [02:05<00:00,  4.17s/it]
Processing questions:  13%|█▎        | 4/30 [00:16<01:48,  4.18s/it]

In [None]:
df_dem = pd.read_csv("output/CommunityLM_democrat-twitter-gpt2/stance_prediction.csv")
df_repub = pd.read_csv("output/CommunityLM_republican-twitter-gpt2/stance_prediction.csv")

## Preparing ANES2020 Questions

This is data from the American National Election Study (ANES)

Website: https://electionstudies.org/
Email:   anes@electionstudies.org


In [None]:
df = pd.read_csv("data/anes_pilot_2020ets_csv.csv")

print(f"Number of Rows Total {df.shape}")

# only look self identified partisans 2144/3080. 1: Republican; 2: Democrat
df = df[df.pid1r < 3]
df.pid1r = df.pid1r.map({1: "Republican", 2: "Democrat"})
print(f"Number of Rows for Partisans {df.shape}")

In [None]:
# 999 stands for missing values and 'pid1r' is the partisanship
df_politician_results = df[['pid1r']+politician_feelings+groups_feelings].replace(999, np.nan).groupby("pid1r").mean().T
df_politician_results['is_repub_leading'] = (df_politician_results.Republican > df_politician_results.Democrat)
# df_politician_results


In [None]:
df_politician_results['Prompt1'] = anes_df['Prompt1'].to_list()
df_politician_results['Prompt2'] = anes_df['Prompt2'].to_list()
df_politician_results['Prompt3'] = anes_df['Prompt3'].to_list()
df_politician_results['Prompt4'] = anes_df['Prompt4'].to_list()

df_politician_results['pid'] = df_politician_results.index
df_politician_results.to_csv("output/anes2020_pilot_prompt_probing_ft.csv", index=False)
# df_politician_results

In [None]:
df_politician_results['diff'] = (df_politician_results.Democrat-df_politician_results.Republican).apply(abs)
df_politician_results.sort_values(by=['diff'])

## Evaluate fine-tuned GPT-2 CommunityLM models

This evaluates the sentiment of the completions generated by each model according to a sentiment classification model.

In [None]:
import pandas as pd
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_recall_fscore_support

def compute_scores(df_anes, df_dem, df_repub):
    df_repub['prediction'] = (df_repub['group_sentiment'] > df_dem['group_sentiment'])

    gold_labels = df_anes.is_repub_leading.astype(int).values
    rows = []
    for run in range(1, 6):
        run = "run_{}".format(run)
        for prompt_format in range(1, 5):
            prompt_format = "Prompt{}".format(prompt_format)
            df_ = df_repub[(df_repub.run == run) & (df_repub.prompt_format == prompt_format)]
            pred_labels = df_.prediction.astype(int).values
            acc = accuracy_score(gold_labels, pred_labels) 
            p, r, f1, _ = precision_recall_fscore_support(gold_labels, pred_labels, average='weighted')
            rows.append([run, prompt_format, acc, p, r, f1])
    df_scores = pd.DataFrame(rows, columns=["run", "prompt_format", "accuracy", "precision", "recall", "f1"])
    return df_scores

In [None]:
df_repub

In [None]:
df = pd.read_csv("output/anes2020_pilot_prompt_probing_ft.csv")
df_scores = compute_scores(df, df_dem, df_repub)
df_scores

In [None]:
# extract gold ranks
df_politician_results = df_politician_results.sort_values(by=["pid"])
gold_dem_rank = df_politician_results['Democrat'].rank().values
gold_repub_rank = df_politician_results['Republican'].rank().values
gold_repub_rank

from scipy import stats
def extract_ranking(df_):
    df_ = df_.sort_values(by=['question'])
    return df_[df_.prompt_format == "Prompt4"].groupby(['question']).group_sentiment.mean().rank().values

dem_rank = extract_ranking(df_dem)
repub_rank = extract_ranking(df_repub)

gold_dem_rank

In [None]:
## plot the rankings

def extract_ranking_for_politicians(df_):
    df_ = df_[df_.question.isin(politician_feelings)]
    df_ = df_.sort_values(by=['question', 'run'])
    return df_[df_.prompt_format == "Prompt4"]

df_politician_results = df_politician_results[df_politician_results.pid.isin(politician_feelings)].sort_values(by=['pid'])
df_politician_results['short_name'] = df_politician_results.Prompt1.apply(lambda x: x.split(" ")[-1])

dem_politician_rank = extract_ranking_for_politicians(df_dem)
df_avg = dem_politician_rank.groupby("question").group_sentiment.mean().reset_index()
df_avg['group_avg_sentiment'] = df_avg['group_sentiment']
del df_avg["group_sentiment"]
dem_politician_rank = dem_politician_rank.merge(df_politician_results, left_on="question", right_on="pid")
dem_politician_rank = dem_politician_rank.merge(df_avg, on="question")


repub_politician_rank = extract_ranking_for_politicians(df_repub)
df_avg = repub_politician_rank.groupby("question").group_sentiment.mean().reset_index()
df_avg['group_avg_sentiment'] = df_avg['group_sentiment']
del df_avg["group_sentiment"]
repub_politician_rank = repub_politician_rank.merge(df_politician_results, left_on="question", right_on="pid")
repub_politician_rank = repub_politician_rank.merge(df_avg, on="question")


dem_politician_rank

In [None]:
# df_politician_results.to_csv("rank_plots.csv")
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(rc={'figure.figsize':(5,5)})

palette = sns.color_palette("Blues",n_colors=20)
palette.reverse()

ax = sns.barplot(data=dem_politician_rank.sort_values(by="group_avg_sentiment", ascending=False), x="group_sentiment", y="short_name", palette=palette, estimator=np.mean, ci=90)

ax.set_xlabel(None, fontsize=15)
ax.set_ylabel(None)
plt.tick_params(axis='both', which='major', labelsize=14)
plt.tight_layout()
plt.savefig('rankings/finetuned_gpt2_pred_dem_rank.png', bbox_inches = "tight")

In [None]:
sns.set(rc={'figure.figsize':(5,5)})

palette = sns.color_palette("Blues",n_colors=20)
palette.reverse()

ax = sns.barplot(data=dem_politician_rank.sort_values(by="Democrat", ascending=False), x="Democrat", y="short_name", palette=palette)

ax.set_xlabel(None, fontsize=15)
ax.set_ylabel(None)
plt.tick_params(axis='both', which='major', labelsize=14)
plt.tight_layout()
plt.savefig('rankings/gold_dem_rank.png', bbox_inches = "tight")

In [None]:
palette = sns.color_palette("Reds", n_colors=20)
palette.reverse()

ax = sns.barplot(data=repub_politician_rank.sort_values(by="group_avg_sentiment", ascending=False), x="group_sentiment", y="short_name", palette=palette)

ax.set_xlabel(None, fontsize=15)
ax.set_ylabel(None)
plt.tick_params(axis='both', which='major', labelsize=14)
plt.tight_layout()
plt.savefig('rankings/finetuned_gpt2_pred_repub_rank.png', bbox_inches = "tight")

In [None]:
palette = sns.color_palette("Reds", n_colors=20)
palette.reverse()

ax = sns.barplot(data=repub_politician_rank.sort_values(by="Republican", ascending=False), x="Republican", y="short_name", palette=palette)

ax.set_xlabel(None, fontsize=15)
ax.set_ylabel(None)
plt.tick_params(axis='both', which='major', labelsize=14)
plt.tight_layout()
plt.savefig('rankings/gold_repub_rank.png', bbox_inches = "tight")