## A tutorial to analysis political insight leveraging LLMs

The case study demonstrates how to Leverage Large Language Models (LLMs) to gain political insight based on a leaked [email](https://github.com/benhamner/hillary-clinton-emails?tab=readme-ov-file) dataset from Hillary Clinton's private email server. 
- The email dataset is a comprehensive collection of communications covering her entire tenure as Secretary of State from 2009 to 2013. 
- It includes approximately 30,000 emails with a wide range of topics from official diplomatic communications to personal correspondences. 
- The release and subsequent analysis of these emails have played a crucial role in political debates, legal inquiries, and public discussions about transparency and security in government communications.

### Goals of analysis with a LLM
- Input for LLM: emails with various political scenarios, historical events, or current affairs related to Israel
- Task for LLM: analyze emails from a political, social, and economic perspective 
    - provide insights into the implications of these scenarios, 
    - how they reflect on Israel's domestic and foreign policy, and 
    - what potential outcomes or future developments could arise from them.
- Output: analyze results
    - No specific format is required.


### Dataset in this study
A set of email summaries (138 paragraph) from the leaked email dataset
- each summary is a summarization of an email containing the keyword "Israel"
    - some emails is very long. LLMs have token limitation
- summarization is done by Gemini 
    - Gemini API is [free](https://aistudio.google.com/app/apikey)


### Implementation Plan
- [langchain](https://www.langchain.com/)
    - a popular open-source framework 
    - designed to simplify the development of applications using LLMs
- Gemini - API is [free](https://aistudio.google.com/app/apikey)
    - summarization
    - political analysis 
- Can we use DSPy?

### Step 0: Download and check summarized email dataset 

In [1]:
! wget -q https://raw.githubusercontent.com/frankwxu/digital-forensics-lab/main/AI4Forensics/CKIM2024/HillaryEmails/results_email_summary.txt

In [2]:
file_path = "results_email_summary.txt"

# Open the file and read its content
with open(file_path, "r") as file:
    provided_data = file.read()

# Display the content
print(provided_data)

No information related to Israel or Israeli affairs was found in the provided email.
12:30 Israeli PM Netanyahu
- Dennis Ross recently visited Israel and will share information with the recipient before upcoming meetings.
There is no mention of Israel or Israeli in the email provided.
**Subject:** The Vice President's Residence

**Date:** N/A

**Summary:**

- 9:00 am: Bilateral meeting with Israeli President Shimon Peres at the Omni Shoreham Hotel.
This email does not mention Israel or Israeli-related topics.
No information related to Israel or Israeli is present in the provided email.
In a discussion about arrangements on settlements with Israel, two options are presented:

1. Describe it as an agreement, which would raise concerns about legitimizing Israeli activity in the West Bank and triggering complaints from Arabs and Palestinians.

2. The administration could acknowledge progress made, express differences with the Israeli government on their intention to complete housing units,

### Step 1: Download libraries 
- Make use you use `pip` to download necessary libraries 
- All downloaded and saved files can be located in the `content` folder if using google Colab

In [3]:
# !pip -q install google-generativeai
# !pip -q install langchain-google-genai
# !pip install python-dotenv
# !pip -q install langchain_experimental langchain_core
# !pip install --upgrade langchain

import os
import google.generativeai as genai
from IPython.display import display
from IPython.display import Markdown
from dotenv import load_dotenv
from langchain_google_genai import ChatGoogleGenerativeAI, HarmBlockThreshold, HarmCategory
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

### Step 2: Config Gemini
- replace your own Gemini API
```genai.configure(api_key=GOOGLE_AI_STUDIO)```
- set up Gemini model
- config safety settings 

In [4]:
# ================ Key configuration===========
# Load environment variables from the .env file
load_dotenv("my_config.env")

# Access the environment variables
GOOGLE_AI_STUDIO = os.getenv("GOOGLE_AI_STUDIO2")
genai.configure(api_key=GOOGLE_AI_STUDIO)

# ======= Gerneration configuration===========
# Set up the model
# Temperature controls the randomness of the model's output.
generation_config = {
    "temperature": 0.0,  # Controls the randomness of the model's output
    "top_p": 1,  # Chooses the smallest set of tokens whose cumulative probability exceeds the threshold p.  1 means all tokens are considered
    "top_k": 16,  # Selects the k most likely next tokens.
    "max_output_tokens": 4096,
}

# ======= Safety configuration=================
# disable safety settings though langchain
safety_settings = {
    HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
}

### Step 3: build a Gemini model with configurations

In [5]:
model = ChatGoogleGenerativeAI(
    model="gemini-pro",
    generation_config=generation_config,
    safety_settings=safety_settings,
    # You can hardcode the key AIzaSyCDqn8xVJ4cFeiXSvhPUcnR60jfBLj5dO4k
    google_api_key=GOOGLE_AI_STUDIO,
)

### Step 4: Create a prompt template
- This is a multi-line string containing placeholders in curly braces.
```
        formatted_prompt = prompt.format(
            role="You are a helpful assistant.",
            provided_data="Here's some context: ...",
            start="Please answer the following question:"
        )
```
- `{role}, {provided_data}, and {start}` are placeholders that will be filled in later.
    - `{role}`: definition specifies the role's name, overall objective, task specific context, and any applicable constraints. 
    - `{provided_data}`: outlines the required datasets for task completion
    - `{start}`: the initiation instruction serves as a trigger, prompting the role to carry out the task

In [6]:
template = """ 
{role}\
{provided_data}\
{start} 
"""
prompt = ChatPromptTemplate.from_template(template)

### Step 5: use LangChain to create a simple processing chain

Flow of operation `chain = prompt | model | output_parser`
- The prompt is first formatted and sent to the model.
- The model processes the prompt and generates a response.
- The output parser then processes the model's response, ensuring it's in the correct string format.

In [7]:
# a LangChain utility that parses the output of a language model into a simple string.
output_parser = StrOutputParser()

# This line creates a processing chain using the pipe (|) operator.

chain = prompt | model | output_parser

role = "I want you to act as a political analyst specializing in Israel. I will present you with various political scenarios, historical events, or current affairs related to Israel, and your task is to analyze them from a political, social, and economic perspective. Provide insights into the implications of these scenarios, how they reflect on Israel's domestic and foreign policy, and what potential outcomes or future developments could arise from them. Remember, your responses should be based on factual analysis, taking into account Israel's political landscape, its relationships with neighboring countries, and the broader geopolitical context. Do not provide personal opinions or speculative predictions that cannot be supported by current facts or historical precedent."

start = "My first scenario involves analyzing a collection of leaked email summaries obtained from Hillary Clinton's private email server. These emails are specifically related to Israel. Describe political insights based on these emails."

result = chain.invoke(
    {
        "role": role,
        "provided_data": provided_data,
        "start": start,
    }
)

Markdown(result)

**Political Insights from Leaked Hillary Clinton Emails on Israel:**

**1. US-Israel Diplomatic Tensions:**

* Several emails reveal tensions between the US and Israel, particularly regarding Israel's settlement expansion and military actions.
* Israel's Ambassador to the US, Michael Oren, expressed concerns about the US reprimanding Israel over its conduct during Vice President Biden's visit.
* Israel disputed the Obama administration's claim that it had warned Israeli officials to exercise caution during the Gaza Flotilla interception.

**2. Internal Israeli Politics:**

* Emails discuss the political dynamics within Israel's ruling coalition, including the pressure on Prime Minister Netanyahu from right-wing parties to block the renewal of the settlement freeze.
* Netanyahu's motivations for not extending the settlement freeze are analyzed, suggesting concerns about keeping coalition partners satisfied.
* The potential impact of Kadima leader Tzipi Livni's willingness to join the government without demanding rotation is examined.

**3. Peace Negotiations and International Pressure:**

* Emails highlight the US administration's efforts to facilitate peace negotiations between Israel and the Palestinians.
* The international community's concerns about Israel's settlement policy and its impact on peace prospects are evident.
* Former Shin Bet chief Yuval Diskin's warning about the potential erosion of Palestinian security motivation if Netanyahu does not demonstrate seriousness about peace is noteworthy.

**4. Public Opinion and Political Strategy:**

* Polls discussed in the emails indicate American support for a two-state solution and concern about the humanitarian crisis in Gaza.
* Netanyahu's negotiating tactics are scrutinized, with analysts suggesting they contribute to distrust on the Palestinian side.
* The Israeli public's readiness for a peace deal is highlighted, with warnings that failure to make a serious move could further delegitimize Israel internationally.

**Social and Economic Implications:**

* The emails do not provide significant insights into the social or economic implications of the discussed political issues. However, the humanitarian crisis in Gaza and the potential impact of settlement expansion on the Palestinian economy are mentioned in passing.

**Potential Future Developments:**

Based on the analysis of these emails, potential future developments could include:

* Continued tensions between the US and Israel over settlement policy and other issues.
* Further political instability within Israel's coalition government.
* Stalled peace negotiations and international pressure on Israel.
* Growing public dissatisfaction in Israel if Netanyahu fails to make progress towards peace.

In [8]:
# Open a file for writing ('w' mode) and create it if it doesn't exist
with open(r"result_political.txt", "w") as file:
    # Write content to the file
    file.write(result)

print("File saved successfully.")

File saved successfully.
