## A tutorial to analysis political insight leveraging LLMs

The case study demonstrates how to Leverage Large Language Models (LLMs) to gain political insight based on a leaked [email](https://github.com/benhamner/hillary-clinton-emails?tab=readme-ov-file) dataset from Hillary Clinton's private email server. 
- The email dataset is a comprehensive collection of communications covering her entire tenure as Secretary of State from 2009 to 2013. 
- It includes approximately 30,000 emails with a wide range of topics from official diplomatic communications to personal correspondences. 
- The release and subsequent analysis of these emails have played a crucial role in political debates, legal inquiries, and public discussions about transparency and security in government communications.

### Goals of analysis with a LLM
- Input for LLM: emails with various political scenarios, historical events, or current affairs related to Israel
- Task for LLM: analyze emails from a political, social, and economic perspective 
    - provide insights into the implications of these scenarios, 
    - how they reflect on Israel's domestic and foreign policy, and 
    - what potential outcomes or future developments could arise from them.
- Output: analyze results
    - No specific format is required.


### Dataset in this study
A set of email summaries (138 paragraph) from the leaked email dataset
- each summary is a summarization of an email containing the keyword "Israel"
    - some emails is very long. LLMs have token limitation
- summarization is done by Gemini 
    - Gemini API is [free](https://aistudio.google.com/app/apikey)


### Implementation Plan
- [langchain](https://www.langchain.com/)
    - a popular open-source framework 
    - designed to simplify the development of applications using LLMs
- Gemini - API is [free](https://aistudio.google.com/app/apikey)
    - summarization
    - political analysis 
- Can we use DSPy?

### Step 1: Download libraries 
- Make use you use `pip` to download necessary libraries 
- All downloaded and saved files can be located in the `content` folder if using google Colab

In [15]:
# !pip -q install google-generativeai
# !pip -q install langchain-google-genai
# !pip install python-dotenv
# !pip -q install langchain_experimental langchain_core
# !pip install --upgrade langchain

import google.generativeai as genai
from IPython.display import display
from IPython.display import Markdown
from dotenv import load_dotenv
from langchain_google_genai import ChatGoogleGenerativeAI, HarmBlockThreshold, HarmCategory
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

### Step 2: Config Gemini
- replace your own Gemini API
```genai.configure(api_key=GOOGLE_AI_STUDIO)```
- set up Gemini model
- config safety settings 

In [16]:
# ================ Key configuration===========
# Load environment variables from the .env file
load_dotenv("my_config.env")

# Access the environment variables
GOOGLE_AI_STUDIO = os.getenv("GOOGLE_AI_STUDIO2")

# replace your own Gemini API key
genai.configure(api_key="GOOGLE_AI_STUDIO")


# ======= Gerneration configuration===========
# Set up the model
# Temperature controls the randomness of the model's output.
generation_config = {
    "temperature": 0.0,  # Controls the randomness of the model's output
    "top_p": 1,  # Chooses the smallest set of tokens whose cumulative probability exceeds the threshold p.  1 means all tokens are considered
    "top_k": 16,  # Selects the k most likely next tokens.
    "max_output_tokens": 4096,
}

# ======= Safety configuration=================
# disable safety settings though langchain
safety_settings = {
    HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
}

### Step 3: build a Gemini model with configurations

In [18]:
model = ChatGoogleGenerativeAI(
    model="gemini-pro",
    generation_config=generation_config,
    safety_settings=safety_settings,
    google_api_key=GOOGLE_AI_STUDIO,
)

### Step 4: Create a prompt template
- This is a multi-line string containing placeholders in curly braces.
```
        formatted_prompt = prompt.format(
            role="You are a helpful assistant.",
            provided_data="Here's some context: ...",
            start="Please answer the following question:"
        )
```
- `{role}, {provided_data}, and {start}` are placeholders that will be filled in later.
    - `{role}`: definition specifies the role's name, overall objective, task specific context, and any applicable constraints. 
    - `{provided_data}`:  outlines the required datasets for task completion
    - `{start}`: the initiation instruction serves as a trigger, prompting the role to carry out the task

In [None]:
template = """ 
{role}\
{provided_data}\
{start} 
"""
prompt = ChatPromptTemplate.from_template(template)

### Step 5: use LangChain to create a simple processing chain

Flow of operation `chain = prompt | model | output_parser`
- The prompt is first formatted and sent to the model.
- The model processes the prompt and generates a response.
- The output parser then processes the model's response, ensuring it's in the correct string format.

In [13]:
# a LangChain utility that parses the output of a language model into a simple string.
output_parser = StrOutputParser()

# This line creates a processing chain using the pipe (|) operator.
chain = prompt | model | output_parser

with open(r".\role_political_analyst.txt", "r") as file:
    role = file.read()

with open(r".\results_email_summary.txt", "r") as file:
    provided_data = file.read()

with open(r".\start_political_analyst.txt", "r") as file:
    start = file.read()

result = chain.invoke(


    {


        "role": role,


        "provided_data": provided_data,


        "start": start,


    }
)


Markdown(result)

**Political Insights Based on Leaked Hillary Clinton Emails Related to Israel**

The leaked email summaries provide valuable insights into the political dynamics surrounding Israel during Hillary Clinton's tenure as Secretary of State. These insights can be categorized as follows:

**1. Diplomatic Challenges and Negotiations:**

* The emails reveal ongoing diplomatic efforts to facilitate peace talks between Israel and the Palestinians, highlighting the complexities and challenges involved in negotiations.
* They shed light on the delicate balance between maintaining good relations with Israel while also addressing concerns from Arab and Palestinian partners.

**2. Settlement Freeze and Construction:**

* The emails discuss the controversial issue of Israeli settlement construction in the West Bank, including the Obama administration's efforts to secure a settlement freeze and Israel's reluctance to fully comply.
* They provide evidence of ongoing tensions between the US and Israel over this issue, which remains a significant obstacle to peace efforts.

**3. Public Perception and International Pressure:**

* The emails reflect the challenges faced by Israel in managing its international image, particularly in the wake of incidents like the Gaza Flotilla raid.
* They show how the US administration attempted to mediate between Israel and the international community, emphasizing the importance of accountability and restraint.

**4. Domestic Political Considerations:**

* The emails provide glimpses into the domestic political dynamics within Israel, including the influence of right-wing parties and the challenges faced by Prime Minister Netanyahu in balancing their demands with international pressure.
* They highlight the complexities of Israeli coalition politics and the impact on decision-making.

**5. Security Concerns:**

* The emails touch on security-related issues, such as the humanitarian crisis in Gaza and the need for a two-state solution to address both Israeli security concerns and Palestinian aspirations.
* They demonstrate the interconnectedness of political and security matters in the region.

Overall, these leaked emails offer valuable insights into the complexities of US-Israel relations, the challenges of peace negotiations, and the political dynamics shaping Israel's domestic and foreign policy. They underscore the importance of diplomacy, dialogue, and a balanced approach to address the multifaceted issues in the region.

In [5]:
# Open a file for writing ('w' mode) and create it if it doesn't exist
with open(r"result_political.txt", "w") as file:
    # Write content to the file
    file.write(result)

print("File saved successfully.")

File saved successfully.
