<a href="https://colab.research.google.com/github/heger-valter/Deux_Ex_Nupic/blob/master/WAR_HACKATHON.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Step 1 : **Data Collection:** Scrape headlines from popular news sources.

In [4]:
import requests
from bs4 import BeautifulSoup

def get_headlines(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    headlines = [h.text for h in soup.find_all('h3')]  # Adjust the tag based on the website structure
    return headlines

# Example usage
url = "https://news.yahoo.com"
headlines = get_headlines(url)
print(headlines)

['  News', ' Life', ' Entertainment', ' Finance', ' Sports', 'New on Yahoo', 'Biden addresses debate performance, rejects calls to end 2024 bid', 'Texas guests jump off Six Flags ride into the water after malfunction', 'Tractor Supply Co. eliminates DEI after conservative backlash', 'Oklahoma orders public schools to start teaching the Bible in classrooms', "The days of 'plausibly live' Olympics coverage on NBC are coming to an end", "Theodore Roosevelt's stolen pocket watch recovered after 4 decades", 'Supreme Court latest decisions: Justices rule on Jan. 6 defendants, criminalization of homelessness and power of federal agencies', 'Biden-Trump debate: What you didn’t see on TV from inside — and outside — the showdown in Atlanta', 'Presidential debate live updates: Fact checks, reaction after Biden, Trump face off', 'Fact-check: Biden and Trump trade falsehoods and context-free claims at 1st presidential debate of 2024', "Who won the Biden-Trump debate? Biden's freeze draws age concer

Step 2: **Preprocessing**: Clean and preprocess the text data.  You need to clean the text data by removing unwanted characters and performing other preprocessing steps like tokenization.


In [None]:
import re
import nltk
nltk.download('punkt')
from nltk.tokenize import word_tokenize

def preprocess_text(text):
    text = re.sub(r'\W', ' ', text)  # Remove non-word characters
    text = text.lower()  # Convert to lowercase
    tokens = word_tokenize(text)  # Tokenize the text
    return tokens

# Example usage
clean_headlines = [preprocess_text(h) for h in headlines]
print(clean_headlines)

Step 3: **Sentiment Analysis**: Use a pre-trained model to analyze the sentiment and hostility of headlines.


In [None]:
from transformers import pipeline

# Load pre-trained sentiment analysis model
sentiment_pipeline = pipeline("sentiment-analysis")

def analyze_sentiment(headlines):
    sentiments = sentiment_pipeline(headlines)
    return sentiments

# Example usage
sentiments = analyze_sentiment(headlines)
print(sentiments)


Step 4:** Probability Estimation**

For this step, you'll need historical data on hostility before wars. Assuming you have such data, you can compare current sentiment scores to historical averages.

In [None]:
import numpy as np

# Example historical hostility scores
historical_scores = np.array([0.2, 0.4, 0.5, 0.3, 0.6])  # Replace with actual data

def estimate_probability(current_scores, historical_scores):
    current_avg = np.mean(current_scores)
    historical_avg = np.mean(historical_scores)
    probability = current_avg / historical_avg  # Simple ratio, adjust as needed
    return probability

# Example usage
current_scores = [s['score'] for s in sentiments if s['label'] == 'NEGATIVE']
probability = estimate_probability(current_scores, historical_scores)
print(f"Estimated Probability of War: {probability:.2f}")

Step 5:  **Prototype App**: Build a simple application to showcase the functionality.

In [None]:
from flask import Flask, jsonify

app = Flask(__name__)

@app.route('/headlines', methods=['GET'])
def get_headlines_route():
    url = "https://news.yahoo.com"
    headlines = get_headlines(url)
    return jsonify(headlines)

@app.route('/sentiment', methods=['GET'])
def get_sentiment_route():
    url = "https://news.yahoo.com"
    headlines = get_headlines(url)
    sentiments = analyze_sentiment(headlines)
    return jsonify(sentiments)

@app.route('/probability', methods=['GET'])
def get_probability_route():
    url = "https://news.yahoo.com"
    headlines = get_headlines(url)
    sentiments = analyze_sentiment(headlines)
    current_scores = [s['score'] for s in sentiments if s['label'] == 'NEGATIVE']
    probability = estimate_probability(current_scores, historical_scores)
    return jsonify({"probability": probability})

if __name__ == '__main__':
    app.run(debug=True)