In [8]:
pip install readability-lxml

Collecting readability-lxml
  Downloading readability_lxml-0.8.1-py3-none-any.whl.metadata (3.6 kB)
Collecting chardet (from readability-lxml)
  Using cached chardet-5.2.0-py3-none-any.whl.metadata (3.4 kB)
Collecting lxml (from readability-lxml)
  Using cached lxml-5.3.1-cp310-cp310-win_amd64.whl.metadata (3.8 kB)
Collecting cssselect (from readability-lxml)
  Downloading cssselect-1.3.0-py3-none-any.whl.metadata (2.6 kB)
Downloading readability_lxml-0.8.1-py3-none-any.whl (20 kB)
Using cached chardet-5.2.0-py3-none-any.whl (199 kB)
Downloading cssselect-1.3.0-py3-none-any.whl (18 kB)
Using cached lxml-5.3.1-cp310-cp310-win_amd64.whl (3.8 MB)
Installing collected packages: lxml, cssselect, chardet, readability-lxml
Successfully installed chardet-5.2.0 cssselect-1.3.0 lxml-5.3.1 readability-lxml-0.8.1
Note: you may need to restart the kernel to use updated packages.


In [6]:
pip install newspaper3k

Note: you may need to restart the kernel to use updated packages.


In [7]:
pip install lxml[html_clean]

Note: you may need to restart the kernel to use updated packages.


In [8]:
pip install langchain-groq

Collecting httpx<1,>=0.23.0 (from groq<1,>=0.4.1->langchain-groq)
  Using cached httpx-0.28.1-py3-none-any.whl.metadata (7.1 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->groq<1,>=0.4.1->langchain-groq)
  Using cached httpcore-1.0.7-py3-none-any.whl.metadata (21 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->groq<1,>=0.4.1->langchain-groq)
  Using cached h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Using cached httpx-0.28.1-py3-none-any.whl (73 kB)
Using cached httpcore-1.0.7-py3-none-any.whl (78 kB)
Using cached h11-0.14.0-py3-none-any.whl (58 kB)
Installing collected packages: h11, httpcore, httpx
  Attempting uninstall: h11
    Found existing installation: h11 0.9.0
    Uninstalling h11-0.9.0:
      Successfully uninstalled h11-0.9.0
  Attempting uninstall: httpcore
    Found existing installation: httpcore 0.9.1
    Uninstalling httpcore-0.9.1:
      Successfully uninstalled httpcore-0.9.1
  Attempting uninstall: httpx
    Found existing installation:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
googletrans 4.0.0rc1 requires httpx==0.13.3, but you have httpx 0.28.1 which is incompatible.


In [25]:
import requests
from bs4 import BeautifulSoup
from newspaper import Article
import time
import os
import spacy
from dotenv import load_dotenv
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain_groq import ChatGroq
from langchain.schema import Document
from textblob import TextBlob  
from sklearn.feature_extraction.text import TfidfVectorizer

# Load environment variables
load_dotenv()
api_key = os.getenv("GROQ_API_KEY")

# Load NLP model for topic extraction
nlp = spacy.load("en_core_web_sm")

# Initialize LLM
llm = ChatGroq(groq_api_key=api_key, model="Gemma2-9b-It")

def fetch_news_links_bing(company, max_articles=5):
    """Fetch at least 5 unique news article links from Bing News"""
    search_url = f"https://www.bing.com/news/search?q={company.replace(' ', '+')}"
    headers = {"User-Agent": "Mozilla/5.0"}

    response = requests.get(search_url, headers=headers)
    soup = BeautifulSoup(response.text, "html.parser")

    news_links = set()
    for link in soup.find_all("a", href=True):
        url = link["href"]
        if url.startswith("http") and "bing.com" not in url and url not in news_links:
            news_links.add(url)
        if len(news_links) >= max_articles:
            break

    if len(news_links) < max_articles:
        print(f"Warning: Only {len(news_links)} unique articles found.")
    
    return list(news_links)[:max_articles]

def extract_article_text(url):
    """Extract clean article text using newspaper3k"""
    try:
        article = Article(url)
        article.download()
        article.parse()
        
        return article.title, article.text if article.text else "Could not extract article content."
    except Exception as e:
        return "Error extracting content", f"Error extracting content: {str(e)}"

def extract_key_topics(text):
    """Extracts key topics using spaCy NLP"""
    doc = nlp(text)
    topics = set()
    
    for ent in doc.ents:
        if ent.label_ in ["ORG", "GPE", "EVENT", "PERSON"]:
            topics.add(ent.text)
    
    return list(topics)

def summarize_text(speech_text):
    """Summarizes the article content"""
    
    docs = [Document(page_content=speech_text)]
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=100)
    final_documents = text_splitter.split_documents(docs)

    chunks_prompt = PromptTemplate(
        input_variables=['text'],
        template="Please summarize the below speech:\nSpeech: `{text}`\nSummary:"
    )

    final_prompt_template = PromptTemplate(
        input_variables=['text'],
        template="Here is a concise summary of the speech:\n\n{text}"
    )

    summary_chain = load_summarize_chain(
        llm=llm,
        chain_type="map_reduce",
        map_prompt=chunks_prompt,
        combine_prompt=final_prompt_template,
        verbose=False
    )

    summary_output = summary_chain.invoke({"input_documents": final_documents})['output_text']
    return summary_output

def analyze_sentiment(text):
    """Performs sentiment analysis using TextBlob"""
    sentiment_score = TextBlob(text).sentiment.polarity
    if sentiment_score > 0:
        return "Positive"
    elif sentiment_score < 0:
        return "Negative"
    else:
        return "Neutral"

def overall_sentiment_analysis(articles):
    """Determines overall sentiment from multiple articles."""
    sentiment_counts = {"Positive": 0, "Negative": 0, "Neutral": 0}

    for article in articles:
        sentiment_counts[article["Sentiment"]] += 1

    if sentiment_counts["Positive"] > sentiment_counts["Negative"]:
        return "The latest news coverage is mostly positive. Potential stock growth expected."
    elif sentiment_counts["Negative"] > sentiment_counts["Positive"]:
        return "The latest news coverage is mostly negative. Stock decline possible."
    else:
        return "The news coverage is balanced. Market stability expected."

def compare_articles(articles):
    texts = [article["Summary"] for article in articles]
    vectorizer = TfidfVectorizer(stop_words="english")
    tfidf_matrix = vectorizer.fit_transform(texts)
    
    comparison_results = []
    for i in range(len(articles)):
        for j in range(i + 1, len(articles)):
            similarity = (tfidf_matrix[i].dot(tfidf_matrix[j].T)).toarray()[0, 0]
            comparison_results.append({
                "article_1": articles[i]["Title"],
                "article_2": articles[j]["Title"],
                "similarity_score": round(float(similarity), 2)
            })
    
    return comparison_results

if __name__ == "__main__":
    company_name = input("Enter company name: ")
    news_links = fetch_news_links_bing(company_name, max_articles=10)

    articles_data = []

    if news_links:
        for idx, link in enumerate(news_links):
            print(f"\nFetching article [{idx+1}]: {link}")
            time.sleep(1)
            
            title, article_text = extract_article_text(link)
            summary = summarize_text(article_text)
            key_topics = extract_key_topics(summary)
            sentiment = analyze_sentiment(summary)
            
            articles_data.append({
                "Title": title,
                "Summary": summary,
                "Sentiment": sentiment,
                "Topics": key_topics
            })

    final_sentiment = overall_sentiment_analysis(articles_data)

    output_data = {
        "Company": company_name,
        "Articles": articles_data,
        "Final Sentiment Analysis": final_sentiment
    }

    print("\n\nFinal Output:\n")
    print(output_data)



Fetching article [1]: https://www.indianweb2.com/2025/03/tesla-launching-2-e-car-models-in-india.html

Fetching article [2]: https://www.telegraphindia.com/world/as-tesla-tanks-elon-musks-chosen-board-chair-thrives/cid/2089317

Fetching article [3]: https://www.timesnownews.com/auto/electric-vehicles/tesla-initiates-homologation-of-model-y-and-model-3-for-india-article-119108577

Fetching article [4]: https://www.msn.com/en-in/news/other/internet-slams-kim-kardashian-s-tesla-photoshoot-as-anti-musk-protests-gain-momentum/ar-AA1B26lN?ocid=BingNewsVerp

Fetching article [5]: https://www.ndtv.com/auto/tesla-model-3-model-ys-homologation-process-initiated-in-india-7940909

Fetching article [6]: https://www.dailymail.co.uk/sciencetech/article-14506405/Elon-Musk-HALT-Cybertruck-deliveries.html

Fetching article [7]: https://www.msn.com/en-ie/technology/tech-companies/more-cybertruck-problems-for-elon-musk-tesla-and-customers/ar-AA1B1a66?ocid=BingNewsVerp

Fetching article [8]: https://www.m

In [12]:
pip install groq

Note: you may need to restart the kernel to use updated packages.


In [None]:
import requests
from bs4 import BeautifulSoup
from newspaper import Article
import time
import os
import spacy
from dotenv import load_dotenv
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain_groq import ChatGroq
from langchain.schema import Document
from textblob import TextBlob  
from sklearn.feature_extraction.text import TfidfVectorizer
from deep_translator import GoogleTranslator
from gtts import gTTS 
# Load environment variables
load_dotenv()
api_key = os.getenv("GROQ_API_KEY")

# Load NLP model for topic extraction
nlp = spacy.load("en_core_web_sm")

# Initialize LLM
llm = ChatGroq(groq_api_key=api_key, model="Gemma2-9b-It")

def fetch_news_links_bing(company, max_articles=10):
    """Fetch at least 5 unique news article links from Bing News"""
    search_url = f"https://www.bing.com/news/search?q={company.replace(' ', '+')}"
    headers = {"User-Agent": "Mozilla/5.0"}

    response = requests.get(search_url, headers=headers)
    soup = BeautifulSoup(response.text, "html.parser")

    news_links = set()
    for link in soup.find_all("a", href=True):
        url = link["href"]
        if url.startswith("http") and "bing.com" not in url and url not in news_links:
            news_links.add(url)
        if len(news_links) >= max_articles:
            break

    if len(news_links) < max_articles:
        print(f"Warning: Only {len(news_links)} unique articles found.")
    
    return list(news_links)[:max_articles]

def extract_article_text(url):
    """Extract clean article text using newspaper3k"""
    try:
        article = Article(url)
        article.download()
        article.parse()
        
        return article.title, article.text if article.text else "Could not extract article content."
    except Exception as e:
        return "Error extracting content", f"Error extracting content: {str(e)}"

def extract_key_topics(text):
    """Extracts key topics using spaCy NLP"""
    doc = nlp(text)
    topics = set()
    
    for ent in doc.ents:
        if ent.label_ in ["ORG", "GPE", "EVENT", "PERSON"]:
            topics.add(ent.text)
    
    return list(topics)

def summarize_text(speech_text):
    """Summarizes the article content"""
    
    docs = [Document(page_content=speech_text)]
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=100)
    final_documents = text_splitter.split_documents(docs)

    chunks_prompt = PromptTemplate(
        input_variables=['text'],
        template="Please summarize the below speech:\nSpeech: `{text}`\nSummary:"
    )

    final_prompt_template = PromptTemplate(
        input_variables=['text'],
        template="Here is a concise summary of the speech:\n\n{text}"
    )

    summary_chain = load_summarize_chain(
        llm=llm,
        chain_type="map_reduce",
        map_prompt=chunks_prompt,
        combine_prompt=final_prompt_template,
        verbose=False
    )

    summary_output = summary_chain.invoke({"input_documents": final_documents})['output_text']
    return summary_output

def analyze_sentiment(text):
    """Performs sentiment analysis using TextBlob"""
    sentiment_score = TextBlob(text).sentiment.polarity
    if sentiment_score > 0:
        return "Positive"
    elif sentiment_score < 0:
        return "Negative"
    else:
        return "Neutral"



def compare_articles(articles):
    texts = [article["Summary"] for article in articles]
    vectorizer = TfidfVectorizer(stop_words="english")
    tfidf_matrix = vectorizer.fit_transform(texts)
    
    comparison_results = []
    for i in range(len(articles)):
        for j in range(i + 1, len(articles)):
            similarity = (tfidf_matrix[i].dot(tfidf_matrix[j].T)).toarray()[0, 0]
            comparison_results.append({
                "article_1": articles[i]["Title"],
                "article_2": articles[j]["Title"],
                "similarity_score": round(float(similarity), 2)
            })
    
    return comparison_results

if __name__ == "__main__":
    company_name = input("Enter company name: ")
    news_links = fetch_news_links_bing(company_name, max_articles=10)

    articles_data = []

    if news_links:
        for idx, link in enumerate(news_links):
            print(f"\nFetching article [{idx+1}]: {link}")
            time.sleep(1)
            
            title, article_text = extract_article_text(link)
            summary = summarize_text(article_text)
            key_topics = extract_key_topics(summary)
            sentiment = analyze_sentiment(summary)
            
            articles_data.append({
                "Title": title,
                "Summary": summary,
                "Sentiment": sentiment,
                "Topics": key_topics
            })

    final_sentiment = compare_articles(articles_data)

    output_data = {
        "Company": company_name,
        "Articles": articles_data,
        "Final Sentiment Analysis": final_sentiment
    }

    print("\n\nFinal Output:\n")
    print(output_data)
    
    




Fetching article [1]: https://www.msn.com/en-in/entertainment/bollywood/as-tesla-tanks-musk-s-chosen-board-chair-stands-strong-as-a-badass-woman-in-business-world/ar-AA1B4QO5?ocid=BingNewsVerp

Fetching article [2]: https://www.dailymail.co.uk/sciencetech/article-14506405/Elon-Musk-HALT-Cybertruck-deliveries.html

Fetching article [3]: https://www.msn.com/en-xl/news/other/tesla-planned-to-sell-the-cybertruck-for-100000-forever-now-it-s-already-selling-for-much-less/ar-AA1AYPpX?ocid=BingNewsVerp

Fetching article [4]: https://www.indianweb2.com/2025/03/tesla-launching-2-e-car-models-in-india.html

Fetching article [5]: https://www.msn.com/en-ie/technology/tech-companies/more-cybertruck-problems-for-elon-musk-tesla-and-customers/ar-AA1B1a66?ocid=BingNewsVerp

Fetching article [6]: https://www.ndtv.com/auto/tesla-model-3-model-ys-homologation-process-initiated-in-india-7940909

Fetching article [7]: https://www.timesnownews.com/auto/electric-vehicles/tesla-begins-certification-process-fo

In [None]:
english_text = final_sentiment
hindi_text = GoogleTranslator(source='en', target='hi').translate(english_text)

print("Hindi Translation:", hindi_text)



tts = gTTS(text=hindi_text, lang='hi')  
tts.save("hindi_audio.mp3")  # Save the audio file  

# Step 3: Play the audio file (Windows)
os.system("start hindi_audio.mp3") 

In [3]:
pip install gtts

Note: you may need to restart the kernel to use updated packages.


In [6]:
pip uninstall googletrans httpcore httpx -y


Found existing installation: googletrans 4.0.0rc1Note: you may need to restart the kernel to use updated packages.

Uninstalling googletrans-4.0.0rc1:
  Successfully uninstalled googletrans-4.0.0rc1
Found existing installation: httpcore 1.0.7
Uninstalling httpcore-1.0.7:
  Successfully uninstalled httpcore-1.0.7
Found existing installation: httpx 0.28.1
Uninstalling httpx-0.28.1:
  Successfully uninstalled httpx-0.28.1


In [7]:
pip install googletrans==4.0.0-rc1 httpx==0.23.0 httpcore==0.15.0


Collecting googletrans==4.0.0-rc1
  Using cached googletrans-4.0.0rc1-py3-none-any.whl
Collecting httpx==0.23.0
  Downloading httpx-0.23.0-py3-none-any.whl.metadata (52 kB)
Collecting httpcore==0.15.0
  Downloading httpcore-0.15.0-py3-none-any.whl.metadata (15 kB)
INFO: pip is looking at multiple versions of googletrans to determine which version is compatible with other requirements. This could take a while.

The conflict is caused by:
    The user requested httpx==0.23.0
    googletrans 4.0.0rc1 depends on httpx==0.13.3

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip to attempt to solve the dependency conflict

Note: you may need to restart the kernel to use updated packages.


ERROR: Cannot install googletrans==4.0.0rc1 and httpx==0.23.0 because these package versions have conflicting dependencies.
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts


In [8]:
final_sentiment

'The latest news coverage is mostly positive. Potential stock growth expected.'

In [11]:
pip install deep-translator


Collecting deep-translator
  Downloading deep_translator-1.11.4-py3-none-any.whl.metadata (30 kB)
Downloading deep_translator-1.11.4-py3-none-any.whl (42 kB)
Installing collected packages: deep-translator
Successfully installed deep-translator-1.11.4
Note: you may need to restart the kernel to use updated packages.


In [25]:
import requests
from bs4 import BeautifulSoup
from newspaper import Article
import time
import os
import spacy
from dotenv import load_dotenv
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain_groq import ChatGroq
from langchain.schema import Document
from textblob import TextBlob  
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from deep_translator import GoogleTranslator
from gtts import gTTS
from urllib.parse import urljoin

# Load environment variables
load_dotenv()
api_key = os.getenv("GROQ_API_KEY")

# Load NLP model for topic extraction
nlp = spacy.load("en_core_web_sm")

# Initialize LLM
llm = ChatGroq(groq_api_key=api_key, model="Gemma2-9b-It")

def fetch_news_links_bing(company, max_articles=5):
    """Fetch at least 5 unique news article links from Bing News"""
    search_url = f"https://www.bing.com/news/search?q={company.replace(' ', '+')}"
    headers = {"User-Agent": "Mozilla/5.0"}

    response = requests.get(search_url, headers=headers)
    soup = BeautifulSoup(response.text, "html.parser")

    news_links = set()
    base_url = "https://www.bing.com"

    for link in soup.find_all("a", href=True):
        url = urljoin(base_url, link["href"])  # Convert to absolute URL
        if url.startswith("http") and "bing.com" not in url and url not in news_links:
            news_links.add(url)
        if len(news_links) >= max_articles:
            break

    if len(news_links) < max_articles:
        print(f"Warning: Only {len(news_links)} unique articles found.")
    
    return list(news_links)[:max_articles]

def extract_article_text(url):
    """Extract clean article text using newspaper3k"""
    try:
        article = Article(url)
        article.download()
        article.parse()

        if not article.text.strip():
            return "Error extracting content", "No content extracted."

        return article.title, article.text
    except Exception as e:
        return "Error extracting content", f"Error extracting content: {str(e)}"

def extract_key_topics(text):
    """Extracts key topics using spaCy NLP"""
    doc = nlp(text)
    topics = {ent.text for ent in doc.ents if ent.label_ in ["ORG", "GPE", "EVENT", "PERSON"]}
    return list(topics)

def summarize_text(article_text):
    """Summarizes the article content using LLM"""
    
    if not article_text.strip():
        return "Summary not available due to extraction failure."

    text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=100)
    docs = text_splitter.split_documents([Document(page_content=article_text)])

    prompt_template = PromptTemplate(
        input_variables=['text'],
        template="Summarize the following article:\n{text}"
    )

    summary_chain = LLMChain(llm=llm, prompt=prompt_template)

    summary_output = summary_chain.run({"text": docs[0].page_content})  # Summarizing first chunk
    return summary_output.strip() if summary_output else "Summary not generated."

def analyze_sentiment(text):
    """Performs sentiment analysis using TextBlob"""
    sentiment_score = TextBlob(text).sentiment.polarity
    return "Positive" if sentiment_score > 0 else "Negative" if sentiment_score < 0 else "Neutral"

def generate_sentiment_summary(articles_data):
    """Generates summary insights based on sentiment analysis"""
    total_articles = len(articles_data)
    
    if total_articles == 0:
        return "No valid articles were processed for analysis."

    sentiment_counts = {"Positive": 0, "Negative": 0, "Neutral": 0}

    for article in articles_data:
        sentiment_counts[article["Sentiment"]] += 1

    # Calculate percentages
    positive_pct = round((sentiment_counts["Positive"] / total_articles) * 100, 2)
    negative_pct = round((sentiment_counts["Negative"] / total_articles) * 100, 2)
    neutral_pct = round((sentiment_counts["Neutral"] / total_articles) * 100, 2)

    summary = f"""
    Sentiment Analysis Summary for {company_name} News Coverage:
    - Positive Articles: {sentiment_counts['Positive']} ({positive_pct}%)
    - Negative Articles: {sentiment_counts['Negative']} ({negative_pct}%)
    - Neutral Articles: {sentiment_counts['Neutral']} ({neutral_pct}%)

    Insights:
    - The majority of news articles are {'positive' if positive_pct > negative_pct else 'negative' if negative_pct > positive_pct else 'neutral'}.
    - There is {abs(positive_pct - negative_pct)}% difference between positive and negative coverage.
    - This analysis suggests that media perception of {company_name} is {('generally favorable' if positive_pct > negative_pct else 'somewhat critical' if negative_pct > positive_pct else 'balanced')}.
    """
    return summary.strip()
    
    
if __name__ == "__main__":
    company_name = input("Enter company name: ")
    news_links = fetch_news_links_bing(company_name, max_articles=10)

    articles_data = []

    if news_links:
        for idx, link in enumerate(news_links):
            print(f"\nFetching article [{idx+1}]: {link}")
            time.sleep(1)
            
            title, article_text = extract_article_text(link)
            if article_text.startswith("Error"):
                continue  # Skip faulty articles
            
            summary = summarize_text(article_text)
            key_topics = extract_key_topics(summary)
            sentiment = analyze_sentiment(summary)
            
            articles_data.append({
                "Title": title,
                "Summary": summary,
                "Sentiment": sentiment,
                "Topics": key_topics
            })

    sentiment_summary = generate_sentiment_summary(articles_data)
    output_data = {
        "Company": company_name,
        "Articles": articles_data,
        "Final Sentiment Analysis":sentiment_summary
        
    }
    
    
    

    print("\n\nFinal Output:\n")
    print(output_data)
    
    english_text = sentiment_summary
    hindi_text = GoogleTranslator(source='en', target='hi').translate(english_text)

    print("Hindi Translation:", hindi_text)



    tts = gTTS(text=hindi_text, lang='hi')  
    tts.save("hindi_audio.mp3")  # Save the audio file  

    # Step 3: Play the audio file (Windows)
    os.system("start hindi_audio.mp3") 



Fetching article [1]: https://www.timesnownews.com/auto/electric-vehicles/tesla-begins-certification-process-for-model-y-model-3-in-india-article-119120768

Fetching article [2]: https://www.msn.com/en-us/money/companies/tesla-investor-christopher-tsai-hopes-musk-s-doge-role-is-short-lived/ar-AA1AZsHA?ocid=BingNewsVerp

Fetching article [3]: https://www.indianweb2.com/2025/03/tesla-launching-2-e-car-models-in-india.html

Fetching article [4]: https://edition.cnn.com/2025/03/15/business/elon-musk-tesla-demonstrations-doge/index.html

Fetching article [5]: https://www.ndtv.com/auto/tesla-model-3-model-ys-homologation-process-initiated-in-india-7940909

Fetching article [6]: https://www.msn.com/en-in/entertainment/bollywood/as-tesla-tanks-musk-s-chosen-board-chair-stands-strong-as-a-badass-woman-in-business-world/ar-AA1B4QO5?ocid=BingNewsVerp

Fetching article [7]: https://www.reuters.com/investigations/tesla-tanks-musks-hand-picked-board-chair-is-doing-just-fine-2025-03-17/

Fetching ar

In [23]:
english_text = sentiment_summary
hindi_text = GoogleTranslator(source='en', target='hi').translate(english_text)

print("Hindi Translation:", hindi_text)



tts = gTTS(text=hindi_text, lang='hi')  
tts.save("hindi_audio.mp3")  # Save the audio file  

# Step 3: Play the audio file (Windows)
os.system("start hindi_audio.mp3") 

Hindi Translation: टेस्ला समाचार कवरेज के लिए भावना विश्लेषण सारांश:
    - सकारात्मक लेख: 8 (100.0%)
    - नकारात्मक लेख: 0 (0.0%)
    - तटस्थ लेख: 0 (0.0%)

    अंतर्दृष्टि:
    - अधिकांश समाचार लेख सकारात्मक हैं।
    - सकारात्मक और नकारात्मक कवरेज के बीच 100.0% अंतर है।
    - इस विश्लेषण से पता चलता है कि टेस्ला की मीडिया धारणा ** आम तौर पर अनुकूल है **।


0

In [None]:
import requests
from bs4 import BeautifulSoup
from newspaper import Article
import time
import os
import spacy
from dotenv import load_dotenv
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain_groq import ChatGroq
from langchain.schema import Document
from textblob import TextBlob  
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from deep_translator import GoogleTranslator
from gtts import gTTS
from urllib.parse import urljoin
from langchain_community.vectorstores import FAISS
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain_core.prompts import MessagesPlaceholder
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain, create_history_aware_retriever
from langchain.prompts import ChatPromptTemplate
# Load environment variables
load_dotenv()
api_key = os.getenv("GROQ_API_KEY")

# Load NLP model for topic extraction
nlp = spacy.load("en_core_web_sm")

# Initialize LLM
llm = ChatGroq(groq_api_key=api_key, model="Gemma2-9b-It")

def fetch_news_links_bing(company, max_articles=10):
    """Fetch at least 5 unique news article links from Bing News"""
    search_url = f"https://www.bing.com/news/search?q={company.replace(' ', '+')}"
    headers = {"User-Agent": "Mozilla/5.0"}

    response = requests.get(search_url, headers=headers)
    soup = BeautifulSoup(response.text, "html.parser")

    news_links = set()
    base_url = "https://www.bing.com"

    for link in soup.find_all("a", href=True):
        url = urljoin(base_url, link["href"])  # Convert to absolute URL
        if url.startswith("http") and "bing.com" not in url and url not in news_links:
            news_links.add(url)
        if len(news_links) >= max_articles:
            break

    if len(news_links) < max_articles:
        print(f"Warning: Only {len(news_links)} unique articles found.")
    
    return list(news_links)[:max_articles]

def extract_article_text(url):
    """Extract clean article text using newspaper3k"""
    try:
        article = Article(url)
        article.download()
        article.parse()

        if not article.text.strip():
            return "Error extracting content", "No content extracted."

        return article.title, article.text
    except Exception as e:
        return "Error extracting content", f"Error extracting content: {str(e)}"



# Function to process PDF files and answer queries
def process_query(uploaded_files, query):
    if uploaded_files:
        contents = uploaded_files
        
        # Split the text into chunks
        text_splitter = CharacterTextSplitter(separator = "\n",chunk_size = 800,chunk_overlap  = 200,length_function = len)
        split_texts = text_splitter.split_text(contents)

        embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
        db = FAISS.from_texts(split_texts, embeddings)

        retriever = db.as_retriever()
        model = ChatGroq(model="Gemma2-9b-It", groq_api_key=api_key)

        # Contextualization prompt
        contextualize_q_system_prompt = (
            "Given a chat history and the latest user question "
            "which might reference context in the chat history, "
            "formulate a standalone question which can be understood "
            "without the chat history. Do NOT answer the question, "
            "just reformulate it if needed and otherwise return it as is."
        )
        contextualize_q_prompt = ChatPromptTemplate.from_messages(
            [
                ("system", contextualize_q_system_prompt),
                MessagesPlaceholder(variable_name="chat_history"),
                ("human", "{input}"),
            ]
        )

        history_aware_retriever = create_history_aware_retriever(model, retriever, contextualize_q_prompt)

        # System prompt for answering questions
        system_prompt = (
            "You are an AI assistant that helps summarize and answer questions from documents.\n\n"
            "Context:\n{context}\n\n"
            "Chat History:\n{chat_history}\n\n"
            "User Question:\n{input}"
        )

        qa_prompt = ChatPromptTemplate.from_template(system_prompt)

        question_answer_chain = create_stuff_documents_chain(model, qa_prompt)
        rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

        chat_history = []
        response = rag_chain.invoke({"input": query, "chat_history": chat_history})

        return response['answer']

def extract_key_topics(text):
    """Extracts key topics using spaCy NLP"""
    doc = nlp(text)
    topics = {ent.text for ent in doc.ents if ent.label_ in ["ORG", "GPE", "EVENT", "PERSON"]}
    return list(topics)

def summarize_text(article_text):
    """Summarizes the article content using LLM"""
    
    if not article_text.strip():
        return "Summary not available due to extraction failure."

    text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=100)
    docs = text_splitter.split_documents([Document(page_content=article_text)])

    prompt_template = PromptTemplate(
        input_variables=['text'],
        template="Summarize the following article:\n{text}"
    )

    summary_chain = LLMChain(llm=llm, prompt=prompt_template)

    summary_output = summary_chain.run({"text": docs[0].page_content})  # Summarizing first chunk
    return summary_output.strip() if summary_output else "Summary not generated."

def analyze_sentiment(text):
    """Performs sentiment analysis using TextBlob"""
    sentiment_score = TextBlob(text).sentiment.polarity
    return "Positive" if sentiment_score > 0 else "Negative" if sentiment_score < 0 else "Neutral"

def generate_sentiment_summary(articles_data):
    """Generates summary insights based on sentiment analysis"""
    total_articles = len(articles_data)
    
    if total_articles == 0:
        return "No valid articles were processed for analysis."

    sentiment_counts = {"Positive": 0, "Negative": 0, "Neutral": 0}

    for article in articles_data:
        sentiment_counts[article["Sentiment"]] += 1

    # Calculate percentages
    positive_pct = round((sentiment_counts["Positive"] / total_articles) * 100, 2)
    negative_pct = round((sentiment_counts["Negative"] / total_articles) * 100, 2)
    neutral_pct = round((sentiment_counts["Neutral"] / total_articles) * 100, 2)

    summary = f"""
    Sentiment Analysis Summary for {company_name} News Coverage:
    - Positive Articles: {sentiment_counts['Positive']} ({positive_pct}%)
    - Negative Articles: {sentiment_counts['Negative']} ({negative_pct}%)
    - Neutral Articles: {sentiment_counts['Neutral']} ({neutral_pct}%)

    Insights:
    - The majority of news articles are {'positive' if positive_pct > negative_pct else 'negative' if negative_pct > positive_pct else 'neutral'}.
    - There is {abs(positive_pct - negative_pct)}% difference between positive and negative coverage.
    - This analysis suggests that media perception of {company_name} is {('generally favorable' if positive_pct > negative_pct else 'somewhat critical' if negative_pct > positive_pct else 'balanced')}.
    """
    return summary.strip()
    
    
if __name__ == "__main__":
    company_name = input("Enter company name: ")
    news_links = fetch_news_links_bing(company_name, max_articles=10)

    articles_data = []
    articals_text=[]
    texts=' '.join(articals_text)

    if news_links:
        for idx, link in enumerate(news_links):
            print(f"\nFetching article [{idx+1}]: {link}")
            time.sleep(1)
            
            title, article_text = extract_article_text(link)
            if article_text.startswith("Error"):
                continue  # Skip faulty articles
            articals_text.append(article_text)
            summary = summarize_text(article_text)
            key_topics = extract_key_topics(summary)
            sentiment = analyze_sentiment(summary)
            
            articles_data.append({
                "Title": title,
                "Summary": summary,
                "Sentiment": sentiment,
                "Topics": key_topics
            })

    sentiment_summary = generate_sentiment_summary(articles_data)
    output_data = {
        "Company": company_name,
        "Articles": articles_data,
        "Final Sentiment Analysis":sentiment_summary
        
    }
    
    
    

    print("\n\nFinal Output:\n")
    print(output_data)
    
    english_text = sentiment_summary
    hindi_text = GoogleTranslator(source='en', target='hi').translate(english_text)

    print("Hindi Translation:", hindi_text)



    tts = gTTS(text=hindi_text, lang='hi')  
    tts.save("hindi_audio.mp3")  # Save the audio file  

    # Step 3: Play the audio file (Windows)
    os.system("start hindi_audio.mp3") 
    query=input("please enter your query related to artical")
    query_process=process_query( texts, query)
    print(query_process)
    
    
    



Fetching article [1]: https://www.livemint.com/global/the-reason-tesla-doesn-t-pay-taxes-11742180949384.html

Fetching article [2]: https://www.msn.com/en-in/autos/news/never-physically-hurt-anyone-elon-musk-reacts-to-tesla-coming-under-attack/ar-AA1B60Am?ocid=BingNewsVerp

Fetching article [3]: https://www.msn.com/en-in/autos/news/a-mechanical-engineer-on-twitter-claims-facebook-link-in-tesla-protests-elon-musk-responds/ar-AA1B6xlH?ocid=BingNewsVerp

Fetching article [4]: https://www.theverge.com/tesla/631308/mark-rober-tesla-youtube-autopilot-lidar-fake-claims

Fetching article [5]: https://www.financialexpress.com/auto/car-news/tesla-allegedly-exploiting-canadas-ev-rebate-program-all-you-need-to-know/3779345/

Fetching article [6]: https://www.ndtv.com/auto/tesla-model-3-model-ys-homologation-process-initiated-in-india-7940909

Fetching article [7]: https://www.msn.com/en-in/autos/news/elon-musk-confirms-tesla-s-entry-into-india-as-soon-as-humanly-possible/ar-AA1B6msj?ocid=BingNews

In [None]:
import requests
from bs4 import BeautifulSoup
from newspaper import Article
import time
import os
import spacy
from dotenv import load_dotenv
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain_groq import ChatGroq
from langchain.schema import Document
from textblob import TextBlob  
from deep_translator import GoogleTranslator
from gtts import gTTS
from urllib.parse import urljoin
from langchain_community.vectorstores import FAISS
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain, create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder

# Load environment variables
load_dotenv()
api_key = os.getenv("GROQ_API_KEY")

# Load NLP model for topic extraction
nlp = spacy.load("en_core_web_sm")

# Initialize LLM
llm = ChatGroq(groq_api_key=api_key, model="Gemma2-9b-It")

def fetch_news_links_bing(company, max_articles=10):
    """Fetch at least 5 unique news article links from Bing News"""
    search_url = f"https://www.bing.com/news/search?q={company.replace(' ', '+')}"
    headers = {"User-Agent": "Mozilla/5.0"}

    response = requests.get(search_url, headers=headers)
    soup = BeautifulSoup(response.text, "html.parser")

    news_links = set()
    base_url = "https://www.bing.com"

    for link in soup.find_all("a", href=True):
        url = urljoin(base_url, link["href"])  # Convert to absolute URL
        if url.startswith("http") and "bing.com" not in url and url not in news_links:
            news_links.add(url)
        if len(news_links) >= max_articles:
            break

    if len(news_links) < max_articles:
        print(f"Warning: Only {len(news_links)} unique articles found.")
    
    return list(news_links)[:max_articles]

def extract_article_text(url):
    """Extract clean article text using newspaper3k"""
    try:
        article = Article(url)
        article.download()
        article.parse()

        if not article.text.strip():
            return "Error extracting content", "No content extracted."

        return article.title, article.text
    except Exception as e:
        return "Error extracting content", f"Error extracting content: {str(e)}"

# Function to process query related to articles
def process_query(texts, query):
    # Split the text into chunks
    text_splitter = CharacterTextSplitter(separator="\n", chunk_size=800, chunk_overlap=200, length_function=len)
    split_texts = text_splitter.split_text(texts)

    embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
    db = FAISS.from_texts(split_texts, embeddings)

    retriever = db.as_retriever()
    model = ChatGroq(model="Gemma2-9b-It", groq_api_key=api_key)

    # Contextualization prompt
    contextualize_q_system_prompt = (
        "Given a chat history and the latest user question "
        "which might reference context in the chat history, "
        "formulate a standalone question which can be understood "
        "without the chat history. Do NOT answer the question, "
        "just reformulate it if needed and otherwise return it as is."
    )
    contextualize_q_prompt = ChatPromptTemplate.from_messages(
        [
            ("system", contextualize_q_system_prompt),
            MessagesPlaceholder(variable_name="chat_history"),
            ("human", "{input}"),
        ]
    )

    history_aware_retriever = create_history_aware_retriever(model, retriever, contextualize_q_prompt)

    # System prompt for answering questions
    system_prompt = (
        "You are an AI assistant that helps summarize and answer questions from documents.\n\n"
        "Context:\n{context}\n\n"
        "Chat History:\n{chat_history}\n\n"
        "User Question:\n{input}"
    )

    qa_prompt = ChatPromptTemplate.from_template(system_prompt)

    question_answer_chain = create_stuff_documents_chain(model, qa_prompt)
    rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

    chat_history = []
    response = rag_chain.invoke({"input": query, "chat_history": chat_history})

    return response['answer']

def extract_key_topics(text):
    """Extracts key topics using spaCy NLP"""
    doc = nlp(text)
    topics = {ent.text for ent in doc.ents if ent.label_ in ["ORG", "GPE", "EVENT", "PERSON"]}
    return list(topics)

def summarize_text(article_text):
    """Summarizes the article content using LLM"""
    if not article_text.strip():
        return "Summary not available due to extraction failure."

    text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=100)
    docs = text_splitter.split_documents([Document(page_content=article_text)])

    prompt_template = PromptTemplate(
        input_variables=['text'],
        template="Summarize the following article:\n{text}"
    )

    summary_chain = LLMChain(llm=llm, prompt=prompt_template)

    summary_output = summary_chain.run({"text": docs[0].page_content})  # Summarizing first chunk
    return summary_output.strip() if summary_output else "Summary not generated."

def analyze_sentiment(text):
    """Performs sentiment analysis using TextBlob"""
    sentiment_score = TextBlob(text).sentiment.polarity
    return "Positive" if sentiment_score > 0 else "Negative" if sentiment_score < 0 else "Neutral"

def generate_sentiment_summary(articles_data):
    """Generates summary insights based on sentiment analysis"""
    total_articles = len(articles_data)
    
    if total_articles == 0:
        return "No valid articles were processed for analysis."

    sentiment_counts = {"Positive": 0, "Negative": 0, "Neutral": 0}

    for article in articles_data:
        sentiment_counts[article["Sentiment"]] += 1

    # Calculate percentages
    positive_pct = round((sentiment_counts["Positive"] / total_articles) * 100, 2)
    negative_pct = round((sentiment_counts["Negative"] / total_articles) * 100, 2)
    neutral_pct = round((sentiment_counts["Neutral"] / total_articles) * 100, 2)

    summary = f"""
    Sentiment Analysis Summary for {company_name} News Coverage:
    - Positive Articles: {sentiment_counts['Positive']} ({positive_pct}%)
    - Negative Articles: {sentiment_counts['Negative']} ({negative_pct}%)
    - Neutral Articles: {sentiment_counts['Neutral']} ({neutral_pct}%)

    Insights:
    - The majority of news articles are {'positive' if positive_pct > negative_pct else 'negative' if negative_pct > positive_pct else 'neutral'}.
    - There is {abs(positive_pct - negative_pct)}% difference between positive and negative coverage.
    - This analysis suggests that media perception of {company_name} is {('generally favorable' if positive_pct > negative_pct else 'somewhat critical' if negative_pct > positive_pct else 'balanced')}.
    """
    return summary.strip()

if __name__ == "__main__":
    company_name = input("Enter company name: ")
    news_links = fetch_news_links_bing(company_name, max_articles=10)

    articles_data = []
    articals_text = []

    if news_links:
        for idx, link in enumerate(news_links):
            print(f"\nFetching article [{idx+1}]: {link}")
            time.sleep(1)
            
            title, article_text = extract_article_text(link)
            if article_text.startswith("Error"):
                continue  # Skip faulty articles
            articals_text.append(article_text)
            summary = summarize_text(article_text)
            key_topics = extract_key_topics(summary)
            sentiment = analyze_sentiment(summary)
            
            articles_data.append({
                "Title": title,
                "Summary": summary,
                "Sentiment": sentiment,
                "Topics": key_topics
            })

    sentiment_summary = generate_sentiment_summary(articles_data)
    output_data = {
        "Company": company_name,
        "Articles": articles_data,
        "Final Sentiment Analysis": sentiment_summary
    }

    print("\n\nFinal Output:\n")
    print(output_data)
    
    english_text = sentiment_summary
    hindi_text = GoogleTranslator(source='en', target='hi').translate(english_text)

    print("Hindi Translation:", hindi_text)

    tts = gTTS(text=hindi_text, lang='hi')  
    tts.save("hindi_audio.mp3")  # Save the audio file  

    # Step 3: Play the audio file (Windows)
    os.system("start hindi_audio.mp3") 

    query = input("Please enter your query related to the article: ")
    query_process = process_query(' '.join(articles_data), query)
    print(query_process)



Fetching article [1]: https://www.livemint.com/global/the-reason-tesla-doesn-t-pay-taxes-11742180949384.html

Fetching article [2]: https://www.msn.com/en-in/autos/news/never-physically-hurt-anyone-elon-musk-reacts-to-tesla-coming-under-attack/ar-AA1B60Am?ocid=BingNewsVerp

Fetching article [3]: https://www.msn.com/en-in/autos/news/a-mechanical-engineer-on-twitter-claims-facebook-link-in-tesla-protests-elon-musk-responds/ar-AA1B6xlH?ocid=BingNewsVerp

Fetching article [4]: https://www.theverge.com/tesla/631308/mark-rober-tesla-youtube-autopilot-lidar-fake-claims

Fetching article [5]: https://www.financialexpress.com/auto/car-news/tesla-allegedly-exploiting-canadas-ev-rebate-program-all-you-need-to-know/3779345/

Fetching article [6]: https://www.ndtv.com/auto/tesla-model-3-model-ys-homologation-process-initiated-in-india-7940909

Fetching article [7]: https://www.msn.com/en-in/autos/news/elon-musk-confirms-tesla-s-entry-into-india-as-soon-as-humanly-possible/ar-AA1B6msj?ocid=BingNews

Created a chunk of size 950, which is longer than the specified 800


Here are the main points from the text:

* **YouTuber Mark Rober conducted a test comparing Tesla's camera-only Autopilot system to a vehicle equipped with lidar.** 
* **The lidar-equipped vehicle stopped before hitting a fake wall, while the Tesla Model Y crashed into it.**
* **Some criticism of Rober's video claims he faked the test, manipulated footage, and received payment from lidar company Luminar (which he denies).**
* **Rober released "raw footage" showing Autopilot was engaged before the crash, addressing claims it wasn't.**


The text also touches on Tesla's alleged involvement in fraud with Canada's EV rebate program, but this is not directly related to Rober's video. 

