# **Web scraping de noticias bursátiles para el Análisis de Sentimiento**
### *Implementación De Un Modelo De Recomendación Para Compra o Venta de Acciones En El Mercado Financiero Basado en Análisis De Sentimiento*

[Robert Garcia Rey](https://www.notion.so/Robert-Garcia-Rey-Data-Analyst-6d7b578d2bf848d585dc9d1a97b1036c?pvs=4)
- garcia.robert.0514@eam.edu.co
- https://www.linkedin.com/in/robert-garcia-rey/

### 1. Introducción

Los artículos de noticias bursátiles de 2019-10-21 al 2023-11-09. Se recopilarán mediante web scraping dinámico desde Investing.com utilizando una combinación de la biblioteca ***Request*** para automatizar la interacción con el navegador permitiendo la extracción de datos mediante ***Beautiful Soup.***

### 2. Importamos librerias

In [1]:
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
import yfinance as yf
import time
from newspaper import Article
from htmldate import find_date
import warnings
warnings.filterwarnings('ignore')

### 3. Recopilación de datos

##### *Scraping para creacion de dataset noticias de investing.com*

In [2]:
def get_newslinks(company, page_number):
    """Scrapes article URLs for a given company and page number from a news website.

    :param company: name of the company to scrape articles for
    :param page_number: page number on the news website to iterate over

    :return: list of article URLs
    """
    url = f"https://www.investing.com/equities/{company}-news/{page_number}"
    
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
    }

    req = Request(url, headers=headers)
    html_content = urlopen(req).read()

    soup = BeautifulSoup(html_content, "html.parser")
    articles = soup.find_all('article')

    cleaned_links = []
    
    for article in articles:
        links = article.find_all('a')
        for link in links:
            partial_link = link.get('href')
            if partial_link is not None:  # Verificar si partial_link no es None
                if 'https' in partial_link:
                    cleaned_links.append(partial_link)
                elif partial_link.startswith('/'):
                    cleaned_links.append('https://www.investing.com' + partial_link)

    return list(set(cleaned_links))

In [3]:
all_company_urls = []
for page in range(1, 119):
    results = get_newslinks('apple-computer-inc', page)
    all_company_urls.extend(results)
    time.sleep(5)  # Agrega un retraso de 5 segundos entre solicitudes
all_company_urls

['https://www.investing.com/news/stock-market-news/apple-and-amazon-shares-surge-amid-tech-market-challenges-93CH-3237875',
 'https://www.investing.com/news/stock-market-news/analysisus-consumer-watchdog-hands-wall-street-rare-win-with-big-tech-crackdown-3237513',
 'https://www.investing.com/news/economy/gaps-holiday-season-outlook-alibaba-slips-in-hong-kong--whats-moving-markets-3237401#comments',
 'https://www.investing.com/news/stock-market-news/apple-to-support-rcs-messaging-boosting-android-interoperability-93CH-3237183',
 'https://www.investing.com/news/stock-market-news/apples-stock-performance-stalls-as-tech-rivals-advance-93CH-3237374',
 'https://www.investing.com/news/stock-market-news/apple-files-legal-challenge-to-eus-digital-markets-act-3237883',
 'https://www.investing.com/news/stock-market-news/wedbush-expects-short-covering-for-the-ages-says-new-tech-bull-market-has-now-begun-3238057',
 'https://www.investing.com/news/stock-market-news/apple-to-support-rcs-messaging-boo

In [4]:
# Save URLS to text file

with open('apple_article_investing.txt', 'w') as f:
    for link in all_company_urls:
        f.write("%s\n" % link)

In [5]:
ticker = 'AAPL'
article_sentiments = pd.DataFrame(columns=['ticker', 'publish_date', 'title', 'body_text', 'url'])

# Loop over all the articles
for link in all_company_urls:
    article = Article(link)
    article.download()

    try:
        article.parse()
        text = article.text
    except Exception as e:
        print(f"Error processing URL {link}: {str(e)}")
        continue

    #sid = SentimentIntensityAnalyzer()
    #polarity = sid.polarity_scores(text)

    tmpdic = {'ticker': ticker, 'publish_date': find_date(link), 'title': article.title, 'body_text': article.text, 'url': link}
    #tmpdic.update(polarity)
    article_sentiments = pd.concat([article_sentiments, pd.DataFrame(tmpdic, index=[0])])

article_sentiments.reset_index(drop=True, inplace=True)

Error processing URL https://www.investing.com/news/economy/factboxhow-the-eus-digital-markets-act-challenges-big-tech-3168869: Article `download()` failed with 502 Server Error: Bad Gateway for url: https://www.investing.com/news/economy/factboxhow-the-eus-digital-markets-act-challenges-big-tech-3168869 on URL https://www.investing.com/news/economy/factboxhow-the-eus-digital-markets-act-challenges-big-tech-3168869
Error processing URL https://www.investing.com/news/stock-market-news/sp-500-falls-on-fresh-inflation-jitters-apple-slump-3168963#comments: Article `download()` failed with 502 Server Error: Bad Gateway for url: https://www.investing.com/news/stock-market-news/sp-500-falls-on-fresh-inflation-jitters-apple-slump-3168963#comments on URL https://www.investing.com/news/stock-market-news/sp-500-falls-on-fresh-inflation-jitters-apple-slump-3168963#comments


Dataset investing.com

In [6]:
article_sentiments

Unnamed: 0,ticker,publish_date,title,body_text,url
0,AAPL,2023-11-17,Apple and Amazon shares surge amid tech market...,"Published Nov 17, 2023 09:29AM ET\n\n© Reuters...",https://www.investing.com/news/stock-market-ne...
1,AAPL,2023-11-17,Analysis-US consumer watchdog hands Wall Stree...,"Published Nov 17, 2023 06:02AM ET Updated Nov ...",https://www.investing.com/news/stock-market-ne...
2,AAPL,2023-11-17,"Gap's holiday season outlook, Alibaba slips in...","Published Nov 17, 2023 04:43AM ET\n\n© Reuters...",https://www.investing.com/news/economy/gaps-ho...
3,AAPL,2023-11-17,"Apple to support RCS messaging, boosting Andro...","Published Nov 16, 2023 10:27PM ET Updated Nov ...",https://www.investing.com/news/stock-market-ne...
4,AAPL,2023-11-17,Apple's stock performance stalls as tech rival...,"Published Nov 17, 2023 03:51AM ET\n\n© Reuters...",https://www.investing.com/news/stock-market-ne...
...,...,...,...,...,...
1422,AAPL,2023-07-11,"Fed officials discuss rates, Amazon Prime Day ...","Published Jul 11, 2023 05:45AM ET\n\n© Reuters...",https://www.investing.com/news/economy/fed-off...
1423,AAPL,2023-07-10,"EU seals new US data transfer pact, but challe...","Published Jul 10, 2023 09:58AM ET Updated Jul ...",https://www.investing.com/news/stock-market-ne...
1424,AAPL,2023-07-07,Why a Bitcoin ETF approval would be a big deal...,"Published Jul 07, 2023 08:00AM ET Updated Jul ...",https://www.investing.com/news/cryptocurrency-...
1425,AAPL,2023-07-10,Market heavyweights dip ahead of Nasdaq 100 re...,"Published Jul 10, 2023 06:12PM ET\n\n© Reuters...",https://www.investing.com/news/stock-market-ne...


In [9]:
article_sentiments['publish_date'].min(), article_sentiments['publish_date'].max()

('2023-07-07', '2023-11-17')

In [13]:
article_sentiments['url'].iloc[1351]

'https://www.investing.com/news/stock-market-news/apple-tests-generative-ai-tools-to-rival-openais-chatgpt--bloomberg-news-3129115'

In [14]:
article_sentiments['body_text'].iloc[1351]

'Published Jul 19, 2023 12:46PM ET Updated Jul 19, 2023 02:06PM ET\n\n© Reuters. FILE PHOTO: The Apple Inc. logo is seen hanging at the entrance to the Apple store on 5th Avenue in Manhattan, New York, U.S., October 16, 2019. REUTERS/Mike Segar/File Photo\n\nMSFT -1.68% Add to/Remove from Watchlist GOOG -1.27% Add to/Remove from Watchlist NVDA -0.37% Add to/Remove from Watchlist AAPL -0.01% Add to/Remove from Watchlist GOOGL -1.18% Add to/Remove from Watchlist\n\n(Reuters) -Apple is working on artificial intelligence (AI) offerings similar to OpenAI\'s ChatGPT and Google\'s Bard, Bloomberg News reported on Wednesday, sending its shares up as much as 2% to a record high.\n\nThe iPhone maker has built its own framework, known as "Ajax", to create large language models (LLMs) and is also testing a chatbot that some engineers call "Apple (NASDAQ: ) GPT", the report said, citing people with knowledge of the matter.\n\nThe company did not respond to a Reuters request for comment.\n\nApple ha

##### *Scraping para creacion de dataset precios de acciones Moderna desde Yahoo Finance*

In [16]:
df_aapl_price = yf.download("AAPL", start="2023-07-07", end="2023-11-17")
df_aapl_price

[*********************100%***********************]  1 of 1 completed


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2023-07-07,191.410004,192.669998,190.240005,190.679993,190.172302,46778000
2023-07-10,189.259995,189.990005,187.039993,188.610001,188.107834,59922200
2023-07-11,189.160004,189.300003,186.600006,188.080002,187.579254,46638100
2023-07-12,189.679993,191.699997,188.470001,189.770004,189.264740,60750200
2023-07-13,190.500000,191.190002,189.779999,190.539993,190.032669,41342300
...,...,...,...,...,...,...
2023-11-10,183.970001,186.570007,183.529999,186.399994,186.399994,66133400
2023-11-13,185.820007,186.029999,184.210007,184.800003,184.800003,43627500
2023-11-14,187.699997,188.110001,186.300003,187.440002,187.440002,60108400
2023-11-15,187.850006,189.500000,187.779999,188.009995,188.009995,53790500


### 4. Guardar los dos DataFrame

In [17]:
article_sentiments.to_csv("../data/raw/apple_article_investing.csv", sep=',', encoding='utf-8', header=True)

print("DataFrame guardado como 'apple_article_investing.csv'")

DataFrame guardado como 'apple_article_investing.csv'


In [18]:
# Guardar el DataFrame en un archivo CSV
df_aapl_price.to_csv('../data/raw/apple_price_yfinance.csv')

print("DataFrame guardado como 'apple_price_yfinance.csv'")

DataFrame guardado como 'apple_price_yfinance.csv'


In [21]:
article_sentiments.to_pickle("../data/raw/apple_article_investing.pkl")
df_aapl_price.to_pickle("../data/raw/apple_price_yfinance.pkl")