<h2>Vale 3 - Analise de Sentimento no Twitter</h2>

<p>Para criar um modelo de machine learning que analise a tendência de alta ou baixa da ação VALE3 (Vale S.A.) com base em tweets, você precisará seguir algumas etapas. No entanto, é importante notar que a análise de sentimentos em redes sociais é apenas um dos muitos fatores que podem influenciar o mercado de ações, e os resultados devem ser interpretados com cautela. Aqui está um esboço geral do processo:</p>

<ol>
    <li><strong>Coleta de Dados</strong>
        <ul>
            <li><strong>Twitter API</strong>: Utilize a API do Twitter para coletar tweets relacionados à ação VALE3. Você precisará de uma conta de desenvolvedor no Twitter para acessar a API.</li>
            <li><strong>Palavras-chave</strong>: Defina palavras-chave relevantes para a busca, como "VALE3", "Vale", "mercado de ações", etc.</li>
        </ul>
    </li>
    <li><strong>Pré-processamento dos Dados</strong>
        <ul>
            <li><strong>Limpeza</strong>: Remova URLs, menções, hashtags e caracteres especiais dos tweets.</li>
            <li><strong>Tokenização</strong>: Divida os textos em palavras ou frases.</li>
            <li><strong>Normalização</strong>: Converta o texto para minúsculas, remova acentuações, etc.</li>
        </ul>
    </li>
    <li><strong>Análise de Sentimento</strong>
        <ul>
            <li><strong>Modelos de NLP</strong>: Utilize modelos de processamento de linguagem natural (NLP) para analisar o sentimento dos tweets. Modelos como BERT, TextBlob ou VADER podem ser úteis.</li>
            <li><strong>Classificação</strong>: Classifique os tweets como positivos, negativos ou neutros.</li>
        </ul>
    </li>
    <li><strong>Análise e Interpretação</strong>
        <ul>
            <li><strong>Tendências</strong>: Analise a proporção de sentimentos positivos e negativos para estimar uma tendência de alta ou baixa.</li>
            <li><strong>Contexto</strong>: Considere o contexto dos tweets. Às vezes, um tweet negativo pode não estar diretamente relacionado ao desempenho da ação.</li>
        </ul>
    </li>
    <li><strong>Integração com Dados de Mercado</strong>
        <ul>
            <li><strong>Dados Históricos</strong>: Compare suas análises de sentimentos com os dados históricos de preço da ação para validar a eficácia do modelo.</li>
        </ul>
    </li>
    <li><strong>Considerações Éticas e Legais</strong>
        <ul>
            <li><strong>Privacidade e Ética</strong>: Certifique-se de seguir as diretrizes éticas e legais ao usar dados de redes sociais.</li>
            <li><strong>Limitações</strong>: Esteja ciente das limitações e incertezas associadas à análise de sentimentos e ao mercado de ações.</li>
        </ul>
    </li>
</ol>


<h2>Análise de Palavras-Chave para Ação VALE3</h2>
<p>A análise do documento PDF forneceu várias informações relevantes sobre a ação VALE3 (Vale S.A.). Aqui estão as principais palavras-chave e temas identificados que podem ser úteis para coletar e analisar tweets relacionados à VALE3:</p>
<ul>
    <li><strong>Siderurgia & Mineração</strong>: Este tema é recorrente e indica a importância do setor de mineração e siderurgia no contexto da VALE3.</li>
    <li><strong>Desempenho de Preços</strong>: Há várias menções ao desempenho de preços da ação VALE3, o que sugere que este é um tópico de interesse significativo.</li>
    <li><strong>Comparação com Pares</strong>: O documento faz comparações entre a VALE3 e outras empresas do setor, como CSN e Gerdau. Isso pode ser relevante para entender a posição da VALE3 no mercado.</li>
    <li><strong>Minério de Ferro e Cobre</strong>: Há discussões sobre os preços do minério de ferro e cobre, indicando a relevância dessas commodities para a VALE3.</li>
    <li><strong>Análise de Mercado e Investimento</strong>: O documento contém análises de mercado e recomendações de investimento, o que pode ser um ponto de interesse para análises de sentimentos.</li>
    <li><strong>EBITDA e Resultados Financeiros</strong>: Menções a EBITDA e outros resultados financeiros são frequentes, refletindo a importância desses indicadores para a avaliação da empresa.</li>
    <li><strong>Riscos e Projeções</strong>: O documento aborda riscos e projeções para a VALE3, o que pode ser um tópico relevante para análises futuras.</li>
</ul>






In [1]:
import tweepy
from textblob import TextBlob

# Autenticação com a API do Twitter
consumer_key = 'SUA_CONSUMER_KEY'
consumer_secret = 'SUA_CONSUMER_SECRET'
access_token = 'SEU_ACCESS_TOKEN'
access_token_secret = 'SEU_ACCESS_TOKEN_SECRET'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth)

# Função para realizar a análise de sentimentos
def analisar_sentimento(tweet):
    analysis = TextBlob(tweet.text)
    if analysis.sentiment.polarity > 0:
        return 'Positivo'
    elif analysis.sentiment.polarity == 0:
        return 'Neutro'
    else:
        return 'Negativo'

# Coletar tweets
tweets = api.search_tweets(q="VALE3", lang="pt", count=100)

# Analisar sentimentos dos tweets
for tweet in tweets:
    print(f'Tweet: {tweet.text}\nSentimento: {analisar_sentimento(tweet)}\n')


ModuleNotFoundError: No module named 'tweepy'

In [None]:
import tweepy
import csv
import datetime
import pandas as pd

def twt_creds(kw_number):
    if kw_number == 0:
        API_KEY = 'API_KEY'
        API_SECRET = 'API_SECRET'
        ACCESS_TOKEN = 'ACCESS_TOKEN'
        ACCESS_TOKEN_SECRET = 'ACCESS_TOKEN_SECRET'
    elif kw_number == 1:
        API_KEY = 'API_KEY'
        API_SECRET = 'API_SECRET'
        ACCESS_TOKEN = 'ACCESS_TOKEN'
        ACCESS_TOKEN_SECRET = 'ACCESS_TOKEN_SECRET'
    elif kw_number == 2:
        API_KEY = 'API_KEY'
        API_SECRET = 'API_SECRET'
        ACCESS_TOKEN = 'ACCESS_TOKEN'
        ACCESS_TOKEN_SECRET = 'ACCESS_TOKEN_SECRET'
    elif kw_number == 3:
        API_KEY = 'API_KEY'
        API_SECRET = 'API_SECRET'
        ACCESS_TOKEN = 'ACCESS_TOKEN'
        ACCESS_TOKEN_SECRET = 'ACCESS_TOKEN_SECRET'
    return dict(API_KEY=API_KEY, API_SECRET=API_SECRET, ACCESS_TOKEN=ACCESS_TOKEN, ACCESS_TOKEN_SECRET=ACCESS_TOKEN_SECRET)




def twitter_csvcreatefile_header(keyword):
    f = open('_data/%s_tweets.csv' % keyword, 'w')
    with f as file:
        w = csv.writer(file)
        w.writerow(['contributors',
                    'coordinates',
                    'created_at',
                    'entities_hashtags',
                    'entities_symbols',
                    'entities_urls',
                    'entities_user_mentions',
                    'favorite_count',
                    'favorited',
                    'geo',
                    'id',
                    'id_str',
                    'in_reply_to_screen_name',
                    'in_reply_to_status_id',
                    'in_reply_to_status_id_str',
                    'in_reply_to_user_id_iso_language_code',
                    'in_reply_to_user_id_str_result_type',
                    'is_quote_status',
                    'lang',
                    'metadata_iso_language_code',
                    'metadata_result_type',
                    'place',
                    'retweet_count',
                    'retweeted',
                    'retweeted_status_contributors',
                    'retweeted_status_coordinates',
                    'retweeted_status_created_at',
                    'retweeted_status_entities',
                    'retweeted_status_favorite_count',
                    'retweeted_status_favorited',
                    'retweeted_status_geo',
                    'retweeted_status_id',
                    'retweeted_status_id_str',
                    'retweeted_status_in_reply_to_screen_name',
                    'retweeted_status_in_reply_to_status_id',
                    'retweeted_status_in_reply_to_status_id_str',
                    'retweeted_status_in_reply_to_user_id',
                    'retweeted_status_in_reply_to_user_id_str',
                    'retweeted_status_is_quote_status',
                    'retweeted_status_lang',
                    'retweeted_status_metadata',
                    'retweeted_status_place',
                    'retweeted_status_retweet_count',
                    'retweeted_status_retweeted',
                    'retweeted_status_source',
                    'retweeted_status_text',
                    'retweeted_status_truncated',
                    'retweeted_status_user',
                    'source',
                    'text',
                    'truncated',
                    'user_contributors_enabled',
                    'user_created_at',
                    'user_default_profile',
                    'user_default_profile_image',
                    'user_description',
                    'user_favourites_count',
                    'user_follow_request_sent',
                    'user_followers_count',
                    'user_following',
                    'user_friends_count',
                    'user_geo_enabled',
                    'user_has_extended_profile',
                    'user_id',
                    'user_id_str',
                    'user_is_translation_enabled',
                    'user_is_translator',
                    'user_lang',
                    'user_listed_count',
                    'user_location',
                    'user_name',
                    'user_notifications',
                    'user_profile_background_color',
                    'user_profile_background_image_url',
                    'user_profile_background_image_url_https',
                    'user_profile_background_tile',
                    'user_profile_banner_url',
                    'user_profile_image_url',
                    'user_profile_image_url_https',
                    'user_profile_link_color',
                    'user_profile_sidebar_border_color',
                    'user_profile_sidebar_fill_color',
                    'user_profile_text_color',
                    'user_profile_use_background_image',
                    'user_protected',
                    'user_screen_name',
                    'user_statuses_count',
                    'user_time_zone',
                    'user_translator_type',
                    'user_url',
                    'user_utc_offset',
                    'user_verified',
                    'time_crawled'
                    ])





def update_tweets(keyword):

    def if_empty(json_input):
        if json_input == '':
            return ''
        else:
            return json_input


    def json_check_keys(jsono):
        print(jsono.keys())

    jsono = ['contributors','coordinates','created_at','entities','favorite_count','favorited',
             'geo','id','id_str','in_reply_to_screen_name','in_reply_to_status_id',
             'in_reply_to_status_id_str','in_reply_to_user_id','in_reply_to_user_id_str',
             'is_quote_status','lang','metadata','place','retweet_count','retweeted',
             'retweeted_status','source','text','truncated','user']

    fields_with_subfields = ['entities','in_reply_to_user_id','in_reply_to_user_id_str',
                             'metadata','retweeted_status','user']

    subfields= {'entities':['hashtags','symbols','urls','user_mentions'],
                'in_reply_to_user_id':['iso_language_code'],
                'in_reply_to_user_id_str':['result_type'],
                'metadata':['iso_language_code','result_type'],
                'retweeted_status': ['contributors','coordinates','created_at','entities',
                                     'favorite_count','favorited','geo','id','id_str',
                                     'in_reply_to_screen_name','in_reply_to_status_id',
                                     'in_reply_to_status_id_str','in_reply_to_user_id',
                                     'in_reply_to_user_id_str','is_quote_status','lang',
                                     'metadata','place','retweet_count','retweeted',
                                     'source','text','truncated','user'],
                'user':['contributors_enabled','created_at','default_profile',
                        'default_profile_image','description','favourites_count',
                        'follow_request_sent','followers_count','following','friends_count',
                        'geo_enabled','has_extended_profile','id','id_str',
                        'is_translation_enabled','is_translator','lang','listed_count','location',
                        'name','notifications','profile_background_color',
                        'profile_background_image_url','profile_background_image_url_https',
                        'profile_background_tile','profile_banner_url','profile_image_url',
                        'profile_image_url_https','profile_link_color',
                        'profile_sidebar_border_color','profile_sidebar_fill_color',
                        'profile_text_color','profile_use_background_image','protected',
                        'screen_name','statuses_count','time_zone','translator_type','url',
                        'utc_offset','verified']}

    API_KEY = twt_creds(0)['API_KEY']
    API_SECRET = twt_creds(0)['API_SECRET']
    ACCESS_TOKEN = twt_creds(0)['ACCESS_TOKEN']
    ACCESS_TOKEN_SECRET = twt_creds(0)['ACCESS_TOKEN_SECRET']
    auth = tweepy.OAuthHandler(API_KEY, API_SECRET)
    auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
    api= tweepy.API(auth)

    max_tweets = 2000
    print ( 'Processing %s Tweets containing the term \"%s\": %s' % (max_tweets,keyword,datetime.datetime.now()) )

    try:
        searched_tweets = [status for status in tweepy.Cursor(api.search, q=keyword).items(max_tweets)]

        f = open('_data/%s_tweets.csv' % keyword, 'a')

        with f as file:
            i=0
            w = csv.writer(file)
            for tweet in searched_tweets:
                i=i+1
                data_row=[]
                for field in jsono:
                    if field in tweet._json.keys():
                        if field in fields_with_subfields:
                            for subfield in subfields[field]:
                                try:
                                    data_row.append(tweet._json[field][subfield])
                                except:
                                    data_row.append('')
                        else:
                            if_empty(data_row.append(tweet._json[field]))

                    else:
                        data_row.append('')

                if 'retweeted_status' not in tweet._json.keys():
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                data_row.append(datetime.datetime.now())
                w.writerow(data_row)
        df = pd.read_csv('_data/%s_tweets.csv' % keyword)
        df['id'] = df['id'].apply(str)
        df.sort_values(['time_crawled'], ascending=False).drop_duplicates(['id'], keep='first').sort_values(['created_at'], ascending=False).to_csv('_data/%s_tweets.csv' % keyword, index=False)
        print('Done! %s Tweets processed: %s' % (i, datetime.datetime.now()))
    except:
        print('Failed to send request: Read timed out.')


def get_tweets(keyword):
    def if_empty(json_input):
        if json_input == '':
            return ''
        else:
            return json_input


    def json_check_keys(jsono):
        print(jsono.keys())

    jsono = ['contributors','coordinates','created_at','entities','favorite_count','favorited',
             'geo','id','id_str','in_reply_to_screen_name','in_reply_to_status_id',
             'in_reply_to_status_id_str','in_reply_to_user_id','in_reply_to_user_id_str',
             'is_quote_status','lang','metadata','place','retweet_count','retweeted',
             'retweeted_status','source','text','truncated','user']

    fields_with_subfields = ['entities','in_reply_to_user_id','in_reply_to_user_id_str',
                             'metadata','retweeted_status','user']

    subfields= {'entities':['hashtags','symbols','urls','user_mentions'],
                'in_reply_to_user_id':['iso_language_code'],
                'in_reply_to_user_id_str':['result_type'],
                'metadata':['iso_language_code','result_type'],
                'retweeted_status': ['contributors','coordinates','created_at','entities',
                                     'favorite_count','favorited','geo','id','id_str',
                                     'in_reply_to_screen_name','in_reply_to_status_id',
                                     'in_reply_to_status_id_str','in_reply_to_user_id',
                                     'in_reply_to_user_id_str','is_quote_status','lang',
                                     'metadata','place','retweet_count','retweeted',
                                     'source','text','truncated','user'],
                'user':['contributors_enabled','created_at','default_profile',
                        'default_profile_image','description','favourites_count',
                        'follow_request_sent','followers_count','following','friends_count',
                        'geo_enabled','has_extended_profile','id','id_str',
                        'is_translation_enabled','is_translator','lang','listed_count','location',
                        'name','notifications','profile_background_color',
                        'profile_background_image_url','profile_background_image_url_https',
                        'profile_background_tile','profile_banner_url','profile_image_url',
                        'profile_image_url_https','profile_link_color',
                        'profile_sidebar_border_color','profile_sidebar_fill_color',
                        'profile_text_color','profile_use_background_image','protected',
                        'screen_name','statuses_count','time_zone','translator_type','url',
                        'utc_offset','verified']}

    API_KEY = twt_creds(0)['API_KEY']
    API_SECRET = twt_creds(0)['API_SECRET']
    ACCESS_TOKEN = twt_creds(0)['ACCESS_TOKEN']
    ACCESS_TOKEN_SECRET = twt_creds(0)['ACCESS_TOKEN_SECRET']
    auth = tweepy.OAuthHandler(API_KEY, API_SECRET)
    auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
    api= tweepy.API(auth)

    max_tweets = 2000
    print ( 'Processing %s Tweets containing the term \"%s\": %s' % (max_tweets,keyword,datetime.datetime.now()) )

    searched_tweets = [status for status in tweepy.Cursor(api.search, q=keyword).items(max_tweets)]

    f = open('_data/%s_tweets.csv' % keyword, 'a')

    with f as file:
        i=0
        w = csv.writer(file)
        for tweet in searched_tweets:
            i=i+1
            data_row=[]
            for field in jsono:
                if field in tweet._json.keys():
                    if field in fields_with_subfields:
                        for subfield in subfields[field]:
                            try:
                                data_row.append(tweet._json[field][subfield])
                            except:
                                data_row.append('')
                    else:
                        if_empty(data_row.append(tweet._json[field]))

                else:
                    data_row.append('')

            if 'retweeted_status' not in tweet._json.keys():
                data_row.insert(25, '')
                data_row.insert(25, '')
                data_row.insert(25, '')
                data_row.insert(25, '')
                data_row.insert(25, '')
                data_row.insert(25, '')
                data_row.insert(25, '')
                data_row.insert(25, '')
                data_row.insert(25, '')
                data_row.insert(25, '')
                data_row.insert(25, '')
                data_row.insert(25, '')
                data_row.insert(25, '')
                data_row.insert(25, '')
                data_row.insert(25, '')
                data_row.insert(25, '')
                data_row.insert(25, '')
                data_row.insert(25, '')
                data_row.insert(25, '')
                data_row.insert(25, '')
                data_row.insert(25, '')
                data_row.insert(25, '')
                data_row.insert(25, '')
            data_row.append(datetime.datetime.now())
            w.writerow(data_row)
    print('Done! %s Tweets processed: %s' % (i, datetime.datetime.now()))


def update_tweets_gui(keyword, kw_number):

    def if_empty(json_input):
        if json_input == '':
            return ''
        else:
            return json_input


    def json_check_keys(jsono):
        print(jsono.keys())

    jsono = ['contributors','coordinates','created_at','entities','favorite_count','favorited',
             'geo','id','id_str','in_reply_to_screen_name','in_reply_to_status_id',
             'in_reply_to_status_id_str','in_reply_to_user_id','in_reply_to_user_id_str',
             'is_quote_status','lang','metadata','place','retweet_count','retweeted',
             'retweeted_status','source','text','truncated','user']

    fields_with_subfields = ['entities','in_reply_to_user_id','in_reply_to_user_id_str',
                             'metadata','retweeted_status','user']

    subfields= {'entities':['hashtags','symbols','urls','user_mentions'],
                'in_reply_to_user_id':['iso_language_code'],
                'in_reply_to_user_id_str':['result_type'],
                'metadata':['iso_language_code','result_type'],
                'retweeted_status': ['contributors','coordinates','created_at','entities',
                                     'favorite_count','favorited','geo','id','id_str',
                                     'in_reply_to_screen_name','in_reply_to_status_id',
                                     'in_reply_to_status_id_str','in_reply_to_user_id',
                                     'in_reply_to_user_id_str','is_quote_status','lang',
                                     'metadata','place','retweet_count','retweeted',
                                     'source','text','truncated','user'],
                'user':['contributors_enabled','created_at','default_profile',
                        'default_profile_image','description','favourites_count',
                        'follow_request_sent','followers_count','following','friends_count',
                        'geo_enabled','has_extended_profile','id','id_str',
                        'is_translation_enabled','is_translator','lang','listed_count','location',
                        'name','notifications','profile_background_color',
                        'profile_background_image_url','profile_background_image_url_https',
                        'profile_background_tile','profile_banner_url','profile_image_url',
                        'profile_image_url_https','profile_link_color',
                        'profile_sidebar_border_color','profile_sidebar_fill_color',
                        'profile_text_color','profile_use_background_image','protected',
                        'screen_name','statuses_count','time_zone','translator_type','url',
                        'utc_offset','verified']}

    API_KEY = twt_creds(kw_number)['API_KEY']
    API_SECRET = twt_creds(kw_number)['API_SECRET']
    ACCESS_TOKEN = twt_creds(kw_number)['ACCESS_TOKEN']
    ACCESS_TOKEN_SECRET = twt_creds(kw_number)['ACCESS_TOKEN_SECRET']



    auth = tweepy.OAuthHandler(API_KEY, API_SECRET)
    auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
    api= tweepy.API(auth)

    max_tweets = 2000
    print ( 'Processing %s Tweets containing the term \"%s\": %s' % (max_tweets,keyword,datetime.datetime.now()) )

    try:
        searched_tweets = [status for status in tweepy.Cursor(api.search, q=keyword).items(max_tweets)]

        f = open('_data/%s_tweets.csv' % keyword, 'a')

        with f as file:
            i=0
            w = csv.writer(file)
            for tweet in searched_tweets:
                i=i+1
                data_row=[]
                for field in jsono:
                    if field in tweet._json.keys():
                        if field in fields_with_subfields:
                            for subfield in subfields[field]:
                                try:
                                    data_row.append(tweet._json[field][subfield])
                                except:
                                    data_row.append('')
                        else:
                            if_empty(data_row.append(tweet._json[field]))

                    else:
                        data_row.append('')

                if 'retweeted_status' not in tweet._json.keys():
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                    data_row.insert(25, '')
                data_row.append(datetime.datetime.now())
                w.writerow(data_row)
        df = pd.read_csv('_data/%s_tweets.csv' % keyword)
        df['id'] = df['id'].apply(str)
        df.sort_values(['time_crawled'], ascending=False).drop_duplicates(['id'], keep='first').sort_values(['created_at'], ascending=False).to_csv('_data/%s_tweets.csv' % keyword, index=False)
        print('Done! %s Tweets processed: %s' % (i, datetime.datetime.now()))
    except Exception as e:
        print(e)