# Data Collection

#### We collect tweets regarding the **META** (ticker:`META`) company using StockTwits API. 

### Seup the environement

In [1]:
#!pip install "pymongo[srv]"==3.11
#!pip install pymongo --upgrade
#!pip install openai

### Import libs

In [2]:
#### COMMON ####
import pandas as pd
import numpy as np

import json
import requests
from datetime import datetime

### MONGO DB ####
from pymongo.mongo_client import MongoClient

### Import utils function ###
from utils import test_connection_db, collect_tweets, insert_tweets, get_tweets_from_db, URI, DB_NAME

### STEP 1: Load `META` tweets from StockTwits

In [3]:
df_meta_tweets = collect_tweets(ticker="META", nb_url=10)

STARTING TO COLLECT TWEET META...


COLLECTNG FROM:... 
1: https://api.stocktwits.com/api/2/streams/symbol/META.json
COLLECTNG FROM:... 
2: https://api.stocktwits.com/api/2/streams/symbol/META.json?max=565501459
COLLECTNG FROM:... 
3: https://api.stocktwits.com/api/2/streams/symbol/META.json?max=565490721
COLLECTNG FROM:... 
4: https://api.stocktwits.com/api/2/streams/symbol/META.json?max=565466496
COLLECTNG FROM:... 
5: https://api.stocktwits.com/api/2/streams/symbol/META.json?max=565457223
COLLECTNG FROM:... 
6: https://api.stocktwits.com/api/2/streams/symbol/META.json?max=565444544
COLLECTNG FROM:... 
7: https://api.stocktwits.com/api/2/streams/symbol/META.json?max=565434112
COLLECTNG FROM:... 
8: https://api.stocktwits.com/api/2/streams/symbol/META.json?max=565415066
COLLECTNG FROM:... 
9: https://api.stocktwits.com/api/2/streams/symbol/META.json?max=565395690
COLLECTNG FROM:... 
10: https://api.stocktwits.com/api/2/streams/symbol/META.json?max=565378771

112 TWEETS ARE SUCCESFULLY

In [4]:
df_meta_tweets

Unnamed: 0,date,content,true_sentiment
6,2024-03-12T20:39:18Z,$SPY $NVDA $META $ORCL Only sad BEARS on here ...,bullish
7,2024-03-12T20:35:09Z,$META 510 Tomorrow Bears Was that over the top...,bullish
15,2024-03-12T19:46:15Z,$PSTG $NTAP $META thatâ€™s a big hit to PSTG pro...,bearish
18,2024-03-12T19:37:31Z,$ATNF looks primed for a big Squeeze ðŸ”¥ \n \nW...,bullish
19,2024-03-12T19:36:29Z,$META itâ€™s over. Thx for playing,bearish
...,...,...,...
289,2024-03-12T00:23:08Z,$META &quot;the enemy of the people&quot;,bearish
290,2024-03-12T00:20:07Z,$META we donâ€™t need any one to tell us who is ...,bullish
292,2024-03-12T00:08:53Z,$META I see downtrend forming but the oversold...,bullish
296,2024-03-11T23:49:38Z,$META so so OVERSOLD Will be UP like a yoyo,bullish


#### NOTE:

* The sentiment given by StockTwits will be the label (ground truth) of our data

### Store the tweets collection into the DB

##### Test the connection with the MongoDB cluster

In [5]:
test_connection_db(URI)

Pinged your deployment. You successfully connected to MongoDB!


In [6]:
insert_tweets(df_meta_tweets, uri=URI, db_name=DB_NAME, collection_name="META")

STARTING TO CLEAN DATA...
tweets_db.META COLLECTION IS SUCCESSFULLY CLEANED 


STARTING TO INSERT DATA...
row: 6 inserted
row: 7 inserted
row: 15 inserted
row: 18 inserted
row: 19 inserted
row: 22 inserted
row: 24 inserted
row: 28 inserted
row: 32 inserted
row: 44 inserted
row: 45 inserted
row: 47 inserted
row: 48 inserted
row: 49 inserted
row: 51 inserted
row: 52 inserted
row: 53 inserted
row: 55 inserted
row: 56 inserted
row: 57 inserted
row: 64 inserted
row: 66 inserted
row: 67 inserted
row: 68 inserted
row: 70 inserted
row: 73 inserted
row: 75 inserted
row: 77 inserted
row: 79 inserted
row: 81 inserted
row: 84 inserted
row: 87 inserted
row: 89 inserted
row: 92 inserted
row: 95 inserted
row: 102 inserted
row: 106 inserted
row: 108 inserted
row: 109 inserted
row: 113 inserted
row: 114 inserted
row: 116 inserted
row: 120 inserted
row: 122 inserted
row: 125 inserted
row: 128 inserted
row: 129 inserted
row: 131 inserted
row: 133 inserted
row: 134 inserted
row: 135 inserted
row: 136 inse

### Extract data from the MongoDB database

In [8]:
# Retrieve data from MongoDB
data_from_mongodb = get_tweets_from_db(uri=URI, db_name=DB_NAME, collection_name="META")

# Create DataFrame from retrieved data
df_meta_tweets_ = pd.DataFrame(data_from_mongodb)

df_meta_tweets_

Unnamed: 0,_id,date,content,true_sentiment
0,65f0bf79cdbba6c28e28124b,2024-03-12T20:39:18Z,$SPY $NVDA $META $ORCL Only sad BEARS on here ...,bullish
1,65f0bf7acdbba6c28e28124d,2024-03-12T20:35:09Z,$META 510 Tomorrow Bears Was that over the top...,bullish
2,65f0bf7acdbba6c28e28124f,2024-03-12T19:46:15Z,$PSTG $NTAP $META thatâ€™s a big hit to PSTG pro...,bearish
3,65f0bf7bcdbba6c28e281251,2024-03-12T19:37:31Z,$ATNF looks primed for a big Squeeze ðŸ”¥ \n \nW...,bullish
4,65f0bf7ccdbba6c28e281253,2024-03-12T19:36:29Z,$META itâ€™s over. Thx for playing,bearish
...,...,...,...,...
107,65f0bfc2cdbba6c28e281321,2024-03-12T00:23:08Z,$META &quot;the enemy of the people&quot;,bearish
108,65f0bfc3cdbba6c28e281323,2024-03-12T00:20:07Z,$META we donâ€™t need any one to tell us who is ...,bullish
109,65f0bfc4cdbba6c28e281325,2024-03-12T00:08:53Z,$META I see downtrend forming but the oversold...,bullish
110,65f0bfc4cdbba6c28e281327,2024-03-11T23:49:38Z,$META so so OVERSOLD Will be UP like a yoyo,bullish
