# Data Collection

#### We collect tweets regarding the **META** (ticker:`META`) company using StockTwits API. 

### Seup the environement

In [1]:
#!pip install "pymongo[srv]"==3.11
#!pip install pymongo --upgrade
#!pip install openai

### Import libs

In [2]:
#### COMMON ####
import pandas as pd
import numpy as np

import json
import requests
from datetime import datetime

### MONGO DB ####
from pymongo.mongo_client import MongoClient

### Import utils function ###
from utils import test_connection_db, collect_tweets, insert_tweets, retrieve_from_mongodb, URI, DB_NAME

### STEP 1: Load `META` tweets from StockTwits

In [3]:
df_meta_tweets = collect_tweets(ticker="META", nb_url=10)

STARTING TO COLLECT TWEET META...


COLLECTNG FROM:... 
1: https://api.stocktwits.com/api/2/streams/symbol/META.json
COLLECTNG FROM:... 
2: https://api.stocktwits.com/api/2/streams/symbol/META.json?max=565498474
COLLECTNG FROM:... 
3: https://api.stocktwits.com/api/2/streams/symbol/META.json?max=565476183
COLLECTNG FROM:... 
4: https://api.stocktwits.com/api/2/streams/symbol/META.json?max=565459532
COLLECTNG FROM:... 
5: https://api.stocktwits.com/api/2/streams/symbol/META.json?max=565450575
COLLECTNG FROM:... 
6: https://api.stocktwits.com/api/2/streams/symbol/META.json?max=565441413
COLLECTNG FROM:... 
7: https://api.stocktwits.com/api/2/streams/symbol/META.json?max=565428221
COLLECTNG FROM:... 
8: https://api.stocktwits.com/api/2/streams/symbol/META.json?max=565409077
COLLECTNG FROM:... 
9: https://api.stocktwits.com/api/2/streams/symbol/META.json?max=565385558
COLLECTNG FROM:... 
10: https://api.stocktwits.com/api/2/streams/symbol/META.json?max=565372464

116 TWEETS ARE SUCCESFULLY

In [4]:
df_meta_tweets

Unnamed: 0,date,content,true_sentiment
0,2024-03-12T19:46:15Z,$PSTG $NTAP $META thatâ€™s a big hit to PSTG pro...,bearish
3,2024-03-12T19:37:31Z,$ATNF looks primed for a big Squeeze ðŸ”¥ \n \nW...,bullish
4,2024-03-12T19:36:29Z,$META itâ€™s over. Thx for playing,bearish
7,2024-03-12T19:30:43Z,$META $600 on tok ban tomm,bullish
9,2024-03-12T19:28:36Z,$SPY $NVDA $META $ORCL Trump was an idiot to t...,bullish
...,...,...,...
286,2024-03-11T23:41:49Z,"$META *TIKTOK A US SECURITY THREAT, SAYS TRUMP...",bullish
287,2024-03-11T23:40:45Z,$META I donâ€™t care what political affiliation ...,bullish
289,2024-03-11T23:36:55Z,$META donâ€™t forget how Trump makes fun of hand...,bullish
296,2024-03-11T22:31:23Z,$META Enemy of the People! \nhttps://www.youtu...,bearish


#### NOTE:

* The sentiment given by StockTwits will be the label (ground truth) of our data

### Store the tweets collection into the DB

##### Test the connection with the MongoDB cluster

In [5]:
test_connection_db(URI)

Pinged your deployment. You successfully connected to MongoDB!


In [6]:
insert_tweets(df_meta_tweets, uri=URI, db_name=DB_NAME, collection_name="META")

STARTING TO CLEAN DATA...
tweets_db.META COLLECTION IS SUCCESSFULLY CLEANED 


STARTING TO INSERT DATA...
row: 0 inserted
row: 3 inserted
row: 4 inserted
row: 7 inserted
row: 9 inserted
row: 13 inserted
row: 17 inserted
row: 29 inserted
row: 30 inserted
row: 32 inserted
row: 33 inserted
row: 34 inserted
row: 36 inserted
row: 37 inserted
row: 38 inserted
row: 40 inserted
row: 41 inserted
row: 42 inserted
row: 49 inserted
row: 51 inserted
row: 52 inserted
row: 53 inserted
row: 55 inserted
row: 58 inserted
row: 60 inserted
row: 62 inserted
row: 64 inserted
row: 66 inserted
row: 69 inserted
row: 72 inserted
row: 74 inserted
row: 77 inserted
row: 80 inserted
row: 87 inserted
row: 91 inserted
row: 93 inserted
row: 94 inserted
row: 98 inserted
row: 99 inserted
row: 101 inserted
row: 105 inserted
row: 107 inserted
row: 110 inserted
row: 113 inserted
row: 114 inserted
row: 116 inserted
row: 118 inserted
row: 119 inserted
row: 120 inserted
row: 121 inserted
row: 125 inserted
row: 126 inserted
ro

### Extract data from the MongoDB database

In [7]:
# Retrieve data from MongoDB
data_from_mongodb = retrieve_from_mongodb(uri=URI, db_name=DB_NAME, collection_name="META")

# Create DataFrame from retrieved data
df_meta_tweets_ = pd.DataFrame(data_from_mongodb)

df_meta_tweets_

Unnamed: 0,_id,date,content,true_sentiment
0,65f0b1f3785c5752056d15f1,2024-03-12T19:46:15Z,$PSTG $NTAP $META thatâ€™s a big hit to PSTG pro...,bearish
1,65f0b1f4785c5752056d15f3,2024-03-12T19:37:31Z,$ATNF looks primed for a big Squeeze ðŸ”¥ \n \nW...,bullish
2,65f0b1f4785c5752056d15f5,2024-03-12T19:36:29Z,$META itâ€™s over. Thx for playing,bearish
3,65f0b1f5785c5752056d15f7,2024-03-12T19:30:43Z,$META $600 on tok ban tomm,bullish
4,65f0b1f6785c5752056d15f9,2024-03-12T19:28:36Z,$SPY $NVDA $META $ORCL Trump was an idiot to t...,bullish
...,...,...,...,...
111,65f0b239785c5752056d16cf,2024-03-11T23:41:49Z,"$META *TIKTOK A US SECURITY THREAT, SAYS TRUMP...",bullish
112,65f0b23a785c5752056d16d1,2024-03-11T23:40:45Z,$META I donâ€™t care what political affiliation ...,bullish
113,65f0b23b785c5752056d16d3,2024-03-11T23:36:55Z,$META donâ€™t forget how Trump makes fun of hand...,bullish
114,65f0b23b785c5752056d16d5,2024-03-11T22:31:23Z,$META Enemy of the People! \nhttps://www.youtu...,bearish
