## Extract data from Mongo DB with pymongo

set_mongo_connection(), sets up a connection to a MongoDB database using a connection string, database name, and collection name. It returns the client, db, and col objects which can be used to perform further operations on the database.

get_node_users(), retrieves information about Twitter users from the specified collection. It searches for documents that contain a key called "includes.users" and extracts information such as user ID, number of followers, and username. The function returns a list of unique user objects sorted by the date that their information was last updated.

get_node_tweets(), retrieves information about Twitter tweets from the specified collection. It searches for documents that contain a key called "includes.tweets" and extracts information such as tweet ID, author ID, creation time, and number of retweets. It also checks for any referenced tweets and adds them to a list if present. The function returns a list of tweet objects.

get_node_hashtags(), retrieves a list of unique hashtags from the first tweet in each document in the specified collection. It uses the collection.find() method to search for documents that contain a key called "includes.tweets.0.entities.hashtags", which indicates that the first tweet in the document has at least one hashtag. It then loops through each matching document, extracts the hashtag tags from the first tweet, and adds them to a set to remove duplicates. Finally, the function converts the set to a list and returns it.

get_relationship_has_hashtag(), retrieves information about the relationship between tweets and hashtags in the specified collection. It searches for documents that contain a key called "includes.tweets.0.entities.hashtags" and extracts the tweet ID and associated hashtags. The function returns a list of tweet objects with their associated hashtags.

In [4]:
from python_utils.mongo_python import set_mongo_connection, get_node_users , get_node_tweets , get_node_hashtags , get_relationship_has_hashtag

Examples :

In [5]:
client,database,collection = set_mongo_connection(
    "mongodb://localhost:27017/",
    "local",
    "test"
)

#nodes
users = get_node_users(collection)
tweets = get_node_tweets(collection)
hashtags = get_node_hashtags(collection)

#relationships
has_hashtag = get_relationship_has_hashtag(collection)


#print heads of data
for user in users[:5]:
    print(user)
print("--------------")
for tweet in tweets[:5]:
    print(tweet)
print("--------------")
for hashtag in hashtags[:5]:
    print(hashtag)
print("--------------")
for tweet_hashtag_relation in has_hashtag[:5]:
    print(tweet_hashtag_relation)

{'id': '860025761167572993', 'followers': 35, 'username': 'Sukhmanjeetkau2', 'info_last_updated': datetime.datetime(2023, 1, 25, 23, 59, 30)}
{'id': '1389705763208237056', 'followers': 12552, 'username': 'LauraDoodlesToo', 'info_last_updated': datetime.datetime(2023, 1, 25, 23, 59, 16)}
{'id': '1495625777605488642', 'followers': 7701, 'username': 'AnastasiaNFTart', 'info_last_updated': datetime.datetime(2023, 1, 25, 23, 59, 16)}
{'id': '1613967463', 'followers': 439, 'username': 'esquivel_gifted', 'info_last_updated': datetime.datetime(2023, 1, 25, 23, 58)}
{'id': '959355565', 'followers': 342, 'username': 'DevonGarritt', 'info_last_updated': datetime.datetime(2023, 1, 25, 23, 58)}
--------------
{'tweet_id': '1618398184312639491', 'author_id': '860025761167572993', 'created_at': '2023-01-25T23:59:30.000Z', 'retweet_count': 0, 'referenced_tweets': None}
{'tweet_id': '1618398126364299264', 'author_id': '1389705763208237056', 'created_at': '2023-01-25T23:59:16.000Z', 'retweet_count': 15,