# Aufgabe Beschreibung

- Mindestens eine NoSQL DB verwenden (Docker, Docker-compose)
- Lessons Learned wichtiger als optimale Lösung (Was hätte ich anders gemacht?)

## Abzugeben:
- Dockerfile / Dockercompose pro DB
- PDF Datenmodell -> Aufbau von System
- Skript / Programm zum Laden von Daten in die DB
- Abfragen zum Szenarien
- PDF Lessons Learned (`lessons-learned.pdf`)

## DB:
- Sollte auf mehrere Container / Knoten laufen
  - Wenn es nicht geht, erklären wieso das nicht ging / was dafür gebraucht ist

## App (Optional)
- Laden von Init-Daten
- Abfragen feuern
- Inhalt anzeigen

## Systemanforderungen
- Es gibt:
  - Follower-Beziehungen
  - Posts von Prominenten
- Aufgaben:
  - Posts von Prominenten auf die 100 IDs verteilen, die am meisten Follower haben (Influencer)
  - Posts können geliked werden (von welchem User wurde ein Post eines anderen Users geliked)
    - Zufällig generiert
- Anfragen:
  - Auflistung von zu einem Account zugeordneten Posts
  - Auflistung der 100 Accounts mit den meisten Followern (Influencer)
  - Auflistung der 100 Accounts, die den meisten der Influencer folgen
  - Startseite für ein beliebiges Account (Influencer sind hier gut):
    - Anzahl Followers
    - Anzahl gefolgte Accounts
    - 25 Posts:
      - Neueste
      - Meisten Likes von gefolgte Accounts
    - Caching der Posts für die Startseite:
      - Fan-Out in den Cache jedes Followers beim Schreiben eines neuen Posts (Fragen nicht von zentraler Tabelle, sondern jedes Account hat eigenen Tweets-Feed)
    - Auflistung der 25 Posts, die ein Wort beinhalten:
      - (Optional: Und-verknüpfte Wörter)


# Setup

## Install needed libraries and import components

In [None]:
# Install needed libraries and import components
!%pip install "pymongo[srv]"

from pymongo import MongoClient
from pymongo.errors import ConnectionFailure
from datetime import datetime

import csv
import random

## Define the MongoDB server details

In [None]:
# Define the MongoDB server details
host = 'localhost'
port = 27017
username = 'devroot'  # Replace with your MongoDB username
password = 'devroot'  # Replace with your MongoDB password

# Create the connection string
connection_string = f'mongodb://{username}:{password}@{host}:{port}'

## Connect to DB and test connection

In [None]:
# Connect to the MongoDB server
client = MongoClient(connection_string)

In [None]:
try:
    # Verify connection
    client.admin.command('ping')
    print("Connected successfully to MongoDB")
    
    # List all databases
    databases = client.list_database_names()
    print("Databases:", databases)
        
except ConnectionFailure as e:
    print(f"Could not connect to MongoDB: {e}")

# Global Definitions

## Collection definition

In [None]:
# Select the database and collections
db = client['social_network']
users_collection = db['users']
followers_collection = db['followers']
posts_collection = db['posts']
likes_collection = db['likes']
feeds_collection = db['feeds']

## Helper functions

In [None]:
# Function to add a user
def add_user(user_id):
    user = {"_id": user_id, "following_count": 0, "followers_count": 0}
    users_collection.insert_one(user)

# Function to check if user already exists
def user_exists(user_id):
    return users_collection.count_documents({"_id": user_id}) > 0

# Function to follow a user
def follow_user(follower_id, followed_id):
    relationship = {"follower_id": follower_id, "followed_id": followed_id}
    followers_collection.insert_one(relationship)
    users_collection.update_one({"_id": follower_id}, {"$inc": {"following_count": 1}})
    users_collection.update_one({"_id": followed_id}, {"$inc": {"followers_count": 1}})

# Function to add a post
def add_post(user_id, content, date):
    post = {
        "user_id": user_id,
        "content": content,
        "timestamp": date,
        "likes": 0
    }
    post_id = posts_collection.insert_one(post).inserted_id
    propagate_post_to_followers(post_id, user_id, content, date)
    return post_id

# Function to propagate a post to all followers' feeds
def propagate_post_to_followers(post_id, user_id, content, date):
    followers = followers_collection.find({"followed_id": user_id})
    for follower in followers:
        feed_entry = {
            "user_id": follower["follower_id"],
            "post_id": post_id,
            "poster_id": user_id,
            "content": content,
            "timestamp": date,
            "likes": 0
        }
        feeds_collection.insert_one(feed_entry)

# Function to like a post
def like_post(user_id, post_id):
    like = {"user_id": user_id, "post_id": post_id}
    likes_collection.insert_one(like)
    posts_collection.update_one({"_id": post_id}, {"$inc": {"likes": 1}})
    feeds_collection.update_many({"post_id": post_id}, {"$inc": {"likes": 1}})

def get_top_influencers(n):
    return users_collection.find().sort("followers_count", -1).limit(n)

def get_user_posts(user_id):
    return posts_collection.find({"user_id": user_id})

# This function needs to be changed to only take into account the amount of followed influencers
def get_top_follower_users(n):
    return users_collection.find().sort("following_count", -1).limit(n)

def get_user_profile(user_id, n):
    user = users_collection.find_one({"_id": user_id})
    followers_count = user["followers_count"]
    following_count = user["following_count"]
    
    recent_posts = posts_collection.find({"user_id": user_id}).sort("timestamp", -1).limit(n)
    popular_posts = posts_collection.find({"user_id": user_id}).sort("likes", -1).limit(n)
    feed = feeds_collection.find({"user_id": user_id}).sort("timestamp", -1).limit(n)
    
    profile = {
        "user_id": user_id,
        "followers_count": followers_count,
        "following_count": following_count,
        "recent_posts": list(recent_posts),
        "popular_posts": list(popular_posts),
        "feed": list(feed)
    }
    return profile

def print_user_profile(user_profile):
    print("User profile with id:", user_profile["user_id"])
    print("Followers count:", user_profile["followers_count"])
    print("Following count:", user_profile["following_count"])
    print("Recent posts:")
    for post in user_profile["recent_posts"]:
        print(post["content"], "date:", post["timestamp"])
    print("Popular posts:")
    for post in user_profile["popular_posts"]:
        print(post["content"], "likes", post["likes"])
    print("Feed:")
    for post in user_profile["feed"]:
        print(post["content"], "date:", post["timestamp"])

def get_posts_with_word(word, n):
    return posts_collection.find({"content": {"$regex": word, "$options": "i"}}).limit(n)

# Function to read the CSV file and process the data
def process_csv(file_path):
    data_map = {}  # Dictionary to store the concatenated string and date

    with open(file_path, mode='r', encoding='utf-8') as file:
        reader = csv.DictReader(file)
        
        # Iterate through each row in the CSV file
        for index, row in enumerate(reader):
            author = row['author']
            content = row['content']
            date_time_str = row['date_time']
            
            # Concatenate author and content
            concatenated = f"{author}: {content}"
            
            # Convert the date_time string to a datetime object
            date_time = datetime.strptime(date_time_str, '%d/%m/%Y %H:%M')
            
            # Store the concatenated string and date in the dictionary
            data_map[index] = {'tweet': concatenated, 'date': date_time}
    
    return data_map

# Function to get a random user ID from the users collection
def get_random_user_id():
    count = users_collection.count_documents({})
    if count > 0:
        random_index = random.randint(0, count - 1)
        random_user = users_collection.find().skip(random_index).limit(1)
        for user in random_user:
            return user['_id']
    else:
        return None

# Insert Data into DB

## Data insertion

In [None]:
# Add some posts
alice_post_id = add_post(1, "Hello, world!")  # Alice's post
bob_post_id = add_post(2, "It's a beautiful day!")  # Bob's post
charlie_post_id = add_post(3, "Learning MongoDB with Python!")  # Charlie's post

# Like some posts
like_post(1, alice_post_id)  
like_post(3, alice_post_id)
like_post(2, charlie_post_id)

## Insert users and following relationships to db

In [None]:
file_path = './InputData/twitter_combined.txt'

with open(file_path, 'r') as file:
    lines = file.readlines()
    user_pairs = [tuple(map(int, line.strip().split())) for line in lines]

    for user1, user2 in user_pairs:
        if not user_exists(user1):
            add_user(user1)
        
        if not user_exists(user2):
            add_user(user2)
        
        follow_user(user1, user2)

## Find out the top 100 most followed users (Influencers)

In [None]:
top_influencers = get_top_influencers(100)
top_influencers_list = list(top_influencers)
print("Top influencers:")
for influencer in top_influencers_list:
    print("user id:", influencer["_id"], "Follower count", influencer["followers_count"])

## Assign the input tweets to the influencers

In [None]:
# Path to the CSV file
file_path = './InputData/tweets.csv'

# Get the concatenated strings
tweets_map = process_csv(file_path)
tweets_ids = []

# Print the concatenated strings
for key, value in tweets_map.items():
    random_influencer_document = random.choice(top_influencers_list)
    influencer_id = random_influencer_document['_id']
    temp_id = add_post(influencer_id, value['tweet'], value['date'])
    tweets_ids.append(temp_id)


## Create random likes

In [None]:
likes_count = 2000

for i in range(likes_count):
    random_tweet_id = random.choice(tweets_ids)
    random_user = get_random_user_id()
    like_post(random_user, random_tweet_id)

# Request data from DB

## Find out top 100 most followed

In [None]:
# Get and print top influencers
top_influencers = get_top_influencers(2)
print("Top influencers:")
for influencer in top_influencers:
    print("user id:", influencer["_id"], "Follower count", influencer["followers_count"])

## Find out top 100 influencer followers

In [None]:
# Get and print top followers
top_followers = get_top_follower_users(2)
print("Top followers:")
for follower in top_followers:
    print("user id:", follower["_id"], "Influencers followed", follower["following_count"])

## Show user profile

In [None]:
# Get and print user profile
user_id = 28465635
tweets_to_show = 25
user_profile = get_user_profile(user_id, tweets_to_show)
print_user_profile(user_profile)

## Posts containing word

In [None]:
# Get and print posts with a given word
posts_with_word = get_posts_with_word("beautiful", 2)
print("Posts containing 'beautiful':")
for post in posts_with_word:
    print(post["content"])

# Cleanup

In [None]:
db.users.drop()
db.followers.drop()
db.posts.drop()
db.likes.drop()
db.feeds.drop()
print("Database cleared.")

In [None]:
# Close the connection
client.close()