## What is this notebook about?
The donated data may contain chats without real interactions, e.g. chats on Facebook containing requests, advertisments etc. 
For this reason, we defined "interactive chats" as chats where the ego, i.e. the donor contributes no less than 10% of the text and no more than 90%. This data is available in the data filed `messages_filtered_table.csv` and this is the code used to create that file. You can run the code to double-check our filtering process or skip this notebook and move to the next notebook.


In [1]:
# Some imports to get things started
import sys
import os
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore")

sys.path.insert(1, os.path.abspath('../'))
from pathlib import Path


In [2]:
# Load the donation info from the data table
donation_fp = Path('../data/donation_table.csv')
donation_table = pd.read_csv(donation_fp)

# Load the messaging info from the data table
message_fp = Path('../data/messages_table.csv')
messages_table = pd.read_csv(message_fp)

# Filepath for saving the filtering output
messages_filtered_fp = Path('../data/messages_filtered_table1.csv')

In [3]:
def calculate_bias(ego_wc, total_wc):
    """
    Calculate the chat bias by taking the ratio between the words contributed
    by ego and alters and subtracting the result from 0.5. This results in values
    betweeen -0.5 and 0.5. Zero indicates no bias, while negative and positive 
    values indicate more ego(donor) and alter(contact) contributions, respectively. 
    """
    return np.subtract(0.5,np.divide(ego_wc,total_wc))

In [None]:
non_interactive_chats = []
for donationID in donation_table['donation_id'].values:
    # Separate messages associated with a given donation
    donation_messages = messages_table[messages_table['donation_id']==donationID]
    
    # Separate the chats in the donation
    chatIDs = donation_messages['conversation_id'].unique()
    
    # Get the donor_id for the donation to be able to separate the donor messages
    egoID = donation_table[donation_table['donation_id']==donationID]['donor_id'].iloc[0]
    
    for chatID in chatIDs:
        chat_messages = donation_messages[donation_messages['conversation_id']==chatID]
        ego_chat_messages = chat_messages[chat_messages['sender_id']==egoID]
        
        # Calculate the chat bias to identify chats where ego contributes <10% or >90% of messages
        chat_bias = calculate_bias(ego_chat_messages['word_count'].sum(),chat_messages['word_count'].sum())
        if chat_bias <-0.4 or chat_bias>0.4:
            non_interactive_chats.append(chatID)
            
# Drop the non_interactive chats and save the filtered messages to a file           
messages_filtered =  messages_table[~messages_table['conversation_id'].isin(non_interactive_chats)] 
messages_filtered.to_csv(messages_filtered_fp)

### The re-created file with the filtered messages is now stored in the data folder under ```messages_filtered_table1.csv```. 