# Analysis for calculating score for Alerts


## Proposed formulas

This section introduces the formulas proposed for each type of alert (influencer, trend and popular topic).

### Alert type - Influencer

- Legend:
    - k = klout
    - f = followers
    - l = listed count
    - v = verb
    - kw = klout weight
    - fw = followers count weight
    - lw = listed count weight

- Values:
    - kw = 3
    - fw = 4
    - lw = 3
    - v = (tweet=1, retweet=0.85)

**influencer_score** = kw \* (k^2) / 10000 +
                        v \* fw \* log(f) / 20 +
                        v \* lw \* log(l) / 15


### Alert type - Trends
* Reach of tweets
* segment: marketing (categories: range from 1 to 10) vs business(15% more important than Marketing)

**trend_score** = reach + segment


### Alert type - Popular Tweet
* Engagement of tweet
* Potential of engajamento

**popular_score** = engagement + potential


## Getting Data

This section loads sample data for 2015-05-21 records of activities, rules and actors.

In [5]:
import pandas as pd
import numpy as np
import math
from datetime import datetime

### Activities

In [6]:
activities_cols_names = ["id","body", "country", "country_code", "place_type",
                       "sub_region", "actor_id", "source", "share_count",
                       "in_reply_to_native_id", "created_at", "updated_at",
                       "klout", "native_id", "verb", "latitude", "longitude",
                       "sharing_activity_native_id", "region", "favorites_count",
                       "replies_count", "in_reply_to_screen_name", "link"]

In [8]:
activities_relevant_cols = ["actor_id", "source", "share_count",
                           "in_reply_to_native_id", "klout", "verb", "favorites_count",
                           "replies_count"]

In [11]:
activities = pd.read_csv('../../s3/2015-05-21-01-00-00-activities.csv', 
                         header=None, parse_dates=True,
                         names=activities_cols_names, index_col="id")

In [12]:
activities = activities[activities_relevant_cols] 

In [112]:
activities.head()

Unnamed: 0_level_0,actor_id,source,share_count,in_reply_to_native_id,klout,verb,favorites_count,replies_count
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
443640355408254904,2419819735,twitter,0,,48,post,0,0
443640355408254905,2405587887,twitter,0,,54,post,0,0
443640355408254907,2425857238,twitter,0,,31,post,0,0
443640355408254908,2422152815,twitter,9,,19,share,14,0
443640355408254909,2406991214,twitter,0,,35,post,0,0


In [14]:
activities.describe()

Unnamed: 0,share_count,klout,favorites_count
count,1280768.0,1255925.0,1280865.0
mean,1878510000000.0,48364120000000.0,3338.222079
std,1062963000000000.0,5392960000000000.0,13146.340361
min,0.0,10.0,0.0
25%,1.0,29.0,0.0
50%,60.0,40.0,54.0
75%,709.0,45.0,782.0
max,6.015506e+17,6.015513e+17,462900.0


### Association Activities-Rules

In [15]:
activities_rules_cols_names = ["id", "rule_id", "activity_id", "created_at", "updated_at", "ignored"]

In [17]:
activities_rules = pd.read_csv('../../s3/2015-05-21-01-00-00-activities-rules.csv', 
                               parse_dates=True, names=activities_rules_cols_names, index_col="id")

In [18]:
activities_rules_relevant_cols = ["rule_id", "activity_id", "ignored"]

In [19]:
activities_rules = activities_rules[activities_rules_relevant_cols]

In [20]:
activities_rules.head()

Unnamed: 0_level_0,rule_id,activity_id,ignored
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
108642970,5772,4.436404e+17,0
108642971,5052,4.436404e+17,0
108642972,5428,4.436404e+17,0
108642973,1506,4.436404e+17,0
108642974,5077,4.436404e+17,0


### Rules

In [22]:
rules = pd.read_csv('../../s3/rule.csv', index_col="id")

In [23]:
rules_relevant_cols = ["business_id","segment","volume", "category", "source", "type", "direct"]

In [24]:
rules = rules[rules_relevant_cols]

In [25]:
rules.head()

Unnamed: 0_level_0,business_id,segment,volume,category,source,type,direct
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,144,1,0,2,twitter,TwitterRule,False
2,144,1,0,2,twitter,TwitterRule,False
3,144,1,0,2,twitter,TwitterRule,False
4,144,1,0,2,twitter,TwitterRule,False
5,144,1,0,2,twitter,TwitterRule,False


In [26]:
rules.describe()

Unnamed: 0,business_id,segment,volume,category,direct
count,4786.0,4786.0,4464.0,4786.0,4786
mean,314.314668,0.376097,232681.2,3.243627,0.0215211
std,125.258811,0.484455,8564815.0,1.819087,0.1451287
min,1.0,0.0,0.0,0.0,False
25%,262.0,0.0,16.0,2.0,0
50%,354.0,0.0,297.0,3.0,0
75%,412.0,1.0,3140.25,5.0,0
max,460.0,1.0,475071400.0,6.0,True


### Actors

In [189]:
actors = pd.read_csv('../../s3/actors.csv', index_col="id")

In [119]:
actors.head()

Unnamed: 0_level_0,lang,favourites_count,statuses_count,friends_count,followers_count,listed_count
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2419819735,en,,16139,761,1065,71
2405587887,en,,3043,1368,140,5
2425857238,en,,15668,7,545,12
2422152815,en,,171,57,30,0
2406991214,en,,1341,306,190,7


In [82]:
actors.describe()

Unnamed: 0,favourites_count,statuses_count,friends_count,followers_count,listed_count
count,146204.0,1278952.0,1278953.0,1278953.0,1278650.0
mean,2288.747558,29485.832285,1787.903539,17574.086707,147.851316
std,8553.651691,63277.951238,11217.130245,358911.933053,2558.850142
min,0.0,0.0,-438.0,0.0,0.0
25%,15.0,2208.0,170.0,157.0,1.0
50%,225.0,9225.5,396.0,426.0,3.0
75%,1488.0,29864.0,1012.0,1242.0,14.0
max,492244.0,2051862.0,1593447.0,64199466.0,821163.0


## Computing score

This section implements the functions for computing the scores and generates the scores for subsequent validation.

### Alert type - Influencer

In [211]:
def generate_influencer_score(activity_id, k=None, f=None, l=None, verb=None):
    kw = 3
    fw = 4
    lw = 3
    
    if activity_id is not None:
        activity = activities.loc[str(activity_id)]
        actor = actors.loc[int(activity.actor_id)]
    
    if verb is None:
        verb = activity.verb
    
    if k is None: k = float(activity.klout)
    if f is None: f = float(actor.followers_count + 1.0)
    if l is None: l = float(actor.listed_count + 1.0)
    
    v = 1.0 if verb=="post" else 0.8
    
    return kw * (k ** 2) / 10000 + \
            v * fw * math.log(f) / 20 + \
            v * lw * math.log(l) / 15

In [209]:
for index, row in activities.head(30).iterrows():
    print "Generated score for activity {0}: {1}".format(index,generate_influencer_score(index))

Generated score for activity 443640355408254904: 2.94086694475
Generated score for activity 443640355408254905: 2.22290387192
Generated score for activity 443640355408254907: 2.06181366664
Generated score for activity 443640355408254908: 0.657737952718
Generated score for activity 443640355408254909: 1.83384299395
Generated score for activity 443640355408254910: 1.2453809881
Generated score for activity 443640355408254912: 2.81284157266
Generated score for activity 443640355408254913: 0.724298843186
Generated score for activity 443640355408254914: 3.06949359829
Generated score for activity 443640355408254916: 2.35859939049
Generated score for activity 443640355408254917: 0.980105195945
Generated score for activity 443640355408254920: 2.57045500031
Generated score for activity 443640355408254922: 1.43877247026
Generated score for activity 443640355408254923: 1.87936467721
Generated score for activity 443640355408254926: nan
Generated score for activity 443640355408254927: 4.78063858473


Now testing with predefined values for evaluating limits:

In [222]:
generate_influencer_score(activity_id=None, k=99, f=50000000, l=50000000, verb='post')

9.091013425356968

In [223]:
generate_influencer_score(activity_id=None, k=90, f=50000000, l=50000000, verb='post')

9.091013425356968

In [224]:
generate_influencer_score(activity_id=None, k=99, f=50000000, l=50000000, verb='share')

7.672810740285575

In [225]:
generate_influencer_score(activity_id=None, k=90, f=50000000, l=50000000, verb='share')

7.672810740285575

In [226]:
generate_influencer_score(activity_id=None, k=70, f=5000000, l=5000000, verb='post')

7.16997938815935

In [227]:
generate_influencer_score(activity_id=None, k=70, f=5000000, l=5000000, verb='share')

5.93598351052748

In [228]:
generate_influencer_score(activity_id=None, k=50, f=5000000, l=5000000, verb='post')

6.16997938815935

In [229]:
generate_influencer_score(activity_id=None, k=50, f=5000000, l=5000000, verb='share')

4.93598351052748

In [230]:
generate_influencer_score(activity_id=None, k=50, f=1000000, l=1000000, verb='post')

5.52620422318571

In [231]:
generate_influencer_score(activity_id=None, k=50, f=1000000, l=1000000, verb='share')

4.420963378548568