#### Database of 10k rows with random data for users.

#### Objective : To show users most relevant posts at the top, basis his past interaction with the app.
#### Relevant post : The post he is most likely to interact with.
Step 1 : Determine input variables like upvote, downvote, time spent on post, etc.(Use the app to find out more input variables) by a user for every post

Step 2 : Populate these variables with random data for each of the user

Step 3 : Based on the analysis of the data, create a mathematical formula to rank each post (Top ranked most will be shown to the user at first )

#### The columns we created [ "User_ID" , "upvote" , "downvote" , "comments" , "impressions" , "followers" , "timespent" , "age" , "gender" , "clan" , "rank" ]

#### User_ID- 
It is the unique ID given to each user on the app

#### upvote-
It is the number of users who have liked or given positive response to the post [Random distribution]

#### downvote-
It is the number of users who have disliked or given negative response to the post[Random distribution]

#### comments-
It is the total number of comments for a post by various users[assuming it to be 10 percent of the sum of upvotes and downvotes]							

#### impressions-
It is the total number of times the post has been seen by various users[assuming it to be 150 percent of the sum of upvotes and downvotes]							

#### followers-
It is the total number of users following a particular user [assuming it to be 5 times the number of impressions]							
#### Viewers-
It Is total number of people who have viewed the post but have not upvoted or downvoted[assuming it to be (impressions-upvote+downvote)

#### timespent-
It is total time spent by various users on a single post[Assuming time consumed for upvote and downvote to be 15 seconds,comments-30seconds and viewers-10 seconds]							

#### age-
It is the age of each user[Random distribution]							

#### gender-
the gender of each user whether it is male or female							

#### clan-
The community to which the post is posted which might be cricket,politics or anything.							

#### rank_value-
this is total value for each post which is given by the formula upvote-downvote+comments+impressions							
#### Rank-
Based on the rank_value we have ranked each post and the order of which the post is shown in the app.							

In [261]:
import numpy as np
import pandas as pd

In [262]:
column_names=["User_ID","upvote","downvote","comments","impressions","followers","timespent","age","gender","clan","rank"]
data=pd.DataFrame(index=range(1,10001),columns=column_names)

In [263]:
data.head()

Unnamed: 0,User_ID,upvote,downvote,comments,impressions,followers,timespent,age,gender,clan,rank
1,,,,,,,,,,,
2,,,,,,,,,,,
3,,,,,,,,,,,
4,,,,,,,,,,,
5,,,,,,,,,,,


In [264]:
data.shape

(10000, 11)

In [265]:
for i in range(1,10001):
    data["User_ID"][i]=i

In [266]:
data.head()

Unnamed: 0,User_ID,upvote,downvote,comments,impressions,followers,timespent,age,gender,clan,rank
1,1,,,,,,,,,,
2,2,,,,,,,,,,
3,3,,,,,,,,,,
4,4,,,,,,,,,,
5,5,,,,,,,,,,


In [267]:
import random
data["upvote"]=data["upvote"].apply(lambda x:np.random.random()*1000)

In [268]:
data["upvote"]=data["upvote"].apply(np.floor)

In [269]:
data.head()

Unnamed: 0,User_ID,upvote,downvote,comments,impressions,followers,timespent,age,gender,clan,rank
1,1,200.0,,,,,,,,,
2,2,569.0,,,,,,,,,
3,3,677.0,,,,,,,,,
4,4,630.0,,,,,,,,,
5,5,927.0,,,,,,,,,


In [270]:
import random
data["downvote"]=data["downvote"].apply(lambda x:np.random.random()*1000)

In [271]:
data["downvote"]=data["downvote"].round(decimals=0)

In [272]:
data.head()

Unnamed: 0,User_ID,upvote,downvote,comments,impressions,followers,timespent,age,gender,clan,rank
1,1,200.0,330.0,,,,,,,,
2,2,569.0,430.0,,,,,,,,
3,3,677.0,317.0,,,,,,,,
4,4,630.0,407.0,,,,,,,,
5,5,927.0,890.0,,,,,,,,


In [273]:
data.describe()

Unnamed: 0,upvote,downvote
count,10000.0,10000.0
mean,501.5117,499.5449
std,288.272429,288.085159
min,0.0,0.0
25%,251.0,251.0
50%,501.0,495.0
75%,754.0,749.0
max,999.0,1000.0


#### As the data are completely random we went with many assumptions. Here we assume that comments are nearly 1/10 of the total users who reacted to that post 

#### Comments=[ (Upvote + Downvote) / 10 ]

In [274]:
data["comments"]=(data["upvote"]+data["downvote"])/10

In [275]:
data["comments"]=data["comments"].apply(np.floor)

In [276]:
data.head()

Unnamed: 0,User_ID,upvote,downvote,comments,impressions,followers,timespent,age,gender,clan,rank
1,1,200.0,330.0,53.0,,,,,,,
2,2,569.0,430.0,99.0,,,,,,,
3,3,677.0,317.0,99.0,,,,,,,
4,4,630.0,407.0,103.0,,,,,,,
5,5,927.0,890.0,181.0,,,,,,,


In [277]:
data.head()

Unnamed: 0,User_ID,upvote,downvote,comments,impressions,followers,timespent,age,gender,clan,rank
1,1,200.0,330.0,53.0,,,,,,,
2,2,569.0,430.0,99.0,,,,,,,
3,3,677.0,317.0,99.0,,,,,,,
4,4,630.0,407.0,103.0,,,,,,,
5,5,927.0,890.0,181.0,,,,,,,


#### And generally for every post not all of them will react so we assumed that total impressions are nearly 1.5 times the total users who reacted to that post

In [279]:
data["impressions"]=(data["upvote"]+data["downvote"])*1.5

In [280]:
data["impressions"]=data["impressions"].apply(np.floor)

In [281]:
data.head()

Unnamed: 0,User_ID,upvote,downvote,comments,impressions,followers,timespent,age,gender,clan,rank
1,1,200.0,330.0,53.0,795.0,,,,,,
2,2,569.0,430.0,99.0,1498.0,,,,,,
3,3,677.0,317.0,99.0,1491.0,,,,,,
4,4,630.0,407.0,103.0,1555.0,,,,,,
5,5,927.0,890.0,181.0,2725.0,,,,,,


#### And same for followers, Not every followers will react to a particular post.
#### E.g: If a user has 1 million followers not all of them will see or react to the post due to various reasons like not using the app much enough or a particular user may miss his/her follower's post. So we gave the total followers as 5 times the impressions.

#### Followers= Impressions*5

In [282]:
data["followers"]=data["impressions"]*5

In [283]:
data["followers"]=data["followers"].apply(np.floor)

In [285]:
data["age"]=data["age"].apply(lambda x:np.random.randint(20,50))

In [286]:
data.head()

Unnamed: 0,User_ID,upvote,downvote,comments,impressions,followers,timespent,age,gender,clan,rank
1,1,200.0,330.0,53.0,795.0,3975.0,,26,,,
2,2,569.0,430.0,99.0,1498.0,7490.0,,42,,,
3,3,677.0,317.0,99.0,1491.0,7455.0,,46,,,
4,4,630.0,407.0,103.0,1555.0,7775.0,,40,,,
5,5,927.0,890.0,181.0,2725.0,13625.0,,39,,,


#### We just randomize the gender column 

In [287]:
data["gender"]=data["gender"].apply(lambda x:np.random.randint(1,3))

In [288]:
data.head()

Unnamed: 0,User_ID,upvote,downvote,comments,impressions,followers,timespent,age,gender,clan,rank
1,1,200.0,330.0,53.0,795.0,3975.0,,26,1,,
2,2,569.0,430.0,99.0,1498.0,7490.0,,42,2,,
3,3,677.0,317.0,99.0,1491.0,7455.0,,46,1,,
4,4,630.0,407.0,103.0,1555.0,7775.0,,40,1,,
5,5,927.0,890.0,181.0,2725.0,13625.0,,39,2,,


In [289]:
data["gender"].value_counts()

1    5009
2    4991
Name: gender, dtype: int64

In [290]:
data["gender"]=data["gender"].replace({1:"Male",2:"Female"})

In [291]:
data.head()

Unnamed: 0,User_ID,upvote,downvote,comments,impressions,followers,timespent,age,gender,clan,rank
1,1,200.0,330.0,53.0,795.0,3975.0,,26,Male,,
2,2,569.0,430.0,99.0,1498.0,7490.0,,42,Female,,
3,3,677.0,317.0,99.0,1491.0,7455.0,,46,Male,,
4,4,630.0,407.0,103.0,1555.0,7775.0,,40,Male,,
5,5,927.0,890.0,181.0,2725.0,13625.0,,39,Female,,


In [292]:
data["viewers"]=((data["impressions"])-(data["upvote"]+data["downvote"]))


In [293]:
data.head()

Unnamed: 0,User_ID,upvote,downvote,comments,impressions,followers,timespent,age,gender,clan,rank,viewers
1,1,200.0,330.0,53.0,795.0,3975.0,,26,Male,,,265.0
2,2,569.0,430.0,99.0,1498.0,7490.0,,42,Female,,,499.0
3,3,677.0,317.0,99.0,1491.0,7455.0,,46,Male,,,497.0
4,4,630.0,407.0,103.0,1555.0,7775.0,,40,Male,,,518.0
5,5,927.0,890.0,181.0,2725.0,13625.0,,39,Female,,,908.0


#### Timespent-It is total time spent by various users on a single post[Assuming time consumed for upvote and downvote to be 15 seconds,comments-30seconds and viewers-10 seconds]

In [294]:
data["timespent"]=((data["upvote"]+data["downvote"])*15+(data["comments"]*30)+(data["viewers"]*10))

In [295]:
data["timespent"]=data["timespent"]/3600

In [296]:
data.head()

Unnamed: 0,User_ID,upvote,downvote,comments,impressions,followers,timespent,age,gender,clan,rank,viewers
1,1,200.0,330.0,53.0,795.0,3975.0,3.386111,26,Male,,,265.0
2,2,569.0,430.0,99.0,1498.0,7490.0,6.373611,42,Female,,,499.0
3,3,677.0,317.0,99.0,1491.0,7455.0,6.347222,46,Male,,,497.0
4,4,630.0,407.0,103.0,1555.0,7775.0,6.618056,40,Male,,,518.0
5,5,927.0,890.0,181.0,2725.0,13625.0,11.601389,39,Female,,,908.0


In [297]:
data["clan"]=data["clan"].apply(lambda x:np.random.randint(1,14))

In [298]:
data["clan"].value_counts()

2     839
9     833
5     802
10    788
11    784
8     770
4     761
6     746
13    743
1     738
3     735
7     734
12    727
Name: clan, dtype: int64

### The clans we used here for the particular post are:
1:"Technology"
    
2:"Travel"
    
3:"Internships"
    
4:"Jobs
    
5:"Startups"
       
6:"Trending"
    
7:"Fashion"
    
8:"Politics"
    
9:"Music"
    
10:"Bollywood"
    
11:"Cricket"

12:"News"

13:"Food"

In [299]:
data["clan"]=data["clan"].replace({1:"Technology",2:"Travel",3:"Internships",4:"Jobs",5:"Startups",6:"Trending",7:"Fashion",8:"Politics",9:"Music",10:"Bollywood",11:"Cricket",12:"News",13:"Food"})

In [300]:
data.head()

Unnamed: 0,User_ID,upvote,downvote,comments,impressions,followers,timespent,age,gender,clan,rank,viewers
1,1,200.0,330.0,53.0,795.0,3975.0,3.386111,26,Male,Travel,,265.0
2,2,569.0,430.0,99.0,1498.0,7490.0,6.373611,42,Female,Travel,,499.0
3,3,677.0,317.0,99.0,1491.0,7455.0,6.347222,46,Male,Bollywood,,497.0
4,4,630.0,407.0,103.0,1555.0,7775.0,6.618056,40,Male,Bollywood,,518.0
5,5,927.0,890.0,181.0,2725.0,13625.0,11.601389,39,Female,Internships,,908.0


In [301]:
data["clan"].value_counts()

Travel         839
Music          833
Startups       802
Bollywood      788
Cricket        784
Politics       770
Jobs           761
Trending       746
Food           743
Technology     738
Internships    735
Fashion        734
News           727
Name: clan, dtype: int64

#### Just for the calculations we used viewers column

In [302]:
data.drop("viewers",axis=1,inplace=True)

In [303]:
data.head()

Unnamed: 0,User_ID,upvote,downvote,comments,impressions,followers,timespent,age,gender,clan,rank
1,1,200.0,330.0,53.0,795.0,3975.0,3.386111,26,Male,Travel,
2,2,569.0,430.0,99.0,1498.0,7490.0,6.373611,42,Female,Travel,
3,3,677.0,317.0,99.0,1491.0,7455.0,6.347222,46,Male,Bollywood,
4,4,630.0,407.0,103.0,1555.0,7775.0,6.618056,40,Male,Bollywood,
5,5,927.0,890.0,181.0,2725.0,13625.0,11.601389,39,Female,Internships,


#### To get the Rank for each post we need to use some mathematical formula to calculate the each post. 

#### Rank value= Upvote - Downvote + impressions + comments

#### The reason behind subtracting Downvotes from Upvotes is to show the Post with upvotes at the top not with highest downvotes and also based on the previous impressions and followers it tries to show the popular post first. 

In [305]:
data["Rank_value"]=data["upvote"]-data["downvote"]+data["impressions"]+data["comments"]

In [306]:
data.head()

Unnamed: 0,User_ID,upvote,downvote,comments,impressions,followers,timespent,age,gender,clan,rank,Rank_value
1,1,200.0,330.0,53.0,795.0,3975.0,3.386111,26,Male,Travel,,718.0
2,2,569.0,430.0,99.0,1498.0,7490.0,6.373611,42,Female,Travel,,1736.0
3,3,677.0,317.0,99.0,1491.0,7455.0,6.347222,46,Male,Bollywood,,1950.0
4,4,630.0,407.0,103.0,1555.0,7775.0,6.618056,40,Male,Bollywood,,1881.0
5,5,927.0,890.0,181.0,2725.0,13625.0,11.601389,39,Female,Internships,,2943.0


In [307]:
data["rank"]=data["Rank_value"].rank(ascending=0)

In [309]:
data=data.sort_values("rank",ascending=True)

In [310]:
data.head(5)

Unnamed: 0,User_ID,upvote,downvote,comments,impressions,followers,timespent,age,gender,clan,rank,Rank_value
8594,8594,991.0,983.0,197.0,2961.0,14805.0,12.608333,23,Female,News,1.0,3166.0
3284,3284,997.0,952.0,194.0,2923.0,14615.0,12.443056,21,Female,Music,2.0,3162.0
3699,3699,989.0,981.0,197.0,2955.0,14775.0,12.586111,29,Female,Bollywood,3.5,3160.0
8274,8274,996.0,952.0,194.0,2922.0,14610.0,12.438889,28,Female,Fashion,3.5,3160.0
7286,7286,991.0,954.0,194.0,2917.0,14585.0,12.420833,48,Male,Travel,5.5,3148.0


In [311]:
data.drop("Rank_value",axis=1,inplace=True)

### Top 20 posts

In [312]:
data.head(20)

Unnamed: 0,User_ID,upvote,downvote,comments,impressions,followers,timespent,age,gender,clan,rank
8594,8594,991.0,983.0,197.0,2961.0,14805.0,12.608333,23,Female,News,1.0
3284,3284,997.0,952.0,194.0,2923.0,14615.0,12.443056,21,Female,Music,2.0
3699,3699,989.0,981.0,197.0,2955.0,14775.0,12.586111,29,Female,Bollywood,3.5
8274,8274,996.0,952.0,194.0,2922.0,14610.0,12.438889,28,Female,Fashion,3.5
7286,7286,991.0,954.0,194.0,2917.0,14585.0,12.420833,48,Male,Travel,5.5
2249,2249,995.0,935.0,193.0,2895.0,14475.0,12.330556,38,Male,Jobs,5.5
1538,1538,979.0,999.0,197.0,2967.0,14835.0,12.630556,33,Female,Internships,7.0
6609,6609,981.0,972.0,195.0,2929.0,14645.0,12.473611,26,Female,Trending,8.0
9303,9303,978.0,982.0,196.0,2940.0,14700.0,12.522222,41,Female,Jobs,9.0
2649,2649,979.0,974.0,195.0,2929.0,14645.0,12.473611,34,Male,Bollywood,10.0


### Bottom 20 posts

In [313]:
data.tail(10)

Unnamed: 0,User_ID,upvote,downvote,comments,impressions,followers,timespent,age,gender,clan,rank
7794,7794,13.0,36.0,4.0,73.0,365.0,0.304167,41,Male,Internships,9991.0
3498,3498,17.0,14.0,3.0,46.0,230.0,0.195833,36,Female,Startups,9992.5
1370,1370,7.0,57.0,6.0,96.0,480.0,0.405556,33,Female,News,9992.5
1194,1194,3.0,71.0,7.0,111.0,555.0,0.469444,28,Female,Music,9994.0
7403,7403,6.0,50.0,5.0,84.0,420.0,0.352778,39,Male,Startups,9995.0
4287,4287,10.0,31.0,4.0,61.0,305.0,0.259722,33,Male,Trending,9996.0
1642,1642,3.0,59.0,6.0,93.0,465.0,0.394444,31,Female,Bollywood,9997.0
9609,9609,7.0,28.0,3.0,52.0,260.0,0.218056,24,Male,Trending,9998.0
4307,4307,1.0,45.0,4.0,69.0,345.0,0.288889,27,Female,Jobs,9999.0
188,188,6.0,18.0,2.0,36.0,180.0,0.15,21,Male,Food,10000.0


### With this method we can Rank the users post and whenever a person logged into his account he can see the post based on Rank

In [304]:
data.to_csv("Revidly_Analytics.csv",index=False)

#### I also did this project in excel where you need to enter just the upvote and downvote of a particular post ( and also some user information like gender , clan , age) after that it will automatically fetch you results for other columns like comments, impressions , followers , timespent on particular post , Rank Value ,Rank