In [2]:
import pandas as pd
import matplotlib.pyplot as plt
pd.set_option('display.max_colwidth', -1)
import pickle

In [82]:
with open("insta_data.pickle", 'rb') as f:
    df = pickle.load(f)

## Engagement Rate

An interesting statistic that Instagram "influencers" always focus on is the <b>engagement rate</b>. <br>

[Scrunch](https://www.scrunch.com/blog/what-is-a-good-engagement-rate-on-instagram) describes this as:

<i> To calculate an engagement rate on Instagram, follow the steps below:

1. Look at all of the influencers posts for the last 30 days and add up the total number of likes and comments on each post (e.g. if there are 17 posts in the last 30 days, add up the number of likes and comments on each of the 17 posts). 
2. Divide that number by the number of posts there are in the last 30 days (e.g. divide the total number of likes and comments from above, by the number of posts - 17 in this example). 
3. Now that you have the average engagements per post, divide that by the number of followers the influencer has. 
4. Finally, times the number above by 100, to turn the numbers into a percentage (the percentage will usually be between 0 and 10). This is the engagement rate of the influencer on Instagram.

</i>

This is a statistic that sponsors use to determine whether or not to work with a particular user. Unfortunately, given that our dataset is centered more around posts rather than users, we can't calculate an accurate engagement rate for a user. Rather, we'll calculate an engagement rate for a particular post and use that.  


In [83]:
df['valid_followers'] = df['followers'].apply(lambda x: 1 if int(x) != 6164 else 0)
df['engagement_rate'] = df['valid_followers']*100*(df['likes'] + df['comments'])/(df['followers'])

Scrunch then goes on to describe how these engagement rates are interpreted:

- Less than 1% = low engagement rate 
- Between 1% and 3.5% = average/good engagement rate 
- Between 3.5% and 6% = high engagement rate 
- Above 6% = very high engagement rate 

where low engagement rate is generally associated with either unengaged followers, which can mean that the user has "fake" or "ghost" followers

### Low Engagement
This is suspecting that about 8.3k of our users have either really low engagement or/and have ghost followers.

In [57]:
ghost_df[ghost_df['engagement_rate'] < 1.0].count()

caption                      8371
comments                     8371
display_url                  8371
id                           8371
likes                        8371
owner_name                   8371
shortcode                    8371
taken_at_timestamp           8371
caption_rating               8371
followers                    8371
normalized_likes             8371
normalized_comments          8371
english                      8371
normalized_caption_rating    8371
hashtags                     8371
caption_no_hashtags          8371
number_of_hashtags           8371
engagement_rate              8371
valid_followers              8371
dtype: int64

In [58]:
ghost_df[ghost_df['engagement_rate'] < 1.0]

Unnamed: 0,caption,comments,display_url,id,likes,owner_name,shortcode,taken_at_timestamp,caption_rating,followers,normalized_likes,normalized_comments,english,normalized_caption_rating,hashtags,caption_no_hashtags,number_of_hashtags,engagement_rate,valid_followers
32,Dottore per un secondo😂,0,https://scontent-iad3-1.cdninstagram.com/vp/ed1ab754bf6ae4b08353641b732cc8bb/5DC9A40F/t51.2885-15/e35/p1080x1080/61986741_383158782335324_1143395176447631610_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com,2087149407867797248,24,valerio_di_cara_,Bz3C1QJIH7j,1563027619,1.20,2463.0,0.009744,0.000000,True,0.097442,[],Dottore per un secondo😂,0,0.974421,1
63,Time flies when your having #fun... #summervacation,0,https://scontent-iad3-1.cdninstagram.com/vp/fda410cbb4ee1760b066a4d996aab927/5DAC8C25/t51.2885-15/e35/67495814_414261665849266_7711077849897837445_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com,2090311162169603328,1,sullyb85,B0CRu1TJzTk,1563404530,0.05,243.0,0.004115,0.000000,True,0.041152,"[#fun..., #summervacation]",Time flies when your having,2,0.411523,1
69,Rpg snipe lol Dont mind tags😉😉 #controllergang #og #fortnite #rarefortniteskins #renegaderaider #fun #remotecontrolgrinder #controlleronpc #pc #legend #grind #grinder #fortnitecontrollergang #fortniteclips #fortnitevbucks #fortniteaccountsforsale #followforfollowback #follow4followback #followme #likeforlikes #like4likes #controller #worldstar #smallbusiness #noticeme #notice #1v1 #controllergang🎮 #fortnitemontage #rpg,0,https://scontent-iad3-1.cdninstagram.com/vp/774e6d0f882d2ebbcd9045bd1d6baca0/5D323B8C/t51.2885-15/e35/54277491_310461989632120_8712804156913365042_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com,2015001952166709248,4,mystical_savez,Bv2uZl8nYht,1554426993,0.20,506.0,0.007905,0.000000,True,0.079051,"[#controllergang, #og, #fortnite, #rarefortniteskins, #renegaderaider, #fun, #remotecontrolgrinder, #controlleronpc, #pc, #legend, #grind, #grinder, #fortnitecontrollergang, #fortniteclips, #fortnitevbucks, #fortniteaccountsforsale, #followforfollowback, #follow4followback, #followme, #likeforlikes, #like4likes, #controller, #worldstar, #smallbusiness, #noticeme, #notice, #1v1, #controllergang🎮, #fortnitemontage, #rpg]",Rpg snipe lol Dont mind tags😉😉,30,0.790514,1
85,#spatz #sparrow #bird #vogel #canon #birds #nature #kr #sunshine #natur #s #takt #sperling #simson #gel #fun #garden #birdlife #spatzen #enduro #ddr #oldschool #stoke #tunning #birdstagram #star #ostblech #ostalgie #birdsofinstagram #bhfyp,0,https://scontent-iad3-1.cdninstagram.com/vp/52d53dfaf39c9614eaf154f4dc472a8d/5DAF552C/t51.2885-15/e35/67259030_877234225972318_9160792364117302747_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com,2090311000629786880,2,michaelkaistefankern,B0CRse2oTiR,1563404511,0.10,2524.0,0.000792,0.000000,True,0.007924,"[#spatz, #sparrow, #bird, #vogel, #canon, #birds, #nature, #kr, #sunshine, #natur, #s, #takt, #sperling, #simson, #gel, #fun, #garden, #birdlife, #spatzen, #enduro, #ddr, #oldschool, #stoke, #tunning, #birdstagram, #star, #ostblech, #ostalgie, #birdsofinstagram, #bhfyp]",,30,0.079239,1
89,To the charming eyes and to the 😝 . . . . 🎵7 rings - Ariana Grande 🎵 . . . . #red #instagood #cute #photooftheday #beautiful #inspiration #she #girl #picoftheday #summer #fun #smile #friends #instadaily #instafashion #igers #sydney #wanderlust #shoottokill #portrait #candid #love #portraits_shots #pursuitofportraits #ig_australia #life,0,https://scontent-iad3-1.cdninstagram.com/vp/a113d66b7573b38ba6f847c4151895af/5DED4A09/t51.2885-15/e35/66463128_2532663373463557_7678586276760378928_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com,2090311349159917056,1,dalravi,B0CRxjcpPZ5,1563404552,0.05,740.0,0.001351,0.000000,True,0.013514,"[#red, #instagood, #cute, #photooftheday, #beautiful, #inspiration, #she, #girl, #picoftheday, #summer, #fun, #smile, #friends, #instadaily, #instafashion, #igers, #sydney, #wanderlust, #shoottokill, #portrait, #candid, #love, #portraits_shots, #pursuitofportraits, #ig_australia, #life]",To the charming eyes and to the 😝 . . . . 🎵7 rings - Ariana Grande 🎵 . . . .,26,0.135135,1
90,Working on a new painting! #artist #art #watercolor #originals #soleizcreations #inspired #colors #learning #fun #instagram #stayinspired #facebook #tumblr #twitter,0,https://scontent-iad3-1.cdninstagram.com/vp/6ee7feb4d26f0ef802d9b44dda9b67d0/5DE9C19A/t51.2885-15/e35/65662835_668334106990430_7030765836764808268_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com,2090311395582580480,1,soleiz_creations,B0CRyOrppb1,1563404558,0.05,175.0,0.005714,0.000000,True,0.057143,"[#artist, #art, #watercolor, #originals, #soleizcreations, #inspired, #colors, #learning, #fun, #instagram, #stayinspired, #facebook, #tumblr, #twitter]",Working on a new painting!,14,0.571429,1
93,🐶❤🐾 . . . . . #selfie #selfienation #selfies #TagsForLikes #TFLers #TagsForLikesApp #me #love #pretty #handsome #stagood #instaselfie #selfietime #face #shamelessselefie #life #hair #portrait #igers #fun #followme #instalove #smile #sigileily #yasy #seguir,0,https://scontent-iad3-1.cdninstagram.com/vp/c3372879d8fbd46d150aa3cb3569656a/5DB2DC80/t51.2885-15/e35/66691050_411828946090694_2171251868928016294_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com,2090311411855332608,3,soylizbethojeda,B0CRyd1lRS9,1563404560,0.15,524.0,0.005725,0.000000,True,0.057252,"[#selfie, #selfienation, #selfies, #TagsForLikes, #TFLers, #TagsForLikesApp, #me, #love, #pretty, #handsome, #stagood, #instaselfie, #selfietime, #face, #shamelessselefie, #life, #hair, #portrait, #igers, #fun, #followme, #instalove, #smile, #sigileily, #yasy, #seguir]",🐶❤🐾 . . . . .,26,0.572519,1
105,#love #instagood #photooftheday #fashion #beautiful #happy #cute #tbt #like4like #followme #picoftheday #follow #me #selfie #summer #art #instadaily #friends #repost #nature #girl #fun #style #smile #food,0,https://scontent-iad3-1.cdninstagram.com/vp/e27d2e24fc1b59e441b03188cadfc031/5DBCF471/t51.2885-15/e35/66156596_367486670629977_1872985289968357677_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com,2090311503860208384,5,francoisenterpriseus,B0CRzzhgBM0,1563404571,0.25,1454.0,0.003439,0.000000,True,0.034388,"[#love, #instagood, #photooftheday, #fashion, #beautiful, #happy, #cute, #tbt, #like4like, #followme, #picoftheday, #follow, #me, #selfie, #summer, #art, #instadaily, #friends, #repost, #nature, #girl, #fun, #style, #smile, #food]",,25,0.343879,1
107,tntnTNNNNNN . . . . . . #meme #memes #dankmemes #memesdaily #fun #funny #funnymemes #marvel #marvelmemes #captainamerica #america #movie #netflux #movies #hbo #got #music #happy #art #goth,0,https://scontent-iad3-1.cdninstagram.com/vp/ca896bb244f93be7f79344519b546196/5DA6E4E5/t51.2885-15/e35/66058638_321947022024186_5259161162852912593_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com,2090311526016038912,1,random_memes165,B0CR0IKFzgF,1563404573,0.05,1049.0,0.000953,0.000000,True,0.009533,"[#meme, #memes, #dankmemes, #memesdaily, #fun, #funny, #funnymemes, #marvel, #marvelmemes, #captainamerica, #america, #movie, #netflux, #movies, #hbo, #got, #music, #happy, #art, #goth]",tntnTNNNNNN . . . . . .,20,0.095329,1
108,Sunflower 💛 #makeup #beauty #cute #makeuplook #eyeshadow #morphe #jaclynhillpalette #art #model #pose #selfie #yellow #followforfollow #inspo #love #fun #bright #color,0,https://scontent-iad3-1.cdninstagram.com/vp/e2e93b620f632bca2b1d6aab332da5f7/5DBD5CC6/t51.2885-15/e35/64935976_552602261939854_6743560522900431396_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com,2090311526377792256,1,makeup_by_bxcca,B0CR0IfpyMm,1563404573,0.05,321.0,0.003115,0.000000,True,0.031153,"[#makeup, #beauty, #cute, #makeuplook, #eyeshadow, #morphe, #jaclynhillpalette, #art, #model, #pose, #selfie, #yellow, #followforfollow, #inspo, #love, #fun, #bright, #color]",Sunflower 💛,18,0.311526,1


And as you can see, these users look very suspicious, with most of them having upwards of a few hundred followers but significantly less (lower 2 digit values) likes on their posts.

### Influencers and High Engagement

Let's look at who could potentially be influencers though. Users generally have to have around 15k followers before they get their first offer from a sponsor, so let's first identify those users before looking at their engagement rate.

In [59]:
influencer_df = df[df['followers'] > 15000]

In [60]:
influencer_df.count()

caption                      1024
comments                     1024
display_url                  1024
id                           1024
likes                        1024
owner_name                   1024
shortcode                    1024
taken_at_timestamp           1024
caption_rating               1024
followers                    1024
normalized_likes             1024
normalized_comments          1024
english                      1024
normalized_caption_rating    1024
hashtags                     1024
caption_no_hashtags          1024
number_of_hashtags           1024
engagement_rate              1024
valid_followers              1024
dtype: int64

In [61]:
influencer_df

Unnamed: 0,caption,comments,display_url,id,likes,owner_name,shortcode,taken_at_timestamp,caption_rating,followers,normalized_likes,normalized_comments,english,normalized_caption_rating,hashtags,caption_no_hashtags,number_of_hashtags,engagement_rate,valid_followers
239,"""There are two types of people who will tell you that you cannot make it: those who are afraid to try and those who are afraid you will succeed."" Don't let anyone stop you from chasing your dreams. #love #instagood #me #cute #tbt #photooftheday #instamood #iphonesia #tweegram #picoftheday #igers #model #beautiful #instadaily #summer #instagramhub #iphoneonly #follow #igdaily #bestoftheday #happy #juniormodel #armani #jiujitsu #sky #nofilter #fashion #followme #fun #sun",0,https://scontent-iad3-1.cdninstagram.com/vp/28141e793a062d52fdfc9bd7a12fef65/5DB9FD3B/t51.2885-15/e35/66071830_1579020355566824_5036096857841393114_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com,2090310392705783040,66,_brandonblake_,B0CRjoriRVN,1563404438,3.30,31520.0,0.002094,0.000000,True,0.020939,"[#love, #instagood, #me, #cute, #tbt, #photooftheday, #instamood, #iphonesia, #tweegram, #picoftheday, #igers, #model, #beautiful, #instadaily, #summer, #instagramhub, #iphoneonly, #follow, #igdaily, #bestoftheday, #happy, #juniormodel, #armani, #jiujitsu, #sky, #nofilter, #fashion, #followme, #fun, #sun]","""There are two types of people who will tell you that you cannot make it: those who are afraid to try and those who are afraid you will succeed."" Don't let anyone stop you from chasing your dreams.",30,0.209391,1
480,Something special 🌟 #work #anfitriona #bike #boots #happy #picoftheday #instagram #followme #style #follow #instadaily #body #best #beautiful #black #peruvian #cute #fitness #nature #beauty #girl #fun #photo #amazing #likeforlike #instagood #sexy,0,https://scontent-iad3-1.cdninstagram.com/vp/343755bd39170afeac7ba35e03073a95/5DE87D5B/t51.2885-15/e35/67106274_901334966885700_7894343354345948638_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com,2090310845087436544,21,claubegazopf,B0CRqN_lU73,1563404492,1.05,45303.0,0.000464,0.000000,True,0.004635,"[#work, #anfitriona, #bike, #boots, #happy, #picoftheday, #instagram, #followme, #style, #follow, #instadaily, #body, #best, #beautiful, #black, #peruvian, #cute, #fitness, #nature, #beauty, #girl, #fun, #photo, #amazing, #likeforlike, #instagood, #sexy]",Something special 🌟,27,0.046355,1
587,Follow for more!,0,https://scontent-iad3-1.cdninstagram.com/vp/1c68b07029aecccffe0495db592d3179/5D31FF02/t51.2885-15/e35/66020468_579053302630368_7986130505926221900_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com,2090307228054011136,114,singledestiny,B0CQ1lXlzyL,1563404070,5.70,35313.0,0.003228,0.000000,True,0.032283,[],Follow for more!,0,0.322827,1
654,How many kids do you wanna have 🤔😭 Follow ~~~~~~~~~~~~~~~~~~~~~~~~~~ #mood #litdances #lit #dance #funnymoods #dance #moody #zoom #great #funny #explorepage #shoot #worldstar #view #litty #football #fun #explore #ke #uzi ##viral #trends #trending #itslit #dance #moodedits #moods,0,https://scontent-iad3-1.cdninstagram.com/vp/49ed01ebe9310992c3bad7a045c04441/5D31BD74/t51.2885-15/e15/66788618_658713601313772_4397701175024512833_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com,2090303419608492288,42,comikazed,B0CP-KejGjK,1563404171,2.10,20495.0,0.002049,0.000000,True,0.020493,"[#mood, #litdances, #lit, #dance, #funnymoods, #dance, #moody, #zoom, #great, #funny, #explorepage, #shoot, #worldstar, #view, #litty, #football, #fun, #explore, #ke, #uzi, ##viral, #trends, #trending, #itslit, #dance, #moodedits, #moods]",How many kids do you wanna have 🤔😭 Follow ~~~~~~~~~~~~~~~~~~~~~~~~~~,27,0.204928,1
669,Brunch Set ✨New Arrivals ✨|| Weekend Essentials || CLICK ON THE PHOTO TO SHOP || || 💻 Weekendr.us . . #sunglasses #accessories #theweekend #Fashion #Streetstyle #style #glamour #menseyewear #weekend__rush #beyDay #pink #neon #neonlights #drip #denim #denimjacket #style #neoncolors #fun #party #drip #turnup #vacay #sunny #yeezy #yeezyseason #beach #sexy #neon,0,https://scontent-iad3-1.cdninstagram.com/vp/bdf4ed922e81aa84f2da0be3b8d3bf59/5DB40D20/t51.2885-15/e35/p1080x1080/66468731_149849479464736_3867257127914223742_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com,2090311100519205632,4,weekend__rush,B0CRt74gMqr,1563404522,0.20,22959.0,0.000174,0.000000,True,0.001742,"[#sunglasses, #accessories, #theweekend, #Fashion, #Streetstyle, #style, #glamour, #menseyewear, #weekend__rush, #beyDay, #pink, #neon, #neonlights, #drip, #denim, #denimjacket, #style, #neoncolors, #fun, #party, #drip, #turnup, #vacay, #sunny, #yeezy, #yeezyseason, #beach, #sexy, #neon]",Brunch Set ✨New Arrivals ✨|| Weekend Essentials || CLICK ON THE PHOTO TO SHOP || || 💻 Weekendr.us . .,29,0.017422,1
800,New series 🤩 Follow me ~~~~~~~~~~~~~~~~~~~~~~~~~~ #mood #litdances #lit #dances #funnymoods #dance #moody #zoom #great #funny #explorepage #shoot #worldstar #view #litty #football #fun #explore #smoke #beef #uzi #dog #viral #trends #trending #itslit #dance #moodedits #moods,0,https://scontent-iad3-1.cdninstagram.com/vp/83e190c115b8400e333a222dbd04911c/5D31ED61/t51.2885-15/e35/67191718_877178959308219_7840838386379501050_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com,2090306544851494144,29,postz.burberry,B0CQrpFjCV4,1563404037,1.45,52100.0,0.000557,0.000000,True,0.005566,"[#mood, #litdances, #lit, #dances, #funnymoods, #dance, #moody, #zoom, #great, #funny, #explorepage, #shoot, #worldstar, #view, #litty, #football, #fun, #explore, #smoke, #beef, #uzi, #dog, #viral, #trends, #trending, #itslit, #dance, #moodedits, #moods]",New series 🤩 Follow me ~~~~~~~~~~~~~~~~~~~~~~~~~~,29,0.055662,1
1073,"Lights, Camera, SMACKDOWN! Make your own #BoxSumo videos and share them with us! Click the link in our bio to learn how! *********************************************** #HEXBUG #toy #robot #robotics #play #fun #science #tech #stem #toys #engineering #robots #learn #imagination #creative #innovation #construction #omgrobots #VEX #VEXRobotics",0,https://scontent-iad3-1.cdninstagram.com/vp/a16aa2b7c6c7cd196329444a16f993b6/5DEC12C4/t51.2885-15/e35/66687894_211410106487419_6934383019342725241_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com,2090305361312410112,10,hexbug,B0CQaa1HupE,1563403838,0.50,22272.0,0.000449,0.000000,True,0.004490,"[#BoxSumo, #HEXBUG, #toy, #robot, #robotics, #play, #fun, #science, #tech, #stem, #toys, #engineering, #robots, #learn, #imagination, #creative, #innovation, #construction, #omgrobots, #VEX, #VEXRobotics]","Lights, Camera, SMACKDOWN! Make your own videos and share them with us! Click the link in our bio to learn how! ***********************************************",21,0.044899,1
1247,Some things are better left alone. . . . . . . . . . . #meme #memes #funny #dankmemes #lol #memesdaily #funnymemes #dank #follow #like #fortnite #lmao #dankmeme #love #anime #edgymemes #humor #comedy #edgy #offensivememes #fun #f #cringe #instagram #art #bhfyp #offensive #sad #gay,0,https://scontent-iad3-1.cdninstagram.com/vp/789216f46d240f0d13fb937a21264e5d/5D31CEE7/t51.2885-15/e35/65640772_1348754661949099_3110950822081721374_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com,2090300706498434816,5,ovu.16,B0CPWrsllck,1563403575,0.25,27120.0,0.000184,0.000000,True,0.001844,"[#meme, #memes, #funny, #dankmemes, #lol, #memesdaily, #funnymemes, #dank, #follow, #like, #fortnite, #lmao, #dankmeme, #love, #anime, #edgymemes, #humor, #comedy, #edgy, #offensivememes, #fun, #f, #cringe, #instagram, #art, #bhfyp, #offensive, #sad, #gay]",Some things are better left alone. . . . . . . . . . .,29,0.018437,1
1363,Calling all Louisiana girls! This top is perfect to show off your Louisiana pride 💜,0,https://scontent-iad3-1.cdninstagram.com/vp/73e44b48dbb320d72d5cf953bb505020/5DA92B5D/t51.2885-15/e35/s1080x1080/67282881_340788710185784_8334489338372351843_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com,2090301858942154240,26,jujusboutique,B0CPnc_lfWL,1563403421,1.30,19963.0,0.001302,0.000000,True,0.013024,[],Calling all Louisiana girls! This top is perfect to show off your Louisiana pride 💜,0,0.130241,1
1449,babe🔥,0,https://scontent-iad3-1.cdninstagram.com/vp/0f0dadcb696fb8c623adc24a3efce8cc/5DEDFD62/t51.2885-15/e35/65652593_149231619499016_5540670252161374475_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com,2090301117239076864,36,fentyshiit,B0CPcqOoyfo,1563403332,1.80,19099.0,0.001885,0.000000,True,0.018849,[],babe🔥,0,0.188492,1


Upon looking at the results for this, I realized a huge flaw that I overlooked in creating this dataset: when I scrape the data, I look at the top hashtags, which are organized by <i>recency</i>. So when I scrape them, they won't have as many likes as they usually do because it will have been just posted. This could be a huge reason as to why my results aren't proving to be as nice as I'd expect. With this in mind, we'll reorganize the pipeline of how we obtain this data.

- We'll scrape the posts of the top 15 hashtags to obtain <i>users</i> who have recently been active.
- We'll extract the users who are posting captions in English.
- We'll then go to their page and extract their follower count as well as their 10 most posts (minus the most recent, since that'll be a faulty indication of likes).
- We'll work with that data.

This is considerably harder to do so I will hold off on doing this and change my rating system to be more influenced by the number of followers. If my results end up being catatrophic, then I will return to this and regenerate a training set.

In [62]:
influencer_df[influencer_df['engagement_rate'] > 1].sort_values(by=['engagement_rate'], ascending=False)[:10]

Unnamed: 0,caption,comments,display_url,id,likes,owner_name,shortcode,taken_at_timestamp,caption_rating,followers,normalized_likes,normalized_comments,english,normalized_caption_rating,hashtags,caption_no_hashtags,number_of_hashtags,engagement_rate,valid_followers
5135,#sad #english #line #status #instagram #trend #viralvideos #insta #likeforfollow #videos #englishbulldog #engagementring #background #sad #videostar #instagramers #instamood #love #explore #picoftheday #travel #nature #travelgram #instadaily #africa #style #viral #singer #song #black #blackandwhite,0,https://scontent-mia3-1.cdninstagram.com/vp/7bac37b86212ec0befe60db8082e4d59/5D31A0C6/t51.2885-15/e35/64422823_436573023588341_2657649877972912703_n.jpg?_nc_ht=scontent-mia3-1.cdninstagram.com,2074022564003131904,12779,love_series_official9,BzIaIiHjTG2,1561462805,638.95,40108.0,0.318615,0.0,True,3.186147,"[#sad, #english, #line, #status, #instagram, #trend, #viralvideos, #insta, #likeforfollow, #videos, #englishbulldog, #engagementring, #background, #sad, #videostar, #instagramers, #instamood, #love, #explore, #picoftheday, #travel, #nature, #travelgram, #instadaily, #africa, #style, #viral, #singer, #song, #black, #blackandwhite]",,31,31.861474,1
84,#nature #comeback #sunset,0,https://scontent-iad3-1.cdninstagram.com/vp/4f4e1450625f22de1bf0de1a7db7c51a/5DEA8E3E/t51.2885-15/e35/54512648_129569274819734_918707443011864625_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com,2018960017740137216,4033,gokselcetin,BwEyXBxHgKH,1554898811,201.65,25361.0,0.159024,0.0,True,1.590237,"[#nature, #comeback, #sunset]",,3,15.90237,1
14326,"One we've never shared on social media earlier.....one of the most special moments of our lives, when we 'officially' became life partners! A new journey started on this day and each day with you has been special....I would love to say 'each day with you by my side' but we both know that while metaphorically we are ALWAYS there for each other, distance has existed which has strengthened our relationship but also thrown lots of challenges our way. The journey started 9 years ago but each year has been a different learning. The first year, 2010, was our year of understanding each other and what marriage really comprises of. The following year brought in adjustments to each other, to living together and to essentially 'being one'. By 2012, we had accepted each other for who we are, embraced the weaknesses and settled into martial bliss. 2013 was a year of support, while 2014 was the year when we bought our first home, a dream which we had seen for long. 2015 was a year of full trust and before we knew it in 2016, we realized our family was expanding. Pregnancy wasn't an easy phase as Ridhima delivered our little twins with me being away at shoot literally in a different state. The following year, 2017, we were getting aquatinted to parenthood but at the same time, yes I was away shooting. I missed a lot of 'firsts' but Ridhima made sure that she would keep me involved in every stage. With parenthood came sleepless nights in 2018 as we had not one but two little ones to take care of. And now with 2019, a new journey is about to begin! They say your blessings come in disguise, I say mine is by my side! You have truly been my pillar of strength, my support system, my life partner, my soul mate and so much more! I am looking forward to the time we will finally spend together this year, I am looking forward to taking on new challenges together, I am looking forward to new beginnings in this journey of ours! #Throwback #Marriage #Adjustments #Companionship #Journey #Parenthood #LongDistance #Love #Support #NewBeginnings #Soulmate #Togetherness #Memories #BlessedTogether #TogetherForever #9YearsOfTogetherness",0,https://scontent-mia3-1.cdninstagram.com/vp/0d9cc57a4e95505bcd3e242a652666ff/5DB95DD3/t51.2885-15/e35/66479204_118107796137806_4181918050469161195_n.jpg?_nc_ht=scontent-mia3-1.cdninstagram.com,2089838662910371584,12914,sourabhraaj.jain,B0AmTEEpmb1,1563348204,645.7,90283.0,0.143039,0.0,True,1.430391,"[#Throwback, #Marriage, #Adjustments, #Companionship, #Journey, #Parenthood, #LongDistance, #Love, #Support, #NewBeginnings, #Soulmate, #Togetherness, #Memories, #BlessedTogether, #TogetherForever, #9YearsOfTogetherness]","One we've never shared on social media earlier.....one of the most special moments of our lives, when we 'officially' became life partners! A new journey started on this day and each day with you has been special....I would love to say 'each day with you by my side' but we both know that while metaphorically we are ALWAYS there for each other, distance has existed which has strengthened our relationship but also thrown lots of challenges our way. The journey started 9 years ago but each year has been a different learning. The first year, 2010, was our year of understanding each other and what marriage really comprises of. The following year brought in adjustments to each other, to living together and to essentially 'being one'. By 2012, we had accepted each other for who we are, embraced the weaknesses and settled into martial bliss. 2013 was a year of support, while 2014 was the year when we bought our first home, a dream which we had seen for long. 2015 was a year of full trust and before we knew it in 2016, we realized our family was expanding. Pregnancy wasn't an easy phase as Ridhima delivered our little twins with me being away at shoot literally in a different state. The following year, 2017, we were getting aquatinted to parenthood but at the same time, yes I was away shooting. I missed a lot of 'firsts' but Ridhima made sure that she would keep me involved in every stage. With parenthood came sleepless nights in 2018 as we had not one but two little ones to take care of. And now with 2019, a new journey is about to begin! They say your blessings come in disguise, I say mine is by my side! You have truly been my pillar of strength, my support system, my life partner, my soul mate and so much more! I am looking forward to the time we will finally spend together this year, I am looking forward to taking on new challenges together, I am looking forward to new beginnings in this journey of ours!",16,14.303911,1
110,#vscocam #sun #ootd 🌤🌿,0,https://scontent-iad3-1.cdninstagram.com/vp/c2a345d7e6970926cf0b080357f34cfc/5DEBF494/t51.2885-15/e35/57097121_392546731475494_5070550833320168500_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com,2032778644486462720,3512,gokselcetin,Bw14WgznZyT,1556546119,175.6,25361.0,0.13848,0.0,True,1.384803,"[#vscocam, #sun, #ootd]",🌤🌿,3,13.848034,1
8570,. Without blending . Step by step tutorial is uploaded on my YouTube channel (link in bio) . #robertdowneyjr #avenger #avengers #marvel #dailysketch #draw #sketch #dailysketch #youtuber #youtube #india #indian #photo #graphite #pencilwork #realisticdrawing #souravjoshi_arts #onlyindia #love #passion #work #hardwork #hollywood #souravjoshi_arts #child #graphite #graphitesketch #pencil #art #artistsoninstagram,0,https://scontent-mia3-1.cdninstagram.com/vp/331e385551ffabe329deab8aed875d56/5DED8E4E/t51.2885-15/e35/62271625_2516249665054887_3029331978852940030_n.jpg?_nc_ht=scontent-mia3-1.cdninstagram.com,2089842461723063552,2173,souravjoshi_arts,B0AnKV_h_Vb,1563348656,108.65,30827.0,0.07049,0.0,True,0.704902,"[#robertdowneyjr, #avenger, #avengers, #marvel, #dailysketch, #draw, #sketch, #dailysketch, #youtuber, #youtube, #india, #indian, #photo, #graphite, #pencilwork, #realisticdrawing, #souravjoshi_arts, #onlyindia, #love, #passion, #work, #hardwork, #hollywood, #souravjoshi_arts, #child, #graphite, #graphitesketch, #pencil, #art, #artistsoninstagram]",. Without blending . Step by step tutorial is uploaded on my YouTube channel (link in bio) .,30,7.049015,1
9386,"Ladies, who wants a sugar daddy? 🤔😉 He's old, rich and handsome 🤣 . . . . #oldpic #oldman #rapperbigdeal #rapmonster #OneKidWithADream #OKWAD #Indianrapper #rap #hiphop #rapper #rhymes #NEO",0,https://scontent-mia3-1.cdninstagram.com/vp/7b9d2191431d316359d369084d6e9078/5DB88691/t51.2885-15/e35/66839233_2133451540115128_868441452599873911_n.jpg?_nc_ht=scontent-mia3-1.cdninstagram.com,2089837971655543296,995,rapperbigdeal,B0AmJASpsmn,1563348121,49.75,15510.0,0.064152,0.0,True,0.641522,"[#oldpic, #oldman, #rapperbigdeal, #rapmonster, #OneKidWithADream, #OKWAD, #Indianrapper, #rap, #hiphop, #rapper, #rhymes, #NEO]","Ladies, who wants a sugar daddy? 🤔😉 He's old, rich and handsome 🤣 . . . .",12,6.415216,1
238,#nature 🍃,0,https://scontent-iad3-1.cdninstagram.com/vp/9a6019bb420cd8e0a30f9e28cda33fbb/5DB58851/t51.2885-15/e35/p1080x1080/66314422_906458573022953_8878522325455051653_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com,2090152962635452672,1588,gokselcetin,B0BtwugHdEi,1563385671,79.4,25361.0,0.062616,0.0,True,0.626158,[#nature],🍃,1,6.261583,1
8799,,0,https://scontent-mia3-1.cdninstagram.com/vp/c97cbb2cc8d7a842fbd3f7c260eb6b22/5DCC2B58/t51.2885-15/e35/37606932_2164406227121874_2334784788071710720_n.jpg?_nc_ht=scontent-mia3-1.cdninstagram.com,1833306005057114880,1125,e.k.6,BlxNhksBRN8,1532767128,56.25,17982.0,0.062563,0.0,True,0.625626,[],,0,6.256256,1
22485,How am I looking ? 😉 Kya aap sab tab bhi mujhse itna pyar karenge? #pvp #buddha #greyhair #stylish #potd #amusing #raghbir #love #life #bepanahpyaarr #old #oldagechallenge,0,https://scontent-mia3-1.cdninstagram.com/vp/6f405c9ed7488753fd7157dabdbdeca2/5DED7733/t51.2885-15/e35/p1080x1080/66114975_613305872493744_5830119668265920184_n.jpg?_nc_ht=scontent-mia3-1.cdninstagram.com,2089853325155976704,88240,pearlvpuri,B0ApobWl0Xg,1563349951,4412.0,1440875.0,0.061241,0.0,True,0.612406,"[#pvp, #buddha, #greyhair, #stylish, #potd, #amusing, #raghbir, #love, #life, #bepanahpyaarr, #old, #oldagechallenge]",How am I looking ? 😉 Kya aap sab tab bhi mujhse itna pyar karenge?,12,6.124057,1
25864,Switches are controlled by our arms #goodfriends #happy #love #life #enjoy #lifestyle #photography #photooftheday #picoftheday #mine #costume #goodvibes #travel #trend #vines #insta #instagram #red #swag #world #famous #insta #india #indian,0,https://scontent-mia3-1.cdninstagram.com/vp/3652c0b1a1db863efe8ef721f29ccefb/5DA8CEDB/t51.2885-15/e35/66483500_332630997677364_6923733659969393067_n.jpg?_nc_ht=scontent-mia3-1.cdninstagram.com,2089859166461950208,984,___1950__,B0Aq9bfjvkw,1563350648,49.2,16660.0,0.059064,0.0,True,0.590636,"[#goodfriends, #happy, #love, #life, #enjoy, #lifestyle, #photography, #photooftheday, #picoftheday, #mine, #costume, #goodvibes, #travel, #trend, #vines, #insta, #instagram, #red, #swag, #world, #famous, #insta, #india, #indian]",Switches are controlled by our arms,24,5.906363,1


## Popular Hashtags

The users above have a high engagement rate on the post that I've scraped and from a cursory glance, it seems as though they use a large number of hashtags. This makes sense given that you would want to use as many hashtags as possible in order to reach as wide of an audience as possible. 

In [66]:
df['engagement_rate'].corr(df['number_of_hashtags'])

0.01302099834963125

But once again, even with our dataset slightly corrected, it doesn't seem like this is the case. Maybe it would be better to look at specifically which hashtags are being used?

In [73]:
hashtag_count = {}
for list_hashtag in df['hashtags']:
    for hashtag in list_hashtag:
        if hashtag in hashtag_count:
            hashtag_count[hashtag] += 1
        else:
            hashtag_count[hashtag] = 1

In [74]:
max(hashtag_count, key=hashtag_count.get) #seeing the most commonly used hashtag

'#love'

It makes sense that the most commonly used hashtag is one of the 15 that we used to get the post in the first place. Let's try to weight these values by the engagement_rate now though

In [85]:
hashtag_count_weighted = {}
for list_hashtag, engagement in zip(df['hashtags'], df['engagement_rate']):
    for hashtag in list_hashtag:
        if hashtag in hashtag_count_weighted:
            hashtag_count_weighted[hashtag] += engagement
        else:
            hashtag_count_weighted[hashtag] = engagement

In [87]:
print(hashtag_count)



In [90]:
df[df['number_of_hashtags'] != 0].count() 

caption                      26018
comments                     26018
display_url                  26018
id                           26018
likes                        26018
owner_name                   26018
shortcode                    26018
taken_at_timestamp           26018
caption_rating               26018
followers                    26018
normalized_likes             26018
normalized_comments          26018
english                      26018
normalized_caption_rating    26018
hashtags                     26018
caption_no_hashtags          26018
number_of_hashtags           26018
valid_followers              26018
engagement_rate              26018
dtype: int64

From this, we see that the 5 most impactful hashtags for getting a high engagement rate (associated with high number of followers, likes and comments) are love, instagood, photooftheday, fashion and beautiful. In fact, about 72% of the all of the posts that we have that uses hashtags uses #love (for the other four listed above, they're 30.8%, 26.3%, 17.7%, 16.7%, respectively). There's some talk out there about avoiding these top hashtags because your post gets buried very quickly just by the sheer volume of people using it, but the stats show that these hashtag are still the best to use. 