# Ad Impression Analyzer for Twitter
Released 2022-11-23 under an MIT license. 

This notebook looks at targetting data for twitter ads.  To use it, you need to be able to run Jupyter notebooks, and also requires you to request, download, and unzip your twitter archive.  You can request your twitter archive in your Twitter settings.  Delivery, for me, took about a week. 

Once you have that downloaded and unzipped, put this file in your main archive directory and run all the cells below.  

Once you get to the section called "Let's actually do it", then you can run those cells as many times as you'd like to see the different results.  

Please note that I'm not a pro, so this code is subject to breaking.  Sorry if so, I'm a hobbiest not pro. 

Please note, I am not a professional programmer, so this code likely has issues.  Mainly I'm releasing it after a few folk on Twitter (@jeffemtman) requested it.  If you see issues in my code, please LMK or help me improve it. 

## Open file and process data
file we're looking for is called 'ad-impressions.js'.  It's in the "data" folder.  It's almost JSON and we can use a regular expression to dump the one thing that keeps it from being parsed right by Python's json library.  

In [None]:
import json
import random
from textwrap import indent
from collections import OrderedDict
import re

In [None]:
# path to your file called "ad-impressions.js"
# If you've placed this in the main directory of your twitter archive, then the default path below shouldn't need changing
file_path = "data/ad-impressions.js"

In [None]:
# UTF 8 encoding is important.  otherwise you'll get unicode errors. 
with open(file_path, encoding="utf-8") as f: 
    
    # There's a variable name up top that needs to be avoided. Load the file as a string.
    string_data = f.read()

    # Nasty regex to discard the variable.  Then convert to JSON
    data = json.loads(re.sub('^(.*)(?= )', "", string_data))


## Make a list of all the ads
This is called "library"

In [None]:
library = []
for entry in data: 
    ad_clump = entry['ad']['adsUserData']['adImpressions']['impressions']

    #print(len(ad_clump))
    for i in ad_clump: 
        library.append(i)

## Make a list and count of all advertisers
Here you can see all the advertisers you've had and how many times their ads have been served to you in the data's period. 

## Functions
4 main functions here. 
- `collateAdvertisers()` Create a dictionary with all advertisers and how many ads they've served you.
- `printAd()` which formats the ad's data in a more visually friendly way
- `randomAd()` which randomly selects an ad from your library
- `lookupAdvertiser()` which lets you see all ads from a specific advertiser (use in conjunction with the list of advertisers)

In [None]:
def collateAdvertisers(print_output = False):
    '''Returns a sorted dictionay of all advertisers and how many ads they've served you'''
    advertisers = {}
    for ad in library: 
        advertiser = (ad['advertiserInfo']['advertiserName'])
        
        if advertiser in advertisers.keys(): 
            advertisers[advertiser] += 1
        
        else: 
            advertisers[advertiser] = 1


    # Sort the output from highest count to lowest
    advertisers_sorted = OrderedDict(sorted(advertisers.items(), key=lambda t: t[1])[::-1])

    if print_output == True: 
        for k,v in advertisers_sorted.items(): 
            print(v, "\t", k)

    return advertisers_sorted

In [None]:
def printAd(ad): 
    '''Prints out relevant info on an ad.'''
    date = ad["impressionTime"].split(' ')[0] #discard the time
    device = ad['deviceInfo']['osType']
    advertiser = ad['advertiserInfo']['advertiserName']
    handle = ad['advertiserInfo']['screenName']
    display_loc = ad['displayLocation']
    criteria = ""

    for c in ad['matchedTargetingCriteria']: 
        try:
            criteria += "\t" + c['targetingType'] + " = " + c['targetingValue'] + "\n"
        except KeyError as e: 
            criteria += "\t" + c['targetingType'] + " = " + "NO AVAILABLE INFO" + "\n"

    print(f"""
👁️ Advertiser: {advertiser} ({handle})
         Served on {date} on {device} 
         Display location: {display_loc}
    """)

    print("🔵 Tweet Text:")

    if "promotedTweetInfo" in ad:
        print(indent(ad["promotedTweetInfo"]["tweetText"], "\t")) #indents multiline tweets

        if len(ad["promotedTweetInfo"]["urls"]) > 0: 
            print("\tAdditional URLS:", ad["promotedTweetInfo"]['urls'])

    else: 
        print("\tNo Tweet Text Provided.")

    print("\n🎯 Targetting:\n", criteria)

    print(" - - ✂ - - - - ✂ - - - - ✂ - - - - ✂ - - - - ✂ - - - - ✂ - - - - ✂ - - ")



In [None]:
def lookupAdvertiser(lookup):
    '''Allows you to look up all ads from a specific company. Requires exact text match, otherwise nothing will be returned'''

    for ad in library: 
        if (ad['advertiserInfo']['advertiserName'].lower()) == lookup.lower(): 
            printAd(ad)


In [None]:
def randomAd(print_raw=False):
    '''returns random ad from the library'''
    rando_ad = random.choice(library)
    if print_raw == True: 
        for k,v in rando_ad.items(): 
            print(k, v)

    return rando_ad

## Let's actually do it

### Print the list of advertisers
Let's see who's advertising to us and how many times they've done it in the period covered by our data. 

In [None]:
# create a dictionary called "advertisers" that holds all this data
# can choose to print this data out, or hide it by swapping "print_output" to False
advertisers = collateAdvertisers(print_output=True)

### Print out some random ads
each time the next cell is run, you'll see however many ads you've requested via the "how_many" variable

In [None]:
how_many = 3

for i in range (how_many): 
    our_ad = randomAd()
    printAd(our_ad)

### See all ads from a specific advertiser
Note that `the_company` has to be an exact match.  Helpful to copy-paste from the output of `collateAdvertisers()`

In [None]:
the_company = "CVS Pharmacy"

lookupAdvertiser(the_company)