**PLEASE MAKE A COPY BEFORE CHANGING**

Copyright 2021 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.


#Important
This content are intended for educational and informational purposes only.

# Configuration 


## ADH APIs Configuration Steps
 - Enable the ADH v1 API in the Google Cloud Storage account you use to access the API.
When searching for the API in your GCP Console API Library, use the search term “adsdatahub”. 
 - Go to the [Google Developers Console](https://console.developers.google.com/) and verify that you have access to your Google Cloud project via the drop-down menu at the top of the page. **If you don't see the right Google Cloud project, you should reach out to your Ads Data Hub team to get access.**
 - From the project drop-down menu, select your Big Query project.
 - Click on the hamburger button on the top left corner of the page and click **APIs & services > Credentials**.
 - If you have not done so already, create an API key by clicking the **Create credentials** drop-down menu and select **API key**. This will create an API key that you will need for a later step.
 - If you have not done so already, create a new OAuth 2.0 client ID by clicking the **Create credentials** button and select **OAuth client ID**. For the **Application type** select **Other** and optionally enter a name to be associated with the client ID. Click **Create** to create the new Client ID and a dialog will appear to show you your client ID and secret. On the [Credentials page](https://console.cloud.google.com/apis/credentials) for
   your project, find your new client ID listed under **OAuth 2.0 client IDs**, and click the corresponding download icon. The downloaded file will contain your credentials, which will be needed to step through the OAuth 2.0 installed application flow.
- update the `DEVELOPER_KEY` field to match the
   API key you retrieved earlier.
- Rename the credentials file you downloaded earlier to adh-key.json and upload the file in this colab (on the left menu click on the "Files" tab and then click on the "upload" button

In [None]:
# The Developer Key is used to retrieve a discovery document containing the
# non-public Full Circle Query v2 API. This is used to build the service used
# in the samples to make API requests. Please see the README for instructions
# on how to configure your Google Cloud Project for access to the Full Circle
# Query v2 API.
DEVELOPER_KEY = 'yourkey' #'INSERT_DEVELOPER_KEY_HERE'

# The client secrets file can be downloaded from the Google Cloud Console.
CLIENT_SECRETS_FILE = 'adh-key.json' #'Make sure you have correctly renamed this file and you have uploaded it in this colab'


## Install Dependencies

In [None]:
import json
import sys
import argparse
import pprint
import random
import datetime
import pandas as pd
import plotly.plotly as py
import plotly.graph_objs as go

from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient import discovery
from oauthlib.oauth2.rfc6749.errors import InvalidGrantError
from google.auth.transport.requests import AuthorizedSession
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from plotly.offline import iplot
from plotly.graph_objs import Contours, Histogram2dContour, Marker, Scatter
from googleapiclient.errors import HttpError

In [None]:
from google.colab import auth
auth.authenticate_user()
print('Authenticated')

## Define function to enable charting library

In [None]:
# Allow plot images to be displayed
%matplotlib inline

# Functions
def enable_plotly_in_cell():
    import IPython
    from plotly.offline import init_notebook_mode
    display(IPython.core.display.HTML('''
          <script src="/static/components/requirejs/require.js"></script>
    '''))
    init_notebook_mode(connected=False)

## Authenticate against the ADH API

[ADH documentation](https://developers.google.com/ads-data-hub/)

In [None]:
#!/usr/bin/python
#
# Copyright 2017 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Utilities used to step through OAuth 2.0 flow.

These are intended to be used for stepping through samples for the Full Circle
Query v2 API.
"""

_APPLICATION_NAME = 'ADH Campaign Overlap'
_CREDENTIALS_FILE = 'fcq-credentials.json'
_SCOPES = 'https://www.googleapis.com/auth/adsdatahub' 
_DISCOVERY_URL_TEMPLATE = 'https://%s/$discovery/rest?version=%s&key=%s'
_FCQ_DISCOVERY_FILE = 'fcq-discovery.json'
_FCQ_SERVICE = 'adsdatahub.googleapis.com'
_FCQ_VERSION = 'v1'
_REDIRECT_URI = 'urn:ietf:wg:oauth:2.0:oob'
_SCOPE = ['https://www.googleapis.com/auth/adsdatahub']
_TOKEN_URI = 'https://accounts.google.com/o/oauth2/token'

MAX_PAGE_SIZE = 50


def _GetCredentialsFromInstalledApplicationFlow():
  """Get new credentials using the installed application flow."""
  flow = InstalledAppFlow.from_client_secrets_file(
      CLIENT_SECRETS_FILE, scopes=_SCOPE)
  flow.redirect_uri = _REDIRECT_URI  # Set the redirect URI used for the flow.

  auth_url, _ = flow.authorization_url(prompt='consent')

  print ('Log into the Google Account you use to access the adsdatahub Query '
         'v1 API and go to the following URL:\n%s\n' % auth_url)
  print 'After approving the token, enter the verification code (if specified).'
  code = raw_input('Code: ')

  try:
    flow.fetch_token(code=code)
  except InvalidGrantError as ex:
    print 'Authentication has failed: %s' % ex
    sys.exit(1)

  credentials = flow.credentials
  _SaveCredentials(credentials)
  return credentials


def _LoadCredentials():
  """Loads and instantiates Credentials from JSON credentials file."""
  with open(_CREDENTIALS_FILE, 'rb') as handler:
    stored_creds = json.loads(handler.read())

  creds = Credentials(client_id=stored_creds['client_id'],
                      client_secret=stored_creds['client_secret'],
                      token=None,
                      refresh_token=stored_creds['refresh_token'],
                      token_uri=_TOKEN_URI)

  return creds


def _SaveCredentials(creds):
  """Save credentials to JSON file."""
  stored_creds = {
      'client_id': getattr(creds, '_client_id'),
      'client_secret': getattr(creds, '_client_secret'),
      'refresh_token': getattr(creds, '_refresh_token')
  }

  with open(_CREDENTIALS_FILE, 'wb') as handler:
    handler.write(json.dumps(stored_creds))


def GetCredentials():
  """Get stored credentials if they exist, otherwise return new credentials.

  If no stored credentials are found, new credentials will be produced by
  stepping through the Installed Application OAuth 2.0 flow with the specified
  client secrets file. The credentials will then be saved for future use.

  Returns:
    A configured google.oauth2.credentials.Credentials instance.
  """
  try:
    creds = _LoadCredentials()
    creds.refresh(Request())
  except IOError:
    creds = _GetCredentialsFromInstalledApplicationFlow()

  return creds


def GetDiscoveryDocument():
  """Downloads the adsdatahub v1 discovery document.

  Downloads the adsdatahub v1 discovery document to fcq-discovery.json
  if it is accessible. If the file already exists, it will be overwritten.

  Raises:
    ValueError: raised if the discovery document is inaccessible for any reason.
  """
  credentials = GetCredentials()
  discovery_url = _DISCOVERY_URL_TEMPLATE % (
      _FCQ_SERVICE, _FCQ_VERSION, DEVELOPER_KEY)

  auth_session = AuthorizedSession(credentials)

  discovery_response = auth_session.get(discovery_url)

  if discovery_response.status_code == 200:
    with open(_FCQ_DISCOVERY_FILE, 'wb') as handler:
      handler.write(discovery_response.text)
  else:
    raise ValueError('Unable to retrieve discovery document for api name "%s"'
                     'and version "%s" via discovery URL: %s'
                     % _FCQ_SERVICE, _FCQ_VERSION, discovery_url)


def GetService():
  """Builds a configured adsdatahub v1 API service.

  Returns:
    A googleapiclient.discovery.Resource instance configured for the adsdatahub v1 service.
  """
  credentials = GetCredentials()
  discovery_url = _DISCOVERY_URL_TEMPLATE % (
      _FCQ_SERVICE, _FCQ_VERSION, DEVELOPER_KEY)

  service = discovery.build(
      'adsdatahub', _FCQ_VERSION, credentials=credentials,
      discoveryServiceUrl=discovery_url)
  return service


def GetServiceFromDiscoveryDocument():
  """Builds a configured Full Circle Query v2 API service via discovery file.

  Returns:
    A googleapiclient.discovery.Resource instance configured for the Full Circle
      Query API v2 service.
  """
  credentials = GetCredentials()

  with open(_FCQ_DISCOVERY_FILE, 'rb') as handler:
    discovery_doc = handler.read()

  service = discovery.build_from_document(
      service=discovery_doc, credentials=credentials)

  return service

try:
  full_circle_query = GetService()
except IOError as ex:
  print ('Unable to create ads data hub service - %s' % ex)
  print ('Did you specify the client secrets file in samples_util.py?')
  sys.exit(1)

try:
  # Execute the request.
  response = full_circle_query.customers().list().execute()
except HttpError as e:
  print (e)
  sys.exit(1)

if 'customers' in response:
  print ('ADH API Returned {} Ads Data Hub customers for the current user!'.format(len(response['customers'])))
  for customer in response['customers']:
    print(json.dumps(customer))
else:
  print ('No customers found for current user.')


# Frequency Analysis 

<b>Purpose:</b> This tool should be used to guide you defining an optimal frequency cap considering the CTR curve. Due to that it is more useful in awareness use cases.

**Key notes**

*   For some campaings the user ID will be <b>zeroed</b> (e.g. Googel Data, ITP browsers and YouTube Data), therefore <b>excluded</b> from the analysis. For more information click <a href="https://support.google.com/dcm/answer/9006418" > here</a>;
*   It will be only included in the analysis campaigns which clicks and impressions were tracked.

**Instructions**
*   First of all: <b>MAKE A COPY</b> =);
*   Fulfill the query parameters in the Box 1;
*   In the menu above click in Runtime > Run All;
*   Authorize your credentials;
*   Go to the end of the colab and your figures will be ready;
*   After defining what should be the optimal frequency cap fill it in the Box 2 and press play.

### Step 1 - Instructions - Defining parameters to find the optimal frequency

*  <b>max_freq:</b> Stands for the amount of frequency you want to plot the graphics (e.g. if you put 50, you will look for impressions that was shown up to 50 times for users);
*  <b>id_type:</b> How do you want to filter your data (if you don't want to filter leave it blank);
*  <b>IDs:</b> Accordingly to the id_type chosen before, fill in this field following this patterns: 'id-1111', 'id-2222', ...


In [None]:
#@title Define ADH configuration parameters
customer_id = 000000001 #@param
dataset_id = 000000001 #@param
query_name = "query_name" #@param {type:"string"}
big_query_project = 'bq_project_id' #@param Destination Project ID {type:"string"}
big_query_dataset = 'dataset_name' #@param Destination Dataset {type:"string"}
big_query_destination_table = 'table_name' #@param Destination Table {type:"string"}
start_date = '2019-09-01' #@param {type:"date", allow-input: true}
end_date = '2019-09-30' #@param {type:"date", allow-input: true}
max_freq = 100 #@param {type:"integer", allow-input: true}
cpm = 5 #@param {type:"number", allow-input: true}
id_type = "campaign_id" #@param ["", "advertiser_id", "campaign_id", "placement_id", "ad_id"] {type: "string", allow-input: false}
IDs = "" #@param {type: "string", allow-input: true}

### Step 2 - Create a function for the final calculations

From DT data Calculate metrics using pandas 
Pass through the pandas dataframe when you call this function 

In [None]:

def df_calc_fields(df):

    df['ctr'] = df.clicks / df.impressions
    df['cost'] = (df.impressions / 1000 ) * cpm
    df['cpc'] = df.cost / df.clicks
    df['cumulative_clicks'] = df.clicks.cumsum()
    df['cumulative_impressions'] = df.impressions.cumsum()
    df['cumulative_reach'] = df.reach.cumsum()
    df['cumulative_cost'] = df.cost.cumsum()
    df['coverage_clicks'] = df.cumulative_clicks / df.clicks.sum()
    df['coverage_impressions'] = df.cumulative_impressions / df.impressions.sum()
    df['coverage_reach'] = df.cumulative_reach / df.reach.sum()

    return df

### Step 3 - Build the query

Set up the vairables

In [None]:
# Build the query

dc = {}


if (IDs == ""):
  dc['ID_filters'] = ""

else:
  dc['id_type'] = id_type
  dc['IDs'] = IDs
  dc['ID_filters'] = '''AND event.{id_type} IN ({IDs})'''.format(**dc)

Part 1 - Find all impressions from the impression table: 
* Select all user IDs from the impression table
* Select the event_time
* Mark the interaction type as 'imp' for all of these rows
* Filter for the dates set in Step 1 using the partition files to reduce bigQuery costs by only searching in files within a 2 day interval of the set date range
* Filter out any user IDs that are 0
* If specific ID filters were applied in Step 1 filter the data for those IDs

In [None]:
q1 = """
    WITH
    imp_u_clicks AS (
    SELECT
        User_ID,
        event.event_time AS interaction_time,
        'imp' AS interaction_type
    FROM
        adh.cm_dt_impressions
    WHERE
        user_id != '0'
        {ID_filters}
"""

Part 2 - Find all clicks from the clicks table: 

* Select all User IDs from the click table
* Select the event_time
* Mark the interaction type as 'click' for all of these rows
* Filter for the dates set in Step 1 using the partition files to reduce BigQuery costs by only searching in files within a 2 day interval of the set date range
* If specific ID filters were applied in Step 2 filter the data for those IDs 
* **Use a union to create a single table with both impressions and clicks**


In [None]:
q2 = """
    UNION ALL (
        SELECT
          User_ID,
          event.event_time AS interaction_time,
          'click' AS interaction_type
        FROM
          adh.cm_dt_clicks
        WHERE
           user_id != '0'
          {ID_filters} ) ),
"""

output example: 

<table>
  <tr>
    <th>USER_ID</th>
    <th>interaction_time</th> 
    <th>interaction_type</th>
  </tr>
  <tr>
    <td>001</td>
    <td>timestamp</td> 
    <td>impression</td>
  </tr>
  <tr>
    <td>001</td>
    <td>timestamp</td> 
    <td>impression</td>
  </tr>
    <tr>
    <td>001</td>
    <td>timestamp</td> 
    <td>click</td>
  </tr>
    <tr>
    <td>002</td>
    <td>timestamp</td> 
    <td>impression</td>
  </tr>
    </tr>
    <tr>
    <td>002</td>
    <td>timestamp</td> 
    <td>click</td>
  </tr>
    </tr>
    <tr>
    <td>003</td>
    <td>timestamp</td> 
    <td>impression</td>
  </tr>
    <tr>
    <td>001</td>
    <td>timestamp</td> 
    <td>impression</td>
  </tr>
</table>

Part 3 - Calculate impressions and clicks per user: 

* For each user, calculate the number of impressions and clicks using the table created in Part 1 and 2 

In [None]:
q3 = """        
    user_level_data AS (
    SELECT
        user_id,
        SUM(IF(interaction_type = 'imp',
            1,
            0)) AS impressions,
        SUM(IF(interaction_type = 'click',
            1,
            0)) AS clicks
    FROM
        imp_u_clicks
    GROUP BY
        user_id)

"""

output example: 

<table>
  <tr>
    <th>USER_ID</th>
    <th>impressions</th> 
    <th>clicks</th>
  </tr>
  <tr>
    <td>001</td>
    <td>3</td> 
    <td>1</td>
  </tr>
    <tr>
    <td>002</td>
    <td>1</td> 
    <td>1</td>
  </tr>
      <tr>
    <td>003</td>
    <td>1</td> 
    <td>0</td>
  </tr>
  
</table>

Part 4 - Calculate metrics per frequency: 

* Use the table created in Part 3 with metrics at user level to calculate metrics per each frequency
* Frequency: The number of impressions served to each user
* Clicks: The sum of clicks that occured at each frequency
* Impressions: The sum of all impressions that occured at each frequency 
* Reach: The total number of unique users (the count of all user ids) 
* Group by Frequency

In [None]:
q4 = """
    SELECT
    impressions AS frequency,
    SUM(clicks) AS clicks,
    SUM(impressions) AS impressions,
    COUNT(*) AS reach
    FROM
    user_level_data
    GROUP BY
    1

    ORDER BY
    frequency ASC
"""

output example: 

<table>
  <tr>
    <th>frequency</th>
    <th>clicks</th> 
    <th>impression</th>
    <th>reach</th>
  </tr>
  <tr>
    <td>1</td>
    <td>1</td> 
    <td>2</td>
     <td>2</td>
  </tr>
 <tr>
    <td>2</td>
    <td>0</td> 
    <td>0</td>
     <td>0</td>
  </tr>
   <tr>
    <td>3</td>
    <td>1</td> 
    <td>3</td>
     <td>1</td>
  </tr>
  
</table>

Join the query and use pythons format method to pass in your parameters set in step 1 

In [None]:
query_text = (q1 + q2 + q3 + q4).format(**dc)
print(query_text)  

Create the query required for ADH 
* When working with ADH the standard BigQuery query needs to be adapted to run in ADH
* This can be done bia the API

In [None]:

try:
  full_circle_query = GetService()
except IOError, ex:
  print 'Unable to create ads data hub service - %s' % ex
  print 'Did you specify the client secrets file?'
  sys.exit(1)



query_create_body = {
        'name': query_name,
        'title': query_name,
        'queryText': query_text
}

try:
  # Execute the request.
  new_query = full_circle_query.customers().analysisQueries().create(body=query_create_body, parent='customers/' + str(customer_id)).execute()
  new_query_name = new_query["name"]
except HttpError as e:
  print e
  sys.exit(1)

print 'New query %s created for customer ID "%s":' % (new_query_name, customer_id)
print(json.dumps(new_query))

#### Full Query

In [None]:
# Build the query

dc = {}


if (IDs == ""):
  dc['ID_filters'] = ""

else:
  dc['id_type'] = id_type
  dc['IDs'] = IDs
  dc['ID_filters'] = '''AND event.{id_type} IN ({IDs})'''.format(**dc)

query_text = """
    WITH
    imp_u_clicks AS (
    SELECT
        User_ID,
        event.event_time AS interaction_time,
        'imp' AS interaction_type
    FROM
        adh.cm_dt_impressions
    WHERE
        user_id != '0'
        {ID_filters}


    UNION ALL (
        SELECT
        User_ID,
        event.event_time AS interaction_time,
        'click' AS interaction_type
        FROM
          adh.cm_dt_clicks
        WHERE
        user_id != '0'
        {ID_filters} ) ),
        
    user_level_data AS (
    SELECT
        user_id,
        SUM(IF(interaction_type = 'imp',
            1,
            0)) AS impressions,
        SUM(IF(interaction_type = 'click',
            1,
            0)) AS clicks
    FROM
        imp_u_clicks
    GROUP BY
        user_id)


    SELECT
    impressions AS frequency,
    SUM(clicks) AS clicks,
    SUM(impressions) AS impressions,
    COUNT(*) AS reach
    FROM
    user_level_data
    GROUP BY
    1

    ORDER BY
    frequency ASC

""".format(**dc)
print(query_text)  

try:
  full_circle_query = GetService()
except IOError, ex:
  print 'Unable to create ads data hub service - %s' % ex
  print 'Did you specify the client secrets file?'
  sys.exit(1)


query_create_body = {
        'name': query_name,
        'title': query_name,
        'queryText': query_text
}

try:
  # Execute the request.
  new_query = full_circle_query.customers().analysisQueries().create(body=query_create_body, parent='customers/'+ str(customer_id)).execute()
  new_query_name = new_query["name"]
except HttpError as e:
  print e
  sys.exit(1)

print 'New query %s for customer ID "%s":' % (new_query_name, customer_id)
print(json.dumps(new_query))

#### Check your query exists

https://adsdatahub.google.com/u/0/#/jobs

1.   Find your query in the my queries tab
2.   Check and ensure your query is valid (there will be a green tick in the top right corner)
3. If your query is not valid hover over the red exclamation mark to see issues that need to be resolved


### Step 4 - Run the query

#### Start the query

* Pass the query in to ADH using the full_circle_query method set at the start
* Pass in the dates, the destination table name in BigQuery and the customer ID

In [None]:
destination_table_full_path = big_query_project + '.' + big_query_dataset + '.' + big_query_destination_table

CUSTOMER_ID = customer_id
DATASET_ID = dataset_id
QUERY_NAME = query_name
DEST_TABLE = destination_table_full_path

#Dates
format_str = '%Y-%m-%d' # The format
start_date_obj = datetime.datetime.strptime(start_date, format_str)
end_date_obj = datetime.datetime.strptime(end_date, format_str)

START_DATE = {
  "year": start_date_obj.year,
  "month": start_date_obj.month,
  "day": start_date_obj.day
}
END_DATE = {
  "year": end_date_obj.year,
  "month": end_date_obj.month,
  "day": end_date_obj.day
}

try:
  full_circle_query = GetService()
except IOError, ex:
  print('Unable to create ads data hub service - %s' % ex)
  print('Did you specify the client secrets file?')
  sys.exit(1)

query_start_body = {
  'spec': {
      'startDate': START_DATE,
      'endDate': END_DATE,
      'adsDataCustomerId': DATASET_ID
        },
  'destTable': DEST_TABLE,
  'customerId': CUSTOMER_ID
}

try:
  # Execute the request.
  operation = full_circle_query.customers().analysisQueries().start(body=query_start_body, name=new_query_name).execute()

except HttpError as e:
  print(e)
  sys.exit(1)

print('Running query with name "%s" via the following operation:' % query_name)
print(json.dumps(operation))


#### Retrieve the results from BigQuery

Check to make sure the query has finished running and is saved in the new BigQuery TAble 
When it is done we cane retrieve it

In [None]:
import time
statusDone = False

while statusDone is False:
  print("waiting for the job to complete...")
  updatedOperation = full_circle_query.operations().get(name=operation['name']).execute()
  if updatedOperation.has_key('done') and updatedOperation['done'] == True:
    statusDone = True
  time.sleep(5)

print("Job completed... Getting results")
#run bigQuery query
dc = {}
dc['table'] = big_query_dataset + '.' + big_query_destination_table
q1 = '''
select * from {table} 
  '''.format(**dc)


We are using the pandas library to run the query. 

We pass in the query (q), the project id and set the SQL language to 'standard' (as opposed to legacy SQL)

In [None]:
# Run query as save as a table (also known as dataframe)
df = pd.io.gbq.read_gbq(q1, project_id=big_query_project, dialect='standard', reauth=True)
print(df)


Save the output as a CSV

In [None]:
# Save the original dataframe as a csv file in case you need to recover the original data
df.to_csv('base_final_user.csv', index=False)

## Step 6 - Set up the data and all the charts that will be plotted

### 6.1 Transform data
 Use the calculation function created to calculate all the values based off your data

In [None]:
df = df[1:max_freq+1] # Reduces the dataframe to have the size you set as the maximum frequency (max_freq)
df = df_calc_fields(df)
df2=df.copy() # Copy the dataframe you calculated the fields in case you need to recover it
graphs = [] # Variable to save all graphics

# Analysis 1: Frequency Analysis by user


### Step 1: Set up graphs

In [None]:
# Save all data into a list to plot the graphics

impressions = dict(type='bar', x=df.frequency, y=df.impressions,
                   name='impressions',
                   marker=dict(color='rgb(0, 29, 255)',
                   line=dict(width=1)))

ctr = dict(
    type='scatter',
    x=df.frequency,
    y=df.ctr,
    name='ctr',
    marker=dict(color='rgb(255, 148, 0)', line=dict(width=1)),
    xaxis='x1',
    yaxis='y2',
    )

layout = dict(
    title='Impressions and CTR Comparison on Each Frequency',
    autosize=True,
    legend=dict(x=1.15, y=1),
    hovermode='x',
    xaxis=dict(tickangle=-45, autorange=True, tickfont=dict(size=10),
               title='frequency', type='category'),
    yaxis=dict(showgrid=True, title='impressions'),
    yaxis2=dict(overlaying='y', anchor='x', side='right',
                showgrid=False, title='ctr'),
    )

fig = dict(data=[impressions, ctr], layout=layout)
graphs.append(fig)


			
clicks = dict(type='bar',
              x= df.frequency, 
              y= df.clicks, 
              name='Clicks',            
              marker=dict(color= 'rgb(0, 29, 255)', line= dict(width= 1))
  )
 
ctr = dict(type='scatter',
              x= df.frequency, 
              y= df.cpc, 
              name='cpc',            
              marker=dict(color= 'rgb(255, 148, 0)', line= dict(width= 1)),
              xaxis='x1',
              yaxis='y2'
              
  )

layout = dict(autosize= True, 
              title='Clicks and CPC Comparison on Each Frequency',                   
              legend= dict(x= 1.15, 
                           y= 1
                          ),
              hovermode='x',
              xaxis=dict(tickangle= -45, 
                         autorange=True,
                         tickfont=dict(size= 10), 
                         title= 'frequency', 
                         type= 'category'
                        ), 

              yaxis=dict(
                         showgrid=True, 
                         title= 'clicks'
                        ), 
              yaxis2=dict( 
                          overlaying= 'y', 
                          anchor= 'x', 
                          side= 'right', 
                          showgrid= False, 
                          title= 'cpc'
                         )
             )

fig = dict(data=[clicks, ctr], layout=layout)
graphs.append(fig)



ctr = dict(type='scatter',
              x= df.frequency, 
              y= df.ctr, 
              name='ctr',            
              marker=dict(color= 'rgb(0, 29, 255)', line= dict(width= 1))
  )
 
cpc = dict(type='scatter',
              x= df.frequency, 
              y= df.cpc, 
              name='cpc',            
              marker=dict(color= 'rgb(255, 148, 0)', line= dict(width= 1)),
              xaxis='x1',
              yaxis='y2'
              
  )

layout = dict(autosize= True, 
              title='CTR and CPC Comparison on Each Frequency',
              legend= dict(x= 1.15, 
                           y= 1
                          ),
              hovermode='x',
              xaxis=dict(tickangle= -45, 
                         autorange=True,
                         tickfont=dict(size= 10), 
                         title= 'frequency', 
                         type= 'category',
                         showgrid =False
                        ), 

              yaxis=dict(
                         showgrid=False, 
                         title= 'ctr'
                        ), 
              yaxis2=dict( 
                          overlaying= 'y', 
                          anchor= 'x', 
                          side= 'right', 
                          showgrid= False, 
                          title= 'cpc'
                         )
             )

fig = dict(data=[ctr, cpc], layout=layout)
graphs.append(fig)



pareto = dict(type='scatter',
              x= df.frequency, 
              y= df.coverage_clicks, 
              name='Cumulative % Clicks',            
              marker=dict(color= 'rgb(0, 29, 255)', line= dict(width= 1))
  )
 
cpc = dict(type='scatter',
              x= df.frequency, 
              y= df.cpc, 
              name='cpc',            
              marker=dict(color= 'rgb(255, 148, 0)', line= dict(width= 1)),
              xaxis='x1',
              yaxis='y2'
              
  )

layout = dict(autosize= True, 
              title='Cumulative Clicks and CPC Comparison on Each Frequency',
              legend= dict(x= 1.15, 
                           y= 1
                          ),
              hovermode='x',
              xaxis=dict(tickangle= -45, 
                         autorange=True,
                         tickfont=dict(size= 10), 
                         title= 'frequency', 
                         type= 'category'
                        ), 

              yaxis=dict(
                         showgrid=True, 
                         title= 'cum clicks'
                        ), 
              yaxis2=dict( 
                          overlaying= 'y', 
                          anchor= 'x', 
                          side= 'right', 
                          showgrid= False, 
                          title= 'cpc'
                         )
             )

fig = dict(data=[pareto, cpc], layout=layout)
graphs.append(fig)



pareto = dict(type='scatter',
              x= df.frequency, 
              y= df.coverage_clicks, 
              name='Cumulative % Clicks',            
              marker=dict(color= 'rgb(0, 29, 255)', line= dict(width= 1))
  )
 
cpc = dict(type='scatter',
              x= df.frequency, 
              y= df.ctr, 
              name='ctr',            
              marker=dict(color= 'rgb(255, 148, 0)', line= dict(width= 1)),
              xaxis='x1',
              yaxis='y2'
              
  )

layout = dict(autosize= True, 
              title='Cumulative Clicks and CTR Comparison on Each Frequency',
              legend= dict(x= 1.15, 
                           y= 1
                          ),
              hovermode='x',
              xaxis=dict(tickangle= -45, 
                         autorange=True,
                         tickfont=dict(size= 10), 
                         title= 'frequency', 
                         type= 'category'
                        ), 

              yaxis=dict(
                         showgrid=True, 
                         title= 'cum clicks'
                        ), 
              yaxis2=dict( 
                          overlaying= 'y', 
                          anchor= 'x', 
                          side= 'right', 
                          showgrid= False, 
                          title= 'ctr'
                         )
             )

fig = dict(data=[pareto, cpc], layout=layout)
graphs.append(fig)



pareto = dict(type='scatter',
              x= df.frequency, 
              y= df.coverage_reach, 
              name='Cumulative % Reach',            
              marker=dict(color= 'rgb(0, 29, 255)', line= dict(width= 1))
  )
 
cpc = dict(type='scatter',
              x= df.frequency, 
              y= df.cost, 
              name='cost',            
              marker=dict(color= 'rgb(255, 148, 0)', line= dict(width= 1)),
              xaxis='x1',
              yaxis='y2'
              
  )

layout = dict(autosize= True, 
              title='Cumulative Reach and Cost Comparison on Each Frequency',
              legend= dict(x= 1.15, 
                           y= 1
                          ),
              hovermode='x',
              xaxis=dict(tickangle= -45, 
                         autorange=True,
                         tickfont=dict(size= 10), 
                         title= 'frequency', 
                         type= 'category'
                        ), 

              yaxis=dict(
                         showgrid=True, 
                         title= 'cummulative reach'
                        ), 
              yaxis2=dict( 
                          overlaying= 'y', 
                          anchor= 'x', 
                          side= 'right', 
                          showgrid= False, 
                          title= 'cost'
                         )
             )



### Step 2: Export all the data (optional) 

In [None]:
# Show the first 5 rows of the dataframe (data matrix) with the final data
df.head()

# Export the whole dataframe to a csv file that can be used in an external environment
df.to_csv('freq_analysis.csv', index=False)

## Output: Visualise the data

**Impression and CTR on each frequency**

**Clicks and CPC Comparison on Each Frequency**

**CTR and CPC Comparison on Each Frequency**

**Cumulative Clicks and CPC Comparison on Each Frequency**

**Cumulative Clicks and CTR Comparison on Each Frequency**


### Impression and CTR on each frequency


1. Consider your frequency range, ensure frequency management is in place. 
2. Where is your CTR floor? At what point does your CTR drop below a level that you care about. 
3. Determine what the wasted impressions is if you don't change your frequency.

In [None]:
enable_plotly_in_cell()
iplot(graphs[0])

### Clicks and CPC Comparison on Each Frequency


1. What is your CPC ceiling
2. Understand what the frequency is at that level
3. Determine what impact changing your frequency will have on clicks 

In [None]:
enable_plotly_in_cell()
iplot(graphs[1])

### CTR and CPC Comparison on Each Frequency


1. How does your CTR and CPC impact each other 
2. Make an informed decision regarding suitable goals


In [None]:
enable_plotly_in_cell()
iplot(graphs[2])

### Cumulative Clicks and CPC Comparison on Each Frequency


Understand what a suitable CPC goal might be
1. What is the change in cost for increased clicks
2. What is the incremental gains for an increased cost
 

In [None]:
enable_plotly_in_cell()
iplot(graphs[3])

### Cumulative Clicks and CTR Comparison on Each Frequency


1. At what frequency does your CTR drop below an acceptable value

In [None]:
enable_plotly_in_cell()
iplot(graphs[4])

# Analysis 2: Understanding optimal frequency

## Step 1: Set up charts

In [None]:
#Understand the logic behind calculation
graphs2 = []
pareto = dict(type='scatter',
              x= df.frequency, 
              y= df.coverage_reach, 
              name='Cummulative % Reach',            
              marker=dict(color= 'rgb(0, 29, 255)', line= dict(width= 1))
  )
 
ccm_imp = dict(type='scatter',
              x= df.frequency, 
              y= df.coverage_impressions, 
              name='Cummulative % Impressions',            
              marker=dict(color= 'rgb(255, 148, 0)', line= dict(width= 1)),
              xaxis='x1',
              yaxis='y'
              
  )

layout = dict(autosize= True, 
              title='Cummulative Impressions and Cummulative Reach on Each Frequency',
              legend= dict(x= 1.15, 
                           y= 1
                          ),
              hovermode='x',
              xaxis=dict(tickangle= -45, 
                         autorange=True,
                         tickfont=dict(size= 10), 
                         title= 'frequency', 
                         type= 'category'
                        ), 

              yaxis=dict(
                         showgrid=True, 
                         title= 'cummulative %'
                        )
             )
fig = dict(data=[pareto, ccm_imp], layout=layout)
graphs2.append(fig)


pareto = dict(type='scatter',
              x= df.frequency, 
              y= df.coverage_clicks, 
              name='Cummulative % Clicks',            
              marker=dict(color= 'rgb(0, 29, 255)', line= dict(width= 1))
  )
 
ccm_imp = dict(type='scatter',
              x= df.frequency, 
              y= df.coverage_impressions, 
              name='Cummulative % Impressions',            
              marker=dict(color= 'rgb(255, 148, 0)', line= dict(width= 1)),
              xaxis='x1',
              yaxis='y'
              
  )

layout = dict(autosize= True, 
              title='Cumulative Impressions and Cummulative Clicks on Each Frequency',
              legend= dict(x= 1.15, 
                           y= 1
                          ),
              hovermode='x',
              xaxis=dict(tickangle= -45, 
                         autorange=True,
                         tickfont=dict(size= 10), 
                         title= 'frequency', 
                         type= 'category'
                        ), 

              yaxis=dict(
                         showgrid=True, 
                         title= 'cummulative %'
                        )
             )
fig = dict(data=[pareto, ccm_imp], layout=layout)
graphs2.append(fig)

## Output: Visualise the results

### Cummulative Impressions and Cummulative Reach on Each Frequency


1. How do you maximise your reach without drastically increasing your impressions?
2. To obtain my reach goals, what frequency do I need at what impression cost? 

With higher frequency caps you will need more impressions to maximise your reach

In [None]:
enable_plotly_in_cell()
iplot(graphs2[0])

### Cummulative Impressions and Cummulative Clicks on Each Frequency

1. To obtain my goals in terms of clicks, what frequency do I need, at what impression cost? 



In [None]:
enable_plotly_in_cell()
iplot(graphs2[1])

# Analysis 3: Determine impressions outside optimal frequency

## Step 1: Define parameter to be the Optimal Frequency
This parameter below will guide the analysis of media loss talking about impressions. We will calculate the percentage of impressions that are out of the number you set as the optimal frequency.

In [None]:
#@title 1.1 - Optimal Frequency
optimal_freq = 3 #@param {type:"integer", allow-input: true}

## Output: Calculate impression loss

In [None]:
from __future__ import division

df2 = df_calc_fields(df2)
df_opt, df_not_opt = df[1:optimal_freq+1], df[optimal_freq+1:]

total_impressions = list(df2.cumulative_impressions)[-1]
total_imp_not_opt = list(df_not_opt.cumulative_impressions)[-1] - list(df_opt.cumulative_impressions)[-1]

imp_not_opt_ratio = total_imp_not_opt / total_impressions

total_clicks = list(df2.cumulative_clicks)[-1]
total_clicks_not_opt = list(df_not_opt.cumulative_clicks)[-1] - list(df_opt.cumulative_clicks)[-1]

clicks_within_opt_ratio = 1-(total_clicks_not_opt / total_clicks)


print("{:.1f}% of your total impressions are out of the optimal frequency.".format(imp_not_opt_ratio*100))
print("{:,} of your impressions are out of the optimal frequency".format(total_imp_not_opt))
print("At a CPM of {} - preventing these would result in a cost saving of {:,}".format(cpm, cpm*total_imp_not_opt))
print("")
print("If you limited frequency to {}, you would still achieve {:.1f}% of your clicks").format(optimal_freq, clicks_within_opt_ratio*100)