We're going to build a text classifier, much like a spam classifier or sentiment analyzer. This use case will focus on predicting whether or not we'll like job descriptions in job offer emails from recruiters. By using two labels to represent good & bad job offers, we'll train our classifier to know what kinds of jobs we like.

First, we need data. Our initial input will be labeled email from Gmail.

In [37]:
# Label _names_ for good and bad job offers.
POS_LABEL = 'job-offers/yes'
NEG_LABEL = 'job-offers-no'

In [47]:
import argparse
import httplib2
import os

from apiclient import discovery
import oauth2client
from oauth2client import client
from oauth2client import tools

# Fake the CLI flags
flags = argparse.ArgumentParser(parents=[tools.argparser]).parse_args(
    ['--noauth_local_webserver'])

# If modifying these scopes, delete your previously saved credentials
# at ~/.credentials/gmail-python-quickstart.json
SCOPES = [
    'https://mail.google.com/',
    'https://www.googleapis.com/auth/gmail.readonly',
    'https://www.googleapis.com/auth/gmail.labels'
]
APPLICATION_NAME = 'HeadHunt-Dev'


def get_credentials_path(dirname='~/.credentials', 
                         filename='gmail-python-quickstart.json'):
    creds_dir = os.path.expanduser(dirname)
    if not os.path.exists(creds_dir):
        os.makedirs(creds_dir)
    return os.path.join(creds_dir, filename)


def get_credentials(client_secrets='client_secret.json'):
    """Gets valid user credentials from storage.

    If nothing has been stored, or if the stored credentials are invalid,
    the OAuth2 flow is completed to obtain the new credentials.

    Returns:
        Credentials, the obtained credential.
    """
    credential_path = os.path.join(get_credentials_path())                            
    store = oauth2client.file.Storage(credential_path)
    credentials = store.get()
    if not credentials or credentials.invalid:
        flow = client.flow_from_clientsecrets(client_secrets, SCOPES)
        flow.user_agent = APPLICATION_NAME
        credentials = tools.run_flow(flow, store, flags)
        print('Storing credentials to ' + credential_path)
    return credentials

In [48]:
# Execute OAuth2 flow
credentials = get_credentials()
# Create an enchanced httplib2.Http instance that sends the correct headers
# and can do an access token refresh on a 401.
http = credentials.authorize(httplib2.Http())
# All of Google's APIs are described in documents that this client knows how 
# to retrieve, parse, and build an API client from.
service = discovery.build('gmail', 'v1', http=http)

Go to the following link in your browser:

    https://accounts.google.com/o/oauth2/auth?response_type=code&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&client_id=1032309364874-qukjfm3q9mjnnkqej0gj6gii19f6gqr1.apps.googleusercontent.com&access_type=offline&scope=https%3A%2F%2Fmail.google.com%2F+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fgmail.readonly+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fgmail.labels

Enter verification code: 4/7Fgvirms1t4QnltY_Qo4TSuXELknHsdmbyuumH-jxnA
Authentication successful.
Storing credentials to /home/jovyan/.credentials/gmail-python-quickstart.json


We've authenticated to Google's APIs, generating an access token (`credentials.access_token`). 

In [49]:
import json
with open(get_credentials_path()) as f:
    print(json.dumps(json.load(f), indent=2))

{
  "access_token": "ya29.Ci_-Ann_Jnjs2KN0TM7kR-L0i2-70LhK5pNrR3-9u-XTGzn8eUNLOS7UHMArZm0KQg",
  "token_uri": "https://accounts.google.com/o/oauth2/token",
  "client_secret": "3-i7BsbhSrj0-iA0ixfWawVu",
  "token_info_uri": "https://www.googleapis.com/oauth2/v3/tokeninfo",
  "client_id": "1032309364874-qukjfm3q9mjnnkqej0gj6gii19f6gqr1.apps.googleusercontent.com",
  "revoke_uri": "https://accounts.google.com/o/oauth2/revoke",
  "token_expiry": "2016-06-11T17:38:06Z",
  "id_token": null,
  "_module": "oauth2client.client",
  "scopes": [
    "https://www.googleapis.com/auth/gmail.readonly",
    "https://mail.google.com/",
    "https://www.googleapis.com/auth/gmail.labels"
  ],
  "refresh_token": "1/iIWpZ1RdvmdNmslc_HS9Gj9r7cQRxlpS5wAt-gwCEnE",
  "invalid": false,
  "user_agent": "HeadHunt-Dev",
  "token_response": {
    "access_token": "ya29.Ci_-Ann_Jnjs2KN0TM7kR-L0i2-70LhK5pNrR3-9u-XTGzn8eUNLOS7UHMArZm0KQg",
    "refresh_token": "1/iIWpZ1RdvmdNmslc_HS9Gj9r7cQRxlpS5wAt-gwCEnE",
    "token_

Now let's retrieve some emails.

Gmail lets you retrieve emails by their _id_, not their name. Our first step will be to turn our human-readable label names defined at the top of this notebook into their label IDs.

In [50]:
def get_label_ids(gmail, user_id='me', labels=[]):
    """Build a dictionary of label names => label IDs for further requests.

    Google uses the label's ID when retrieving email, which is different from the label 
    name. We can figure the ID out by retrieving all labels w/ their respective IDs, and
    filtering for just the ones we want.

    Args:
        gmail: Authorized gmail API service instance.
        user_id (str): User's email address, or "me" to indicate the authenticated user.
        labels (list[str]): List of label _names_ to filter on. 
    
    Returns:
        Dictionary of {"label_name": "label_id"}.
        
    Examples:
    
        >>> get_label_ids(service, labels=['my-label', 'category/sub-label'])
        {'my-label': 'Label_34', 'category/sub-label': 'Label_26'}
    """
    # Retrieve all labels
    results = gmail.users().labels().list(userId=user_id).execute()
    _labels = results.get('labels', [])
    # filter for only the labels we want
    return {x['name']: x['id'] for x in _labels if x['name'] in labels}

In [52]:
# lbl_name_id maps {label names => label IDs}
lbl_name_id = get_label_ids(service, labels=[POS_LABEL, NEG_LABEL])
# lbl_id_name maps {label ID => label name}; convenient for readability
lbl_id_name = {v: k for k, v in lbl_name_id.items()}
print(lbl_name_id)

{'job-offers-no': 'Label_60', 'job-offers/yes': 'Label_62'}


With the label IDs recovered, we grab every email we can from Gmail, using only a slight modification to their example code [here](https://developers.google.com/gmail/api/v1/reference/users/messages/list#response).

In [68]:
def get_messages_by_labels(gmail, user_id='me', label_ids=[]):
    """Retrieve sets of messages by label.
    
    Args:
        gmail (googleapiclient.discovery.Resource): API resource configured for Gmail. 
        user_id: User ID we're retrieving on behalf of. The default special-case of 'me' uses
            the authorizing user of the access token.
        label_ids (List[str or List[str]]): List of label IDs to retrieve messages under. 
            
    Returns:
        Dictionary mapping label names to sets of emails.
    """
    from apiclient import errors
    d = {}
    for lbl in label_ids:
        try:
            # Retrieve all messages that match the given label
            response = gmail.users().messages().list(userId=user_id,
                                                     labelIds=[lbl]).execute()
            d[lbl] = []
            if 'messages' in response:  # Append the first round of messages
              d[lbl].extend(response['messages'])

            # Handle the pagination to get the rest
            while 'nextPageToken' in response:
              page_token = response['nextPageToken']
              response = gmail.users().messages().list(userId=user_id,
                                                         labelIds=[lbl],
                                                         pageToken=page_token).execute()
              d[lbl].extend(response['messages'])
        except (errors.HttpError) as e:
            print('An error occurred: {}'.format(e))
    return d

In [69]:
msgs = get_messages_by_labels(service, label_ids=list(lbl_name_id.values()))

In [70]:
for lbl_id, m in msgs.items():
    print("Retrieved {:d} emails labeled '{}'".format(len(m), lbl_id_name[lbl_id]))

Retrieved 39 emails labeled 'job-offers/yes'
Retrieved 112 emails labeled 'job-offers-no'


### Cleaning and storing 

With the emails downloaded, we'll need to pre-process them before we start running analyses. Since we don't want to have to keep re-downloading and processing the emails every time, we'll also need to figure out a storage solution.

We'll throw everything into a pandas DataFrame to make indexing simple, and see if we can't hook it up to a more persistent datastore down the line.

In [71]:
import pandas as pd

Currently, our data is grouped by label ID. Also, importantly, _their is no actual message body_. This will require another lookup from gmail, which we'll get to in a minute. First, let's get this metadata in order.

In [72]:
msgs[lbl_name_id[NEG_LABEL]][:5]

[{'id': '1552c3134e732bff', 'threadId': '1552c3134e732bff'},
 {'id': '15507767e9b5725b', 'threadId': '15507767e9b5725b'},
 {'id': '154deb710bc21e40', 'threadId': '154deb710bc21e40'},
 {'id': '154d084eca46f99d', 'threadId': '154d084eca46f99d'},
 {'id': '154ce64cec2bfc18', 'threadId': '154ce64cec2bfc18'}]

We'll flatten this by inserting each message into the DataFrame, and making LabelID (and label name, for readability's sake) a column.

In [96]:
df = pd.DataFrame()
for lbl_id in msgs:
    # Build a DataFrame only containing labels of one kind
    tmp_df = pd.DataFrame(msgs[lbl_id])
    tmp_df['labelId'] = lbl_id
    tmp_df['labelName'] = lbl_id_name[lbl_id]
    # Append it to our master DataFrame
    df = df.append(tmp_df, ignore_index=True)
    
rand_n = 5
print("Showing %d random rows" % rand_n)
df.sample(rand_n)

                 id          threadId   labelId       labelName
0  154e57999204ba74  154e57999204ba74  Label_62  job-offers/yes
1  154e43cfbc28473a  154e43cfbc28473a  Label_62  job-offers/yes
2  154dfeff4602a39c  154dfeff4602a39c  Label_62  job-offers/yes
3  154bf8d38e70bcd1  154bf8d38e70bcd1  Label_62  job-offers/yes
4  154ab4aa9b3a3328  154ab4aa9b3a3328  Label_62  job-offers/yes
                 id          threadId   labelId      labelName
0  1552c3134e732bff  1552c3134e732bff  Label_60  job-offers-no
1  15507767e9b5725b  15507767e9b5725b  Label_60  job-offers-no
2  154deb710bc21e40  154deb710bc21e40  Label_60  job-offers-no
3  154d084eca46f99d  154d084eca46f99d  Label_60  job-offers-no
4  154ce64cec2bfc18  154ce64cec2bfc18  Label_60  job-offers-no
Showing 5 random rows


Unnamed: 0,id,threadId,labelId,labelName
13,1541c517be4661c3,1541c517be4661c3,Label_62,job-offers/yes
57,1547840f0d45f364,1547840f0d45f364,Label_60,job-offers-no
97,15385b9c3eac6ccb,15385b9c3eac6ccb,Label_60,job-offers-no
49,1549bf574c488d99,1549bf574c488d99,Label_60,job-offers-no
35,15319f3f845241fa,15319f3f845241fa,Label_62,job-offers/yes


Whew! Our data is now tabular, a good starting point. Let's get the body of the emails now.

In [103]:
def get_message(gmail, msg_id, user_id='me'):
  """Get a Message with given ID.
  
  Taken from: https://developers.google.com/gmail/api/v1/reference/users/messages/get#examples

  Args:
    service: Authorized Gmail API service instance.
    user_id: User's email address. The special value "me"
    can be used to indicate the authenticated user.
    msg_id: The ID of the Message required.

  Returns:
    A Message.
  """
  try:
    return gmail.users().messages().get(userId=user_id, id=msg_id).execute()
  except errors.HttpError as e:
    print('An error occurred: %s' % error)
    return None

In [None]:
# Grab whatever the first message is
first_msg = df.iloc[0]
msg = get_message(service, first_msg['id'])

In [122]:
# Pretty-print the result
print(json.dumps(msg, indent=2))

{
  "historyId": "2867343",
  "payload": {
    "body": {
      "size": 3441,
      "data": "PGh0bWw-CjxoZWFkPgoJPHRpdGxlPk5vd-KAmXMgYSBncmVhdCB0aW1lIGZvciBhIGNhcmVlciBjaGFuZ2U8L3RpdGxlPgo8L2hlYWQ-Cgo8Ym9keT4KCkhpIEplcmVteSwKPGJyLz48YnIvPgpJ4oCZdmUgZ290IGEgZ3JlYXQgcm9sZSB5b3UgbWF5IGJlIGludGVyZXN0ZWQgaW46IDxhIGhyZWY9Imh0dHA6Ly93d3cuQ3liZXJDb2RlcnMuY29tL3FjLmFzcHg_cG9zSWQ9Q0M2LTEyODk1MzcmYWQ9Q1MyQ2FtZXJvbi5DYXJsdG9uJnV0bV9zb3VyY2U9Y2FuZGlkYXRlJnV0bV9tZWRpdW09ZW1haWwmdXRtX2NhbXBhaWduPW5ldy1qb2JzMyI-U2VuaW9yIFN5c3RlbXMgRW5naW5lZXIgLSBQeXRob24gJiAiR28iIChsYW5nKTwvYT4KPGJyLz48YnIvPgpQbGVhc2UgcmVwbHkgd2l0aCBhbiB1cGRhdGVkIHJlc3VtZSBvciBhcHBseSBkaXJlY3RseSB0aHJvdWdoIHRoZSBsaW5rIGlmIGl0IHNvdW5kcyBsaWtlIGEgZ29vZCBtYXRjaC4gCjxici8-PGJyLz4KSWYgeW914oCZcmUgbm90IGxvb2tpbmcgYW5kIGtub3cgc29tZW9uZSBlbHNlIHdobyBtYXkgYmUgYSBmaXQsIGZlZWwgZnJlZSB0byBmb3J3YXJkIHRoZW0gdGhpcyBlbWFpbC4gCjxici8-PGJyLz4KTG9va2luZyBmb3J3YXJkIHRvIHlvdXIgcmVzcG9uc2UhCjxici8-PGJyLz4KQmVzdCByZWdhcmRzLAo8YnIvPjxici8-CkNhbWVyb24KPGJyLz48

Kind of alot, and the actual body appears to be base64 encoded. Let's fix that:

In [129]:
import base64
body_data_bytes = base64.urlsafe_b64decode(msg['payload']['body']['data'])
body_data = body_data_bytes.decode() # From bytes to string; Python 3, yo.
print(body_data)

<html>
<head>
	<title>Now’s a great time for a career change</title>
</head>

<body>

Hi Jeremy,
<br/><br/>
I’ve got a great role you may be interested in: <a href="http://www.CyberCoders.com/qc.aspx?posId=CC6-1289537&ad=CS2Cameron.Carlton&utm_source=candidate&utm_medium=email&utm_campaign=new-jobs3">Senior Systems Engineer - Python & "Go" (lang)</a>
<br/><br/>
Please reply with an updated resume or apply directly through the link if it sounds like a good match. 
<br/><br/>
If you’re not looking and know someone else who may be a fit, feel free to forward them this email. 
<br/><br/>
Looking forward to your response!
<br/><br/>
Best regards,
<br/><br/>
Cameron
<br/><br/>
<font size="2" style="font-family: arial; color: #FF8000; font-weight: bold;">
    Cameron.Carlton
</font>

<font size="2" style="font-family: arial;">
| Executive Recruiter | CyberCoders
</font>

<br />

<font size="2" style="font-family: arial;">
 | Follow Us: 
</font>
	
<a href="https://www.facebook.com/CyberCoders/

Nice. Now we've got the email shown at the HTML-level, which is just one level below how we, as humans, read them. Certainly a more understandable form than a url-safe base64-encoded string of gibberish. Let's see what visualizing w/ the indented structure in place tells us about the struture of this email.

In [134]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(body_data, 'html.parser')
# Print the first 30 lines
print('\n'.join(soup.prettify().split('\n')[:30]))

<html>
 <head>
  <title>
   Now’s a great time for a career change
  </title>
 </head>
 <body>
  Hi Jeremy,
  <br/>
  <br/>
  I’ve got a great role you may be interested in:
  <a href="http://www.CyberCoders.com/qc.aspx?posId=CC6-1289537&amp;ad=CS2Cameron.Carlton&amp;utm_source=candidate&amp;utm_medium=email&amp;utm_campaign=new-jobs3">
   Senior Systems Engineer - Python &amp; "Go" (lang)
  </a>
  <br/>
  <br/>
  Please reply with an updated resume or apply directly through the link if it sounds like a good match.
  <br/>
  <br/>
  If you’re not looking and know someone else who may be a fit, feel free to forward them this email.
  <br/>
  <br/>
  Looking forward to your response!
  <br/>
  <br/>
  Best regards,
  <br/>
  <br/>
  Cameron
  <br/>


In [136]:
print("All the URLs")
for link in soup.find_all('a'):
    print(link.get('href'))
    
print("All the text")
print(soup.get_text())

All the URLs
http://www.CyberCoders.com/qc.aspx?posId=CC6-1289537&ad=CS2Cameron.Carlton&utm_source=candidate&utm_medium=email&utm_campaign=new-jobs3
https://www.facebook.com/CyberCoders/
http://www.linkedin.com/company/cybercoders/cybercoders-recruiting-451541/product
http://www.twitter.com/CyberCoders
http://www.cybercoders.com/jobs/by-recruiter/Cameron.Carlton/?utm_source=signature&utm_medium=email&utm_content=recruiterbio&utm_campaign=new-jobs
http://www.cybercoders.com/?utm_source=candidate&utm_medium=email&utm_campaign=new-jobs
mailto:cameron.carlton.b1@cybercoders.com
http://www.cybercoders.com/unsubscribe/byemail?utm_source=candidate&utm_medium=email&utm_campaign=new-jobs
All the text


Now’s a great time for a career change



Hi Jeremy,

I’ve got a great role you may be interested in: Senior Systems Engineer - Python & "Go" (lang)

Please reply with an updated resume or apply directly through the link if it sounds like a good match. 

If you’re not looking and know someone els

Al-_right_, it looks like the text from the email will serve to get started. Let's grab every email, store it in our table, and strip just the text from it.

In [143]:
print("Columns before email body retrieval: {}".format(df.columns))

Columns before email body retrieval: Index(['id', 'threadId', 'labelId', 'labelName'], dtype='object')


In [165]:
# Add an empty email column
none_col = pd.Series(None for x in range(len(tmpdf)))
df = df.assign(email=none_col)

In [191]:
# Retrieve each email and store in the 'email' column of our dataframe
import time
start = time.perf_counter()
for i, row in df.iterrows():
    df.set_value(i, 'email', get_message(service, row['id']))
print("Retrieved {:d} emails in {:.3f} seconds".format(len(df), time.perf_counter() - start))

Retrieved 151 emails in 25.865 seconds


In [368]:
# Let's save our work so far.

df.to_json('emails.json')
!ls -lh emails.csv

-rw-r--r-- 1 jovyan users 3.4M Jun 11 23:31 emails.csv


In [369]:
!head -n 1 emails.csv

,id,threadId,labelId,labelName,email


In [394]:
df.loc[:, 'email'].apply(print)

{'historyId': '2867343', 'payload': {'body': {'size': 3441, 'data': 'PGh0bWw-CjxoZWFkPgoJPHRpdGxlPk5vd-KAmXMgYSBncmVhdCB0aW1lIGZvciBhIGNhcmVlciBjaGFuZ2U8L3RpdGxlPgo8L2hlYWQ-Cgo8Ym9keT4KCkhpIEplcmVteSwKPGJyLz48YnIvPgpJ4oCZdmUgZ290IGEgZ3JlYXQgcm9sZSB5b3UgbWF5IGJlIGludGVyZXN0ZWQgaW46IDxhIGhyZWY9Imh0dHA6Ly93d3cuQ3liZXJDb2RlcnMuY29tL3FjLmFzcHg_cG9zSWQ9Q0M2LTEyODk1MzcmYWQ9Q1MyQ2FtZXJvbi5DYXJsdG9uJnV0bV9zb3VyY2U9Y2FuZGlkYXRlJnV0bV9tZWRpdW09ZW1haWwmdXRtX2NhbXBhaWduPW5ldy1qb2JzMyI-U2VuaW9yIFN5c3RlbXMgRW5naW5lZXIgLSBQeXRob24gJiAiR28iIChsYW5nKTwvYT4KPGJyLz48YnIvPgpQbGVhc2UgcmVwbHkgd2l0aCBhbiB1cGRhdGVkIHJlc3VtZSBvciBhcHBseSBkaXJlY3RseSB0aHJvdWdoIHRoZSBsaW5rIGlmIGl0IHNvdW5kcyBsaWtlIGEgZ29vZCBtYXRjaC4gCjxici8-PGJyLz4KSWYgeW914oCZcmUgbm90IGxvb2tpbmcgYW5kIGtub3cgc29tZW9uZSBlbHNlIHdobyBtYXkgYmUgYSBmaXQsIGZlZWwgZnJlZSB0byBmb3J3YXJkIHRoZW0gdGhpcyBlbWFpbC4gCjxici8-PGJyLz4KTG9va2luZyBmb3J3YXJkIHRvIHlvdXIgcmVzcG9uc2UhCjxici8-PGJyLz4KQmVzdCByZWdhcmRzLAo8YnIvPjxici8-CkNhbWVyb24KPGJyLz48YnIvPgo8Zm9udCBzaXplPSI

0      None
1      None
2      None
3      None
4      None
5      None
6      None
7      None
8      None
9      None
10     None
11     None
12     None
13     None
14     None
15     None
16     None
17     None
18     None
19     None
20     None
21     None
22     None
23     None
24     None
25     None
26     None
27     None
28     None
29     None
       ... 
121    None
122    None
123    None
124    None
125    None
126    None
127    None
128    None
129    None
130    None
131    None
132    None
133    None
134    None
135    None
136    None
137    None
138    None
139    None
140    None
141    None
142    None
143    None
144    None
145    None
146    None
147    None
148    None
149    None
150    None
Name: email, dtype: object

In [370]:
# And let's reload from disk - which *also* makes for a convenient checkpoint
# for resuming this work!
df = pd.read_csv('emails.json', index_col=0)
df.head()

Unnamed: 0,id,threadId,labelId,labelName,email
0,154e57999204ba74,154e57999204ba74,Label_62,job-offers/yes,"{'historyId': '2867343', 'payload': {'body': {..."
1,154e43cfbc28473a,154e43cfbc28473a,Label_62,job-offers/yes,"{'historyId': '2867416', 'payload': {'body': {..."
2,154dfeff4602a39c,154dfeff4602a39c,Label_62,job-offers/yes,"{'historyId': '2867731', 'payload': {'body': {..."
3,154bf8d38e70bcd1,154bf8d38e70bcd1,Label_62,job-offers/yes,"{'historyId': '2845012', 'payload': {'body': {..."
4,154ab4aa9b3a3328,154ab4aa9b3a3328,Label_62,job-offers/yes,"{'historyId': '2846609', 'payload': {'body': {..."


In [384]:
def decode_email(string):
    return base64.urlsafe_b64decode(string).decode()


# Add a 'text' column to our dataframe w/ just the email's body text
def extract_email_body(row):
    print(type(row))
    try:
        body = row.email['payload']['body']['data']
    except KeyError as e:
        print("{} not found for msg ID {}".format(e, row['id']))
        return None
    data = decode_email(body)
    return BeautifulSoup(data, 'html.parser').get_text()

In [386]:
df.apply(extract_email_body, axis=1)

<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>


TypeError: ('string indices must be integers', 'occurred at index 0')

Oh, goddammit. I'm got to save you the time of troubleshooting and skip to the chase: Not every email we got conveniently stores its message body under `email.payload.body.data`, as those with the mime-type text/html or text/plain do. 

In [323]:
# Groupby iterates over the row indices, which we use to perform a lookup.
# Open to better ways of doing this, as I find it unsatisfying
df.groupby(lambda x: df.ix[x]['email']['payload']['mimeType']).count()

Unnamed: 0,id,threadId,labelId,labelName,email,text
multipart/alternative,40,40,40,40,40,0
multipart/mixed,69,69,69,69,69,0
multipart/related,27,27,27,27,27,0
text/html,13,13,13,13,13,13
text/plain,2,2,2,2,2,2


In [331]:
# Sigh. Let's filter out all the missing text fields.
df[df.text.isnull()]

Unnamed: 0,id,threadId,labelId,labelName,email,text
1,154e43cfbc28473a,154e43cfbc28473a,Label_62,job-offers/yes,"{'historyId': '2867416', 'payload': {'body': {...",
2,154dfeff4602a39c,154dfeff4602a39c,Label_62,job-offers/yes,"{'historyId': '2867731', 'payload': {'body': {...",
3,154bf8d38e70bcd1,154bf8d38e70bcd1,Label_62,job-offers/yes,"{'historyId': '2845012', 'payload': {'body': {...",
4,154ab4aa9b3a3328,154ab4aa9b3a3328,Label_62,job-offers/yes,"{'historyId': '2846609', 'payload': {'body': {...",
5,15487ec6f07dc487,1548799c2c39ebf3,Label_62,job-offers/yes,"{'historyId': '2850721', 'payload': {'body': {...",
6,1548799c2c39ebf3,1548799c2c39ebf3,Label_62,job-offers/yes,"{'historyId': '2850721', 'payload': {'body': {...",
7,1548167c8e780133,154815291dd58008,Label_62,job-offers/yes,"{'historyId': '2851221', 'payload': {'body': {...",
8,154815291dd58008,154815291dd58008,Label_62,job-offers/yes,"{'historyId': '2851221', 'payload': {'body': {...",
9,1545d9035e8d41d0,1545d9035e8d41d0,Label_62,job-offers/yes,"{'historyId': '2829134', 'payload': {'body': {...",
11,1543e72bc882b878,1543e72bc882b878,Label_62,job-offers/yes,"{'historyId': '2816528', 'payload': {'body': {...",
