The Data sources are:

### 1. Liam's SPSS coded data
**File:** The Loop 2017 Final Interventions.xlsx

Exported as Excel from SPSS, keeping the variable names.

This file contains 1325 entries.


27 have null festival or sample numbers so can't be used, leaving 1298


One has sample number 12151, two have sample number 0 - these cannot be merged.


This leaves 1295 - all of which can be merged

### 2. Guy's cleaned up lab data
**File:** Loop 2017 Lab fixed data.xlsm

From: Dropbox/Testing/2017 results processing/Loop 2017 Lab fixed data.xlsm


Data is in the ‘Raw Lab Data’ sheet


This file contains 2544 entries


1900 entries start with F


621 entries begin with A (amnesty) so can't be merged


23 Begin with W? so can't be merged


Entry SGP2017 F0465 needs editing as 'Client gender' is FemaleaMalee 

### 3. Boomtown Intervention Questionnaire
**File:** BTReport 2017 - Form responses 3.csv

Exported from: https://docs.google.com/spreadsheets/d/15pdETY0HK-VbBcV-N0swt6ZrRBbeDnZR5RGDzfq95dg

This file contains 194 entries

### Merging the data

Merging the data on Festival and SampleNumber resulted in 1295 entries



In [1]:
# Module imports
import os
import numpy as np
import pandas as pd

In [2]:
spssdata = '/opt/random/The Loop 2017 Final Interventions.xlsx'

spss_df = pd.read_excel(spssdata)

# Change festival names
spss_df['Festival'].replace(['BoomTown', 'KC', 'SGP'], ['BT2017', 'KC2017', 'SGP2017'], inplace=True)

# Ensure all Sample numbers are consistent
# 1. Delete any rows where SampleNumber or Festival is NA as we can't do anything with it
spss_df.dropna(subset=['SampleNumber', 'Festival'], inplace=True)

# 2. Make all sample numbers a 4-digit code starting with F
spss_df['SampleNumber'] = spss_df['SampleNumber'].apply(lambda x: 'F{:04d}'.format(int(x)))

# Combine date and time columns into new single column
spss_df['Date'] = pd.to_datetime(spss_df['Date']) # Convert Date to datetime object
spss_df['Date & Time of intervention'] = spss_df.apply(lambda r : pd.datetime.combine(r['Date'], r['Time']), 1)

# Remove Day, Date, Time and SurveyID columns
spss_df.drop(['Day', 'Date', 'Time', 'SurveyID'], axis=1, inplace=True)

# Below shows we are left with 1298 datasets
print(len(spss_df))

1298


In [3]:
labdata = '/opt/random/Loop 2017 Lab fixed data.xlsm'
lab_df = pd.read_excel(labdata, sheet_name='Raw LabData')

# Remame 'Event Name' and 'Sample Number' columns so they match
lab_df.rename(columns={'Event  Name': 'Festival', 'Sample Number': 'SampleNumber'}, inplace=True)

# Delete any rows where SampleNumber or Festival is NA as we can't do anything with it
lab_df.dropna(subset=['SampleNumber', 'Festival'], inplace=True) # This just drops one case

# Uppercase all sample numbers
labels = ['SampleNumber']
lab_df.loc[:, labels] = lab_df[labels].apply(lambda x: x.str.upper())

# Some sample numbers begin with W or F 
#print(len(lab_df[ ~ (lab_df['SampleNumber'].str.startswith('F') | lab_df['SampleNumber'].str.startswith('A')) ]))

In [4]:
dft = pd.merge(spss_df, lab_df, how='inner', on=['Festival','SampleNumber'])
print("%d entries were merged" % len(dft))

# For checking which entries can't be merged - check for right_only
#pd.merge(lab_df, spss_df, how='outer', indicator=True)

# Sort first by Festival, then SampleNumber
dft.sort_values(['Festival', 'SampleNumber'], ascending=True, inplace=True)

# Here we reorder columns that should be identical to:
# 1. spot data errors
# 2. remove duplicate columns once we're happy data is consistent
prefix_cols = ['Festival', 'SampleNumber',
             'Sample submission time', 'Date & Time of return', 'Date & Time of intervention', 
             'Client age', 'Age', 'Client gender', 'Gender', 'Bought as', 'SubmittedSubstanceAs']

# Get the list of columns excluding the ones in prefix_cols
cols = [c for c in dft.columns.tolist() if c not in prefix_cols]
# Prepend prefix_cols to create the new list
cols = prefix_cols + cols
# Reorder columns
dft = dft[cols]

# Uppercase all genders for consistency
labels = ['Client gender', 'Gender']
dft.loc[:, labels] = dft[labels].apply(lambda x: x.str.upper())
# Set any MISSING to be nan
dft.loc[:, labels] = dft.loc[:, labels].replace({'MISSING':np.nan})

# Dump to excel
writer = pd.ExcelWriter('merged.xlsx')
dft.to_excel(writer, 'MergedData', index=False)
writer.save()

1295 entries were merged


In [5]:
# # See which non-na ages don't match
# # 540 entries have valid ages
# print(len(dft))
# df = dft[pd.notnull(dft['Client age']) & pd.notnull(dft['Age'])]
# print(len(df))
# df = df[df['Client age'] != df['Age']]
# # 127 don't match
# print(len(df))
# # df.to_csv('foo.csv')


# Cross tab 'Client gender' and 'Gender
#print(dft['Client gender'].unique())
#print(dft['Gender'].unique())

# # Look where they don't match
# df = dft[pd.notnull(dft['Client gender']) & pd.notnull(dft['Gender'])]
# df = df[df['Client gender'] != df['Gender']]
# # 127 don't match
# print(len(df))
# df.to_csv('foo.csv')



### Strategy for merging Google Forms interventions with SPSS data

<pre>
cocaine_ever	Ever had this drug 
cocaine_year	Ever had this drug in the past year
cocaine_month	Ever had this drug in the past month
cocaine_week	Ever had this drug in the past week
cocaine_yesterday	Ever had this drug yesterday
cocaine_today	Ever had this drug today
cocaine_tonight	Are you having this drug tonight
cocaine_today_tonight	Have they had this drug today or will they have it tonight?


Google sheets also has:

2-CB
Amphetamine (speed)
Codeine
Valium or other benzodiazepines

These are not in Liam's set

Key to Liam's Data
Legal:	balloons, poppers, spice, other legal highs
core drugs:	cannabis, cocaine, ecstasy, mdma, ketamine, mephodrone, speed, heroin
polgydrug:	2+ illegal drugs
polysubstace:	2+ illegal drigs and usual alcohol frequency

FriendsPresent	Where there friends present when the sample was given in?
ConsumedAlcohol	Has the respondent consumed alcohol?
UnitsConsumed	How many units?
PrescribedDrugs	Are they on any prescribed medication?
OverTheCounter	Are they taking any over the counter medication?
ConcernsWithCurrentFeelings	Do they have any current concerns with how they're feeling at the moment?
WhatConcerns	If so, what concerns?
SubmittedSubstanceAs	What has the substance been submitted as?
Obtained	Where have the drugs been obtained?
EverHadSubstance	Have they ever taken the drug before?
WhereAndWhen	If so, when and where?
NegativeExperieces	Have you or anyone you know ever had any negative experiences taking this substance?
ConsumedFromBatchAlready	Have they consumed from this batch already?
PriorConcerns	Do you have any prior conerns about this particular sample from this particular batch?
Why	If yes, what concerns?
AccessedSupportBefore	Have they accessed support from a treatment service for your alcohol or drug use?
WantFurtherAdvice	After our conversation today, would you like to have any further advice or support from a treatment service for your alcohol or drug use?
a	I will ask the loop to safely dispose of the further substandes in my possession
b	I will throw them away
c	I will take a smaller amount of it
d	I will take a larger amount of it
e	I will take the same amount as usual
f	I will take it over a longer time period
g	I will be more careful about mixing it with other substances
h	I will give it away
i	I will sell it
j	I will obtain more on site
k	I will warn my friends
l	I will warn others via social media and public websites
m	I will tell my dealer
n	I will return it to my dealer
o	I will ask for a refund from my dealer
p	I will go to another dealer
q	I will keep it to take it elsewhere after the festival
r	I will do something else
    

    

Map Columns in SPSS to Google Forms
d = {
'Festival' : 
'SampleNumber' : 'Sample Number',
'FriendsPresent' : 'Number of friends present with primary respondent',
'Gender' : 'Gender of primary respondent',
'Ethnicity' : 'Ethnicity',
'Ethnictiy_other' : 
'Age' : 'Age',
'ConsumedAlcohol' : 'Have you had any alcohol to drink today?',
'UnitsConsumed' : 
'BeerCider' : 'How much beer have you had today?', ADD 'How much cider have you had today?',
'Spirits' : 'How much spirits have you had today?',
'Wine' : 'How much wine have you had today?',
'Alcopops' : 'How many alcopops have you had today?',
'PrescribedDrugs' : 'Are you currently taking any prescribed medication?',
'OverTheCounter' : 'Are you currently taking any "Over the Counter" medication?',
'ConcernsWithCurrentFeelings' : 'Do you have any concerns about how you are feeling at the moment?',
'WhatConcerns' : THIS FROM ABOVE
'SubmittedSubstanceAs' : 'You submitted a substance of concern for analysis, what do you believe it to be?',
'As_expected' : 
'Obtained' : 'Where did you obtain the sample?',
'EverHadSubstance' : 'Very roughly, how often do you use this drug?',
'WhereAndWhen' : 'When did you first use this batch?',
'NegativeExperieces' : 'Have you or anyone you know ever had negative experiences taking this substance?',
'ConsumedFromBatchAlready' : 'How many times have you used this batch?',
'PriorConcerns' : 'Do you have any concerns about using this sample from this batch or any other concerns about the result?',
'Why' : FROM ABOVE
'AccessedSupportBefore' : 'Have you ever accessed a treatment service for your alcohol or drug use?',
'WantFurtherAdvice' : 'After our conversation today, would you like to have any further advice or support from a treatment service for your alcohol or drug use?
',
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r

    
Need to factor below 2 questions into a-r:
After hearing today’s test results and harm reduction advice from The Loop, what do you plan to do with the sample?
What other actions will you do?
    
    
Below are in Google Forms but not in SPSS
Volunteer Name
When was the last time you used this service?
What was your first sample number at this event? Did you take a photo or keep the ticket?
Have you had any other legal or illegal drugs today?
Have you ever taken any other drugs I didn't mention?
Which drugs have you used? [Non-prescribed opiods]
Are you planning to take any of these drugs later?

</pre>



In [5]:
#
# Attempt to disentangle the Form responses into a form they can be merged with Liam's data
#
bt_interventions = '/opt/random/BTReport 2017 - Form responses 3.csv'
date_cols = ['Timestamp']
bt_df = pd.read_csv(bt_interventions, engine="python", parse_dates=date_cols)

def add_columns(df, drug, dmap):
    """For a given column, separate it into a form to match Liam's SPSS dataset, delete
    the parent column and append the new columns created to the dataset
    """
    
#     l = ['cannabis', 'cocaine', 'ecstasy', 'mdma', 'ketamine', 'lsd', 'nitrous_oxide', 'mushrooms', 
#          'Mephedrone', 'spice_legals', 'unknown_powder', 'other', 'any', 'any_legal', 'core', 
#          'polydrug', 'polysubstance']

    assert drug in dmap.keys()
    
    # Create a DataFrame that holds the columns containing the drug usage info for this drug
    column = 'Which drugs have you used? [{}]'.format(drug)
    prefix = dmap[drug] 
    categories = ['ever', 'year', 'month', 'week', 'yesterday', 'today', 'tonight', 'today_tonight']
    keys = [ "{0}_{1}".format(prefix, c) for c in categories ]
    values = list(zip(*df[column].apply(parse_cell)))
    data_dict = dict(zip(keys, values))
    df_tmp = pd.DataFrame(data_dict, columns=keys)
    
    # Remove the Google Forms column from the parent DataFrame
    df.drop([column], axis=1, inplace=True)
    
    # Add the new columns to the parent
    df = pd.concat([df, df_tmp], axis=1, join_axes=[df.index])
    return df

# c = 'Never had, Had today, (Probably) planning later'
# c = 'Had in last month, Had in last year'
# c = np.nan
def parse_cell(cell):
    """Parse cell containing the response of drug use question from the google forms intervention.
    Return Yes/No for:
    ['ever', 'year', 'month', 'week', 'yesterday', 'today', 'tonight', 'today_tonight']
    """
    form_responses = ['(Probably) planning later', 'Had today', 'Had yesterday', 'Had in last week', 'Had in last month', 'Had in last year', 'Had in my life']
    len_responses = len(form_responses) 
    if isinstance(cell, float) and np.isnan(cell): # If Nan assume all No's
        return ['No'] * (len_responses + 1) # Need to add one result has today_tonight value
    
    values = [v.strip() for v in cell.split(',')]
    flags = [True] * len_responses
    # As everything is set true we set false for any we don't see and break as soon as we encounter a value
    # as once a respondant has said yes to (e.g.) week, month, year and ever are True
    for i, response in enumerate(form_responses):
        if response in values:
            break
        flags[i] = False
    
    # Add 'today_tonight'
    today = 0
    tonight = 1
    today_tonight = False
    if flags[today] or flags[tonight]:
        today_tonight = True
    flags = [today_tonight] + flags # Prepend today_tonight value
    
    # Reverse list as columns are in opposite order
    flags.reverse()
    return list(map(lambda x: 'Yes' if x else 'No', flags))

dmap = { 'Cannabis' : 'cannabis',
      'Cocaine' : 'cocaine',
      'Ecstasy pills' : 'ecstasy',
      'Nitrous (NOS, laughing gas)' : 'nitrous_oxide',
      'MDMA crystal/powder' : 'mdma',
      'Ketamine' : 'ketamine',
      'Magic mushrooms' : 'mushrooms',
      'LSD' : 'lsd',
      'Mephedrone (M-Cat)' : 'Mephedrone',
      'Synthetic cannabinoids ("Spice")' : 'spice_legals',
      'A powder which I had no idea what it was' : 'unknown_powder',
    }

for drug in dmap.keys():
    bt_df = add_columns(bt_df, drug, dmap)

bt_df.to_csv('foo.csv')
print("DONE")
# df = pd.DataFrame(bt_df[clabel])
# #print(df)
# #df.pivot(index='date', columns='variable', values='value')
# df = df.pivot(columns=clabel, values=clabel)
# # Remove any nan columns and the label we've used
# df.drop(np.nan, axis=1, inplace=True)

# print(df)

DONE


In [76]:
# Attempt to refactor calculation of drug frequency columns
def includes_frequency(cell, period):
    "Return boolean indicating if this cell contains frequencies >= period"
    if isinstance(cell, float) and np.isnan(cell):
        return False # nan's is considered not having the value
    
    periods = ['ever', 'year', 'month', 'week', 'yesterday', 'today', 'tonight']
    form_responses = ['Had in my life', 'Had in last year', 'Had in last month', 'Had in last week', 
                      'Had yesterday', 'Had today', '(Probably) planning later']
    
    assert period in periods, "Invalid period: {0}".format(period)
    
    values = [v.strip() for v in cell.split(',')]   
    idx = periods.index(period)
    # check if any of the periods >= this have been checked
    for i in range(idx, len(periods) - 1):
        if form_responses[i] in values:
            return True
    return False

def get_value(column, period):
    "Return boolean Series indicating if this columns contains frequencies >= period"
    result = None
    if period == 'today_tonight':
        today = column.apply(includes_frequency, period='today')
        tonight = column.apply(includes_frequency, period='tonight')
        result = today | tonight
    else:
        result = column.apply(includes_frequency, period=period)
    return result

df = pd.DataFrame({'Which drugs have you used? [Cannabis]' : ['Never had, Had today, (Probably) planning later',
                                                              'Had in last month, Had in last year', np.nan],
                    'Which drugs have you used? [Cocaine]' : [np.nan,
                                                              'Had in last month, Had in last year',
                                                               'Never had, Had today, (Probably) planning later']
                   })
# column = df['Which drugs have you used? [Cannabis]']
# for period in ['ever', 'year', 'month', 'week', 'yesterday', 'today', 'tonight', 'today_tonight']:
#     column_name = 'cocaine_{}'.format(period)
#     df[column_name] = get_value(column, period)
# print(df)

c1 = df['Which drugs have you used? [Cannabis]']
c2 = df['Which drugs have you used? [Cocaine]']
for period in ['ever', 'year', 'month', 'week', 'yesterday', 'today', 'tonight', 'today_tonight']:
    column_name = 'core_{}'.format(period)
    df[column_name] = get_value(c1, period) | get_value(c2, period)
print(df)

             Which drugs have you used? [Cannabis]  \
0  Never had, Had today, (Probably) planning later   
1              Had in last month, Had in last year   
2                                              NaN   

              Which drugs have you used? [Cocaine]  core_ever  core_year  \
0                                              NaN       True       True   
1              Had in last month, Had in last year       True       True   
2  Never had, Had today, (Probably) planning later       True       True   

   core_month  core_week  core_yesterday  core_today  core_tonight  \
0        True       True            True        True         False   
1        True      False           False       False         False   
2        True       True            True        True         False   

   core_today_tonight  
0                True  
1               False  
2                True  
