# CalcMTurkDiffs #
<br>

**Summary:** Calculate difference in perceptions between Amazon Mechanical Turk (MTurk) participants, stratified by geography and demographics <br>
**Author:** Andrew Larkin <br>
**Date Created:** Aug 2nd, 2023 <br>
**Affiliation:** Oregon State University, College of Health

In [1]:
import pandas as ps

In [2]:
PARENT_FOLDER = 'insert absolute filepath to folder where files are stored'
image_meta = ps.read_csv(PARENT_FOLDER + "meta_link.csv")
print(image_meta.head())
print(image_meta.count()[0])

                     image_id  psp_code  gis_code
0   wKku7JX9oLkYwqUh__KSKQ_81     33344      2312
1  2Yx-Q0olEsjx65z-xE8Wog_163     42434      2311
2  ZH1fSO2nyQbRXznFoXaVKw_113     44213      2312
3   da_ZMkAiRJF7CgGj4Z-qTQ_41     12433      2311
4  wksq-PLNXyuua30xVXy2SQ_110     21413      2312
49145


In [3]:
mturk = ps.read_csv(PARENT_FOLDER + "mturk_May11_21.csv")
print(mturk.head())
print(mturk.count()[0])
print(mturk.groupby('urban_compare').count())

                        l_img                       r_img  vote  vote_norm  \
0  ZKdv7hpLZQqgmBfn7Ssesg_325  ZBIOyKV2TwDW92xDn2099A_186     1 -30.083333   
1  AKZQPDCGlaRgrb1ycEv-sQ_312  woAr_Xst6xKbUoTy9PaW3g_226    99  25.083333   
2  VCpz7BUH4mn92HPhVEp9cg_318  fvTSslWM2jmNcEnBzLVQ_Q_149    71  19.716049   
3   etX1pIb6ALZyhC0BihdfHw_30  Zg4Lu80vPCV8HbQ4FcGtIg_181     1 -33.716049   
4   6taBIQR0cErHoITbGQyq0A_90  jJI22KXOnXuSUDiJe5MQ4w_177    69  26.694444   

        label  geo1  geo2  urban_compare  road_compare  angle_compare  ...  \
0      beauty    17     3              3             1              1  ...   
1   safe_walk    11     1              3             4              0  ...   
2      nature    10    10              3             4              0  ...   
3  safe_crime     5     9              1             3              1  ...   
4  safe_crime     4     4              5             4              0  ...   

   ethnicity race  education covid  green_compare built_compar

### given a 4 digit classification code, extract the urban classification value from the second digit
**Inputs:**
- **code** (int) - 4 digit classification code

**Outputs:**
- **urbanNum** (int) - number indicating whether the code corresponds to an urban center, urban cluster, or other location

In [4]:
def getUrbanFromCode(code):
    urbanNum = str(code)[1]
    return(urbanNum)

### given a 4 digit classification code, extract the census division number from the first digit ###
**Inputs:**
- **code** (int) - 4 digit classification code

**Outputs:**
- **division** (int) - number indicating which US census division the code corresponds to

In [5]:
def getDivisionFromCode(code):
    division = str(code)[0]
    return(division)

### extract all individual classifications from 4 digit classification codes ###
**Inputs:**
- **imgId** (string) - unique identifier for a Google Street View image
- **pspCode** (int) - 5 digit classification code of PSPNet quartiles
- **gisCode** (int) - 4 digit classification code of GIS properties
- **score** (float) - perception score, ranging from -50 to 50
- **userInfo** (pandas dataframe) - contains demographic information about the participant who provided the score

**Outputs:**
- **df** (pandas dataframe) - contains all individual classification extracted from the demographic dictionary and classification codes

In [6]:
def getPropertiesForImg(imgId,pspCode,gisCode,score,userInfo):
    urban = int(str(gisCode)[1])
    division = int(str(gisCode)[0])
    road = int(str(gisCode)[2])
    green = int(str(pspCode)[0])
    built = int(str(pspCode)[1])
    v_road = int(str(pspCode)[2])
    access = int(str(pspCode)[3])
    trees = int(str(pspCode)[4])
    df = ps.DataFrame({
        'image_id':imgId,
        'urban':[urban],
        'division':[division],
        'road':[road],
        'green':[green],
        'built':built,
        'v_road':v_road,
        'trees':trees,
        'access':access,
        'h_division':userInfo['hit_division'],
        'age':userInfo['age'],
        'gender':userInfo['gender'],
        'ethnicity':userInfo['ethnicity'],
        'race':userInfo['race'],
        'education':userInfo['education'],
        'covid':userInfo['covid'],
        'score':score
    })
    return(df) 

### create a new categorical level for instances where image comparisons are between the same classification level
**Inputs:**
- combined (pandas dataframe) - data extracted from a single image comparison

**Outputs:**
- input dataset, where classifications coded -1 if both images in the comparison contained the same classification level

In [7]:
def screenCombined(combined):
    if(len(list(set(combined['urban'])))==1):
        combined['urban']=-1
    if(len(list(set(combined['division'])))==1):
        combined['division']=-1
    if(len(list(set(combined['road'])))==1):
        combined['road']=-1
    if(len(list(set(combined['green'])))==1):
        combined['green']=-1
    if(len(list(set(combined['built'])))==1):
        combined['built']=-1
    if(len(list(set(combined['v_road'])))==1):
        combined['v_road']=-1
    if(len(list(set(combined['trees'])))==1):
        combined['trees']=-1
    if(len(list(set(combined['access'])))==1):
        combined['access']=-1
    return(combined)

### given a vote for an image comparison, assign scores to each of the 2 images in the comparison.  Winning images gets a positive magnitue score, losing image gets a negative magnitude score
**Inputs:**
- **vote** (float) - vote value.  <0: the left image wins.  >0: right image wins
- **meta** (pandas dataframe) - metadata about the left and right image, including classification codes

**Outputs:**
- **combined** (pandas dataframe) - contains 2 rows, with row 1 corresponding to the left image and row 2 the right image

In [93]:
def convertVoteToImageScores(vote,meta):
    leftImgVals = meta[meta['image_id']==vote['l_img']]
    rightImgVals = meta[meta['image_id']==vote['r_img']]
    leftVote = vote['vote_norm']*-1
    rightVote = vote['vote_norm']
    leftVals = getPropertiesForImg(
        vote['l_img'],
        leftImgVals['psp_code'].iloc[0],
        leftImgVals['gis_code'].iloc[0],
        leftVote,
        vote
    )
    rightVals = getPropertiesForImg(
        vote['r_img'],
        rightImgVals['psp_code'].iloc[0],
        rightImgVals['gis_code'].iloc[0],
        rightVote,
        vote
    )
    combined = ps.concat([leftVals,rightVals])
    combined = screenCombined(combined)
    return(combined)

### combine all non-binary gender specifications into an Other category.  Necessary for statistical power ###
**Inputs:**
- **inData** (pandas dataframe) - contains participant demographics, including self-reported gender

**Outputs:**
- **df** (pandas dataframe) - input data, with non-binary gender specification recoded as 'Other'

In [9]:
def redefineGender(inData):
    maleData = inData[inData['gender']=='Male']
    femaleData  = inData[inData['gender']=='Female']
    otherData = inData[~inData['gender'].isin(['Male','Female'])]
    otherData['gender'] = 'Other'
    df = ps.concat([maleData,femaleData,otherData])
    return(df)

### combine all other race specifications into an OT category.  Necessary for statistical power ###
**Inputs:**
- **inData** (pandas dataframe) - contains participant demographics, including self-reported race

**Outputs:**
- **df** (pandas dataframe) - input data, with other race specifications recoded as 'OT'

In [10]:
def redefineEthnicity(inData):
    mainData = inData[inData['race'].isin(['WH','AI','AS','BL','NH'])]
    otherData = inData[~inData['race'].isin(['WH','AI','AS','BL','NH'])]
    otherData['race']= 'OT'
    df = ps.concat([mainData,otherData])
    return(df)

### calculate summary statistics for each urbanicity level, stratified by a category of interest ###
**Inputs:**
- **inData** (pandas dataframe) - contains image scores and urban classifications 
- **cat** (string) - category to stratify summary statistics by
- **val** (int) - specific value of the stratified category to calulcate summary statistics for

**Outputs:**
- **mean** (pandas dataframe) - contains mean, std, and n for all urbanicity levels of the stratified analysis

In [11]:
def calcPerfByUrban(inData,cat,val):
    mean = inData.groupby('urban',as_index=False)['score'].mean()
    count = inData.groupby('urban',as_index=False)['score'].count()
    std = inData.groupby('urban',as_index=False)['score'].std()
    mean['cat'] = cat
    mean['val'] = val
    mean['urban'] = ['NA','urban','suburban','rural']
    mean['std'] = std['score']
    mean['n'] = count['score']
    return(mean)

### calculate summary statistics for each road level, stratified by a category of interest ###
**Inputs:**
- **inData** (pandas dataframe) - contains image scores and road classifications 
- **cat** (string) - category to stratify summary statistics by
- **val** (int) - specific value of the stratified category to calulcate summary statistics for

**Outputs:**
- **mean** (pandas dataframe) - contains mean, std, and n for all road levels of the stratified analysis

In [12]:
def calcPerfByRoad(inData,cat,val):
    mean = inData.groupby('road',as_index=False)['score'].mean()
    count = inData.groupby('road',as_index=False)['score'].count()
    std = inData.groupby('road',as_index=False)['score'].std()
    mean['cat'] = cat
    mean['val'] = val
    mean['road'] = ['NA','primary','secondary','residential']
    mean['std'] = std['score']
    mean['n'] = count['score']
    return(mean)

### calculate summary statistics for each US census division, stratified by a category of interest ###
**Inputs:**
- **inData** (pandas dataframe) - contains image scores and census division 
- **cat** (string) - category to stratify summary statistics by
- **val** (int) - specific value of the stratified category to calulcate summary statistics for

**Outputs:**
- **mean** (pandas dataframe) - contains mean, std, and n for all census divisions of the stratified analysis

In [13]:
def calcPerfByHit(inData,cat,val):
    mean = inData.groupby('division',as_index=False)['score'].mean()
    count = inData.groupby('division',as_index=False)['score'].count()
    std = inData.groupby('division',as_index=False)['score'].std()
    mean['cat'] = cat
    mean['val'] = val
    mean['division'] = [
        'NA','New England','Middle Atlantic','East North Central','West North Central','South Atlantic',
        'East South Central','West South Central','Mountain','Pacific'               
                       ]
    mean['std'] = std['score']
    mean['n'] = count['score']
    return(mean)

### calculate summary statistics for each PSPNet nature quartile, stratified by a category of interest ###
**Inputs:**
- **inData** (pandas dataframe) - contains image scores and nature quartile classifications 
- **cat** (string) - category to stratify summary statistics by
- **val** (int) - specific value of the stratified category to calulcate summary statistics for

**Outputs:**
- **mean** (pandas dataframe) - contains mean, std, and n for all nature quartiles of the stratified analysis

In [14]:
def calcPerfByGreen(inData,cat,val):
    mean = inData.groupby('green',as_index=False)['score'].mean()
    count = inData.groupby('green',as_index=False)['score'].count()
    std = inData.groupby('green',as_index=False)['score'].std()
    mean['cat'] = cat
    mean['val'] = val
    mean['green'] = [
        'NA','Q1','Q2','Q3','Q4'               
                       ]
    mean['std'] = std['score']
    mean['n'] = count['score']
    return(mean)

### calculate summary statistics for each PSPNet built environment quartile, stratified by a category of interest ###
**Inputs:**
- **inData** (pandas dataframe) - contains image scores and built environment quartile classifications 
- **cat** (string) - category to stratify summary statistics by
- **val** (int) - specific value of the stratified category to calulcate summary statistics for

**Outputs:**
- **mean** (pandas dataframe) - contains mean, std, and n for all built environment quartiles of the stratified analysis

In [15]:
def calcPerfByBuilt(inData,cat,val):
    mean = inData.groupby('built',as_index=False)['score'].mean()
    count = inData.groupby('built',as_index=False)['score'].count()
    std = inData.groupby('built',as_index=False)['score'].std()
    mean['cat'] = cat
    mean['val'] = val
    mean['built'] = [
        'NA','Q1','Q2','Q3','Q4'               
                       ]
    mean['std'] = std['score']
    mean['n'] = count['score']
    return(mean)

### calculate summary statistics for each visible road quartile, stratified by a category of interest ###
**Inputs:**
- **inData** (pandas dataframe) - contains image scores and visible road quartile classifications 
- **cat** (string) - category to stratify summary statistics by
- **val** (int) - specific value of the stratified category to calulcate summary statistics for

**Outputs:**
- **mean** (pandas dataframe) - contains mean, std, and n for all visible road quartiles of the stratified analysis

In [16]:
def calcPerfByVRoad(inData,cat,val):
    mean = inData.groupby('v_road',as_index=False)['score'].mean()
    count = inData.groupby('v_road',as_index=False)['score'].count()
    std = inData.groupby('v_road',as_index=False)['score'].std()
    mean['cat'] = cat
    mean['val'] = val
    mean['v_road'] = [
        'NA','Q1','Q2','Q3','Q4'               
                       ]
    mean['std'] = std['score']
    mean['n'] = count['score']
    return(mean)

### calculate summary statistics for each visible tree quartile, stratified by a category of interest ###
**Inputs:**
- **inData** (pandas dataframe) - contains image scores and visible tree quartile classifications 
- **cat** (string) - category to stratify summary statistics by
- **val** (int) - specific value of the stratified category to calulcate summary statistics for

**Outputs:**
- **mean** (pandas dataframe) - contains mean, std, and n for all visible tree quartiles of the stratified analysis

In [17]:
def calcPerfByTrees(inData,cat,val):
    mean = inData.groupby('trees',as_index=False)['score'].mean()
    count = inData.groupby('trees',as_index=False)['score'].count()
    std = inData.groupby('trees',as_index=False)['score'].std()
    mean['cat'] = cat
    mean['val'] = val
    mean['trees'] = [
        'NA','Q1','Q2','Q3','Q4'               
                       ]
    mean['std'] = std['score']
    mean['n'] = count['score']
    return(mean)

### calculate summary statistics for each visible accessibility features quartile, stratified by a category of interest ###
**Inputs:**
- **inData** (pandas dataframe) - contains image scores and visible accessibility features quartile classifications 
- **cat** (string) - category to stratify summary statistics by
- **val** (int) - specific value of the stratified category to calulcate summary statistics for

**Outputs:**
- **mean** (pandas dataframe) - contains mean, std, and n for all visible accessibiilty features quartiles of the stratified analysis

In [18]:
def calcPerfByAccess(inData,cat,val):
    mean = inData.groupby('access',as_index=False)['score'].mean()
    count = inData.groupby('access',as_index=False)['score'].count()
    std = inData.groupby('access',as_index=False)['score'].std()
    mean['cat'] = cat
    mean['val'] = val
    mean['access'] = [
        'NA','Q1','Q2','Q3','Q4'               
                       ]
    mean['std'] = std['score']
    mean['n'] = count['score']
    return(mean)

### calculate summary statistics, stratified by urbanicity and a second stratification category ###
**Inputs:**
- **cat** (string) - second stratification category
- **inData** (pandas dataframe) - contains image scores and urbanicity features

**Outputs:**
- **df** (pandas dataframe) - contains calculated summary statistics

In [19]:
def calcUrbanByCat(cat,inData):
    uniquecats = list(set(inData[cat]))
    outData = []
    for curcat in uniquecats:
        subsetData = inData[inData[cat]==curcat]
        outData.append(calcPerfByUrban(subsetData,cat,curcat))
    df = ps.concat(outData)
    return(df)

### calculate summary statistics, stratified by road type and a second stratification category ###
**Inputs:**
- **cat** (string) - second stratification category
- **inData** (pandas dataframe) - contains image scores and road type features

**Outputs:**
- **df** (pandas dataframe) - contains calculated summary statistics

In [20]:
def calcRoadByCat(cat,inData):
    uniquecats = list(set(inData[cat]))
    outData = []
    for curcat in uniquecats:
        subsetData = inData[inData[cat]==curcat]
        outData.append(calcPerfByRoad(subsetData,cat,curcat))
    df = ps.concat(outData)
    return(df)

### calculate summary statistics, stratified by pariticipant census division and a second stratification category ###
**Inputs:**
- **cat** (string) - second stratification category
- **inData** (pandas dataframe) - contains image scores and participant census division

**Outputs:**
- **df** (pandas dataframe) - contains calculated summary statistics

In [20]:
def calcHitByCat(cat,inData):
    uniquecats = list(set(inData[cat]))
    outData = []
    for curcat in uniquecats:
        subsetData = inData[inData[cat]==curcat]
        outData.append(calcPerfByHit(subsetData,cat,curcat))
    df = ps.concat(outData)
    return(df)

### calculate summary statistics, stratified by visible nature quartile and a second stratification category ###
**Inputs:**
- **cat** (string) - second stratification category
- **inData** (pandas dataframe) - contains image scores and visible nature quartile

**Outputs:**
- **df** (pandas dataframe) - contains calculated summary statistics

In [21]:
def calcGreenByCat(cat,inData):
    uniquecats = list(set(inData[cat]))
    outData = []
    for curcat in uniquecats:
        subsetData = inData[inData[cat]==curcat]
        outData.append(calcPerfByGreen(subsetData,cat,curcat))
    df = ps.concat(outData)
    return(df)

### calculate summary statistics, stratified by visible built environment and a second stratification category ###
**Inputs:**
- **cat** (string) - second stratification category
- **inData** (pandas dataframe) - contains image scores and visible built environment

**Outputs:**
- **df** (pandas dataframe) - contains calculated summary statistics

In [23]:
def calcBuiltByCat(cat,inData):
    uniquecats = list(set(inData[cat]))
    outData = []
    for curcat in uniquecats:
        subsetData = inData[inData[cat]==curcat]
        outData.append(calcPerfByBuilt(subsetData,cat,curcat))
    df = ps.concat(outData)
    return(df)

### calculate summary statistics, stratified by visible road quartile and a second stratification category ###
**Inputs:**
- **cat** (string) - second stratification category
- **inData** (pandas dataframe) - contains image scores and visible road quartile

**Outputs:**
- **df** (pandas dataframe) - contains calculated summary statistics

In [24]:
def calcVRoadByCat(cat,inData):
    uniquecats = list(set(inData[cat]))
    outData = []
    for curcat in uniquecats:
        subsetData = inData[inData[cat]==curcat]
        outData.append(calcPerfByBuilt(subsetData,cat,curcat))
    df = ps.concat(outData)
    return(df)

### calculate summary statistics, stratified by visible tree quartile and a second stratification category ###
**Inputs:**
- **cat** (string) - second stratification category
- **inData** (pandas dataframe) - contains image scores and visible tree quartile

**Outputs:**
- **df** (pandas dataframe) - contains calculated summary statistics

In [25]:
def calcTreesByCat(cat,inData):
    uniquecats = list(set(inData[cat]))
    outData = []
    for curcat in uniquecats:
        subsetData = inData[inData[cat]==curcat]
        outData.append(calcPerfByTrees(subsetData,cat,curcat))
    df = ps.concat(outData)
    return(df)

### calculate summary statistics, stratified by visible accessibility features quartile and a second stratification category ###
**Inputs:**
- **cat** (string) - second stratification category
- **inData** (pandas dataframe) - contains image scores and visible accessibility quartile

**Outputs:**
- **df** (pandas dataframe) - contains calculated summary statistics

In [26]:
def calcAccessByCat(cat,inData):
    uniquecats = list(set(inData[cat]))
    outData = []
    for curcat in uniquecats:
        subsetData = inData[inData[cat]==curcat]
        outData.append(calcPerfByAccess(subsetData,cat,curcat))
    df = ps.concat(outData)
    return(df)

### wrapper function to calculate summary statistics for all second stratification levels ###
**Inputs:**
- **fxn1** (function) - calculatie summary statistics for a first stratification without a second strat level
- **fxn2** (function) - calculate summary statistics using a second stratification level
- **data** (pandas dataframe) - contains image scores and all stratification variables
- **outputFile** (string) - relative filepath for storing compiled summary statistics

In [41]:
def calcPerfForExp(fxn1,fxn2,data,outputFile):
    allVals = fxn1(data,'all','all')
    age = fxn2('age',data)
    gender = fxn2('gender',data)
    race = fxn2('race',data)
    ethnicity = fxn2('ethnicity',data)
    division = fxn2('h_division',data)
    covid = fxn2('covid',data)
    education = fxn2('education',data)
    df = ps.concat([allVals,age,gender,race,ethnicity,division,covid,education])
    df.to_csv(PARENT_FOLDER + outputFile,index=False)

### calculate all summary statistics for a single perception, including all first and second stratification levels ###
**Inputs:**
- **inData** (pandas dataframe) - contains image scores and all stratification variables
- **label** (string) - name of the perception to calculate summary statistics for

In [45]:
def calcValsSingleOutcome(inData,label):
    calcPerfForExp(calcPerfByUrban,calcUrbanByCat,inData,label + '_urban_mturk.csv')
    calcPerfForExp(calcPerfByRoad,calcRoadByCat,inData,label + '_road_mturk.csv')
    calcPerfForExp(calcPerfByAccess,calcAccessByCat,inData,label + '_access_mturk.csv')
    calcPerfForExp(calcPerfByTrees,calcTreesByCat,inData,label + '_trees_mturk.csv')
    calcPerfForExp(calcPerfByVRoad,calcVRoadByCat,inData,label + '_vroad_mturk.csv')
    calcPerfForExp(calcPerfByBuilt,calcBuiltByCat,inData,label + '_built_mturk.csv')
    calcPerfForExp(calcPerfByGreen,calcGreenByCat,inData,label + '_green_mturk.csv')
    calcPerfForExp(calcPerfByHit,calcHitByCat,inData,label + '_hit_mturk.csv')

### preprocess and then calculate summary statistics for a single perception ###
**Inputs:**
- outcome (string) - name of the perception to calculate summary statistics for
- inData (pandas dataframe) - contains image scores and all stratification variables

In [60]:
def calcSingleOutcome(outcome,inData):
    subsetData = inData[inData['label']==outcome]
    nVotes = subsetData.count()[0]
    converted = []
    for row in range(nVotes):
        converted.append(convertVoteToImageScores(subsetData.iloc[row],image_meta))
    converted2 = ps.concat(converted)
    converted3 = redefineGender(converted2)
    converted3 = redefineEthnicity(converted3)
    calcValsSingleOutcome(converted3,outcome)

### calculate summary statistics for all perceptions ###

In [48]:
outcomes = list(set(mturk['label']))
for outcome in outcomes:
    print("processing outcome %s" %(outcome))
    calcSingleOutcome(outcome,mturk)

processing outcome relaxing


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  otherData['gender'] = 'Other'
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  otherData['race']= 'OT'


processing outcome nature


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  otherData['gender'] = 'Other'
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  otherData['race']= 'OT'


processing outcome safe_walk


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  otherData['gender'] = 'Other'
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  otherData['race']= 'OT'


processing outcome beauty


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  otherData['gender'] = 'Other'
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  otherData['race']= 'OT'


processing outcome safe_crime


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  otherData['gender'] = 'Other'
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  otherData['race']= 'OT'
