# GSVMturkv2Counts #
<br>

**Summary:** Count the number of GSV images per category for the dataset used for online training of the urban environment perception model <br>
**Author:** Andrew Larkin <br>
**Date Created:** Dec 21, 2020 <br>
**Affiliation:** Oregon State University, College of Health

#### Note: For those interested in reproducing this analysis for their own studies, using the group by function in pandas is faster and more thorougly validated than our code shown below.  ####

In [4]:
import pandas as ps

In [5]:
PARENT_FOLDER = 'insert parent folder here'
SCREENED_IMGS_CSV = PARENT_FOLDER + "GSV_screened_images.csv"
STATS_FOLDER = PARENT_FOLDER + "Statistics/"
DIVISON_CSV = STATS_FOLDER + "GSV_screened_byDivision.csv"
ROAD_CSV = STATS_FOLDER + "GSV_screened_byRoad.csv"
URBAN_CSV = STATS_FOLDER + "GSV_screeened_byUrban.csv"
CODE_CSV = STATS_FOLDER + "GSV_screened_bySampleCode.csv"
ANGLE_CSV = STATS_FOLDER + "GSV_screened_byViewingAngle.csv"

### get number of GSV images for all classification levels of one variable ###
**Inputs:**
- **inputData** (pandas dataframe) - contains list of all downloaded images and accompanying metadata
- **variable** (string) - variable to stratify dataset by and count.  

**Outputs:**
- **divisionDF** (pandas dataframe) - contains counts for all classification levels of the input variable

In [3]:
def getCountByVariable(inputData,variable):
    uniqueDivisions = list(set(inputData[variable]))
    count = []
    for division in uniqueDivisions:
        divisionData = inputData[inputData[variable]==division]
        count.append(divisionData.count()[0])
    divisionDF = ps.DataFrame({
        variable:uniqueDivisions,
        'count':count
    })
    return(divisionDF)

### get number of GSV image by road type ###
**Inputs:**
- **inputData** (pandas dataframe) - contains list of all downloaded images and accompanying metadata

**Outputs:**
- **roadDF** (pandas dataframe) - contains counts for all road types

In [5]:
def getCountByRoad(inputData):
    uniqueDivisions = list(set(inputData['roadType']))
    count = []
    for division in uniqueDivisions:
        divisionData = inputData[inputData['roadType']==division]
        count.append(divisionData.count()[0])
    roadDF = ps.DataFrame({
        'division':uniqueDivisions,
        'count':count
    })
    return(roadDF)

### subset dataset by a classifiction level, and return the subset data and subset data size ###
**Inputs:** 
- **rawData** (pandas dataframe) - contains list of all downloaded images and accompanying metadata
- **cat** (string) - pandas variable name to subset data by
- **value** (int) - varlue to filter dataset by

**Outputs:**
- **tempData** (pandas dataframe - subset of the input data
- **tempCount** (int) - number of rows in the subset data

In [3]:
def getSubsetCat(rawData,cat,value):
    tempData = rawData[rawData[cat]==value]
    tempCount = tempData.count()[0]
    return([tempData,tempCount])

### get number of GSV images for one urban category, stratified by viewing angle ###
**Inputs:**
- **rawData** (pandas dataframe) - contains list of all downloaded images and accompanying metadata
- **value** (int) - which urban category to count images for

**Outputs:**
- **tempDF** (pandas dataframe) - number of GSV images for the urban category, stratified by straight and side viewing angles

In [21]:
def processUrbanCat(rawData,value):
    cat,count = [],[]
    urbanData,urbanCount = getSubsetCat(rawData,'urban',value)
    cat.append('urbanCore')
    count.append(urbanCount)
    for angle in ['straight','side']:
        tempData,tempCount = getSubsetCat(urbanData,'viewingAngle',angle)
        cat.append(angle)
        count.append(tempCount)
        for road in [1,2,3]:
            tempData2,tempCount2 = getSubsetCat(tempData,'roadType',road)
            cat.append(angle + str(road))
            count.append(tempCount2)
    tempDF = ps.DataFrame({
        'category':cat,
        'count':count
    })
    return(tempDF)

In [22]:
def main():
    rawData = ps.read_csv(SCREENED_IMGS_CSV)
    divisionData = getCountByVariable(rawData,'division')
    divisionData.to_csv(DIVISON_CSV,index=False)
    roadData = getCountByVariable(rawData,'roadType')
    roadData.to_csv(ROAD_CSV,index=False)
    urbanData = getCountByVariable(rawData,'urban')
    urbanData.to_csv(URBAN_CSV,index=False)
    codeData = getCountByVariable(rawData,'sampleCode')
    codeData.to_csv(CODE_CSV,index=False)
    angleData = getCountByVariable(rawData,'viewingAngle')
    angleData.to_csv(ANGLE_CSV,index=False)
    

In [23]:
main()