### Notebook name: MarkRecaptureSim.ipynb
#### Author: Sreejith Menon (smenon8@uic.edu)

#### Mark Recapture Simulation Notebook

Recognize individuals that appeared on day 1 and then on day 2
Individuals that appear on day 1 are **marks**.    
If the same individuals appear on day 2 then these are **recaptures**

*Appeared means the individuals who were photographed on day 1 as well as day 2*

To change the behavior of the script only change the values of the dictionary days. Changing days dict can filter out the images to the days the images were clicked. 

The first level calculations are based on what pictures were clicked and by applying the Pertersen-Lincoln Index calculations

The second level calculations will filter out only the images that were shared (only highly shared images with proportion >= 80).

In [60]:
import json
from datetime import datetime
import DataStructsHelperAPI as DS
import importlib
importlib.reload(DS)
import pandas as pd
import cufflinks as cf # this is necessary to link pandas to plotly
cf.go_online()
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import MarkRecapHelper as MR
import importlib
importlib.reload(MR)
import DeriveFinalResultSet as DRS

In [None]:
days = {'2015-02-18' : '2015-02-18',
 '2015-02-19' : '2015-02-19',
 '2015-02-20' : '2015-02-20',
 '2015-02-25' : '2015-02-25',
 '2015-02-26' : '2015-02-26',
 '2015-03-01' : '2015-03-01',
 '2015-03-02' : '2015-03-02'}

nidMarkRecapSet = MR.genNidMarkRecapDict("../data/imgs_exif_data_full.json","../data/full_gid_aid_map.json","../data/full_aid_features.json",days)

#### Visualizations on how pictures were taken.
Visualizations on how individuals were identified across different days of the Great Zebra Count (GZC) rally. There are visuals which show how many individuals were identified on the first day, how many individuals were seen only on that day and how many individuals were first seen on that day.

In [None]:
# How many individuals were identified on each day, 
# i.e. how many different individuals did we see each day?

indsPerDay = {}
for nid in nidMarkRecapSet:
    for day in nidMarkRecapSet[nid]:
        indsPerDay[day] = indsPerDay.get(day,0) + 1
        
df1 = pd.DataFrame(indsPerDay,index=['IndsIdentified']).transpose()

fig1 = df1.iplot(kind='bar',filename='Individuals seen per day',title='Individuals seen per day')
iframe1 = fig1.embed_code

In [None]:
# How many individuals did we see only on that day, 
# i.e. how many individuals were only seen that day and not any other day.

uniqIndsPerDay = {}
for nid in nidMarkRecapSet:
    if len(nidMarkRecapSet[nid]) == 1:
        uniqIndsPerDay[nidMarkRecapSet[nid][0]] = uniqIndsPerDay.get(nidMarkRecapSet[nid][0],0) + 1
        
df2 = pd.DataFrame(uniqIndsPerDay,index=['IndsIdentifiedOnlyOnce']).transpose()

fig2 = df2.iplot(kind='bar',filename='Individuals seen only that day',title='Individuals seen only that day')
iframe2 = fig2.embed_code

In [None]:
# How many individuals were first seen on that day, i.e. the unique number of animals that were identified on that day.
# The total number of individuals across all the days is indeed equal to all the unique individuals in the database. We have 1997 identified individuals.
indsSeenFirst = {}
for nid in nidMarkRecapSet:
    indsSeenFirst[min(nidMarkRecapSet[nid])] = indsSeenFirst.get(min(nidMarkRecapSet[nid]),0) + 1
    
df3 = pd.DataFrame(indsSeenFirst,index=['FirstTimeInds']).transpose()

fig3 = df3.iplot(kind='bar',filename='Individuals first seen on that day',title='Individuals first seen on that day')
iframe3 = fig3.embed_code

In [None]:
df1['IndsIdentifiedOnlyOnce'] = df2['IndsIdentifiedOnlyOnce']
df1['FirstTimeInds'] = df3['FirstTimeInds']

df1.columns = ['Total inds seen today','Inds seen only today','Inds first seen today']
fig4 = df1.iplot(kind='bar',filename='Distribution of sightings',title='Distribution of sightings')
iframe4 = fig4.embed_code

### Actual Mark-Recapture Calculations

#### The below snippets only consider photos clicked and _NOT_ shared data.

In [4]:
days = {'2015-03-01' : 1,
        '2015-03-02' : 2 }

In [15]:
# Entire population estimate (includes giraffes and zebras)
nidMarkRecapSet = MR.genNidMarkRecapDict("../data/imgs_exif_data_full.json","../data/full_gid_aid_map.json","../data/full_aid_features.json","../FinalResults/rankListImages_expt2.csv",days,shareData=None)
marks,recaptures,population = MR.applyMarkRecap(nidMarkRecapSet)
print("Population of all animals = %f" %population)
marks,recaptures

Population of all animals = 3620.930233


(1038, 129)

In [16]:
nidMarkRecapSet_Zebras = MR.genNidMarkRecapDict("../data/imgs_exif_data_full.json","../data/full_gid_aid_map.json","../data/full_aid_features.json","../FinalResults/rankListImages_expt2.csv",days,'zebra_plains',shareData=None)
marks,recaptures,population = MR.applyMarkRecap(nidMarkRecapSet_Zebras)
print("Population of zebras = %f" %population)
marks,recaptures

Population of zebras = 3468.352941


(951, 119)

In [17]:
nidMarkRecapSet_Giraffes = MR.genNidMarkRecapDict("../data/imgs_exif_data_full.json","../data/full_gid_aid_map.json","../data/full_aid_features.json","../FinalResults/rankListImages_expt2.csv",days,'giraffe_masai',shareData=None)
marks,recaptures,population = MR.applyMarkRecap(nidMarkRecapSet_Giraffes)
print("Population of giraffes = %f" %population)
marks,recaptures

Population of giraffes = 176.800000


(104, 10)

#### The below snippets consider the share data

In [25]:
nidMarkRecapSet_share = MR.genNidMarkRecapDict("../data/imgs_exif_data_full.json",
                       "../data/full_gid_aid_map.json",
                       "../data/full_aid_features.json",
                       "../FinalResults/rankListImages_expt2.csv",
                       days,
                       None,
                       shareData='proportion')
mark,recapture,population = MR.applyMarkRecap(nidMarkRecapSet_share)
print("Population of all animals = %f" %population)
marks,recaptures

Population of all animals = 2709.000000


(104, 10)

In [26]:
nidMarkRecapSet_share = MR.genNidMarkRecapDict("../data/imgs_exif_data_full.json",
                       "../data/full_gid_aid_map.json",
                       "../data/full_aid_features.json",
                       "../FinalResults/rankListImages_expt2.csv",
                       days,
                       'zebra_plains',
                       shareData='proportion')
mark,recapture,population = MR.applyMarkRecap(nidMarkRecapSet_share)
print("Population of zebras = %f" %population)
marks,recaptures

Population of zebras = 0.000000



There are no recaptures for this case.



(104, 10)

In [27]:
nidMarkRecapSet_share = MR.genNidMarkRecapDict("../data/imgs_exif_data_full.json",
                       "../data/full_gid_aid_map.json",
                       "../data/full_aid_features.json",
                       "../FinalResults/rankListImages_expt2.csv",
                       days,
                       'giraffe_masai',
                       shareData='proportion')
mark,recapture,population = MR.applyMarkRecap(nidMarkRecapSet_share)
print("Population of giraffes = %f" %population)
marks,recaptures

Population of giraffes = 69.000000


(104, 10)

In [69]:
days = [{'2006' : 1, '2007' : 2 }, {'2007' : 1, '2008' : 2 }, {'2008' : 1, '2009' : 2 }, {'2009' : 1, '2010' : 2 }, {'2010' : 1, '2011' : 2 }, {'2014' : 1, '2015' : 2 }] 
for i in range(len(days)):
    nidMarkRecapSet = MR.genNidMarkRecapDict("../data/Flickr_EXIF_full.json",
                                         "../data/Flickr_IBEIS_Ftrs_gid_aid_map.json",
                                         "../data/Flickr_IBEIS_Ftrs_aid_features.json",
                                         "../FinalResults/rankListImages_expt2.csv", # this is useless
                                         days[i],
                                         shareData='other',
                                        filterBySpecies='zebra_grevys')
    
    marks, recaps, population = MR.applyMarkRecap(nidMarkRecapSet)
    
    print("Estimate for the year : "  + ' & '.join(list(days[i].keys())))
    print("Number of marks : %i" %marks)
    print("Number of recaptures : %i" %recaps)
    print("Estimated population : %f" %population)
    print()

Estimate for the year : 2006 & 2007
Number of marks : 90
Number of recaptures : 6
Estimated population : 2205.000000

Estimate for the year : 2007 & 2008
Number of marks : 147
Number of recaptures : 7
Estimated population : 2310.000000

Estimate for the year : 2008 & 2009
Number of marks : 110
Number of recaptures : 11
Estimated population : 1850.000000

Estimate for the year : 2010 & 2009
Number of marks : 185
Number of recaptures : 7
Estimated population : 2669.285714

Estimate for the year : 2011 & 2010
Number of marks : 101
Number of recaptures : 3
Estimated population : 5050.000000

Estimate for the year : 2015 & 2014
Number of marks : 57
Number of recaptures : 3
Estimated population : 2432.000000



In [77]:
gidNid = DRS.getCountingLogic(inGidAidMapFl,inAidFtrFl,"NID",False)
flickr_nids = list(gidNid.values())
flickr_nids = [item for sublist in flickr_nids for item in sublist]

print("Number of unique individuals identified : %i" %len(set(flickr_nids)))

Number of unique individuals identified : 1355


['1478',
 '45',
 '722',
 '103',
 '45',
 '173',
 '1337',
 '1338',
 '1291',
 '1292',
 '621',
 '958',
 '45',
 '1130',
 '810',
 '864',
 '317',
 '146',
 '1273',
 '527',
 '528',
 '700',
 '1121',
 '1364',
 '45',
 '1365',
 '45',
 '943',
 '301',
 '45',
 '300',
 '173',
 '950',
 '70',
 '45',
 '812',
 '1327',
 '1328',
 '134',
 '173',
 '1179',
 '45',
 '250',
 '249',
 '377',
 '45',
 '166',
 '478',
 '1058',
 '1099',
 '1100',
 '45',
 '1146',
 '330',
 '1183',
 '45',
 '376',
 '870',
 '1428',
 '970',
 '428',
 '1268',
 '1267',
 '395',
 '45',
 '45',
 '45',
 '1043',
 '207',
 '45',
 '1470',
 '3',
 '1198',
 '1197',
 '19',
 '1370',
 '1372',
 '471',
 '1371',
 '505',
 '3',
 '2',
 '1003',
 '534',
 '45',
 '45',
 '1445',
 '1469',
 '45',
 '355',
 '828',
 '1260',
 '1238',
 '610',
 '1265',
 '233',
 '25',
 '1239',
 '938',
 '549',
 '1403',
 '673',
 '45',
 '45',
 '44',
 '45',
 '386',
 '1429',
 '202',
 '807',
 '806',
 '171',
 '1514',
 '45',
 '36',
 '488',
 '45',
 '1086',
 '45',
 '834',
 '375',
 '368',
 '45',
 '45',
 '241'