# Visual Genome: Spatial sematic Analysis, an attempt
* Spatial semantics is the study of the meaning of spatial language, but what is to be regarded
as ‘spatial language’?

* Ankur Dutta, Miroslav Vitkov, MS Cognitive Sytems

References: https://pdfs.semanticscholar.org/2bed/7800e931181a6dbb9acba104242b2dd9905f.pdf
*          (To appear in Hubert Cuyckens and Dirk Geeraerts (eds.) Handbook in Cognitive Linguistics, Chapter 13, Spatial Semantics, Jordan Zlatev) 

In [2]:
## libraries
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
from visual_genome import api as vg ## visual_genome is the name of the toolbar folder
from PIL import Image as PIL_Image
import requests
try:
    from StringIO import StringIO ## python 2
except ImportError:
    from io import StringIO ## python 3
import pandas as pd
import spacy
nlp = spacy.load('en_core_web_sm')

In [3]:
## STEP 0 thinking
# In the VG datset-annotation bundle, spatial languages can be found in any kind of region description or 
# image description. However since spatial sematics will not appear as attributes (has to have a verb and
# prepositions, and not adjectives) since some of them will be 
# Prepositional Phrases or Noun Phrases, it has to be only present as a relationship part of the annotation,
# so now we will look only into the 
# various descriptions in image annotation and break them into their POS tags, to check 
# across text for a few categories of Verbs and Prepositions and slowly explore the data.

### for the 1st image
## region description 
reg_desc = vg.get_region_descriptions_of_image(id=1)
print(len(reg_desc))
## relationships within an image
# reg_rel = vg.get_scene_graph_of_image(id=1)
# print(len(reg_rel.relationships))

262


In [4]:
### DATA FORMAT OF POS
# text      : lemma_ : pos_ : tag_ : dep_  : shape_ : is_alpha : is_stop
# the       : the    : DET  : DT   : det   : xxx    : True     : True
# clock     : clock  : NOUN : NN   : nsubj : xxxx   : True     : False
# is        : be     : VERB : VBZ  : ROOT  : xx     : True     : True
# green     : green  : ADJ  : JJ   : acomp : xxxx   : True     : False
# in        : in     : ADP  : IN   : prep  : xx     : True     : True
# colour    : colour : NOUN : NN   : pobj  : xxxx   : True     : False

### Step 0. POS tagging image descriptions of first 100 images

In [5]:
## collecting 100 region descriptions into
reg_descpritions = []
for i in range(1,101):
    reg_desc = vg.get_region_descriptions_of_image(id=i)
    sent_pos = []
    for j in range(0,len(reg_desc)):
        sent_pos.append(reg_desc[j].phrase)
    reg_descpritions.append(sent_pos)

In [6]:
len(reg_descpritions)

100

In [7]:
## storage arrays
## FORMAT:
## POS_table = [[[a,b,c,a,s],[a,s,d,q,w]].....,
##              [[q,w,e,y,u],[e,w,q,b,n]].....,
##               ...............
##              [[6,6,6,0,1],[7,7,7,1,0]]....]
POS_table = []

## POS tagging of all relationships to do a hollistic analysis
for i_rel in reg_descpritions:
    # Image's all object/subject relationships 
    # POS tagging all image object/subject relationships
    for relation in i_rel:
        doc = nlp(relation)
#         print("text:lemma:pos:tag:dep:shape:is_alpha:is_stop") ## POS format
        sent_pos = []
        rel_count = 0
        for token in doc:    
#             print(token.text,':', token.lemma_,':', token.pos_,':', token.tag_,':', 
#                   token.dep_,':',token.shape_,':', token.is_alpha,':', token.is_stop)
            sent_pos.append([token.text,token.lemma_,token.pos_,token.tag_,token.dep_])
        POS_table.append(sent_pos)
        rel_count+=1
#         break ## per relationship 
#     break ## per image

In [12]:
len(POS_table)

8827

### Filter the data using verbs and prespositions

##### the idea is now to filter using various verbs(VERBS):
* The following being the verb forms in the data
* a. BES	VERB		auxiliary “be”
* b. HVS	VERB		forms of “have”
* c. MD	VERB	VerbType=mod	verb, modal auxiliary
* d. VB	VERB	VerbForm=inf	verb, base form
* e. VBD	VERB	VerbForm=fin Tense=past	verb, past tense
* f. VBG	VERB	VerbForm=part Tense=pres Aspect=prog	verb, gerund or present participle
* h. VBN	VERB	VerbForm=part Tense=past Aspect=perf	verb, past participle
* h. VBP	VERB	VerbForm=fin Tense=pres	verb, non-3rd person singular present
* i. VBZ	VERB	VerbForm=fin Tense=pres Number=sing Person=3	verb, 3rd person singular present
 
* and prepositions like : ADP,IN

source: https://spacy.io/api/annotation#pos-tagging

### Filtering Spatial language data

### a. Searching for PP

In [9]:
## Step 1 a filter
## finding PP: Prepositional Phrases using the a tags
## means:
# finding PP in <tag> part of the annotaion
PP_collection = []
x = 0
y = 0
for image in POS_table:
    for desc in image:
        if (desc[3]=='PP'):
            print(x,y)
            PP_collection.append([x,y])
#             break
        y+=1
#         break
    x+=1
### didnt work

### b. Searching for Prepositions and spatial root verbs like "stand", "look", "walk", "run" and "sit" 

In [28]:
## flags:
prep_flag = 0
verb_flag = 0
verb_list = ["stand", "look", "walk", "run" ,"sit"]
## storage variables
prep_col = []
x = 0
for image in POS_table:
    y = 0
    prep_flag = 0
    verb_flag = 0
    for desc in image:
        if (desc[4] == "prep"):
            prep_flag = 1
        if desc[1] in verb_list:
            if(desc[2]=="VERB"):
                verb_flag = 1
        if(verb_flag == 1 and prep_flag == 1):
            prep_col.append(x)
#             print(x) ## detectied spatial sentences/phrases
            break
        y+=1
#         break
    x+=1
#     break
### found some!!!!

In [29]:
## some spatially filtered sentences
for i in prep_col:
    temp_append_str = ''
    for j in POS_table[i]:
        temp_append_str += " "+str(j[0])
    print(i,". ")
    print(temp_append_str)

116 . 
 two men standing on the road
163 . 
 Man standing on a sidewalk
204 . 
 two men stand on an urban sidewalk
282 . 
 a man beginning to walk across the street
286 . 
 man about to walk across street
315 . 
 the street is empty without people walking
356 . 
 the side walks is empty without people
471 . 
 Walk light on
489 . 
 man walking in a crosswalk 

529 . 
 girl sitting in front of a monitor
572 . 
 cables run down from the outlets onto the floor
577 . 
 dark - haired woman sitting at desk
594 . 
 Computer keyboard sitting on the desk
615 . 
 woman looking at a computer monitor
616 . 
 bag sitting on desk
637 . 
 Woman sitting in a chair
654 . 
 woman is looking at the computer
673 . 
 A woman with black hair sitting at a desk .
677 . 
 A black case sitting on a desk .
689 . 
 black telephone sitting on the desk
734 . 
 cordless phone sitting in its base
823 . 
 teddy bear sitting on the sofa
872 . 
 Teddy Bear sitting on futon
875 . 
 Lamp sitting in corner
876 . 
 TV sittin

In [None]:
# Lets print those preposotions
for x,y in prep_col:
    print(POS_table[x][y])
#     break