# Visual Genome: Spatial sematic Analysis, an attempt
* Spatial semantics is the study of the meaning of spatial language, but what is to be regarded
as ‘spatial language’?

* Ankur Dutta, Miroslav Vitkov, MS Cognitive Sytems

References: https://pdfs.semanticscholar.org/2bed/7800e931181a6dbb9acba104242b2dd9905f.pdf
*          (To appear in Hubert Cuyckens and Dirk Geeraerts (eds.) Handbook in Cognitive Linguistics, Chapter 13, Spatial Semantics, Jordan Zlatev) 

In [1]:
## libraries
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
from visual_genome import api as vg ## visual_genome is the name of the toolbar folder
from PIL import Image as PIL_Image
import requests
try:
    from StringIO import StringIO ## python 2
except ImportError:
    from io import StringIO ## python 3
import pandas as pd
import spacy
nlp = spacy.load('en_core_web_sm')

In [2]:
## STEP 0 thinking
# In the VG datset-annotation bundle, spatial languages can be found in any kind of region description or 
# image description. However since spatial sematics will not appear as attributes (has to have a verb and
# prepositions, and not adjectives) since some of them will be 
# Prepositional Phrases or Noun Phrases, it has to be only present as a relationship part of the annotation,
# so now we will look only into the 
# various descriptions in image annotation and break them into their POS tags, to check 
# across text for a few categories of Verbs and Prepositions and slowly explore the data.

### for the 1st image
## region description 
reg_desc = vg.get_region_descriptions_of_image(id=1)
print(len(reg_desc))
## relationships within an image
reg_rel = vg.get_scene_graph_of_image(id=1)
print(len(reg_rel.relationships))

262
41


In [3]:
### DATA FORMAT OF POS
# text      : lemma_ : pos_ : tag_ : dep_  : shape_ : is_alpha : is_stop
# the       : the    : DET  : DT   : det   : xxx    : True     : True
# clock     : clock  : NOUN : NN   : nsubj : xxxx   : True     : False
# is        : be     : VERB : VBZ  : ROOT  : xx     : True     : True
# green     : green  : ADJ  : JJ   : acomp : xxxx   : True     : False
# in        : in     : ADP  : IN   : prep  : xx     : True     : True
# colour    : colour : NOUN : NN   : pobj  : xxxx   : True     : False

### Step 0. POS tagging image descriptions of first 100 images

In [17]:
## collecting 100 region descriptions into
reg_descpritions = []
for i in range(1,101):
    reg_desc = vg.get_region_descriptions_of_image(id=i)
    sent_pos = []
    for j in range(0,len(reg_desc)):
        sent_pos.append(reg_desc[j].phrase)
    reg_descpritions.append(sent_pos)

In [19]:
len(reg_descpritions)

100

In [24]:
## storage arrays
## FORMAT:
## POS_table = [[[a,b,c,a,s],[a,s,d,q,w]].....,
##              [[q,w,e,y,u],[e,w,q,b,n]].....,
##               ...............
##              [[6,6,6,0,1],[7,7,7,1,0]]....]
POS_table = []

## POS tagging of all relationships to do a hollistic analysis
for i_rel in reg_descpritions:
    # Image's all object/subject relationships 
    # POS tagging all image object/subject relationships
    for relation in i_rel:
        doc = nlp(relation)
#         print("text:lemma:pos:tag:dep:shape:is_alpha:is_stop") ## POS format
        sent_pos = []
        rel_count = 0
        for token in doc:    
#             print(token.text,':', token.lemma_,':', token.pos_,':', token.tag_,':', 
#                   token.dep_,':',token.shape_,':', token.is_alpha,':', token.is_stop)
            sent_pos.append([token.text,token.lemma_,token.pos_,token.tag_,token.dep_])
        POS_table.append(sent_pos)
        rel_count+=1
#         break ## per relationship 
#     break ## per image

In [45]:
len(POS_table)

8827

### Filter the data using verbs and prespositions

##### the idea is now to filter using various verbs(VERBS):
* The following being the verb forms in the data
* a. BES	VERB		auxiliary “be”
* b. HVS	VERB		forms of “have”
* c. MD	VERB	VerbType=mod	verb, modal auxiliary
* d. VB	VERB	VerbForm=inf	verb, base form
* e. VBD	VERB	VerbForm=fin Tense=past	verb, past tense
* f. VBG	VERB	VerbForm=part Tense=pres Aspect=prog	verb, gerund or present participle
* h. VBN	VERB	VerbForm=part Tense=past Aspect=perf	verb, past participle
* h. VBP	VERB	VerbForm=fin Tense=pres	verb, non-3rd person singular present
* i. VBZ	VERB	VerbForm=fin Tense=pres Number=sing Person=3	verb, 3rd person singular present
 
* and prepositions like : ADP,IN

source: https://spacy.io/api/annotation#pos-tagging

### Filtering Spatial language data

##### a. searching for PP

In [33]:
## Step 1 a filter
## finding PP: Prepositional Phrases using the a tags
## means:
# finding PP in <tag> part of the annotaion
PP_collection = []
x = 0
y = 0
for image in POS_table:
    for desc in image:
        if (desc[3]=='PP'):
            print(x,y)
            PP_collection.append([x,y])
#             break
        y+=1
#         break
    x+=1
### didnt work

#### b.searching for Prepositions and sspatial root verbs like "stand", "look", "walk", "run" and "sit" 

In [41]:
prep_col = []
x = 0
for image in POS_table:
    y = 0
    for desc in image:
        if (desc[4]=="prep"):
            print(desc)
            prep_col.append([x,y])
            break
        y+=1
#         break
    x+=1
    break
### found some!!!!

['in', 'in', 'ADP', 'IN', 'prep']


In [40]:
# Lets print those preposotions
for x,y in prep_col:
    print(POS_table[x][y])

['in', 'in', 'ADP', 'IN', 'prep']
['along', 'along', 'ADP', 'IN', 'prep']
['at', 'at', 'ADP', 'IN', 'prep']
['on', 'on', 'ADP', 'IN', 'prep']
['of', 'of', 'ADP', 'IN', 'prep']
['on', 'on', 'ADP', 'IN', 'prep']
['in', 'in', 'ADP', 'IN', 'prep']
['beside', 'beside', 'ADP', 'IN', 'prep']
['of', 'of', 'ADP', 'IN', 'prep']
['on', 'on', 'ADP', 'IN', 'prep']
['in', 'in', 'ADP', 'IN', 'prep']
['on', 'on', 'ADP', 'IN', 'prep']
['in', 'in', 'ADP', 'IN', 'prep']
['on', 'on', 'ADP', 'IN', 'prep']
['on', 'on', 'ADP', 'IN', 'prep']
['on', 'on', 'ADP', 'IN', 'prep']
['on', 'on', 'ADP', 'IN', 'prep']
['of', 'of', 'ADP', 'IN', 'prep']
['on', 'on', 'ADP', 'IN', 'prep']
['on', 'on', 'ADP', 'IN', 'prep']
['to', 'to', 'ADP', 'IN', 'prep']
['of', 'of', 'ADP', 'IN', 'prep']
['near', 'near', 'ADP', 'IN', 'prep']
['of', 'of', 'ADP', 'IN', 'prep']
['in', 'in', 'ADP', 'IN', 'prep']
['near', 'near', 'ADP', 'IN', 'prep']
['at', 'at', 'ADP', 'IN', 'prep']
['to', 'to', 'ADP', 'IN', 'prep']
['along', 'along', 'ADP', 

['at', 'at', 'ADP', 'IN', 'prep']
['on', 'on', 'ADP', 'IN', 'prep']
['on', 'on', 'ADP', 'IN', 'prep']
['on', 'on', 'ADP', 'IN', 'prep']
['with', 'with', 'ADP', 'IN', 'prep']
['in', 'in', 'ADP', 'IN', 'prep']
['in', 'in', 'ADP', 'IN', 'prep']
['into', 'into', 'ADP', 'IN', 'prep']
['on', 'on', 'ADP', 'IN', 'prep']
['with', 'with', 'ADP', 'IN', 'prep']
['in', 'in', 'ADP', 'IN', 'prep']
['under', 'under', 'ADP', 'IN', 'prep']
['of', 'of', 'ADP', 'IN', 'prep']
['of', 'of', 'ADP', 'IN', 'prep']
['on', 'on', 'ADP', 'IN', 'prep']
['of', 'of', 'ADP', 'IN', 'prep']
['on', 'on', 'ADP', 'IN', 'prep']
['on', 'on', 'ADP', 'IN', 'prep']
['on', 'on', 'ADP', 'IN', 'prep']
['through', 'through', 'ADP', 'IN', 'prep']
['in', 'in', 'ADP', 'IN', 'prep']
['on', 'on', 'ADP', 'IN', 'prep']
['on', 'on', 'ADP', 'IN', 'prep']
['in', 'in', 'ADP', 'IN', 'prep']
['on', 'on', 'ADP', 'IN', 'prep']
['in', 'in', 'ADP', 'IN', 'prep']
['on', 'on', 'ADP', 'IN', 'prep']
['at', 'at', 'ADP', 'IN', 'prep']
['in', 'in', 'ADP', 

#### Getting the image data
Next, we will get some data about the image. We specifically want to know the image's url.

In [None]:
# image = vg.GetImageData(id=image_id)
# print "The url of the image is: %s" % image.url

#### Getting the region descriptions
Now, let's get all the region descriptions for this image.

In [None]:
# regions = vg.GetRegionDescriptionsOfImage(id=image_id)
# print "The first region descriptions is: %s" % regions[0].phrase
# print "It is located in a bounding box specified by x:%d, y:%d, width:%d, height:%d" % (regions[0].x, regions[0].y, regions[0].width, regions[0].height)

#### Visualizing some regions
Now, we will visualize some of the regions. The x,y coordinates of a region refer to the top left corner of the region. Since there are many regions, we will only visualize the first 4.

In [None]:
# fig = plt.gcf()
# fig.set_size_inches(18.5, 10.5)
# def visualize_regions(image, regions):
#     response = requests.get(image.url)
#     img = PIL_Image.open(StringIO(response.content))
#     plt.imshow(img)
#     ax = plt.gca()
#     for region in regions:
#         ax.add_patch(Rectangle((region.x, region.y),
#                                region.width,
#                                region.height,
#                                fill=False,
#                                edgecolor='red',
#                                linewidth=3))
#         ax.text(region.x, region.y, region.phrase, style='italic', bbox={'facecolor':'white', 'alpha':0.7, 'pad':10})
#     fig = plt.gcf()
#     plt.tick_params(labelbottom='off', labelleft='off')
#     plt.show()
# visualize_regions(image, regions[:8])