## Exploring Places

![Screen%20Shot%202022-01-30%20at%2011.24.35.png](attachment:Screen%20Shot%202022-01-30%20at%2011.24.35.png)

## Experiments:

- 1. Visualizing Places dataset
- 2. Exploring Tags Places
- 3. Exploring Towns & Places Names
- 4. Exploring Properities
- 5. Exploring Descriptions Places Similarities
- 6. Descriptions Places Topic Modelling

In [6]:
import json
import pandas as pd
import plotly.express as px
import os
import plotly.graph_objects as go
import numpy as np
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
from bertopic import BERTopic

In [7]:
#data="places.json"
data="dataset/sample_20180501.json"
with open('dataset/sample_20180501.json', 'r') as f:
    data = json.load(f)
    print(len(data["places"]))
    places=data["places"]
df = pd.DataFrame(places)

1224


### 2. Visualizing the places dataframe

In [10]:
df["properties"].iloc[0]

{'place.child-restrictions': True,
 'place.facilities.free-wifi': True,
 'place.facilities.dogs-allowed': False,
 'place.facilities.parking': True,
 'place.facilities.toilets': True,
 'place.facilities.toilets.disabled': False,
 'place.facilities.wheelchair-access': False,
 'place.capacity.max': '160'}

In [4]:
df.shape[0]

1224

###  Experiment 1: Exploring Place Ids

In [5]:
df_ids=df.groupby(['place_id']).size().reset_index()
df_ids=df_ids.rename(columns={0: "number_of_times"}).sort_values(by=['number_of_times'], ascending=False)
df_ids

Unnamed: 0,place_id,number_of_times
0,1,1
813,60849,1
820,61545,1
819,61451,1
818,61204,1
...,...,...
407,22375,1
406,22225,1
405,22215,1
404,22207,1


### Experiment 2: Exploring Tags Places

We are going to separete the elements stored in each tag list into new rows. 


In [6]:
df["tags"][0:5]

0        [Bar & pub food, Comedy, Restaurants, Venues]
1    [Cinemas, Community centre, Public buildings, ...
2    [Arts Centre, Galleries, Language School, Publ...
3                         [Conference Centres, Venues]
4                                   [Theatres, Venues]
Name: tags, dtype: object

In [7]:
df_tags=df.explode('tags')

In [8]:
df_tags

Unnamed: 0,address,email,postal_code,properties,sort_name,town,website,place_id,modified_ts,created_ts,name,loc,country_code,tags,descriptions,phone_numbers,status
0,5 York Place,admin@thestand.co.uk,EH1 3EB,"{'place.child-restrictions': True, 'place.faci...",Stand,Edinburgh,http://www.thestand.co.uk,1,2021-11-24T12:18:33Z,2021-11-24T12:18:33Z,The Stand,"{'latitude': '55.955806109395006', 'longitude'...",GB,Bar & pub food,"[{'type': 'description.list.default', 'descrip...","{'info': '0131 558 7272', 'box_office': '0131 ...",live
0,5 York Place,admin@thestand.co.uk,EH1 3EB,"{'place.child-restrictions': True, 'place.faci...",Stand,Edinburgh,http://www.thestand.co.uk,1,2021-11-24T12:18:33Z,2021-11-24T12:18:33Z,The Stand,"{'latitude': '55.955806109395006', 'longitude'...",GB,Comedy,"[{'type': 'description.list.default', 'descrip...","{'info': '0131 558 7272', 'box_office': '0131 ...",live
0,5 York Place,admin@thestand.co.uk,EH1 3EB,"{'place.child-restrictions': True, 'place.faci...",Stand,Edinburgh,http://www.thestand.co.uk,1,2021-11-24T12:18:33Z,2021-11-24T12:18:33Z,The Stand,"{'latitude': '55.955806109395006', 'longitude'...",GB,Restaurants,"[{'type': 'description.list.default', 'descrip...","{'info': '0131 558 7272', 'box_office': '0131 ...",live
0,5 York Place,admin@thestand.co.uk,EH1 3EB,"{'place.child-restrictions': True, 'place.faci...",Stand,Edinburgh,http://www.thestand.co.uk,1,2021-11-24T12:18:33Z,2021-11-24T12:18:33Z,The Stand,"{'latitude': '55.955806109395006', 'longitude'...",GB,Venues,"[{'type': 'description.list.default', 'descrip...","{'info': '0131 558 7272', 'box_office': '0131 ...",live
1,10 Orwell Terrace,,EH11 2DY,,St Bride's Centre,Edinburgh,http://stbrides.wordpress.com,371,2019-12-04T13:27:26Z,2019-12-04T13:27:26Z,St Bride's Centre,"{'latitude': '55.94255035', 'longitude': '-3.2...",GB,Cinemas,"[{'type': 'description.list.default', 'descrip...",{'info': '0131 346 1405'},live
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1220,,,EH32 0QB,,Aberlady Local Nature Reserve,Longniddry,,112611,2018-10-12T15:32:19Z,2018-10-12T15:32:19Z,Aberlady Local Nature Reserve,"{'latitude': '56.01454821598324', 'longitude':...",GB,Outdoors,,,live
1221,4 Picardy Place,,EH1 3JT,,Tokyo Bar And Nightclub,Edinburgh,https://www.facebook.com/tokyonightclubedin/,112723,2018-10-15T16:54:59Z,2018-10-15T16:54:59Z,Tokyo Bar And Nightclub,"{'latitude': '55.9569983', 'longitude': '-3.18...",GB,Clubs,,{'info': '07378 413630'},live
1222,Edinburgh Road,,EH49 6AB,,Old Pavilion at Linlithgow Cricket Ground,Linlithgow,,113099,2018-10-26T17:04:32Z,2018-10-26T17:04:32Z,The Old Pavilion at Linlithgow Cricket Ground,"{'latitude': '55.97670900', 'longitude': '-3.5...",GB,Outdoors,,,live
1223,19-21 George Street,,EH2 2PB,,Principal George Street,Edinburgh,https://www.phcompany.com/principal/edinburgh-...,114042,2018-11-29T17:04:07Z,2018-11-29T17:04:07Z,The Principal George Street,"{'latitude': '55.95414000', 'longitude': '-3.1...",GB,Accommodation,,{'info': '0131 225 1251'},live


In [9]:
g_tags=df_tags.groupby(['tags']).size().reset_index()
g_tags=g_tags.rename(columns={0: "number_of_times"}).sort_values(by=['number_of_times'], ascending=False)
g_tags

Unnamed: 0,tags,number_of_times
233,Public buildings,224
317,Venues,224
217,Outdoors,180
242,Restaurants,155
234,Pubs & bars,150
...,...,...
163,Housing Association,1
164,IT,1
165,Ice Cream,1
166,Ice cream,1


In [10]:
px.histogram(g_tags, x="tags", y="number_of_times", histfunc="sum", color="tags", title='Frequency of tags places')

### Experiment 3: Exploring Towns & Names


In [11]:
df["town"][1:10]

1    Edinburgh
2    Edinburgh
3    Edinburgh
4    Edinburgh
5    Edinburgh
6    Edinburgh
7    Edinburgh
8    Edinburgh
9    Edinburgh
Name: town, dtype: object

#### 3.1 Frequency of places grouped by  towns

In [12]:
df_town=df.dropna(subset=['town'])
town=df_town.groupby(['town']).size().reset_index()
town=town.rename(columns={0: "number_of_times"})
town=town.drop([0])

In [13]:
town=town.sort_values(by=['number_of_times'], ascending=False)
town

Unnamed: 0,town,number_of_times
45,Edinburgh,736
38,Dunfermline,38
121,St Andrews,31
37,Dunbar,17
71,Kirkcaldy,17
...,...,...
86,Lothianburn,1
33,Dairsie,1
34,Dalgety Bay,1
40,EH8 8BL,1


In [14]:
px.scatter(town, x='town', y='number_of_times', color='number_of_times',  size="number_of_times", size_max=60, title="Frequency of places grouped by towns")


####  3.2 Frequency of places grouped by name 

In [15]:
df_name_town=df.groupby(['name']).size().reset_index()
df_name_town=df_name_town.rename(columns={0: "number_of_times"})
df_name_town=df_name_town.sort_values(by=['number_of_times'], ascending=False)
df_name_town.reset_index()

Unnamed: 0,index,name,number_of_times
0,1167,Waterstones,7
1,1121,University of Edinburgh,2
2,308,Edinburgh Napier University,2
3,437,Holy Trinity Church,2
4,879,St Mary's Parish Church,2
...,...,...,...
1206,404,Halliwell’s House Museum,1
1207,403,Hallhill Healthy Living Centre,1
1208,402,Haddington School of Dance and Music,1
1209,401,Haddington Corn Exchange,1


#### 3.3. Frequency of places  grouped by name and town

In [16]:
df_name_town=df.groupby(['name', 'town']).size().reset_index()
df_name_town=df_name_town.rename(columns={0: "number_of_times"})
df_name_town=df_name_town.sort_values(by=['number_of_times'], ascending=False)
df_name_town

Unnamed: 0,name,town,number_of_times
1170,Waterstones,Edinburgh,3
308,Edinburgh Napier University,Edinburgh,2
1206,ZOO Charteris,Edinburgh,2
1123,University of Edinburgh,Edinburgh,2
13,52 Canoes,Edinburgh,2
...,...,...,...
406,Harehead Farm,Cranshaws,1
405,Hanover Tap,Edinburgh,1
404,Halliwell’s House Museum,Selkirk,1
403,Hallhill Healthy Living Centre,Dunbar,1


### Experiment 4: Exploring Properities

In [17]:
df_properties=pd.concat([df.drop(['properties'], axis=1), df['properties'].apply(pd.Series)], axis=1)

In [18]:
df_properties[0:3]

Unnamed: 0,address,email,postal_code,sort_name,town,website,place_id,modified_ts,created_ts,name,...,place.child-restrictions,place.facilities.dogs-allowed,place.facilities.free-wifi,place.facilities.guide-dogs,place.facilities.hearing-loop,place.facilities.parking,place.facilities.toilets,place.facilities.toilets.baby-changing,place.facilities.toilets.disabled,place.facilities.wheelchair-access
0,5 York Place,admin@thestand.co.uk,EH1 3EB,Stand,Edinburgh,http://www.thestand.co.uk,1,2021-11-24T12:18:33Z,2021-11-24T12:18:33Z,The Stand,...,True,False,True,,,True,True,,False,False
1,10 Orwell Terrace,,EH11 2DY,St Bride's Centre,Edinburgh,http://stbrides.wordpress.com,371,2019-12-04T13:27:26Z,2019-12-04T13:27:26Z,St Bride's Centre,...,,,,,,,,,,
2,West Parliament Square,ifecosse.edimbourg-cslt@diplomatie.gouv.fr,EH1 1RN,Institut Français d'Ecosse,Edinburgh,http://www.ifecosse.org.uk,372,2021-02-23T16:57:44Z,2021-02-23T16:57:44Z,Institut Français d'Ecosse,...,,,False,,,False,False,,False,True


#### 4.1 Frequency of places grouped by wheelchair-access and town 

In [19]:
df_properties_wc=df_properties.groupby(['place.facilities.wheelchair-access', 'town']).size().reset_index()
df_properties_wc=df_properties_wc.rename(columns={0: "number_of_times"})
df_properties_wc=df_properties_wc.sort_values(by=['number_of_times'], ascending=False)
df_properties_wc

Unnamed: 0,place.facilities.wheelchair-access,town,number_of_times
23,True,Edinburgh,129
6,False,Edinburgh,69
22,True,Dunfermline,7
45,True,St Andrews,5
38,True,Musselburgh,3
30,True,Kirkcaldy,2
32,True,Livingston,2
12,False,South Queensferry,2
34,True,Loanhead,2
35,True,Lochgelly,2


#### 4.2 Frequency of places grouped by toilets_disabled and town 

In [20]:
df_properties_td=df_properties.groupby(['place.facilities.toilets.disabled', 'town']).size().reset_index()
df_properties_td=df_properties_td.rename(columns={0: "number_of_times"})
df_properties_td=df_properties_td.sort_values(by=['number_of_times'], ascending=False)
df_properties_td

Unnamed: 0,place.facilities.toilets.disabled,town,number_of_times
23,True,Edinburgh,117
7,False,Edinburgh,73
22,True,Dunfermline,6
30,True,Kirkcaldy,3
44,True,St Andrews,3
5,False,Dunfermline,3
37,True,Musselburgh,2
13,False,Peebles,2
25,True,Falkland,2
32,True,Livingston,2


### 5. Exploring Descriptions

In [21]:
df_descriptions=df.explode('descriptions')
df_descriptions=pd.concat([df_descriptions.drop(['descriptions'], axis=1), df_descriptions['descriptions'].apply(pd.Series)], axis=1)
df_descriptions=df_descriptions.dropna(subset=['description']).reset_index()
documents=df_descriptions["description"].values

In [22]:
len(documents)

404

In [23]:
#documents

#### Generating Text Embeddings

In [24]:
model = SentenceTransformer('all-MiniLM-L6-v2')
#Training our text_embeddings - using the descriptions available & all-MiniLM-L6-v2 Transformer
text_embeddings = model.encode(documents, batch_size = 8, show_progress_bar = True)



HBox(children=(FloatProgress(value=0.0, description='Batches', max=51.0, style=ProgressStyle(description_width…




In [25]:
np.shape(text_embeddings)

(404, 384)

#### Description Similarity 

In [26]:
similarities = cosine_similarity(text_embeddings)
similarities_sorted = similarities.argsort()
id_1 = []
id_2 = []
score = []
for index,array in enumerate(similarities_sorted):
    p=len(array)
    id_1.append(index)
    id_2.append(array[-2])
    score.append(similarities[index][array[-2]])
index_df = pd.DataFrame({'id_1' : id_1,
                          'id_2' : id_2,
                          'score' : score})
print(index_df)



     id_1  id_2     score
0       0   155  0.465363
1       1   195  0.513309
2       2   400  0.524068
3       3    12  0.541578
4       4   231  0.574484
..    ...   ...       ...
399   399   372  0.595814
400   400   392  0.670403
401   401   267  0.495739
402   402   299  0.577223
403   403   371  0.285343

[404 rows x 3 columns]


In [27]:
index_df["score"].sort_values(ascending=False)

85     0.864806
86     0.864806
132    0.859234
133    0.859234
61     0.847924
         ...   
232    0.312205
318    0.296475
403    0.285343
349    0.265690
321    0.231058
Name: score, Length: 404, dtype: float32

In [28]:
index_df.iloc[85]

id_1     85.000000
id_2     86.000000
score     0.864806
Name: 85, dtype: float64

**NOTE:** Documents 5 and 7 seems to be the most similar. Lets see what they have

In [29]:
documents[85]

'Docked in Leith, the Royal Yacht Britannia proudly served the Royal Family as its only floating residence for 44 years, travelling on nearly 1000 voyages. The magnificent ship is now open to the public and visitors vote it as one of the best Scottish attractions over and over again for decades. With the self-guided tours (available in 30 languages) you can see for yourself what life was like on the Britannia back in the days. Explore the richly decorated royal rooms, experience what it feels like to descend under the deck or simply enjoy the view of the sea. The ship serves as a luxury hotel and holds various events throughout the year. Getting a lovely cup of tea in the Royal Deck Tea Room or buying a charming souvenir from the gift shop will crown the visit.'

In [30]:
documents[86]

"Rated Scotland’s Best Visitor Attraction for 13 years by national tourism agency VisitScotland, Britannia was home to Her Majesty The Queen and the Royal Family for over 40 years, sailing one million miles around the world. Berthed in Edinburgh's historic Port of Leith, Britannia is now a five-star visitor attraction, as well as an exclusive evening events venue.\n\nStep aboard and follow in the footsteps of royalty to see where Prince Charles and Princess Diana honeymooned and Liz Taylor, Frank Sinatra and presidents Reagan, Mandela and Gorbachev were wined and dined.\n\nThere are five main decks to explore with a fascinating audio guide included in the ticket price, as well as a version for children. We also provide British Sign Language tablets, Braille script and each deck is fully accessible for buggies and wheelchairs. Take in stunning waterfront views from The Royal Deck Tea Room where delicious home-made cakes, soups and sandwiches are served. Experience The Royal Yacht Britan

### 6. Topic Modelling

In [31]:
topic_model = BERTopic(min_topic_size=20).fit(documents, text_embeddings)
topics, probs = topic_model.transform(documents, text_embeddings)
topic_model.visualize_topics()

In [32]:
topic_model.visualize_barchart()

In [33]:
topic_model.visualize_heatmap()

In [34]:
topic_model.get_topic_freq()

Unnamed: 0,Topic,Count
0,-1,157
1,0,97
2,1,63
3,2,48
4,3,39
