## Generating the LIST Dataframes FOR THE KG


In this notebook we are going to generate and save (into disk) a dataframe per CE ontology NODE. 
This is the list of the dataframes saved to file. These dataframes have **ONLY the necessary attributes for later generating the graph, and linking the information between nodes**:

- df_events	
- df_event_category 
- df_event_tags
- df_event_description
- df_event_properties
- df_event_phonenumber	

- df_schedules
- df_schedules_phone_numbers

- df_performances			
- df_performances_properties
- df_performances_descriptions	
- df_performances_links		
			
- df_tickets

- df_places
- df_places_tags
- df_places_properties
- df_places_description
- df_places_loc	
- df_places_pn



We are also generating another intermediate dataframes (those dataframes have 'total' in their names). But those intermediate dataframes will not be saved to disk. 

The idea will be later to read from those dataframes saved to disk to create the knowledge graph. 

In order to create these, we are going to use two dataframes previously calculated:
 - df_news_events
 - df_places
 
 

In [1]:
import yaml
import string
import copy
from datetime import datetime
import pandas as pd
from yaml import safe_load
from pandas.io.json import json_normalize
from difflib import SequenceMatcher
import pickle
import numpy as np
import collections
from yaml import safe_load

We are going to save the dataframes in a directory called dataframes_final

In [2]:
!mkdir ./dataframes_final

mkdir: ./dataframes_final: File exists


In [3]:
dataframes_final="./dataframes_final/"

### 1. EVENTS DATAFRAME

Going to read the events dataframe and work with it. 

In [4]:
with open("./dataframe/df_new_events","rb") as df_new_events:
    df_new_events=pickle.load(df_new_events)

In [5]:
df_new_events.iloc[0]

event_id                                                        157884
modified_ts                                       2021-01-13T18:46:26Z
created_ts                                        2007-12-06T17:18:12Z
name                                                             Väsen
sort_name                                                        Väsen
status                                                            live
id                                                              157884
schedules            [{'start_ts': '2018-04-26T20:00:00+01:00', 'en...
descriptions         [{'type': 'description.list.default', 'descrip...
website                                     http://www.twoforjoy.co.uk
tags                                                     [Folk, Music]
category                                                         Music
properties           {'list:website:comments-end-date': '2013-01-31...
ranking_level                                                        3
rankin

In [6]:
df_events=df_new_events[['event_id','id',  'created_ts', 'modified_ts','website', 'ranking_in_level', 'ranking_level', 'sort_name', 'status']]

**Important** Use this dataframe for reading the properties of events in your knowlege graph. Not the df_new_events

In [7]:
df_events.to_pickle(dataframes_final+"/df_events")

## 2. EVENTS_DESCRIPTION DATAFRAME

In [8]:
df_new_events['descriptions']

0       [{'type': 'description.list.default', 'descrip...
1       [{'type': 'description.list.default', 'descrip...
2       [{'type': 'description.list.default', 'descrip...
3       [{'type': 'description.list.default', 'descrip...
4       [{'type': 'description.list.default', 'descrip...
                              ...                        
2272    [{'type': 'description.list.default', 'descrip...
2276    [{'type': 'description.list.default', 'descrip...
2278    [{'type': 'description.official', 'description...
2279    [{'type': 'description.list.default', 'descrip...
2280    [{'type': 'description.list.default', 'descrip...
Name: descriptions, Length: 38700, dtype: object

In [9]:
df_e_desc=df_new_events[['event_id','descriptions']].explode('descriptions')
df_e_desc=pd.concat([df_e_desc.drop(['descriptions'], axis=1), df_e_desc['descriptions'].apply(pd.Series)], axis=1)
df_e_desc=df_e_desc.drop(0, axis=1)
df_e_desc

Unnamed: 0,event_id,description,type
0,157884,Swedish trio combining acoustic instrumentatio...,description.list.default
1,194419,Brilliant mix of English tradition and America...,description.list.default
1,194419,Nominated for Musician of the Year and for Bes...,description.official
2,240818,"Robbie Burns was funny, right? So toast the ba...",description.list.default
3,345866,"The Stand's spankingly good new talent night, ...",description.list.default
...,...,...,...
2276,1586592,Tour starting at Edinburgh's The Elephant Hous...,description.list.default
2278,1595055,"The tour will be led by Lisa Williams, directo...",description.official
2278,1595055,"The tour will be led by Lisa Williams, directo...",description.list.default
2279,1599103,"Pull on your wellies, wrap up warm and come pi...",description.list.default


In [10]:
#comment this line if you want to save the dataframe to file
df_e_desc.to_pickle(dataframes_final+"/df_event_description")

## 3. EVENTS_CATEGORY DATAFRAME

In [11]:
df_new_events['category']

0          Music
1          Music
2         Comedy
3         Comedy
4         Comedy
          ...   
2272       Sport
2276    Days out
2278    Days out
2279    Days out
2280    Days out
Name: category, Length: 38700, dtype: object

In [12]:
df_e_category=df_new_events[['event_id','category']]
df_e_category

Unnamed: 0,event_id,category
0,157884,Music
1,194419,Music
2,240818,Comedy
3,345866,Comedy
4,347164,Comedy
...,...,...
2272,1584208,Sport
2276,1586592,Days out
2278,1595055,Days out
2279,1599103,Days out


In [13]:
df_e_category.to_pickle(dataframes_final+"/df_event_category")

## 4. EVENTS_PROPERTIES DATAFRAME

In [14]:
df_e_prop=df_new_events[['event_id','properties']]
df_e_prop=pd.concat([df_e_prop.drop(['properties'], axis=1), df_e_prop['properties'].apply(pd.Series)], axis=1)
df_e_prop

Unnamed: 0,event_id,actor,actor:sample,affiliate:getmein,affiliate:seatwave,author,awards:fringe-sustainable-practice:2015,awards:fringe-sustainable-practice:2017,booking_essential,cast,...,list:website:comments-enabled,list:website:comments-end-date,list:website:company,list:website:hitlisted,list:website:list-of-sites,organisation,pa:rating,place:capacity:max,simpleview:original:categories,writer
0,157884,,,,,,,,False,,...,,2013-01-31 00:00:00,,,,,,,,
1,194419,,,,,,,,False,,...,,2020-01-28 05:01:07,,,,,,,,
2,240818,,,,,,,,False,,...,,,,,,,,,,
3,345866,,,,,,,,False,,...,,,,,,,,,,
4,347164,,,,,,,,False,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2272,1584208,,,,,,,,False,,...,,,,,,,,,,
2276,1586592,,,,,,,,False,,...,,,,,,,,,,
2278,1595055,,,,,,,,False,,...,,,,,,,,,,
2279,1599103,,,,,,,,False,,...,,,,,,,,,,


In [15]:
df_e_prop.to_pickle(dataframes_final+"/df_event_properties")

## 5. EVENTS_TAGS Dataframe

In [16]:
df_e_tags=df_new_events[['event_id','tags']].explode('tags')
df_e_tags

Unnamed: 0,event_id,tags
0,157884,Folk
0,157884,Music
1,194419,Blues
1,194419,Jazz
1,194419,Folk
...,...,...
2279,1599103,Activities
2279,1599103,Days out
2279,1599103,Food & Drink
2280,1603922,Days out


In [17]:
df_e_tags.to_pickle(dataframes_final+"/df_event_tags")

## 6 EVENTS_PHONE_NUMBER DATAFRAME

In [18]:
df_events_pn=df_new_events[['event_id', 'phone_numbers']]
df_events_pn=pd.concat([df_events_pn.drop(['phone_numbers'], axis=1), df_events_pn['phone_numbers'].apply(pd.Series)], axis=1)
df_events_pn=df_events_pn.drop(0, axis=1)
df_events_pn

Unnamed: 0,event_id,box_office,info
0,157884,,
1,194419,,
2,240818,,
3,345866,,
4,347164,,
...,...,...,...
2272,1584208,,
2276,1586592,,0131 555 5558
2278,1595055,,
2279,1599103,,07793 600 289


In [19]:
df_events_pn.to_pickle(dataframes_final+"/df_event_phonenumber")

## 7. Schedules Dataframe 

In [20]:
df_schedules=df_new_events[['event_id', 'schedules']].explode('schedules')
df_schedules_total=pd.concat([df_schedules.drop(['schedules'], axis=1), df_schedules['schedules'].apply(pd.Series)], axis=1)
df_schedules_total

Unnamed: 0,event_id,start_ts,end_ts,place_id,performances,performance_space,phone_numbers
0,157884,2018-04-26T20:00:00+01:00,2018-04-26T20:00:00+01:00,383,"[{'ts': '2018-04-26T20:00:00+01:00', 'duration...",,
1,194419,2018-03-10T19:30:00+00:00,2018-03-10T19:30:00+00:00,11092,"[{'ts': '2018-03-10T19:30:00+00:00', 'duration...",,
1,194419,2018-03-08T20:00:00+00:00,2018-03-08T20:00:00+00:00,11200,"[{'ts': '2018-03-08T20:00:00+00:00', 'links': ...",,
1,194419,2018-05-07T20:00:00+01:00,2018-05-07T20:00:00+01:00,386,"[{'ts': '2018-05-07T20:00:00+01:00', 'links': ...",,
2,240818,2018-01-24T20:30:00+00:00,2018-01-28T20:30:00+00:00,1,"[{'ts': '2018-01-24T20:30:00+00:00', 'links': ...",,
...,...,...,...,...,...,...,...
2272,1584208,2020-10-10T16:00:00+01:00,2020-10-10T16:00:00+01:00,127508,"[{'ts': '2020-10-10T16:00:00+01:00', 'duration...",,
2276,1586592,2020-08-12T08:00:00+01:00,2020-10-30T08:00:00+00:00,127571,"[{'ts': '2020-08-12T08:00:00+01:00', 'duration...",,
2278,1595055,2020-09-12T10:30:00+01:00,2020-10-24T10:30:00+01:00,127985,"[{'ts': '2020-09-12T10:30:00+01:00', 'duration...",,
2279,1599103,2020-10-16T09:00:00+01:00,2020-10-19T09:00:00+01:00,128231,"[{'ts': '2020-10-16T09:00:00+01:00', 'duration...",,


In [21]:
df_schedules=df_schedules_total[['event_id', 'start_ts', 'end_ts', 'place_id', 'performance_space']]
df_schedules

Unnamed: 0,event_id,start_ts,end_ts,place_id,performance_space
0,157884,2018-04-26T20:00:00+01:00,2018-04-26T20:00:00+01:00,383,
1,194419,2018-03-10T19:30:00+00:00,2018-03-10T19:30:00+00:00,11092,
1,194419,2018-03-08T20:00:00+00:00,2018-03-08T20:00:00+00:00,11200,
1,194419,2018-05-07T20:00:00+01:00,2018-05-07T20:00:00+01:00,386,
2,240818,2018-01-24T20:30:00+00:00,2018-01-28T20:30:00+00:00,1,
...,...,...,...,...,...
2272,1584208,2020-10-10T16:00:00+01:00,2020-10-10T16:00:00+01:00,127508,
2276,1586592,2020-08-12T08:00:00+01:00,2020-10-30T08:00:00+00:00,127571,
2278,1595055,2020-09-12T10:30:00+01:00,2020-10-24T10:30:00+01:00,127985,
2279,1599103,2020-10-16T09:00:00+01:00,2020-10-19T09:00:00+01:00,128231,


In [22]:
df_schedules.to_pickle(dataframes_final+"/df_schedules")

## 8. Schedules _TAGS DATAFRAME ??? 

NOTE: SCHEDULES DO NOT HAVE TAGS - so that part of the ontology is wrong - you should correct the ontology. 

## 8. Schedules_PhoneNumber Dataframe

In [23]:
df_schedules_pn=df_schedules_total[['event_id', 'start_ts', 'end_ts', 'phone_numbers']]
df_schedules_pn=pd.concat([df_schedules_pn.drop(['phone_numbers'], axis=1), df_schedules_pn['phone_numbers'].apply(pd.Series)], axis=1)
df_schedules_pn=df_schedules_pn.drop(0, axis=1)
df_schedules_pn

Unnamed: 0,event_id,start_ts,end_ts,info
0,157884,2018-04-26T20:00:00+01:00,2018-04-26T20:00:00+01:00,
1,194419,2018-03-10T19:30:00+00:00,2018-03-10T19:30:00+00:00,
1,194419,2018-03-08T20:00:00+00:00,2018-03-08T20:00:00+00:00,
1,194419,2018-05-07T20:00:00+01:00,2018-05-07T20:00:00+01:00,
2,240818,2018-01-24T20:30:00+00:00,2018-01-28T20:30:00+00:00,
...,...,...,...,...
2272,1584208,2020-10-10T16:00:00+01:00,2020-10-10T16:00:00+01:00,
2276,1586592,2020-08-12T08:00:00+01:00,2020-10-30T08:00:00+00:00,
2278,1595055,2020-09-12T10:30:00+01:00,2020-10-24T10:30:00+01:00,
2279,1599103,2020-10-16T09:00:00+01:00,2020-10-19T09:00:00+01:00,


In [24]:
df_schedules_pn.to_pickle(dataframes_final+"/df_schedules_phone_numbers")

## 9. Performances Dataframe

In [25]:
df_total_performances=df_schedules_total[['event_id', 'start_ts', 'end_ts', 'performances']].explode('performances')
df_total_performances=pd.concat([df_total_performances.drop(['performances'], axis=1), df_total_performances['performances'].apply(pd.Series)], axis=1)
#### NEW FOR DROPPING REPEATED PERFORMANCES
df_total_performances=df_total_performances.drop_duplicates(subset=['ts', 'start_ts', 'end_ts', 'event_id'], keep="first")
df_total_performances

Unnamed: 0,event_id,start_ts,end_ts,ts,duration,links,tickets,properties,descriptions,time_unknown
0,157884,2018-04-26T20:00:00+01:00,2018-04-26T20:00:00+01:00,2018-04-26T20:00:00+01:00,150.0,"[{'type': 'booking', 'url': 'http://www.theque...","[{'type': 'Standard', 'currency': 'GBP', 'min_...",,,
1,194419,2018-03-10T19:30:00+00:00,2018-03-10T19:30:00+00:00,2018-03-10T19:30:00+00:00,120.0,,"[{'type': 'Standard', 'currency': 'GBP', 'min_...",,,
1,194419,2018-03-08T20:00:00+00:00,2018-03-08T20:00:00+00:00,2018-03-08T20:00:00+00:00,,"[{'type': 'booking', 'url': 'https://www.ticke...","[{'type': 'Standard', 'currency': 'GBP', 'desc...",{'performance.sold-out': True},,
1,194419,2018-05-07T20:00:00+01:00,2018-05-07T20:00:00+01:00,2018-05-07T20:00:00+01:00,,"[{'type': 'booking', 'url': 'https://www.trave...","[{'type': 'Standard', 'currency': 'GBP', 'min_...",,,
2,240818,2018-01-24T20:30:00+00:00,2018-01-28T20:30:00+00:00,2018-01-24T20:30:00+00:00,,"[{'type': 'booking', 'url': 'http://www.thesta...","[{'type': 'Standard', 'currency': 'GBP', 'min_...",,"[{'type': 'list.description.default', 'descrip...",
...,...,...,...,...,...,...,...,...,...,...
2279,1599103,2020-10-16T09:00:00+01:00,2020-10-19T09:00:00+01:00,2020-10-16T09:00:00+01:00,480.0,"[{'type': 'booking', 'url': 'https://www.kildu...","[{'type': 'Standard', 'currency': 'GBP', 'min_...",,,
2279,1599103,2020-10-16T09:00:00+01:00,2020-10-19T09:00:00+01:00,2020-10-17T09:00:00+01:00,480.0,"[{'type': 'booking', 'url': 'https://www.kildu...","[{'type': 'Standard', 'currency': 'GBP', 'min_...",,,
2279,1599103,2020-10-16T09:00:00+01:00,2020-10-19T09:00:00+01:00,2020-10-18T09:00:00+01:00,480.0,"[{'type': 'booking', 'url': 'https://www.kildu...","[{'type': 'Standard', 'currency': 'GBP', 'min_...",,,
2279,1599103,2020-10-16T09:00:00+01:00,2020-10-19T09:00:00+01:00,2020-10-19T09:00:00+01:00,480.0,"[{'type': 'booking', 'url': 'https://www.kildu...","[{'type': 'Standard', 'currency': 'GBP', 'min_...",,,


In [26]:
df_performaces=df_total_performances[['event_id', 'start_ts', 'end_ts', 'ts', 'duration', 'time_unknown' ]]

In [27]:
#comment this line if you want to save the dataframe to file
df_performaces.to_pickle(dataframes_final+"/df_performances")

## 10. PERFORMANCES_PROPERTIES DATAFRAME


**IMPORTANT** WE HAVE REALISED THAT WE DONT LONGER NEED THESE THREE NODES: PROPERTYEVENTS, THEATHRE AND FILM
ALL the properties of these 3 nodes are now PERFORMANCE_PROPERTY. 

The ontology should be updated to reflect this

Note: The follow cell takes 2 or 3 mintues to run

In [28]:
df_p_prop_total=df_total_performances[['event_id', 'start_ts', 'end_ts', 'ts', 'properties']]
df_p_prop=pd.concat([df_p_prop_total.drop(['properties'], axis=1), df_p_prop_total['properties'].apply(pd.Series)], axis=1)
df_p_prop=df_p_prop.drop(0, axis=1)
df_p_prop

Unnamed: 0,event_id,start_ts,end_ts,ts,event.festival,event.film.3d,event.film.autism-friendly,event.film.imax,event.film.over-18s,event.film.parent-and-baby,...,event.film.senior,event.film.subtitled,event.minimum-age,event.session,event.support,event.theatre.bsl-interpreted,event.theatre.captioned,list.hitlisted,performance.cancelled,performance.sold-out
0,157884,2018-04-26T20:00:00+01:00,2018-04-26T20:00:00+01:00,2018-04-26T20:00:00+01:00,,,,,,,...,,,,,,,,,,
1,194419,2018-03-10T19:30:00+00:00,2018-03-10T19:30:00+00:00,2018-03-10T19:30:00+00:00,,,,,,,...,,,,,,,,,,
1,194419,2018-03-08T20:00:00+00:00,2018-03-08T20:00:00+00:00,2018-03-08T20:00:00+00:00,,,,,,,...,,,,,,,,,,True
1,194419,2018-05-07T20:00:00+01:00,2018-05-07T20:00:00+01:00,2018-05-07T20:00:00+01:00,,,,,,,...,,,,,,,,,,
2,240818,2018-01-24T20:30:00+00:00,2018-01-28T20:30:00+00:00,2018-01-24T20:30:00+00:00,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2279,1599103,2020-10-16T09:00:00+01:00,2020-10-19T09:00:00+01:00,2020-10-16T09:00:00+01:00,,,,,,,...,,,,,,,,,,
2279,1599103,2020-10-16T09:00:00+01:00,2020-10-19T09:00:00+01:00,2020-10-17T09:00:00+01:00,,,,,,,...,,,,,,,,,,
2279,1599103,2020-10-16T09:00:00+01:00,2020-10-19T09:00:00+01:00,2020-10-18T09:00:00+01:00,,,,,,,...,,,,,,,,,,
2279,1599103,2020-10-16T09:00:00+01:00,2020-10-19T09:00:00+01:00,2020-10-19T09:00:00+01:00,,,,,,,...,,,,,,,,,,


In [29]:
df_p_prop.to_pickle(dataframes_final+"/df_performances_properties")

## 11. PERFORMANCES DESCRIPTION

In [30]:
df_p_desc=df_total_performances[['event_id', 'start_ts', 'end_ts', 'ts','descriptions']].explode('descriptions')
df_p_desc=pd.concat([df_p_desc.drop(['descriptions'], axis=1), df_p_desc['descriptions'].apply(pd.Series)], axis=1)
df_p_desc=df_p_desc.drop(0, axis=1)
df_p_desc

Unnamed: 0,event_id,start_ts,end_ts,ts,description,type
0,157884,2018-04-26T20:00:00+01:00,2018-04-26T20:00:00+01:00,2018-04-26T20:00:00+01:00,,
1,194419,2018-03-10T19:30:00+00:00,2018-03-10T19:30:00+00:00,2018-03-10T19:30:00+00:00,,
1,194419,2018-03-08T20:00:00+00:00,2018-03-08T20:00:00+00:00,2018-03-08T20:00:00+00:00,,
1,194419,2018-05-07T20:00:00+01:00,2018-05-07T20:00:00+01:00,2018-05-07T20:00:00+01:00,,
2,240818,2018-01-24T20:30:00+00:00,2018-01-28T20:30:00+00:00,2018-01-24T20:30:00+00:00,"With Vladimir McTavish, Jim Smith, Wisarut Jan...",list.description.default
...,...,...,...,...,...,...
2279,1599103,2020-10-16T09:00:00+01:00,2020-10-19T09:00:00+01:00,2020-10-16T09:00:00+01:00,,
2279,1599103,2020-10-16T09:00:00+01:00,2020-10-19T09:00:00+01:00,2020-10-17T09:00:00+01:00,,
2279,1599103,2020-10-16T09:00:00+01:00,2020-10-19T09:00:00+01:00,2020-10-18T09:00:00+01:00,,
2279,1599103,2020-10-16T09:00:00+01:00,2020-10-19T09:00:00+01:00,2020-10-19T09:00:00+01:00,,


In [31]:
df_p_desc.to_pickle(dataframes_final+"/df_performances_descriptions")

## 12. PERFORMANCE LINKS

These lines takes 5 minutes to run

In [32]:
df_p_links=df_total_performances[['event_id', 'start_ts', 'end_ts', 'ts','links']].explode('links')
df_p_links=pd.concat([df_p_links.drop(['links'], axis=1), df_p_links['links'].apply(pd.Series)], axis=1)
df_p_links=df_p_links.drop(0, axis=1)
df_p_links

Unnamed: 0,event_id,start_ts,end_ts,ts,type,url
0,157884,2018-04-26T20:00:00+01:00,2018-04-26T20:00:00+01:00,2018-04-26T20:00:00+01:00,booking,http://www.thequeenshall.net/whats-on/shows/va...
1,194419,2018-03-10T19:30:00+00:00,2018-03-10T19:30:00+00:00,2018-03-10T19:30:00+00:00,,
1,194419,2018-03-08T20:00:00+00:00,2018-03-08T20:00:00+00:00,2018-03-08T20:00:00+00:00,booking,https://www.ticketsource.co.uk/booking/date/44...
1,194419,2018-05-07T20:00:00+01:00,2018-05-07T20:00:00+01:00,2018-05-07T20:00:00+01:00,booking,https://www.traverse.co.uk/whats-on/event-deta...
2,240818,2018-01-24T20:30:00+00:00,2018-01-28T20:30:00+00:00,2018-01-24T20:30:00+00:00,booking,http://www.thestand.co.uk/show/29395/burns_nig...
...,...,...,...,...,...,...
2279,1599103,2020-10-16T09:00:00+01:00,2020-10-19T09:00:00+01:00,2020-10-16T09:00:00+01:00,booking,https://www.kilduff.co.uk/patch/
2279,1599103,2020-10-16T09:00:00+01:00,2020-10-19T09:00:00+01:00,2020-10-17T09:00:00+01:00,booking,https://www.kilduff.co.uk/patch/
2279,1599103,2020-10-16T09:00:00+01:00,2020-10-19T09:00:00+01:00,2020-10-18T09:00:00+01:00,booking,https://www.kilduff.co.uk/patch/
2279,1599103,2020-10-16T09:00:00+01:00,2020-10-19T09:00:00+01:00,2020-10-19T09:00:00+01:00,booking,https://www.kilduff.co.uk/patch/


In [33]:
df_p_links.to_pickle(dataframes_final+"/df_performances_links")

## 13. TICKETS

This cell takes 5 minutes to run


In [34]:
df_tickets=df_total_performances[['event_id', 'start_ts', 'end_ts', 'ts','tickets']].explode('tickets')
df_tickets=pd.concat([df_tickets.drop(['tickets'], axis=1), df_tickets['tickets'].apply(pd.Series)], axis=1)
df_tickets=df_tickets.drop(0, axis=1)
df_tickets

Unnamed: 0,event_id,start_ts,end_ts,ts,currency,description,max_price,min_price,type
0,157884,2018-04-26T20:00:00+01:00,2018-04-26T20:00:00+01:00,2018-04-26T20:00:00+01:00,GBP,,,14.0,Standard
1,194419,2018-03-10T19:30:00+00:00,2018-03-10T19:30:00+00:00,2018-03-10T19:30:00+00:00,GBP,,,15.0,Standard
1,194419,2018-03-10T19:30:00+00:00,2018-03-10T19:30:00+00:00,2018-03-10T19:30:00+00:00,GBP,,,13.0,Concession
1,194419,2018-03-10T19:30:00+00:00,2018-03-10T19:30:00+00:00,2018-03-10T19:30:00+00:00,GBP,,,6.0,Children
1,194419,2018-03-08T20:00:00+00:00,2018-03-08T20:00:00+00:00,2018-03-08T20:00:00+00:00,GBP,tbc,,,Standard
...,...,...,...,...,...,...,...,...,...
2279,1599103,2020-10-16T09:00:00+01:00,2020-10-19T09:00:00+01:00,2020-10-16T09:00:00+01:00,GBP,,,1.0,Standard
2279,1599103,2020-10-16T09:00:00+01:00,2020-10-19T09:00:00+01:00,2020-10-17T09:00:00+01:00,GBP,,,1.0,Standard
2279,1599103,2020-10-16T09:00:00+01:00,2020-10-19T09:00:00+01:00,2020-10-18T09:00:00+01:00,GBP,,,1.0,Standard
2279,1599103,2020-10-16T09:00:00+01:00,2020-10-19T09:00:00+01:00,2020-10-19T09:00:00+01:00,GBP,,,1.0,Standard


In [35]:
# un-comment this line if you want to save the df_p_properties into a file
df_tickets.to_pickle(dataframes_final+"/df_tickets")

## 14. PLACES

In [36]:
with open("./dataframe/df_places","rb") as df_places:
    df_places_total=pickle.load(df_places)

In [37]:
df_places_total.iloc[0]

address                                               5 York Place
email                                         admin@thestand.co.uk
postal_code                                                EH1 3EB
properties       {'place.child-restrictions': True, 'place.faci...
sort_name                                                    Stand
town                                                     Edinburgh
website                                  http://www.thestand.co.uk
place_id                                                         1
modified_ts                                   2021-11-24T12:18:33Z
created_ts                                    2021-11-24T12:18:33Z
name                                                     The Stand
loc              {'latitude': '55.955806109395006', 'longitude'...
country_code                                                    GB
tags                 [Bar & pub food, Comedy, Restaurants, Venues]
descriptions     [{'type': 'description.list.default', 'descri

In [38]:
df_places=df_places_total[['place_id', 'created_ts', 'modified_ts', 'name', 'sort_name', 'address', 'town', 'postal_code', 'country_code', 'website', 'email', 'status']]
df_places

Unnamed: 0,place_id,created_ts,modified_ts,name,sort_name,address,town,postal_code,country_code,website,email,status
0,1,2021-11-24T12:18:33Z,2021-11-24T12:18:33Z,The Stand,Stand,5 York Place,Edinburgh,EH1 3EB,GB,http://www.thestand.co.uk,admin@thestand.co.uk,live
1,371,2019-12-04T13:27:26Z,2019-12-04T13:27:26Z,St Bride's Centre,St Bride's Centre,10 Orwell Terrace,Edinburgh,EH11 2DY,GB,http://stbrides.wordpress.com,,live
2,372,2021-02-23T16:57:44Z,2021-02-23T16:57:44Z,Institut Français d'Ecosse,Institut Français d'Ecosse,West Parliament Square,Edinburgh,EH1 1RN,GB,http://www.ifecosse.org.uk,ifecosse.edimbourg-cslt@diplomatie.gouv.fr,live
3,375,2015-02-18T15:59:38Z,2015-02-18T15:59:38Z,Meadowbank Sports Centre,Meadowbank Sports Centre,139 London Road,Edinburgh,EH7 6AE,GB,http://www.edinburghleisure.co.uk,,live
4,376,2020-01-27T10:18:15Z,2020-01-27T10:18:15Z,Royal Highland Centre,Royal Highland Centre,Ingliston,Edinburgh,EH28 8NB,GB,http://www.royalhighlandcentre.co.uk,,live
...,...,...,...,...,...,...,...,...,...,...,...,...
512,127508,2020-07-22T16:37:19Z,2020-07-22T16:37:19Z,Lochgelly Raceway,Lochgelly Raceway,A92,Lochgelly,KY5 9HG,GB,https://www.hardieracepromotions.co.uk/,,live
514,127985,2020-09-06T21:18:42Z,2020-09-06T21:18:42Z,Melville Monument,Melville Monument,42 St Andrew Square,Edinburgh,EH2 2AD,GB,,,live
515,128007,2020-09-08T17:49:40Z,2020-09-08T17:49:40Z,Edinburgh Technopole,Edinburgh Technopole,Milton Bridge,Edinburgh,EH26 0BB,GB,https://edinburghtechnopole.co.uk/,,live
516,128231,2020-09-22T17:58:11Z,2020-09-22T17:58:11Z,Kilduff Farm,Kilduff Farm,Kilduff Farm Drem,North Berwick,EH39 5BD,GB,,,live


In [39]:
df_places.to_pickle(dataframes_final+"/df_places")

## 15. PLACES PROPERTIES DATAFRAME

In [40]:
df_place_prop=df_places_total[['place_id','properties']]
df_place_prop=pd.concat([df_place_prop.drop(['properties'], axis=1), df_place_prop['properties'].apply(pd.Series)], axis=1)
df_place_prop.drop(0, axis=1)

Unnamed: 0,place_id,place.capacity.max,place.child-friendly,place.child-restrictions,place.facilities.dogs-allowed,place.facilities.free-wifi,place.facilities.guide-dogs,place.facilities.hearing-loop,place.facilities.parking,place.facilities.toilets,place.facilities.toilets.baby-changing,place.facilities.toilets_disabled,place.facilities.wheelchair-access
0,1,160,,True,False,True,,,True,True,,False,False
1,371,,,,,,,,,,,,
2,372,,,,,False,,,False,False,,False,True
3,375,16500.0,,,,,,,,,,,
4,376,35000,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
512,127508,,,,,,,,,,,,
514,127985,,,,,,,,,,,,
515,128007,,,,,,,,,,,,
516,128231,,,,,,,,,,,,


In [41]:

df_place_prop.to_pickle(dataframes_final+"/df_places_properties")

## 16. PLACE PHONE NUMBER

In [42]:
df_places_pn=df_places_total[['place_id','phone_numbers']]
df_places_pn=pd.concat([df_places_pn.drop(['phone_numbers'], axis=1), df_places_pn['phone_numbers'].apply(pd.Series)], axis=1)
df_places_pn=df_places_pn.drop(0, axis=1)
df_places_pn

Unnamed: 0,place_id,box_office,info
0,1,0131 558 7272,0131 558 7272
1,371,,0131 346 1405
2,372,,0131 285 6030
3,375,,0131 661 5351
4,376,0131 335 6200,
...,...,...,...
512,127508,,07584 837 445
514,127985,,
515,128007,,0131 445 8600
516,128231,,


In [43]:
df_places_pn.to_pickle(dataframes_final+"/df_places_pn")

## 17. PLACE LOCATION DATAFRAME

In [44]:
df_places_loc=df_places_total[['place_id','loc']]
df_places_loc=pd.concat([df_places_loc.drop(['loc'], axis=1), df_places_loc['loc'].apply(pd.Series)], axis=1)
df_places_loc=df_places_loc.drop(0, axis=1)
df_places_loc

Unnamed: 0,place_id,latitude,longitude
0,1,55.955806109395006,-3.1923184844646357
1,371,55.94255035,-3.22056693
2,372,55.94930633508542,-3.192111771011355
3,375,55.95640000,-3.15627000
4,376,55.94067800,-3.36880500
...,...,...,...
512,127508,56.12694697199462,-3.281544714233391
514,127985,55.95418700,-3.19310200
515,128007,55.85879500,-3.20775100
516,128231,55.98810000,-2.77096100


In [45]:

df_places_loc.to_pickle(dataframes_final+"/df_places_loc")

## 18. PLACES DESCRIPTION

In [46]:
df_places_desc=df_places_total[['place_id','descriptions']].explode('descriptions')
df_places_desc=pd.concat([df_places_desc.drop(['descriptions'], axis=1), df_places_desc['descriptions'].apply(pd.Series)], axis=1)
df_places_desc=df_places_desc.drop(0, axis=1)
df_places_desc

Unnamed: 0,place_id,description,type
0,1,Cheerful cavern with all the ingredients requi...,description.list.default
1,371,The St Brides Community Centre is a former chu...,description.list.default
2,372,The Institut Francais d'Ecosse in Edinburgh's ...,description.list.default
3,375,,
4,376,"A popular large-scale events venue, the Royal ...",description.list.default
...,...,...,...
512,127508,,
514,127985,,
515,128007,,
516,128231,,


In [47]:
# un-comment this line if you want to save the df_p_properties into a file
df_places_desc.to_pickle(dataframes_final+"/df_places_description")

## 19. PLACES TAGS

In [48]:
df_places_tags=df_places_total[['place_id','tags']].explode('tags')
df_places_tags

Unnamed: 0,place_id,tags
0,1,Bar & pub food
0,1,Comedy
0,1,Restaurants
0,1,Venues
1,371,Cinemas
...,...,...
515,128007,Business centre
516,128231,Farm
516,128231,Outdoors
517,128392,Pubs & bars


In [49]:
df_places_tags.to_pickle(dataframes_final+"/df_places_tags")