# Illegal STL Detection and Reporting

This notebook shows the process used to find illegal STLs in Galway City. Additional information is available for Galway County, Conamara, and the rest of Ireland.

Note: Airbnb blocks web scrapers from searching their site, so we rely on [Inside Airbnb](https://insideairbnb.com/get-the-data)'s data for Ireland. We can run Scrapy on the specific listing URLs but most of the information we would want is already provided in Inside Airbnb's listings.csv (which just needs to be unzipped from listings.csv.gz)

Questions
- We were focusing on listings for entire homes, but what about private/shared rooms in guest houses/ etc? Where the owner is letting all the individual rooms in an entire property?
- I haven't been able to find the 81 approved STL planning permissions, where did we get that information and can we get the list of permission reference IDs?
- Any proposals to get around "Exact location provided after booking"? (I'm wondering if we look at planning permissions for guest houses, etc that we can figure out if planning permission was obtained for development but not STL?)

In [56]:
import os
import pandas as pd
cwd = os.getcwd()
input_dir = cwd+"/inputs"
output_dir = cwd+"/outputs"

## 1. Getting Data

### Airbnb
We can download listings.csv.gz from [Inside Airbnb](https://insideairbnb.com/get-the-data), which includes over 80 fields of information.

In [57]:
# we load the data into a pandas data frame, and print the list of columns
# we won't be interested in all of the fields right now, but there is a lot to explore
df = pd.read_csv(input_dir+"/listings.csv")
df.columns

Index(['id', 'listing_url', 'scrape_id', 'last_searched', 'last_scraped',
       'source', 'name', 'description', 'neighborhood_overview', 'picture_url',
       'host_id', 'host_url', 'host_name', 'host_since', 'host_location',
       'host_about', 'host_response_time', 'host_response_rate',
       'host_acceptance_rate', 'host_is_superhost', 'host_thumbnail_url',
       'host_picture_url', 'host_neighbourhood', 'host_listings_count',
       'host_total_listings_count', 'host_verifications',
       'host_has_profile_pic', 'host_identity_verified', 'neighbourhood',
       'latitude', 'longitude', 'property_type', 'room_type', 'accommodates',
       'bathrooms', 'bathrooms_text', 'bedrooms', 'beds', 'amenities', 'price',
       'minimum_nights', 'maximum_nights', 'minimum_minimum_nights',
       'maximum_minimum_nights', 'minimum_maximum_nights',
       'maximum_maximum_nights', 'minimum_nights_avg_ntm',
       'maximum_nights_avg_ntm', 'calendar_updated', 'has_availability',
       'ava

In [58]:
# now we will filter our data frame (which covers all of Ireland) to focus on Galway
# we can do this by filtering by the field 'region_parent_name' to get all of Galway County's listings
galway_county_df = df[df['region_parent_name'].str.contains("Galway")] #3123 listings
# by the field 'region_name' for Conamara
conamara_df = df[df['region_name'].str.contains("Conamara")] #1228 listings
# or by the field 'region_name' for Galway City, which we'll focus on
galway_city_df = df[df['region_name'].str.contains("Galway")] #1119 listings

In [4]:
# now we'll look at all of the property types contained in our dataset
# and decide which ones we want to filter for STLs
set(list(galway_city_df["property_type"]))

{'Boat',
 'Castle',
 'Entire bungalow',
 'Entire cabin',
 'Entire condo',
 'Entire cottage',
 'Entire guest suite',
 'Entire guesthouse',
 'Entire home',
 'Entire loft',
 'Entire place',
 'Entire rental unit',
 'Entire serviced apartment',
 'Entire townhouse',
 'Entire villa',
 'Houseboat',
 'Private room',
 'Private room in bed and breakfast',
 'Private room in bungalow',
 'Private room in condo',
 'Private room in guest suite',
 'Private room in guesthouse',
 'Private room in home',
 'Private room in hostel',
 'Private room in rental unit',
 'Private room in serviced apartment',
 'Private room in tiny home',
 'Private room in townhouse',
 'Room in aparthotel',
 'Room in bed and breakfast',
 'Room in boutique hotel',
 'Room in hostel',
 'Room in hotel',
 'Shared room in hostel',
 'Shared room in hotel',
 'Tiny home'}

In [5]:
# I decided to go with anything that includes "Entire", as well as "Tiny home,"
# "Private room in guest suite", or "Private room in guesthouse"
# But we can absolutely add or change these selections if necessary
gal_entire = galway_city_df[galway_city_df["property_type"].str.contains("Entire")]
gal_othertypes = galway_city_df[galway_city_df["property_type"].isin(["Tiny home", "Private room in guest suite", "Private room in guesthouse"])]
gal_df = pd.concat([gal_entire, gal_othertypes]) # 742 listings

Now maybe we want to get rid of some of the columns we aren't interested in this time around, just to make things look a bit cleaner for us.

In [6]:
desired_columns = ['id', 'listing_url', 'scrape_id', 'last_searched', 'last_scraped',
       'source', 'name', 'description', 'host_id', 'host_url', 'host_name', 'host_since', 'host_location',
       'host_neighbourhood', 'host_listings_count','host_total_listings_count', 'neighbourhood',
       'latitude', 'longitude', 'property_type', 'room_type', 'accommodates',
       'bathrooms', 'bedrooms', 'beds', 'price', 'estimated_occupancy_l365d',
       'estimated_revenue_l365d','calculated_host_listings_count',
       'calculated_host_listings_count_entire_homes',
       'calculated_host_listings_count_private_rooms',
       'calculated_host_listings_count_shared_rooms', 'region_id',
       'region_name', 'region_parent_id', 'region_parent_name',
       'region_parent_parent_id', 'region_parent_parent_name'
       ]
df = gal_df.filter(desired_columns, axis=1)

Now we have our final Airbnb dataset, filtered to focus on Galway City and potential STL property types. We can save it as a csv or excel file now, but in the next section we will work on making a map of this data.

In [None]:
#df.to_csv(output_dir+"/airbnb_filtered_260925.csv")
df.to_excel(output_dir+'/airbnb_filtered_260925.xlsx')

Ok now let's do the same for Conamara.

In [60]:
set(list(conamara_df["property_type"]))

{'Barn',
 'Camper/RV',
 'Campsite',
 'Entire bungalow',
 'Entire cabin',
 'Entire chalet',
 'Entire condo',
 'Entire cottage',
 'Entire guest suite',
 'Entire guesthouse',
 'Entire home',
 'Entire loft',
 'Entire place',
 'Entire rental unit',
 'Entire townhouse',
 'Entire vacation home',
 'Entire villa',
 'Farm stay',
 'Hut',
 'Private room',
 'Private room in bed and breakfast',
 'Private room in bungalow',
 'Private room in casa particular',
 'Private room in castle',
 'Private room in condo',
 'Private room in cottage',
 'Private room in farm stay',
 'Private room in guest suite',
 'Private room in guesthouse',
 'Private room in home',
 'Private room in hostel',
 'Private room in hut',
 'Private room in loft',
 'Private room in nature lodge',
 'Private room in rental unit',
 'Private room in tent',
 'Private room in townhouse',
 'Room in aparthotel',
 'Room in bed and breakfast',
 'Room in boutique hotel',
 'Room in hotel',
 'Shared room in guesthouse',
 'Shared room in hostel',
 '

In [62]:
# I decided to go with anything that includes "Entire", as well as "Tiny home,"
# "Shipping container", "Shepherd’s hut", "Hut"
# "Private room in guest suite", or "Private room in guesthouse"
con_entire = conamara_df[conamara_df["property_type"].str.contains("Entire")]
con_othertypes = conamara_df[conamara_df["property_type"].isin(["Tiny home", "Shipping container", "Shepherd’s hut", "Hut", "Private room in guest suite", "Private room in guesthouse"])]
con_df = pd.concat([con_entire, con_othertypes]) # 930 listings
con_df_flt = con_df.filter(desired_columns, axis=1)
#con_df_flt.to_csv(output_dir+"/airbnb_connemara_270925.csv")
con_df_flt.to_excel(output_dir+'/airbnb_connemara_270925.xlsx')

### Booking.com
On the one hand, it's nice that we can use Scrapy to crawl booking.com for listings, but on the other hand, it means we need to do a bit more work to get the information.

In [None]:
# will include information about the strucutre of the scrapy spider and how to run it

## 2. Geospatial Data
So, in order to automatically evaluate if a listing has corresponding planning permission, we're going to do some geospatial calculations. First, we're going to download the PACE_Planning_Sites_With_Info shapefile from the [City Council Planning Map on ArcGIS](https://experience.arcgis.com/experience/4878ca4a845945db8b3c1af302acbebf). Then, we're going to convert our tables of listings into what's called a point shapefile. Then, we're going to see if the two overlap. 

It may also be helpful to view the files on free GIS software like QGIS.

### Expedia?

### Listing Table to Shapefile

In [14]:
import geopandas as gp
gdf = gp.GeoDataFrame(
    df, geometry=gp.points_from_xy(df.longitude, df.latitude, crs="EPSG:4326"))

In [63]:
cnm_gdf = gp.GeoDataFrame(
    con_df_flt, geometry=gp.points_from_xy(con_df_flt.longitude, con_df_flt.latitude, crs="EPSG:4326")
)

Maybe we update the column names ourselves too?

In [15]:
mapper = {
    'listing_url':'list_url', 
    'last_searched':"srch_date", 
    'last_scraped':"scrpe_date",
    'description':"descrpt", 
    'host_location':"host_loc",
    'host_neighbourhood':"host_nbhd", 
    'host_listings_count':"hst_lcount",
    'host_total_listings_count':"hst_t_lcnt", 
    'neighbourhood':"nbhd",
    'property_type':"prop_type", 
    'accommodates':"max_guests",
    'estimated_occupancy_l365d':"est_occ_yr",
    'estimated_revenue_l365d': "est_rev_yr",
    'calculated_host_listings_count':"htlc",
    'calculated_host_listings_count_entire_homes':"htlc_eh",
    'calculated_host_listings_count_private_rooms':"htlc_pr",
    'calculated_host_listings_count_shared_rooms':"htlc_sr",
    'region_name':"reg_name", 
    'region_parent_id':"reg_pid", 
    'region_parent_name':"reg_pname",
    'region_parent_parent_id':"reg_ppid", 
    'region_parent_parent_name':"reg_ppname"
}
gdf.rename(mapper, axis=1, inplace=True)

In [12]:
gdf.to_file(output_dir+"/shapefiles/airbnbs.shp")

  ogr_write(


In [64]:
cnm_gdf.rename(mapper, axis=1, inplace=True)
cnm_gdf.to_file(output_dir+"/shapefiles/cnmra_airbnbs.shp")

  ogr_write(


### Galway City Council Planning Permission Map

We gather our data from the [ArcGIS Experience Map Site](https://experience.arcgis.com/experience/4878ca4a845945db8b3c1af302acbebf), downloading the Shapefile of Planning Applications (Last 10 Years)

In [16]:
gcc_planmap_orig_addr = "C:\\Users\\Ales\\Documents\\galway planning permission map\\PACE_Planning_Sites_With_Info_-8595002616335958008"
gp_og = gp.read_file(gcc_planmap_orig_addr)

  return ogr_read(


contains polygon(s) with rings with invalid winding order
shapefile should be corrected using ogr2ogr

In [None]:
gp_og #18822 rows on 

Unnamed: 0,ReferenceN,Applicatio,Applicati,DateReceiv,DecisionDu,YearReceiv,EPlanInfo,IDCount,Developm00,Developmen,MergeKey,geometry
0,00557,APPLICATION FINALISED,PERMISSION,25/07/2000,24/09/2000,2000,https://www.eplanning.ie/GalwayCity/AppFileRef...,1000,Permission to extend residence,"1, Claremont Park, Circul ar Road, Galway.",00557,"POLYGON ((-1011338.509 7034375.363, -1011338.5..."
1,00558,APPLICATION FINALISED,RETENTION,25/07/2000,24/09/2000,2000,https://www.eplanning.ie/GalwayCity/AppFileRef...,1001,Permission for retention of 1 no. fascia sign ...,"Aldi Stores, Westside Ret ail Park, Galway.",00558,"POLYGON ((-1009895.973 7034369.625, -1009896.0..."
2,00559,APPLICATION FINALISED,PERMISSION,25/07/2000,07/11/2000,2000,https://www.eplanning.ie/GalwayCity/AppFileRef...,1002,Permission to 1. Demolish existing dwellinghou...,"Ballinfoile, Galway.",00559,"POLYGON ((-1005946.79 7037466.046, -1005946.84..."
3,0056,APPLICATION FINALISED,OUTLINE PERMISSION,10/02/2000,09/04/2000,2000,https://www.eplanning.ie/GalwayCity/AppFileRef...,1003,"Outline permission for dwellinghouse, septic t...","Ballagh, Galway.",0056,"POLYGON ((-1012858.398 7038386.894, -1012892.2..."
4,00560,APPLICATION FINALISED,PERMISSION,26/07/2000,25/09/2000,2000,https://www.eplanning.ie/GalwayCity/AppFileRef...,1004,Permission for the erection of a single storey...,University College Hospit al Galway.,00560,"POLYGON ((-1009145.19 7034206.53, -1009145.248..."
...,...,...,...,...,...,...,...,...,...,...,...,...
18817,2560272,NEW APPLICATION,RETENTION,01/09/2025,26/10/2025,2025,https://www.eplanning.ie/GalwayCity/AppFileRef...,19817,Permission for development which consists of: ...,Galway Harbour Enterprise Park New Docks Galwa...,2560272,"POLYGON ((-1006522.152 7033294.362, -1006522.5..."
18818,2560275,NEW APPLICATION,PERMISSION,03/09/2025,28/10/2025,2025,https://www.eplanning.ie/GalwayCity/AppFileRef...,19818,Permission for development which consists of: ...,No 15 Gleann Na Coille Barna Road Galway H91FY5V,2560275,"POLYGON ((-1016247.689 7031434.843, -1016209.8..."
18819,2560257,NEW APPLICATION,RETENTION,21/08/2025,15/10/2025,2025,https://www.eplanning.ie/GalwayCity/AppFileRef...,19819,Permission for development which consists of: ...,35 Árd Na Mara Salthill Galway H91 HPK8,2560257,"POLYGON ((-1010945.651 7031687.175, -1010945.4..."
18820,2560091,APPLICATION FINALISED,PERMISSION,27/03/2025,21/05/2025,2025,https://www.eplanning.ie/GalwayCity/AppFileRef...,19820,Permission for development which consists of p...,"Coláiste Éinde, Threadneedle Road, Salthill, G...",2560091,"POLYGON ((-1012128.906 7031764.292, -1011976.7..."


In [None]:
pp_stl = gp_og[gp_og["Developm00"].str.contains("short term let")]
#pp_s_t = gal_entire = gp_og[gp_og["Developm00"].str.contains("short-term")] #0
#pp_st2 = gp_og[gp_og["Developm00"].str.contains("Short Term")] #0
#pp_st3 = gp_og[gp_og["Developm00"].str.contains("Short-term")] #0
#pp_stl = gp_og[gp_og["Developm00"].str.contains("STL")] #0
pp_cou = gp_og[gp_og["Developm00"].str.contains("change of use")] #1461 many irrelev
len(pp_stl), len(pp_cou)

(17, 1461)

So, this is way fewer than the 81 mentioned [in this article](https://catuireland.org/airbnb/2025/04/30/how-to-report-illegal-short-term-lets/)-- was that number about Galway County? Where did we get that number? *Can we have the permission reference numbers*?

When searching the [Galway City Planning site](https://www.eplanning.ie/GalwayCity/searchresults) for "short term let", there are only 22 applications, some refused or invalid.

In [None]:
after_cou = []
for description in pp_cou["Developm00"]:
    after_cou.append(description[description.rfind("to"):])
from collections import Counter
options_c = Counter(after_cou)
options_c

In [None]:
### VALID we tolerate for now
#'to short term let for a period not exceeding 90 days per calendar year'
### RED FLAG if nearby listings are "Exact location provided after booking"
#'bedsit'
#'granny flat'
#'apartment'
#'guesthouse'
#'guest house'
#'self-contained apartment'
### YES we're looking at 
#'residential apartment'
#'student accommodation'
#'living accommodation'
#'guest bedroom'
#'guest room'
#'bedroom'

In [65]:
gael_file = "C:\\Users\\Ales\\Documents\\galway planning permission map\\gaeltacht"
gael_gdf = gp.read_file(gael_file)
gael_df = pd.DataFrame(gael_gdf.drop(columns='geometry'))
gael_df.to_excel(output_dir+'/airbnb_gaeltct_270925.xlsx')

Now first examine in qgis, both this layer and the county council layers