# A spatial study of InPost's deployment strategy of parcel pick-up points in relation to competitors 

**Authors**: Michał Woźniak (385190) & Michał Wrzesiński (385197)

# 1. Introduction

The e-commerce market has been growing rapidly over the past few years, and the Covid-19 pandemic has further enhanced this effect by making irreversible changes in consumer habits. Currently, one of the fastest growing e-commerce markets in Europe is Poland. Companies such as: Allegro, OLX, Amazon, Aliexpress have an average daily turnover of several million baskets. [PWC](https://www.pwc.pl/pl/media/2021-02-09-analiza-pwc-prognoza-rozwoju-rynku-ecommerce-w-polsce.html) estimated that in 2026 the gross value of the Polish e-commerce market will be at the level of 162 billion PLN. 

In order to handle such volume of parcels, well-functioning logistics companies are required to deliver the ordered parcel within a short time from the order. In Poland, there are several international logistic companies, e.g. DHL, DPD; Poczta Polska is an important local player; however, InPost enjoys the greatest recognition among customers. Its success has two sources: firstly, InPost has developed a network of parcel machines (Paczkomat) which have revolutionized the way we receive parcels; secondly, InPost has perfectly operationalized logistics thus is able to deliver parcels in a very short time.

One of the obvious elements of InPost's successful logistics is an appropriate deployment strategy for parcel pick-up points. This point determines how convenient the InPost service is for the end user. As we suppose, the models that determine the choice of a spatial point to place a Paczkomat are not ideal because they do not naturally take into account constraints such as the inability to lease a given point in space. Thus, departments responsible for the expansion of new pick-up points can often make decisions about placing a parcel machine in an intuitive way (for instance close to the shopping mall), without statistical basis (they probably have their own targets to achieve - to receive bonuses and they do not care about evaluation of their choice in the short/mid-term). In addition, recently, more and more competitors for InPost have appeared (adopting the same model of delivery into parcel machine), such as Ruch or even Poczta Polska. It seems to be logical that they possibly copy the deployment choices of InPost. Therefore, it seems very interesting to analyze how InPost's pick-up points deployment situation looks like compared to the competition. In addition, it is worth verifying whether the deployment determinants of InPost's Paczkomats are consistent with the literature. We are referring here to the [Morganti et al. (2014)](https://www.sciencedirect.com/science/article/pii/S2210539514000078) publication which suggests such indicators of pick-up point deployment: demographic indicators (population density, employment rate, computer ownership, Internet access), centers and nodes for city users ("parameters related to end-consumers' mobility and accessibility to socio-economic activities, in particular end-consumers' use of both public transport and private cars, and the density of retail outlets and commercial services, business and employment sites, cultural and leisure centers and public transportation nodes"), parcels flow within the network (mostly related to the transport system and users preferences).

The purpose of this project is to examine the deployment strategy of InPost parcel pick-up points compared to the competition in multivariate environment. Furthermore, we want to test whether the deployment determinants (herein referred to as control variables) indicated by the literature and our modeling intuition are relevant to InPost's strategy. 

We state following research questions:

1. Does InPost have parcel pick-up points deployed in line with the competition?
2. Do control variables affect the number of InPost pickup points in a given area?
3. Are spatial effects significant in multivariate econometric models?

To answer the above research questions, we conducted a full econometric analysis based primarily on spatial statistics and spatial models. In order to simplify the computational complexity of the task, we focused only on two cities in Poland: Warsaw and Cracow. These are two key cities from the point of view of InPost (Warsaw - the capital of Poland, Krakow - where InPost was established). Depending on the modeling approach we used different levels of data aggregation for Spatial dependence models - grid 1km x 1km, for Spatial drift models - point data. We are testing those approaches because in this problem, both Spatial Autocorrelation and Spatial Drift seem to be intuitive phenomena.

We present following table of content for this research:

1. Introduction
2. Data gathering
3. Dataset construction
4. Spatial visualizations
5. Spatial statistical explanatory analysis
6. Non-spatial statistical explanatory analysis
7. Spatial dependence models
8. Spatial drift models
9. Conclusions

# 2. Data gathering

This part is devoted to data collection process. As the output we obtain raw data which will be transformed to the final dataset in the 3. Dataset construction section. 

Generally, we devided data into 4 categories: 

* pick-up points data
* spatial shapes data
* demographic data
* points of interest data.

Pick-up points data comes from websites like: [Bliskapaczka.pl](https://bliskapaczka.pl) and [DHL](https://www.dhl.com/pl-pl/home.html?locale=true). 

Spatial shapes data comes from [GUGIK](https://gis-support.pl/baza-wiedzy-2/dane-do-pobrania/granice-administracyjne/) (head office of geodesy and cartography in Poland). 

Demographic data are taken from the [Inspire repository](https://geo.stat.gov.pl/inspire) and it represents indicators for 1km2 grids in Poland. 

Finally, we obtained points of interest data from [OSM](https://download.geofabrik.de/europe/poland.html) repository. This is amazing site which store snapshots of the OSM in shape files!!! 

## Import dependencies

In [None]:
from google_drive_downloader import GoogleDriveDownloader
import requests
import json
import numpy as np
import pandas as pd

%config Completer.use_jedi = False

## Utilities

We define some utilities for code reproducibility. 

In [None]:
def download_gd_data_from_dict(dictionary: dict):
    '''
    Download, save and unzip data from Google Drive
    '''
    for i,j in dictionary.items():
        GoogleDriveDownloader.download_file_from_google_drive(file_id=j,
                                            dest_path=f"../datasets/raw_data/{i}/{i}.zip",
                                            unzip=True,
                                            showsize=True,
                                            overwrite=False)

def download_json_data_from_url(name:str, url: str):
    '''
    Download and save data from JSON API outputs
    '''
    response = requests.get(url).text
    df = pd.DataFrame(json.loads(response))
    df.to_csv(f"../datasets/raw_data/{name}.csv")

## Download data collected and stored on our Google Drive

We decided to download data from GUIGK, OSM and Inspire (using links attached in the introduction to this stage of study) and store it on our academic Google Drive to obtain reproducibility in any time. Thanks to our functional utilities we can just pass direct link to the file and then download, store and unzip files with the data! Does data are not stored in our remote git repository due to their size, but thanks to Google Drive they are available for anyone!

You can also inspect the file via Browser just combine: https://drive.google.com/file/d/ + file_id, for instace: https://drive.google.com/file/d/1BZCmADIZhJuf1_Jh-f6D8vSpI8p5-2wd

### Source: GUIGK

In [None]:
gugik = {'guigk_voi':'1BZCmADIZhJuf1_Jh-f6D8vSpI8p5-2wd',
        'guigk_pov':'1wX99dmNUbiEKYKh-qAfxDipT9oC6DLzE',
        'guigk_com':'1URjb9NM6Fm_qES5kC4QPPXZGERzarUIa'}

download_gd_data_from_dict(gugik)

### Source: OSM

In [None]:
osm = {'osm_mazowieckie':'195E_n9JlgavFWp4mbaOCHAYKFWziBkc0',
        'osm_wielkopolskie':'1oik_ia4hFeG0zYPjzswCohlrzy70r3Yy',
        'osm_malopolskie':'1KG6uPhCZ-jKDgEpBU46WKHXVG_Mc-dBS'}

download_gd_data_from_dict(osm)

### Source: Inspire

In [None]:
inspire = {"inspire":"1avnBMziIn9uLetSbucMrZlZadhnSUvPE"}
download_gd_data_from_dict(inspire)

## Scrape pick-up points data from websites

We collect that about pick-up points from two website or to be more precise from their APIs. It is the smartest way to gather this data in seconds!

### Source: Bliskapaczka.pl

In [None]:
url = 'https://pos.bliskapaczka.pl/api/v1/pos?fields=operator%2Ccode%2Clatitude%2Clongitude%2Cbrand%2CbrandPretty%2CoperatorPretty%2Ccod%2Cavailable%2C+city%2C+street&operators=RUCH%2CINPOST%2CPOCZTA%2CDPD%2CUPS%2CFEDEX'
download_json_data_from_url("bliska_paczka", url)

### Source: DHL

In [None]:
url = 'https://parcelshop.dhl.pl/index.php/points?type=lm&country=pl&ptype=parcelShop&hours_from=10&hours_to=16&week_days_PON=T&week_days_WT=T&week_days_SR=T&week_days_CZW=T&week_days_PT=T&week_days_SOB=N&week_days_NIEDZ=N&options_pickup_cod=N&show_on_map_parcelshop=T&show_on_map_parcelstation=T&show_on_map_pok=T&tab=pickup'
download_json_data_from_url("dhl", url)