# Analysis of procurements

The purpose of this work is to analyze Ukraine procurements data.

**About the dataset**

I will be using **Competitive_procurements** dataset. It includes all competitive procurements and all participants of competitive procurements in 2015-2019. Each row of the dataset is related to one procurement and one participant. Thus one procurement is related to multiple rows if there were more than one company competing to become a supplier.

In this work I will try to answer 3 queastions Oleksa Stepaniuk had asked in Research ideas ([link](https://www.kaggle.com/oleksastepaniuk/prozorro-public-procurement-dataset)).

At first, lets see what we are working with:

In [None]:
import pandas as pd
PATH = '../input/prozorro-public-procurement-dataset/Competitive_procurements.csv'
import warnings
warnings.filterwarnings('ignore')

In [None]:
procs = pd.read_csv(PATH)
procs.head()

Here we can see a lot with 5 participants. Last column **supplier_dummy** indicates if participant had won this lot.

# Competition and savings

First queastion we will try to answer is:

**Is there an evidence of increased competition? Does increased competition always lead to the increase of public savings?**

Lets build a plot to see the number of lots in each month. We will create new column in format yyyy-mm to use it as X axis in plots:

In [None]:
procs['lot_announce_month'] = procs.apply(lambda row: row.lot_announce_date[:7], axis = 1) 

In [None]:
lots_per_month = procs.groupby(['lot_announce_month', 'lot_id']).size().groupby('lot_announce_month').count().reset_index()
lots_per_month.columns = ['lot_announce_month', 'number_of_lots']

In [None]:
import matplotlib.pyplot as plt
import numpy as np
%matplotlib notebook

plt.figure(figsize=(12,5))
plt.plot(lots_per_month.lot_announce_month, lots_per_month.number_of_lots)
plt.xticks(['2015-02','2015-12','2016-12',
            '2017-12','2018-12', '2019-12'], 
           ['2015-02','15-12','16-12',
            '17-12','18-12','2019-12'])
plt.yticks([30000, 25000, 20000, 15000, 10000, 5000, 0],
           ['30 000', '25 000', '20 000', '15 000', '10 000', '5 000', 0])
plt.xlabel('month')
plt.ylabel('number of lots')
plt.title('Number of lots')
plt.grid(True)
plt.show()

From the plot we can see, that lot count reached peak at the end of the 2016 and then started to gradually decrease up to 2020. Also lot count is close to periodical in last 3 years - it has peaks at the beginning and end of each year.

Now lets see whether competition is growing. We will plot the avarage number of participants per lot in each month:

In [None]:
supps_per_lot = procs.groupby(['lot_announce_month', 'lot_id']).size().groupby('lot_announce_month').agg('mean').reset_index()
supps_per_lot.columns = ['lot_announce_month', 'supps_per_lot']

In [None]:
plt.figure(figsize=(12,5))
plt.plot(supps_per_lot.lot_announce_month, supps_per_lot.supps_per_lot, color='#4f5c8a')
plt.xticks(['2015-02', '2015-12', '2016-12', '2017-12', '2018-12', '2019-12'], 
           ['2015-02', '15-12', '16-12', '17-12', '18-12', '2019-12'])
plt.xlabel('month')
plt.ylabel('supps per lot')
plt.title('Suppliers per lot')
plt.grid(True)
plt.show()

We see, that competition was highest at the start, when lot count was not very high yet. In 2019 suppliers per lot reached year maximum at 2.5

I would say that suppliers per lot is pretty much consistent for last 3 years.

Lets see how total sales and savings are changing

In [None]:
sales = procs[procs.supplier_dummy == 1].groupby('lot_announce_month').agg({'lot_final_value':'sum'}).reset_index()
sales.columns = ['lot_announce_month', 'sales']

We will create new column for ease of use:

In [None]:
procs['savings'] = procs.apply(lambda row: row.lot_initial_value - row.lot_final_value, axis = 1)

In [None]:
savings = procs[procs.supplier_dummy == 1].groupby('lot_announce_month').agg({'savings':'sum'}).reset_index()

In [None]:
plt.figure(figsize=(12,5))
plt.plot(sales.lot_announce_month, sales.sales, color='#395838')
plt.plot(savings.lot_announce_month, savings.savings, color='#4f8a56')
plt.xticks(['2015-02', '2015-12', '2017-02', '2017-12', '2018-11', '2019-12'], 
           ['2015-02', '15-12', '17-02', '17-12', '18-11', '2019-12'])
plt.yticks([50*10**9, 40*10**9, 30*10**9, 20*10**9, 10*10**9, 0],
           ['50B', '40B', '30B', '20B', '10B', '0'])
plt.xlabel('month')
plt.ylabel('UAH')
plt.title('Sales and Savings')
plt.legend(['sales', 'savings'])
plt.grid(True)
plt.show()

At 2018-11 were maximum sales at more than 50B UAH. (UAH (hryvnya) is ukrainian currency, 27 UAH = 1 USD)

By the end of 2019 sales have dropped to less than 10B.

Lets also plot savings on separate plot:

In [None]:
plt.figure(figsize=(12,5))
plt.plot(savings.lot_announce_month, savings.savings, color='#4f8a56')
plt.xticks(['2015-02', '2015-12', '2017-02', '2017-12', '2018-11', '2019-12'], 
           ['2015-02', '15-12', '17-02', '17-12', '18-11', '2019-12'])
plt.yticks([5*10**9, 4*10**9, 3*10**9, 2*10**9, 10**9, 0],['5B', '4B', '3B', '2B', '1B', '0'])
plt.xlabel('month')
plt.ylabel('savings (UAH)')
plt.title('Savings')
plt.grid(True)
plt.show()

We see, that savings have a trend to gradually grow within last 3 years with some peaks every winter.

So, we cant see any evidence of increased competition. Also, sales have dropped a lot in 2019 and reached 2016-level.

# Companies

**Which companies dominate the market? Can any procurement markets be classified as monopoly / oligopoly?**

Lets see a list of largest companies:

In [None]:
times_participated = procs.groupby(['participant_name']).filter(lambda row: row['supplier_dummy'].sum() > 0).groupby(['participant_name']).size().reset_index()
times_participated.columns = ['participant_name', 'times_participated']

In [None]:
times_won = procs[procs.supplier_dummy == 1].groupby(['participant_name']).size().reset_index()
times_won.columns = ['participant_name', 'times_won']

In [None]:
total_sales = procs[procs.supplier_dummy == 1].groupby(['participant_name']).agg({'lot_final_value':'sum'}).reset_index()
total_sales.columns = ['participant_name', 'total_sales']

Here we can see 20 largest companies ordered by total sales:

In [None]:
pd.options.display.float_format = '{:,.0f}'.format
pd.options.display.max_colwidth = 80
top_suppliers = pd.DataFrame({'participant_name':times_participated.participant_name,
                                 'times_participated':times_participated.times_participated,
                                 'times_won':times_won.times_won,
                                 'total_sales':total_sales.total_sales})

top_suppliers.sort_values('total_sales', ascending=False).head(20)

First company is the main fuel company in Ukraine, it has almost 38B in sales for last 4 years. Third one is interesting: I dont really know what DEIC is, it had participated only once, but with 19B sale.

Now lets make a list to see the largest companies in each market.

For this, we will create new column:

In [None]:
procs['market'] = procs.apply(lambda row: row.lot_cpv_2_digs[11:], axis = 1)

In [None]:
top_suppliers_by_market = procs[procs.supplier_dummy == 1].groupby(['market','participant_name']).agg({'lot_final_value':'sum'}).reset_index()
top_suppliers_by_market.columns = ['market','participant_name','total_sales']

In [None]:
market_total_sales = top_suppliers_by_market.groupby(['market']).agg({'total_sales':'sum'}).reset_index()

In [None]:

top_suppliers_by_market = top_suppliers_by_market.groupby(['market']).apply(lambda row: row.nlargest(5,['total_sales'])).reset_index(drop=True)
top_suppliers_by_market['market_share'] = ''

for i in range(len(market_total_sales)):
    for j in range(5):
        top_suppliers_by_market.market_share[5*i+j] = top_suppliers_by_market.total_sales[5*i+j]/market_total_sales.total_sales[i]

def insert_row_(row_number, df, row_value): 
    df1 = df[0:row_number] 
    df2 = df[row_number:] 
    df1.loc[row_number]=row_value 
    df_result = pd.concat([df1, df2]) 
    df_result.index = [*range(df_result.shape[0])] 
    return df_result 

        
for i in range(len(top_suppliers_by_market)+46):
    if (i%6 == 0):
        top_suppliers_by_market = insert_row_(i,  top_suppliers_by_market, ['','','',''])

In [None]:
pd.set_option('display.max_rows', 500)
pd.options.display.max_colwidth = 100
pd.options.display.float_format = '{:,.2f}'.format

Here we see top 5 companies in each market with its market share, separated by a blank line:

In [None]:
top_suppliers_by_market

According to the ukrainian law, a monopoly is considered when the company occupies 35%. A situation in which the aggregate share of no more than three business entities in one market exceeds 50% and the aggregate share of no more than five business entities - 70% is also considered a monopoly.


1. We see that market **Financial and insurance services** has a company with 52% of market share; 
2. **Agricultural machinery** has 3 companies with total market share of 51%;
3. **Public utilities** has 3 companies with total market share of 54%;

# Markets

**How implementation of the new electronic system affected the public procurement market? Are the changes the same across different markets (e.g. markets of fuel procurement vs. procurement of professional services)?**

Lets see a list of markets with thier sales:

In [None]:
market_total_sales.sort_values('total_sales', ascending=False)

Now lets analize 3 markets in more detail:

In [None]:
construction_market = procs[procs.market.isin(['Construction work',
                                               'Construction structures and materials; auxiliary products to construction (except electric apparatus)',
                                               'Electrical machinery, apparatus, equipment and consumables; lighting',
                                               'Architectural, construction, engineering and inspection services'])]

fuel_market = procs[procs.market.isin(['Petroleum products, fuel, electricity and other sources of energy',
                                       'Services related to the oil and gas industry'])]

agricultural_market = procs[procs.market.isin(['Agricultural, farming, fishing, forestry and related products',
                                               'Agricultural, forestry, horticultural, aquacultural and apicultural services',
                                               'Agricultural machinery'])]

Although market selection for each one above may vary, but I think it will be representative enough.

In [None]:
construction_market_sales = construction_market[construction_market.supplier_dummy == 1].groupby('lot_announce_month').agg({'lot_final_value':'sum'}).reset_index()
construction_market_sales.columns = ['lot_announce_month', 'construction']
fuel_market_sales = fuel_market[fuel_market.supplier_dummy == 1].groupby('lot_announce_month').agg({'lot_final_value':'sum'}).reset_index()
fuel_market_sales.columns = ['lot_announce_month', 'fuel']
agricultural_market_sales = agricultural_market[agricultural_market.supplier_dummy == 1].groupby('lot_announce_month').agg({'lot_final_value':'sum'}).reset_index()
agricultural_market_sales.columns = ['lot_announce_month', 'agricultural']

In [None]:
plt.figure(figsize=(12,5))
plt.plot(construction_market_sales.lot_announce_month, construction_market_sales.construction, color='#ac0000')
plt.plot(fuel_market_sales.lot_announce_month, fuel_market_sales.fuel, color='#007dac')
plt.plot(agricultural_market_sales.lot_announce_month, agricultural_market_sales.agricultural, color='#007810')
plt.xticks(['2015-02', '2015-12', '2016-12', '2017-12', '2018-12', '2019-12'], 
           ['2015-02', '15-12', '16-12', '17-12', '18-12', '2019-12'])
plt.yticks([35*10**9, 30*10**9, 25*10**9, 20*10**9, 15*10**9, 10*10**9, 5*10**9, 0],
           ['35B', '30B', '25B', '20B', '15B', '10B', '5B', '0'])
plt.xlabel('month')
plt.ylabel('UAH')
plt.title('Sales of market')
plt.legend(['construction', 'fuel', 'agricultural'])
plt.grid(True)
plt.show()

In [None]:
construction_market_savings = construction_market[construction_market.supplier_dummy == 1].groupby('lot_announce_month').agg({'savings':'sum'}).reset_index()
construction_market_savings.columns = ['lot_announce_month', 'construction']
fuel_market_savings = fuel_market[fuel_market.supplier_dummy == 1].groupby('lot_announce_month').agg({'savings':'sum'}).reset_index()
fuel_market_savings.columns = ['lot_announce_month', 'fuel']
agricultural_market_savings = agricultural_market[agricultural_market.supplier_dummy == 1].groupby('lot_announce_month').agg({'savings':'sum'}).reset_index()
agricultural_market_savings.columns = ['lot_announce_month', 'agricultural']

In [None]:
plt.figure(figsize=(12,5))
plt.plot(construction_market_sales.lot_announce_month, construction_market_sales.construction, color='#ac0000')
plt.plot(construction_market_savings.lot_announce_month, construction_market_savings.construction, color='#ab4d4d')
plt.xticks(['2015-02', '2015-12', '2016-12', '2017-12', '2018-12', '2019-12'], 
           ['2015-02', '15-12', '16-12', '17-12', '18-12', '2019-12'])
plt.yticks([35*10**9, 30*10**9, 25*10**9, 20*10**9, 15*10**9, 10*10**9, 5*10**9, 0],
           ['35B', '30B', '25B', '20B', '15B', '10B', '5B', '0'])
plt.xlabel('month')
plt.ylabel('UAH')
plt.title('Sales and savings - Construction')
plt.legend(['sales', 'savings'])
plt.grid(True)
plt.show()

In [None]:
plt.figure(figsize=(12,5))
plt.plot(fuel_market_sales.lot_announce_month, fuel_market_sales.fuel, color='#007dac')
plt.plot(fuel_market_savings.lot_announce_month, fuel_market_savings.fuel, color='#4d86ab')
plt.xticks(['2015-12', '2016-12', '2017-12', '2018-12', '2019-12'], 
           ['15-12', '16-12', '17-12', '18-12', '2019-12'])
plt.yticks([ 20*10**9, 15*10**9, 10*10**9, 5*10**9, 0],
           ['20B','15B', '10B', '5B', '0'])
plt.xlabel('month')
plt.ylabel('UAH')
plt.title('Sales and savings - Fuel')
plt.legend(['sales', 'savings'])
plt.grid(True)
plt.show()

We see that savings in fuel market are quite high, lets also compare savings in the end.

In [None]:
plt.figure(figsize=(12,5))
plt.plot(agricultural_market_sales.lot_announce_month, agricultural_market_sales.agricultural, color='#007810')
plt.plot(agricultural_market_savings.lot_announce_month, agricultural_market_savings.agricultural, color='#3f7847')
plt.xticks(['2015-12', '2016-12', '2017-12', '2018-12', '2019-12'], 
           ['15-12', '16-12', '17-12', '18-12', '2019-12'])
plt.yticks([700*10**6, 550*10**6, 400*10**6, 250*10**6, 100*10**6, 0],
           ['700M', '550M', '400M','250M', '100M', '0'])
plt.xlabel('month')
plt.ylabel('UAH')
plt.title('Sales and savings - Agricultural')
plt.legend(['sales', 'savings'])
plt.grid(True)
plt.show()

On the plot below we can see part of savings in sales for each market:

In [None]:
plt.figure(figsize=(12,5))
plt.plot(construction_market_sales.lot_announce_month, 
         construction_market_savings.construction/construction_market_sales.construction, color='#ac0000')
plt.plot(fuel_market_sales.lot_announce_month, 
         fuel_market_savings.fuel/fuel_market_sales.fuel, color='#007dac')
plt.plot(agricultural_market_sales.lot_announce_month, 
         agricultural_market_savings.agricultural/agricultural_market_sales.agricultural, color='#007810')
plt.xticks(['2015-02', '2015-12', '2016-12', '2017-12', '2018-12', '2019-12'], 
           ['2015-02', '15-12', '16-12', '17-12', '18-12', '2019-12'])
plt.yticks([0.5, 0.4, 0.3, 0.2, 0.1, 0],
           ['50%', '40%', '30%', '20%', '10%', 0])
plt.xlabel('month')
plt.ylabel('percent')
plt.title('% of savings')
plt.legend(['construction', 'fuel', 'agricultural'])
plt.grid(True)
plt.show()

We see that savings in fuel market are the highest overall, reached over 40% in the end of last year and 20%+ and 30%+ within last 3 years. We see a peak in agricultural market in 2017 with over 40% savings. Also savings were high in all markets in 2016.

We have analyzed only 3 markets, there is still a lot of room for further exploration.