# Assessing  borrower welfare levels (rather than understanding borrowers and their poverty levels)
Poznan University of Technology:<br/>
Czesław Jędrzejek, academic teacher<br/>
Mateusz Norel, student

## I. Description of the method

### 1. Introduction
To start we revoke the objective of the Data Science for Good challenge:
>“Kiva.org is an online crowdfunding platform to extend financial services to poor and financially excluded people around the world. Kiva lenders have provided over $1 billion dollars in loans to over 2 million people. In order to set investment priorities, help inform lenders, and understand their target communities, knowing the level of poverty of each borrower is critical. However, this requires inference based on a limited set of information for each borrower.
In Kaggle Datasets' inaugural Data Science for Good challenge, Kiva is inviting the Kaggle community to help them build more localized models to estimate the poverty levels of residents in the regions where Kiva has active loans. Unlike traditional machine learning competitions with rigid evaluation criteria, participants will develop their own creative approaches to addressing the objective. <br/>
Kiva has provided a dataset of loans issued over the last two years, and participants are invited to use this data as well as source external public datasets to help Kiva build models for assessing borrower welfare levels. Participants will write kernels on this dataset to submit as solutions to this objective and five winners will be selected by Kiva judges at the close of the event. In addition, awards will be made to encourage public code and data sharing. With a stronger understanding of their borrowers and their poverty levels, Kiva will be able to better assess and maximize the impact of their work.”

The problem in this challenge is very different from the standard credit classification studies. For example, the original “German credit dataset” [1]  contains 1000 entries with 20 categorial/symbolic attributes prepared by Prof. Hofmann. In this dataset, each entry represents a person who takes a credit in a bank. Each person is classified as good or bad credit risk according to a set of attributes. The German credit dataset was the subject of Kaggle classification challenge [2]. Machine learning was applied by [3] to Lending Club microfinance data [4].<br/>

The Kiva provided data does not provide too much information about individual borrowers, and some information on intermediaries MFI. 

### 2. Main microfinance studies

The microfinance concept triad concerns savings, loans and investments.<br/>

Loans can be disbursed using  various microfinance methodologies – Group/Solidarity Lending, Village Banking and Individual Lending. The literature provides numerous studies on  the characteristics of microfinance clients and the impact of microfinance on the poor based on actual impact assessment results. A few literature works exist using randomised control trials (RCTs), pipeline designs, with/without comparisons (in panel or crosssection form), natural experiments and general purpose surveys. Contrary to earlier results the recent results coming from respected institutions rather claim that no clear evidence yet exists that microfinance programmes have positive impacts or lead to financial inclusion [5], [6], [7]. In India this led to Indian microfinance crisis [8]. The study on the benefits and costs of microfinance: evidence from Bangladesh [9] led to real battle between Duvendack & Palmer-Jones on one side and Chemin and Pitt on the other side  [10-14]. <br/>
In [9-14] the authors  focused on the intervention (e.g. provision of microcredit), the measurement of outcomes (e.g. income, expenditure, assets, health and education, empowerment, and so on) and contextual factors likely to affect differences in outcomes in different contexts, including other microfinance services. In addition, they considered heterogeneous categories of persons, and as well as the likely significance of factors which might obscured observed relationships.  

Anyway, in Bangladesh, where in 2001 approximately one out of four households had at least one microloan, microcredit seems to have had little impact on the country’s relative development performance. In 1991, for example, Bangladesh ranked 136th on the UN Development Programme’s Human Development Index (a measure of societal well-being); 15 years later it ranked 137th, despite the growth of 4,5% a year (compared to 2,5% a year of growth in Pakistan).

Work [9] also addressed the fact that most loans are simply used for consumption, which even The Consultative Group to Assist the Poor (CGAP ) recognizes in its attempts to redefine microfinance in terms of “financial inclusion”, ignoring the problem of these loans’ unsustainability. This is linked to the risk of overindebtedness and debt trappping [15].  Schicks [16] showed on the example of Ghana  that 30% of borrowers in a sample  urban African population of micro-borrowers are overindebted. 

We believe there exist a serious problem with Micro Finance Institutions  (MFI) in the microfinance model (Kiva included). Generally, despite higher cost of microlending loan interests are predatory 25% and much higher and there is little transparency. How come The default rate of loans in Kenya can be 2% whereas the World Bank Statistics for default bank loans are above 10%. How is it possible to repay a loan with 50% of interest? 

There are influential voices that the sector is healthy [25], and the MFI problems have bottomed out. The authors of this contribution hold no political views on this issue.

However, we propose here a simplified model that would by change of parameters alleviate some problems mentioned in [17].

### 3. Understanding Poverty levels of Borrowers
First, we invoke globally accepted welfare indices: HDI and MPI, 

A. The <b>Human Development Index (HDI)</b> is a composite statistic (composite index) of life expectancy, education, and per capita income indicators, which are used to rank countries into four tiers of human development. A country scores higher HDI when the lifespan
is higher, the education level is higher, and the GDP per capita is higher.
The Fig. below presents aggregation of subindices into the HDI index

![HDI](http://hdr.undp.org/sites/default/files/hdi.png)

B. <b>Multi-dimensional Poverty Index (MPI)</b> for different regions
*Multidimensional Poverty Index (MPI) is an international measure of acute poverty covering over 100 developing countries

<b>Formula :</b><br/>
The MPI is calculated as follows :
  MPI =  H * A
![MPI](http://hdr.undp.org/sites/default/files/mpi.png)

* H : Percentage of people who are MPI poor (incidence of poverty)<br/>
* A : Average intensity of MPI poverty across the poor 

### 4. The basic model of loan success
The model uses four factors addressed in the literature that effect <b>loan success for poor people</b><br/>

It is based on several assumptions: <br/>

Loan success probability depends on a borrower loan eligibility. The poorer a borrower is the more difficult is to pay a loan back.  There are no data in the Kiva set that would allow estimation of credit risk or whether a person qualifies for a credit. Such studies were performed elsewhere and are inconclusive [6-14]. Therefore, making plots of distribution of loans according to a sector, plotting HDI for countries in itself is totally unconstructive.



1. Loan success probability depends on HDI or Inverse-like function of  MPI. In [18] Planet Rating, a global rating agency specialized in microfinance uses Global Findex dataset [19] to use linear regression on frequency of loans a function of HDI. We disregard mixmarket database [20] which is self reported by MFI’s with little control. The Planet Rating Mimosa (now 2.0) tool calculates Microfinance Index of Market Outreach and Saturation.

2. Loan success probability depends on enabling environment for financial inclusion. Such data is provided by Global Microscope 2016 [11], by the Economist FDI unit. The overall score ranges from 0 to 1 (which indicates the best policy, and  regulatory and supervisory capacities for monitoring financial activities) 

3. In  [22] the authors study  global productivity surplus (GPS) that  is both related to the efficiency of MFIs and their social performance. The propensity for it depends on a size of MFI and if they are more socially-oriented, they would decrease client’s interest rates.  Based on this work we make the assumption of a logistic curve dependence of this relation f(MPI). 

Here we propose a concept of unnormalized  probability of repayment (PoR) as a function MFI, country, region, and borrower i. 

\begin{equation*}
POR(i, MFI, region(country)) = A*f(MFI)* GMOS * \frac{1}{MPI(region)+1}
\end{equation*}

Here A is a normalization factor = 1,814. Originally, GMOS ranges from 0 to 100, however in the equation maximum value of GMOS ranges between 0 and 1. It was calculated so that the highest loan score (that happened for MFI Interactuar from Colombia is 1.000). The lowest score for loans are for Haiti MFI, because all factors entering formula for PoR  are very low for this country.


\begin{equation*}
f(MFI) = \frac{1}{1+6*abs((interest/portfolio yield - 0.25))} * \frac{1}{1+e^{-a(x-x_{0})}}
\end{equation*}

The shape of sigmoid is selected with a=0.25 , x0=2 [USD million] 

Optimal interest/portfolio yield was selected as 25% (0.25)
GMOS = Global Microscope Overall score is estimated by The Economist Intelligence Unit. 

It is interesting to compare PoR for two largest Kiva’s MFI: 

<table>
    <tr>
        <th>MFI name</th>
        <th>Country</th>
        <th>GMOS</th>
        <th>Total Loans  million USD</th>
        <th>Region’s  MPI</th>
        <th>Average Cost to Borrower</th>
        <th>PoR</th>
        <th>Delinquency /Default Rate</th>
        <th>Profita-bility</th>
    </tr>
    <tr>
        <th>Urwego Bank, id=166</th>
        <th>Rwanda</th>
        <th>0.61</th>
        <th>19.9</th>
        <th>Karongi 0.266</th>
        <th>0.33</th>
        <th>61%</th>
        <th>7.95%/0.49%</th>
        <th>-8.5%</th>
    </tr>
    <tr>
        <th>Credi Campo id=199</th>
        <th>El Salvador</th>
        <th>0.56</th>
        <th>1.5</th>
        <th>Average 0.03</th>
        <th>0.32</th>
        <th>73%</th>
        <th>2.74%/1.29%</th>
        <th>3.9%</th>
    </tr>
</table>
Compared to the results for all Kiva MFI here loan interest instead of portfolio yield is taken and the most appropriate region was assigned (in the results for all Kiva MFI  MPI for the country was averaged over regions). In the Table the  average of MPI for several provinces where CrediCampo operates (much less poor) overweight other factors that are in favor of Urwego Bank.

It is striking that we do not use MFI ranking (also the Kiva one [23]; the mixmarket one is totally unreliable) nor reported default rate. The Kiva MFI ranking does not correlate well with GMOS although more work should be done. The default rate is reported suspiciously low by MFI (it is not controlled). They do it for publicity purposes.

Described in [24], the Rural Bank of Mabitac aimed to provide sustainable financial assistance to microentrepreneurs in the areas of Laguna and Quezon in the Philippines. The bank, which had over 4,000 microfinance clients, has a very high default rate of 71 percent (default defined as not completing full payment on or before the maturity date of loan).  This study took place from 2005 to 2006 in various locations across the island of Luzon in the Philippines. The bank was certified by mixmarket. For this bank in 2012 introduction of SMS text Messaging on Loan Repayment was very significant. Blockchain deployment could very significantly cut cost in near future. So that 25% interest or lower could become prevalent. 

With more data (complete data on individual borrowers) we could perform the machine learning comparison of theoretical PoR function with real scores.

We could consider in future optimization of both MFI and borrowers welfare. MFI employ a lot of people that add to GNP in their communities. It is to be stressed that lenders mostly get their money back and are generally satisfied with helping poor people [25].

## II. Data use
<br/>

In our computations we mostly used the Kiva provided data:
1. kiva_loans.csv 
2. kiva_mpi_region_locations.csv 
3. loan_themes_ids.csv
4. loan_themes_by_region.csv

The kiva_loans.csv file matches an MFI partner to the region he operates (not that the region information enters our PoR score through the MPI for the region. If a single region cannot be assigned we take the average of all region for the country (strictly we should weight by the population of a given region. A partner that serves many countries is not included in our data.
kiva_mpi_region_locations.csv provides the MPI data for a given region. loan_themes_ids.csv and loan_themes_by_region.csv files are not used because they provide limited data on a borrower that cannot be used in our PoR analytics.
We find the following externally obtained data very useful:
1. Global Monitoring (Global Microscope 2016 [11]), by the Economist FDI unit data on financial environment for poor countries. This is the latest data.
2. Partners.csv from  http://kivatools.com/downloads no 9 All Partner Data
3. We inspected Kiva ranking of 323 Field Partners , https://www.kiva.org/about/where-kiva-works, but eventually decided not to use this data
4. During analysis we used data from https://www.kiva.org/about/where-kiva-works/partners/n  (n from 1 to 568). The data was used in the Table (e.g. loan interest rate). In calculations we used portfolio yield from partners.csv

## III. Numerical results
In the beginning we present MFIs with best PoR score. Then we present results for all loans for which we were able to determine parameters entering PoR. We presents four results (Figures with captions are related to titles of given subsections). 
PoR vs X displays individual MFIs represented by points, e g.  PoR vs  MPI relation. The other parameters can be read from MFI id.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(color_codes=True)

### 1. 10 MFIs with best PoR scores
Table below presents TOP10 partners sorted by normalized PoR. 
*Interactuar* from Colombia achieved best score. Scores are normalized to 1.0.

In [None]:
por = pd.read_csv('../input/por-for-kiva-partners/por_for_kiva_partners.csv',usecols=['partner_id','name','portfolio_yield','total_amount_raised','countries','score','average_mpi','POR_norm'])
por = por[por.portfolio_yield>0]
por.index=range(1,por.shape[0]+1)
por[['name','countries','POR_norm']][:10]

El Salvador emerged 3 times in TOP10. We can assume that MFIs in this country are doing good work. By lending money there someone actually can make impact on poor people.<br/>
It is worth to mention that table contains results for 87 MFIs.

### 2. 10 MFIs with worst PoR scores
Here in the table we show the tail of the MFIs having the lowest  PoR scores. There are only 87 MFIs ranked. We removed from the computations MFIs with portfolio yield equal zero (we treat such cases as having missing values). These are not KivaZip cases.

In [None]:
por.index=range(1,por.shape[0]+1)
por[['name','countries','POR_norm']][-10:]

Haiti and Madagascar represent the worst measured level of PoR.<br/>
One could think that by lending money in regions with highest MPI one could  actually can make the impact on poor people. The issue is that we are not talking about help in the form of a hand-out. In case of loans a borrower should have a chance: to 1. repay it  2. make an investment that works. History of a south of Italy showed that just transferring money does not work.

### 3. List of countries for which PoR was calculated:

In [None]:
set(por.countries.unique())

### 4.  Normalized PoR score distribution among MFIs
Occurrences of specific scores were marked with dark blue lines on the horizontal axis. The light blue bars indicate a number of occurrences of the score ranges  in 15 bins. The  chart below provides a number of MFIs having scores as function of PoR presented on horizontal axis.

In [None]:
sns.distplot(por.POR_norm, bins=15,kde=False, rug=True);
plt.ylabel('Number of MFIs with speciffic PoR');
plt.xlabel('Normalized PoR score');

The curve is skewed in the direction of low PoR values.<br/><br/>
Further, we present results for all loans for which we were able to determine parameters entering PoR. We presents four results (Figures with captions related to titles of given subsections. 
PoR vs X displays MFIs represented by points, e g.  PoR vs  MPI relation. The other parameters can be read from a loan id.

### 5. Bivariate distribution of PoR score and MPI of MFI's country<br/>
In the Figure each dot represents a single MFI. 

In [None]:
tmp = por.copy()
tmp.columns = ['partner_id', 'name', 'Portfolio Yield', 'Total Amount Raised in USD million',
       'countries', 'score', "Average MPI in MFI's country", 'Normalized PoR']
sns.jointplot(x=tmp["Average MPI in MFI's country"],y=tmp['Normalized PoR']);

The Pearson coefficient indicates that there is a weak negative correlation between PoR and MPI. In countries with low MPI, the range of distribution is the highest. With growing MPI MFIs are more limited and distribution of scores decreases. The most of MFIs tend to achieve low scores.  In the graph we can see one MFI (CAURIE Microfinance) which achieved PoR score over 0.4 even with difficult circumstances (MPI of a region exceeds 0.3). 

### 6. Bivariate distribution of PoR score and GMOS score for MFI's country
In the Figure each dot represents a single MFI. 

In [None]:
gmos = pd.read_csv('../input/gmos-data/gmos.csv')
gmos_score = []
tmp['GMOS score for country'] = 0.
for i in tmp.index:
    tmp.loc[i,'GMOS score for country'] = gmos[gmos['Overall score']==tmp.loc[i,'countries']]['2016'].mean()
    gmos_score.append(gmos[gmos['Overall score']==tmp.loc[i,'countries']]['2016'].mean())
sns.jointplot(x=tmp['GMOS score for country'], y=tmp['Normalized PoR']);

There are few countries with high (Colombia, El Salvador) and few with really low GMOS score (Haiti, Madagaskar).   The average score is equal to 50.42, and the median is  51.  Our formula for PoR is linear with GMOS, which is probably too strong a dependence (High Pearson r value). Possibly GMOSα, where  α= ½ or α=1/3 would be more appropriate.

In [None]:
print(tmp['GMOS score for country'].mean())
print(tmp['GMOS score for country'].median())

### 7. Bivariate distribution of PoR score and MFI's portfolio yield
In the Figure each dot represents a single MFI. 

In [None]:
tmp.columns

In [None]:
sns.jointplot(x=tmp['Portfolio Yield'], y=tmp['Normalized PoR']);

The PoR formula penalizes both too high (harmful for a borrower)  and too low (harmful for an MFI)  portfolio yields. The average portfolio yield in this dataset is equal to 33.21%

### 8. Bivariate distribution of PoR score and total amount raised by MFI
In the Figure each dot represents a single MFI. 

In [None]:
sns.jointplot(x=tmp['Total Amount Raised in USD million']/1000000., y=tmp['Normalized PoR']);

The higher total loan amount raised by an MFI since the beginning of the operation the higher PoR score. The Pearson correlation coefficient over 0.5 indicates that relation between PoR and total amount raised is strong. 

### 9. POR distribution for MFIs  operating in poor countries.
This Figure present a distribution of PoR values for MFIs within a given county. Bars marked in colour show the mean value of the PoR score for MFIs for a country. Black lines represents a deviation from the mean value of PoR score for MFIs for the country. The blue line indicates the average PoR.

In [None]:
por.POR_norm.mean()

In [None]:
chart = sns.barplot(x=por.POR_norm,y=por.countries);
plt.plot([por.POR_norm.mean(),por.POR_norm.mean()],[-2,30]);
plt.ylabel('Countries');
plt.xlabel('Normalized PoR');

From data we can read that even though Colombia has the best MFI (as measured by PoR), it has a little worse average score than El Salvador.

## IV. Conclusion

Our model is designed to concentrate on welfare of a poor borrower. Hence a loan score is bad if the portfolio yield is much larger than 25%. One can argue that sustainability (benefit for an intermediary) is better the higher is the yield. It is absolutely critical to decide when a loan is beneficial for a borrower. Success of a loan is not a low default ratio. It is when the money is invested (not consumed) and assets of a borrower are increased. Our calculations show that the situation is far from this goal. It is difficult to improve MPI. The increase of GMOS factor and a decrease of portfolio yield by better use of technology (e.g, the use of blockchain) is easier to achieve.

### References

[1] German credit dataset,  https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data)<br/>
[2] https://www.kaggle.com/uciml/german-credit<br/>
[3] Wolfson, Ben, Microfinance and Machine Learning: A Study of Loan Classification and Risk Princeton University Senior Theses (2017), https://dataspace.princeton.edu/jspui/handle/88435/dsp012j62s7474<br/>
[4] Largest P2P lending institution in US. Datasets from 2007 on: https://www.lendingclub.com/info/download-data.action.<br/>
[5] Hugh Sinclair,   Confessions of a Microfinance Heretic: How Microlending Lost Its Way and Betrayed the Poor  2012<br/>
by  (Author)Berrett-Koehler  Publ.
[6] Duvendack, M.; Palmer-Jones, R.; Copestake, J. G.; Hooper, L.; Loke, Y.; Rao, N. Systematic Review. What is the evidence of the impact of microfinance on the well-being of poor people? EPPI-Centre, Social Science Research Unit, Institute of Education, University of London, London, UK (2011) 184 pp. ISBN 978-1-907345-19-7 https://www.gov.uk/dfid- Department for International Development <br/>
[7]  Mader, Philip, The Political Economy of Microfinance: Financializing Poverty. London: Palgrave Macmillan, 2015. <br/>
[8] Ramesh S. Arunachalam, 2011: The Journey of Indian Micro-Finance: Lessons for the Future. Chennai: Aapti Publications.<br/>
[9 Duvendack, Maren, and Richard Palmer-Jones [2012], .High Noon for Microfinance<br/>
Impact Evaluations: Re-investigating the Evidence from Bangladesh., Journal of
Development Studies, 2012, 1.17, 
[10] Chemin, Matthieu [2008],.The Benefits and Costs of Microfinance: Evidence from
Bangladesh., Journal of Development Studies, Vol. 44, No. 4, 463.484, April 2008<br/>
[11]  Pitt, Mark and Shahidur Khandker [1998], The Impact of Group-Based Credit Pro-
grams on Poor Households in Bangladesh: Does the Gender of Participants Matter?.,
Journal of Political Economy, Vol. 106[5] [Oct. 1998], 958-996.<br/>
[12] Gunfight at the Not OK Corral: Reply to ‘High Noon for Microfinance’ Mark M. Pitt, Journal of Development Studies, Volume 48, 2012 - Issue 12<br/>
[13] Matthieu Chemin, Response to ‘High Noon for Microfinance Impact Evaluations’Journal of Development Studies, Volume 48, 2012 - Issue 12<br/>
[14] Maren Duvendack & Richard Palmer-Jones, Response to Chemin and to Pitt, Journal of Development Studies, Volume 48, 2012 - Issue 12<br/>
[15] CSFI, Microfinance Banana Skins 2014, https://static1.squarespace.com/static/54d620fce4b049bf4cd5be9b/t/55339d1fe4b09a66347a3abe/1429445919704/Microfinance+Banana+Skins+2014+-+WEB.pdf<br/>
[16] Jessica Schicks (2013) The Sacrifices of Micro-Borrowers in Ghana – A Customer-Protection Perspective on Measuring Over-Indebtedness, The Journal of Development Studies, 49:9, 1238-1255, DOI: 10.1080/00220388.2013.775421<br/>
[17] Philip Mader , Questioning Three Fundamental Assumptions In Financial Inclusion,  February 2016,  https://opendocs.ids.ac.uk/opendocs/bitstream/handle/123456789/9566/ER176_ QuestioningThreeFundamentalAssumptionsinFinancialInclusion.pdf<br/>
[18] Planet Rating, MIMOSA Microfinance Index of Market Outreach and Saturation, mimosaindex.org/wp-content/uploads/2017/08/MIMOSA-1_0_final-110313.pdf<br/>
[19] World Bank, Global Findex dataset, https://globalfindex.worldbank.org<br/>
[20] https://www.themix.org/mixmarket/profiles<br/>
[21] Center for Financial Inclusion, Global Microscope 2016, https://www.centerforfinancialinclusion.org/publications-a-resources/global-microscope<br/>
[22] Hudon, M., Périlleux, A., (2014),  What Explains Microfinance Distribution Surplus? A Stakeholder-oriented Approach, Quarterly Review of Economics and Finance, 54(2), pp. 47-157.<br/>
[23] Kiva ranking of 323 Field Partners , https://www.kiva.org/about/where-kiva-works<br/>
[24]  https://www.poverty-action.org/study/determinants-microcredit-delinquency-philippines;<br/>
Dean Karlan, Melanie Morten, Jonathan Zinman, A Personal Touch: Text Messaging for LoaRepayment, NBER Working Paper No. 17952,  March 2012
[25] Ira Lieberman, Paul DiLeo, Todd A. Watkins, and Anna Kanze, The Future of Microfinance Over the Next Ten years. https://responsiblefinanceforum.org/wp-content/uploads/2017/12/Ira-Microfinance-Revolution-or-Footnote_The-Future-of-Microfinance-Over-the-Next-10-Years-11-Dec-2017.pdf<br/>
[26] How does Kiva's loan repayment rate compare with other micro-finance lenders?, https://www.quora.com/How-does-Kivas-loan-repayment-rate-compare-with-other-micro-finance-lenders<br/>