# Data Science for Good: Kiva Crowdfunding - Augustin Jossa

### EXECUTIVE SUMMARY

Over the 671 205 loans studied on Kiva's platform, 423 081 were funded (__63%__).

There are three types of loans on Kiva's platform : 
-  Small loans (25 - 1000\$) representing 73% of the platform loans but only 37% of the amount shared on the platform. 
-  Big loans for micro entrepreneurs (1000 - 10 000\$) representing 26% of the platform loans but 61% of the amount shared on the platform. 
-  Few big investments projects (more than 10 000\$).

#### Small loans study (25 - 1000\$)
- The average funded amount among these loans is equal to __432\$__
- Most of these loans are used to build projects in food, retail, personal needs, services, housing and education
- __These loans are used for personal projects__ (medical check-up, buying a rickshaw for a son, buying medicines) and small microentrepreneurial projects (building new classes in a school, buying a new sewing machine or purchasing beverages, breads, vegetables and fruits...)
- Most of these loans are __funded by an average 13 people, and refunded in 13 months__
- Most of these activities are in Kenya, Cambodia, Pakistan, Tajikistan, and Colombia

#### Medium loans study (1000 - 10 000\$)
- Most of these loans are __financed by an average 50 people, and refunded in 17 months__ 
- The main part of these loans are used to build projects in food, retail, education services and clothing
- These loans are used for __professional projects needing a significant capital investment__ to purchase tools, equipments, and commodities.  
- Most of these loans are in Cambodia, Paraguay, Ecuador, Palestin and Lebanon. 

#### Big loans study (higher than 10 000\$)
- There are few big projects on Kiva platform (one 100 000\$ project and thirty 50 000\$ projects)
- The 100 000\$ project is __a 13 years investment to create jobs for women and farmers in Haïti__. 368 people decided to take part to this very impactful project.
- Retail, Education and services are the three main project categories for these loans
- These big loans are __used for highly impactful humanitarian projects__ (creating activity for 600 families in Ecuador, 600 fishermen in Tanzania, 800 farmers in Guatemala, developing solar homes in Zimbabwe, developing farming cooperatives in Rwanda...)
- Most of these loans are in __Rwanda, Tanzania, and Myanmar__

#### Unfunded loans
Studying unfunded loans on Kiva platform, it appears that: 
- Small loans are more likely to get financed on Kiva than big investment projects (most of unfunded loans are 10 000 to 50 000\$ loans)
- Loans implying more than 250 investors are less likely to get funded

Two facts could explain this situation : 
- __Kiva's marketing positioning__. The first message people can read on Kiva’s landing page is : "Lend to a woman entrepreneur in honor of International Women's Day". Users think that Kiva focuses on microfunding investments, even if these loans represent only 37% of the amount shared on the platform.
- __The average basket on Kiva platform__ is equal to 36\$. With 36\$ people may prefer to support a small project rather than contributing to a bigger project not depending on their own contribution

#### Ideas 
- __Creating a page to discover selected projects__ could be a good way to promote big humanitarian project
- __Inciting people to split a big loan into smaller recuring loans__ could be a good way to fund big projects and to increase customers annual average basket

### DATA IMPORTATION AND PREPROCESSING

In [7]:
#Import libraries
import pandas as pd
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="whitegrid")

#Import Kiva datasets
loans = pd.read_csv('../input/kiva_loans.csv')

#Crating views
studied_loans = loans.where((loans['funded_amount']>0)).dropna()
small_loans = loans.where((loans['funded_amount']>0)&(loans['funded_amount']<1000)).dropna()
medium_loans = loans.where((loans['funded_amount']>=1000)&(loans['funded_amount']<10000)).dropna()
big_loans = loans.where((loans['funded_amount']>10000)).dropna()
unfunded_loans = loans[loans['funded_amount'].isin([0])]

#Visualization functions
def top10(value,selection,dataset, operation=1, number=10):
    df = dataset[:][[selection,value]]
    if operation == 2: 
        df = df.groupby(selection).mean()
    else:
        df = df.groupby(selection).sum()
    df = df.sort_values(by=value, ascending=False)
    df_x = df.iloc[1:number+1][value]
    df_y = df.iloc[1:number+1].index.values
    plot=sns.barplot(x=df_x, y=df_y) 
    plt.xlabel('')
    if operation == 2: 
        plot.set(title="Mean of " +value + " by " + selection)
    else:
        plot.set(title=value + " by " + selection)
    return

In [8]:
# Understanding loans on Kiva platform
print(len(small_loans)/len(studied_loans))
print(sum(small_loans['funded_amount'])/sum(studied_loans['funded_amount']))
print(len(medium_loans)/len(studied_loans))
print(sum(medium_loans['funded_amount'])/sum(studied_loans['funded_amount']))
print(len(big_loans)/len(studied_loans))
print(sum(big_loans['funded_amount'])/sum(studied_loans['funded_amount']))

### DATA EXPLORATION

### Small loans study (25 - 1000\$)
- The average funded amount among these loans is equal to 432$
- Most of these loans are used to build projects in food, retail, personal needs, services, housing and education
- These loans are used for personal projects (medical check-up, buying a rickshaw for a son, buying medicines) and small microentrepreneurial projects (building new classes in a school, buying a new sewing machine or purchasing beverages, breads, vegetables and fruits...)
- Most of these loans are funded by an average 13 people, and refunded in 13 months
- Most of these activities are in Kenya, Cambodia, Pakistan, Tajikistan, and Colombia

In [9]:
small_loans.describe()

In [11]:
plt.figure(figsize=(20,4))
sns.distplot(small_loans[:]['funded_amount'])
plt.show()

plt.figure(figsize=(20,4))
plt.subplot(1,2,1)
top10('id','sector',small_loans, operation=1, number=10)
plt.subplot(1,2,2)
top10('funded_amount','sector',small_loans, operation=1, number=10)
plt.show()

plt.figure(figsize=(20,4))
plt.subplot(1,2,1)
top10('id','country',small_loans, operation=1, number=10)
plt.subplot(1,2,2)
top10('funded_amount','country',small_loans, operation=1, number=10)
plt.show()

small_loans.sort_values(by='funded_amount', ascending=False)[['funded_amount','use']].head(20)

### Medium loans study (1000 - 10 000\$)
- Most of these loans are __financed by an average 50 people, and refunded in 17 months__ 
- The main part of these loans are used to build projects in food, retail, education services and clothing
- These loans are used for __professional projects needing a significant capital investment__ to purchase tools, equipments, and commodities.  
- Most of these loans are in Cambodia, Paraguay, Ecuador, Palestin and Lebanon. 

In [13]:
medium_loans.describe()

In [14]:
plt.figure(figsize=(20,4))
sns.distplot(medium_loans[:]['funded_amount'])
plt.show()

plt.figure(figsize=(20,4))
plt.subplot(1,2,1)
top10('id','sector',medium_loans, operation=1, number=10)
plt.subplot(1,2,2)
top10('funded_amount','sector',medium_loans, operation=1, number=10)
plt.show()

plt.figure(figsize=(20,4))
plt.subplot(1,2,1)
top10('id','country',medium_loans, operation=1, number=10)
plt.subplot(1,2,2)
top10('funded_amount','country',medium_loans, operation=1, number=10)
plt.show()

medium_loans.sort_values(by='funded_amount', ascending=False)[['funded_amount','use']].head(20)

### Big loans study (higher than 10 000\$)
- There are few big projects on Kiva platform (one 100 000\$ project and thirty 50 000\$ projects)
- The 100 000\$ project is __a 13 years investment to create jobs for women and farmers in Haïti__. 368 people decided to take part to this very impactful project.
- Retail, Education and services are the three main project categories for these loans
- These big loans are __used for highly impactful humanitarian projects__ (creating activity for 600 families in Ecuador, 600 fishermen in Tanzania, 800 farmers in Guatemala, developing solar homes in Zimbabwe, developing farming cooperatives in Rwanda...)
- Most of these loans are in __Rwanda, Tanzania, and Myanmar__

In [15]:
print(len(big_loans[big_loans['funded_amount'].isin([100000])]))
print(len(big_loans[big_loans['funded_amount'].isin([50000])]))

In [17]:
plt.figure(figsize=(20,4))
sns.distplot(big_loans[:]['funded_amount'])
plt.show()

plt.figure(figsize=(20,4))
plt.subplot(1,2,1)
top10('id','sector',big_loans, operation=1, number=10)
plt.subplot(1,2,2)
top10('funded_amount','sector',big_loans, operation=1, number=10)
plt.show()

plt.figure(figsize=(20,4))
plt.subplot(1,2,1)
top10('id','country',big_loans, operation=1, number=10)
plt.subplot(1,2,2)
top10('funded_amount','country',big_loans, operation=1, number=10)
plt.show()

big_loans.sort_values(by='funded_amount', ascending=False)[['funded_amount','use', 'country']].head(20)

### Unfunded loans
Studying unfunded loans on Kiva platform, it appears that: 
- Small loans are more likely to get financed on Kiva than big investment projects (most of unfunded loans are 10 000 to 50 000\$ loans)
- Loans implying more than 250 investors are less likely to get funded

Two facts could explain this situation : 
- __Kiva's marketing positioning__. The first message people can read on Kiva’s landing page is : "Lend to a woman entrepreneur in honor of International Women's Day". Users think that Kiva focuses on microfunding investments, even if these loans represent only 37% of the amount shared on the platform.
- __The average basket on Kiva platform__ is equal to 36\$. With 36\$ people may prefer to support a small project rather than contributing to a bigger project not depending on their own contribution

In [18]:
sum(studied_loans['funded_amount'])/sum(studied_loans['lender_count'])

In [None]:
plt.figure(figsize=(20,4))
plt.subplot(1,2,1)
sns.distplot(big_loans[:]['loan_amount'])

plt.subplot(1,2,2)
sns.distplot(big_loans[:]['lender_count'])
plt.show()