## KIVA LOANS ANALYSIS

#### Introduction 

Kiva is an online crowdfunding platform to extend financial services to poor and financially excluded people around the world. Kiva lenders have provided over $1 billion dollars in loans to over 2 million people. In order to set investment priorities, help inform lenders, and understand their target communities, answering some fundamental questions is necessary. However, this requires inference based on a limited set of information for each borrower.

In this project our country in focus will be the Philippines,(though Kenya will be featured in some questions). And the vital questions to be answered are as follows: 

1. what were the top 10 loan uses?
2. Who are the most dominant field partners?
3. What are the names of the partners and their total loan amount?
4. What are the names of the partners and their total funded amount.
5. What are the names of the partners and their total number of loans.
6. Which are the Top Regions by loan amount?
7. What is the correlation of the numerial values using a heatmap?
8. What is the distribution of loan amount?
9. What is the distribution of funded amount? 
10. What is the distribution of the repayment term?
11. What is the lender count against the funded amount? Explain the obtained results. 
12. What is the distribution of funded amount by region?
13. What is the funded amount with the sector as the hue?
14. By using a boxplot show the regions against funded amount.
15. what is the repayment interval for different regions?
16. By using a boxplot show the loan amount in sectors.
17. What is the sector-wise classification of loans based on the top sectors of the top 20 regions?
18. What is the number of field partners by region (MPI data set)?
19. What is the total amount of loans by month? 
20. How many regions are there?
21. What is the number of loans per region?
22. What is the number of loans per sector?
23. Show the distrubution of the disbursed time of the loans. 
##### I sure hope my findings will be an eye opener and a motivation to take necessary actions.

#### Imported necessary liabriaries.
For EDA questions and data cleaning, please have a look at other project "...

In [None]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns 
import warnings 
warnings.filterwarnings('ignore')

In [None]:
kiva = pd.read_csv ('../input/data-science-for-good-kiva-crowdfunding/kiva_loans.csv')

In [None]:
philippines = kiva[kiva['country']== 'Philippines']. reset_index (drop = True)

In [None]:
#some loan uses are in repeat with one a "." or " " differentiating them. There, I used the following codes to join them.

philippines ['use'] = philippines ['use'].astype (str) #change to string type
def remove_end_spaces (string): 
    return "".join(string.strip())
philippines ['use'] = philippines ['use'].apply (remove_end_spaces) #to add all the strings together.
transtab = str. maketrans (dict. fromkeys (".",""))
transtab2 = str. maketrans (dict. fromkeys ("-"," "))
philippines ['use'] = '|'.join (philippines ['use']. tolist()).translate (transtab). split ('|') #to split the strings
philippines ['use'] = '|'.join (philippines ['use']. tolist()).translate (transtab2). split ('|') 

#### 1. The top 10 loan uses in the data.

The table below shows the top 10 loan uses, and therefore I can confidently conclude that "to build a sanitary toilet for her family" is the top most reason. 

In [None]:
philippines.columns

In [None]:
philippines ['use'].value_counts().head(10). reset_index()

In [None]:
top10_loan_uses = philippines.groupby('use')['loan_amount'].count().sort_values (ascending = False).reset_index(). head (10)
top10_loan_uses

In [None]:
plt.figure (figsize = (12,6))
plt.xticks (rotation = 75)
sns.barplot (x= 'use', y= 'loan_amount', data = top10_loan_uses). set(ylabel= 'Count')

#### 2. The most dominant field partners.

Partner with index 145 is the most dominant partner.

In [None]:
philippines ['partner_id'].value_counts(). reset_index(). head(10)

In [None]:
dominant_field_partners = philippines.groupby ('partner_id')['lender_count'].count(). sort_values (ascending = False). reset_index().head(10)
dominant_field_partners

In [None]:
plt.figure (figsize = (12,6))
plt.xticks (rotation = 75)
sns.barplot (x= 'partner_id', y= 'lender_count', data = dominant_field_partners). set(ylabel = 'Count')

#### 3. The partner names and their total loan amount.

NWTF, which is a women group, borrowed the highest total loan amount.

In [None]:
loan_themes = pd.read_csv ('../input/data-science-for-good-kiva-crowdfunding/loan_themes_by_region.csv')
loan_themes.columns

In [None]:
phil_partners = loan_themes [loan_themes['country']== 'Philippines'].reset_index(drop = True)

In [None]:
partner_loans = phil_partners.groupby ('Field Partner Name')['amount'].sum(). sort_values (ascending = False). reset_index()
partner_loans

In [None]:
plt.figure (figsize = (12,6))
plt.xticks (rotation = 75)
sns.barplot (x ='amount', y = 'Field Partner Name', data = partner_loans)

#### 4. The partners names and their total funded amount.



In [None]:
# I had to merge loan_themes_by_region with Kiva loans in order to get the partner names and the funded amount.
loan_themes_by_region = pd.read_csv ('../input/data-science-for-good-kiva-crowdfunding/loan_themes_by_region.csv')

In [None]:
philippines_themes_region = loan_themes_by_region [loan_themes_by_region ['country'] == 'Philippines']. reset_index (drop = True)                                                                                                                        

In [None]:
philippines_partner_name_ID = philippines_themes_region [["Partner ID", "Field Partner Name"]].drop_duplicates()
philippines_partner_name_ID.head()

In [None]:
#the field names must match the ones in the philippines for the merge to be sucessful.
philippines_partner_name_ID.columns = ["partner_id", "field_partner_name"]

In [None]:
merge = pd.merge(philippines, philippines_partner_name_ID, on = 'partner_id').reset_index()

In [None]:
funded_amount_by_partner = merge. groupby('field_partner_name')['funded_amount'].sum().sort_values (ascending = False).reset_index()
funded_amount_by_partner

#### 5. The partner names and their total number of loans

In [None]:
no_loans_by_partner = phil_partners.groupby ('Field Partner Name')['amount'].count().sort_values (ascending = False). reset_index()
no_loans_by_partner

In [None]:
plt.figure (figsize = (12,6))
plt.xticks (rotation = 75)
sns.barplot (x ='amount', y = 'Field Partner Name', data = no_loans_by_partner).set (xlabel = 'No.of Loans')

#### 6. Top Regions by loan amount

In [None]:
loan_amount_by_regions = philippines.groupby ('region')['loan_amount'].sum().sort_values(ascending = False). reset_index().head(10)
loan_amount_by_regions 

In [None]:
fig = px.pie(loan_amount_by_regions, values ='loan_amount', names='region', title='Loan Amount by Regions')
fig

#### 7. The correlation of the numerial values using a heatmap.

In [None]:
philippines.columns

In [None]:
num_values = philippines[['funded_amount', 'loan_amount', 'posted_time', 'disbursed_time', 'funded_time', 'term_in_months','lender_count']].head()
num_values 

In [None]:
num_values.corr()

In [None]:
plt.figure(figsize = (5,5))
sns.heatmap (num_values.corr(), annot = True, cmap = 'coolwarm')

#### 8. The distribution of loan amount.

In [None]:
philippines.dropna (subset =['borrower_genders'], inplace = True)

In [None]:
fig = px.histogram (philippines, x = 'loan_amount', facet_row = 'borrower_genders')
fig.update_layout(yaxis_range = [0,1000],xaxis_range=[0, 4000])
fig

#### 9. The distribution of funded amount.

In [None]:
px.histogram (philippines, x = 'funded_amount',range_x = [0,1000], color = 'borrower_genders')

#### 10. The distribution of repayment term.

In [None]:
px.histogram (philippines, x = 'repayment_interval')

#### 11. The lender count against the funded amount. Explain the obtained results. 

In [None]:
lender_count_funded_amount = philippines.groupby('borrower_genders')[['lender_count', 'funded_amount']].sum().sort_values (by ='funded_amount').reset_index()
lender_count_funded_amount

In [None]:
px.scatter (philippines, x = 'funded_amount', y = 'lender_count', color = 'borrower_genders', size = 'funded_amount', hover_data = ['loan_amount'])

#### 12. The distribution of funded amount by region. 

In [None]:
dist_by_region = philippines.groupby ('region')['funded_amount'].sum().sort_values (ascending = False). reset_index().head (10)
dist_by_region

In [None]:
plt.figure (figsize = (12,6))
plt.xticks (rotation = 75)
sns.barplot (x ='region', y = 'funded_amount', data = dist_by_region)

#### 13. The funded amount with the sector as the hue. 

In [None]:
funded_amount_by_sector = philippines.groupby ('sector')['funded_amount'].sum().sort_values (ascending = False). reset_index().head (10)
funded_amount_by_sector

In [None]:
px.scatter (funded_amount_by_sector, x = 'funded_amount', y = 'sector', color = 'sector', size = 'funded_amount')

#### 14. A box plot of regions against funded amount.

In [None]:
dist_by_region.head()

In [None]:
fig2 = px.box(dist_by_region, y="funded_amount", x = 'region') 
fig2.show()

#### 15. The repayment interval for different regions.

In [None]:
repayment_interval = philippines.groupby (['region', 'repayment_interval'])['loan_amount'].sum().sort_values(ascending = False). reset_index().head (10)
repayment_interval

In [None]:
philippines.columns

In [None]:
plt.figure (figsize=(10,5))
plt.xticks (rotation=75)
plt.title ('Repayment_interval by Region')
sns.barplot(x='region', y='loan_amount', data=repayment_interval, ci =None, estimator=np.sum, hue= 'repayment_interval')

#### 16. Boxplots of the loan amount in sectors.

In [None]:
fig3 = px.box (philippines, y = 'loan_amount', x = 'sector', height = 800, color = 'sector')
fig3.update_yaxes (
    range = (0,2000), constrain= 'domain')
fig3.show ()

#### 17. Sector-wise classification of loans based on the top sectors of the top 20 regions.

In [None]:
loan_amount_by_sector_tp20regions = philippines.groupby (['sector', 'region'])['loan_amount'].sum().sort_values (ascending = False). reset_index().head (20)
loan_amount_by_sector_tp20regions

In [None]:
px.bar (data_frame =loan_amount_by_sector_tp20regions, x = 'region', y = 'loan_amount', title = 'Loan Amount by Region', color = 'sector')

#### 18. The number of field partners by region (MPI data set).

In [None]:
phil_partners.columns

In [None]:
phil_partners_by_mpiregion = phil_partners.groupby ('mpi_region')['Partner ID']. count(). sort_values (ascending = False ). reset_index()
phil_partners_by_mpiregion

In [None]:
px.bar (data_frame =phil_partners_by_mpiregion, x = 'mpi_region', y = 'Partner ID')

#### 19. The months and the total number of loans in that month.

In [None]:
philippines ['date']

In [None]:
philippines['date'] = pd.to_datetime (philippines['date'])
philippines['date']

In [None]:
philippines['date'].dt.year

In [None]:
philippines['date'].dt.month

In [None]:
philippines['date'].dt.day

In [None]:
philippines['year']= philippines['date'].dt.year
philippines['month']= philippines['date'].dt.month
philippines['day']= philippines['date'].dt.day
philippines.sample()

In [None]:
loans_by_month= philippines.groupby('month')['loan_amount'].count(). sort_values (ascending = False ). reset_index()
loans_by_month

In [None]:
px.bar(data_frame = loans_by_month, x = 'month', y = 'loan_amount', title= 'Loans by Month')

In [None]:
philippines.columns

#### 20. The number of regions.

In [None]:
philippines['region']. nunique()

In [None]:
region_nos = philippines.groupby ('region')['id']. count (). reset_index()
region_nos

#### 21. The number of loans per region.

In [None]:
loan_nos_by_region = philippines.groupby ('region')['id'].count().sort_values(ascending = False). reset_index().head(10)
loan_nos_by_region

In [None]:
px.bar(data_frame = loan_nos_by_region, x = 'region', y = 'id', title= 'No. of Loans by Region')

#### 22. The number of loans per sector.

In [None]:
loan_nos_by_sector = philippines.groupby ('sector')['id'].count().sort_values(ascending = False). reset_index().head(10)
loan_nos_by_sector

In [None]:
px.bar(data_frame = loan_nos_by_sector, x = 'sector', y = 'id', title= 'No. of Loans by Sector')

In [None]:
philippines.columns

#### 23. The distrubution of the disbursed time of the loans.

In [None]:
px.histogram (philippines, x = 'disbursed_time')

#### 24. The distrubution of the funded time of the loans.

In [None]:
px.histogram (philippines, x = 'funded_time')