# INTRODUCTION

We will be looking at a competition on Kaggle: Data Science for Good: Kiva Crowdfunding 
In this challenge, Kiva an online crowdfunding platform is inviting the community to help then build more localized models to estimate the poverty levels of residents in the regions where Kiva has active loans.
The aim will be to explore the data using Python to help Kiva understand their borrowers and their poverty levels so as to better assess and maximize the impact of their work. Participants should develop their own creative approaches to addressing the objective.
The columns are explained as below:
Unique ID for loan
funded_amount-The amount disbursed by Kiva to the field agent(USD)
loan_amount-The amount disbursed by the field agent to the borrower(USD) activity More granular category sectorHigh level category useExact usage of loan amount
country_code- ISO country code of country in which loan was disbursed
countryFull- country name of country in which loan was disbursed
regionFull - region name within the country
currency- The currency in which the loan was disbursed
partner_id - ID of partner organization
posted_time - The time at which the loan is posted on Kiva by the field agent
disbursed_time - The time at which the loan is disbursed by the field agent to the borrower
funded_time - The time at which the loan posted to Kiva gets funded by lenders completely
term_in_months - The duration for which the loan was disbursed in months
lender_count - The total number of lenders that contributed to this loan tags
borrower_genders- Comma separated M,F letters, where each instance represents a single male/female in the group

# Importing libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import plotly.graph_objs as go 
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True) 

In [None]:
import plotly.plotly as py

# Reading and Previewing the data

## kiva_loans 

In [None]:
kiva_loans=pd.read_csv('kiva_loans.csv')
kiva_loans.head()

## kiva_mpi_region_locations

In [None]:
kiva_mpi_region_locations=pd.read_csv('kiva_mpi_region_locations.csv')
kiva_mpi_region_locations.head()

## loan_theme_ids

In [None]:
loan_theme_ids=pd.read_csv('loan_theme_ids.csv')
loan_theme_ids.head()

## loan_themes_by_region

In [None]:
loan_themes_by_region=pd.read_csv('loan_themes_by_region.csv')
loan_themes_by_region.head()

# Unique values in each column per Dataframe

## kiva_loans

In [None]:
unique_values=kiva_loans.nunique()
print('Count of unique values in each column :')
print(unique_values)

## kiva_mpi_region_locations

In [None]:
unique_values_1=kiva_mpi_region_locations.nunique()
print('Count of unique values in each column :')
print(unique_values_1)

## loan_theme_ids

In [None]:
unique_values_2=loan_theme_ids.nunique()
print('Count of unique values in each column :')
print(unique_values_2)

## loan_themes_by_region

In [None]:
unique_values_3=loan_themes_by_region.nunique()
print('Count of unique values in each column :')
print(unique_values_3)

# Statistical Overview of the data

## Checking the info of the dataset

In [None]:
kiva_loans.info()

## Description for numerical features

In [None]:
kiva_loans.describe()

## Description to include categorical features

In [None]:
kiva_loans.describe(include=['O'])

# Cleaning the data

## Checking for missing values

Missing values in Kiva_loans

In [None]:
total=kiva_loans.isnull().sum().sort_values(ascending = False)
percentage=(kiva_loans.isnull().sum()/kiva_loans.isnull().count()).sort_values(ascending = False)
missing_kiva_loans  = pd.concat([total, percentage], axis=1, keys=['Total', 'Percentage'])
missing_kiva_loans

Missing values in kiva_mpi_region_locations 

In [None]:
total=kiva_mpi_region_locations.isnull().sum().sort_values(ascending = False)
percentage=(kiva_mpi_region_locations.isnull().sum()/kiva_mpi_region_locations.isnull().count()).sort_values(ascending = False)
missing_kiva_location  = pd.concat([total, percentage], axis=1, keys=['Total', 'Percentage'])
missing_kiva_location

Missing values in loan_theme_ids

In [None]:
total=loan_theme_ids.isnull().sum().sort_values(ascending = False)
percentage=(loan_theme_ids.isnull().sum()/loan_theme_ids.isnull().count()).sort_values(ascending = False)
missing_loan_theme_ids  = pd.concat([total, percentage], axis=1, keys=['Total', 'Percentage'])
missing_loan_theme_ids

Missing values in loan_themes_by_region

In [None]:
total=loan_themes_by_region.isnull().sum().sort_values(ascending = False)
percentage=(loan_themes_by_region.isnull().sum()/loan_themes_by_region.isnull().count()).sort_values(ascending = False)
missing_loan_themes_by_region  = pd.concat([total, percentage], axis=1, keys=['Total', 'Percentage'])
missing_loan_themes_by_region

# Exploratory Data Analysis

In [None]:
kiva_loans.head(2)

## The relationship among numerical variables 

In [None]:
sns.pairplot(kiva_loans,palette='rainbow')

## Distribution of Loan amount

In [None]:
sns.distplot(kiva_loans['loan_amount'],kde=False,bins=30)
plt.title('Distribution of Loan amount')

All loans are below 20,000

## Distribution of loan amount per sector

In [None]:
plt.figure(figsize=(12,8))
sns.barplot(x='sector',y='loan_amount',data=kiva_loans)
plt.xticks(rotation=45)
plt.title('Distribution of loan amount per sector')

The entertainment sector recorded the highest loan amount followed by the wholesale sector. Personal use recorded the lowest loan amount

## Distribution of funded amount

In [None]:
sns.distplot(kiva_loans['funded_amount'],kde=False,bins=30)
plt.title('Distribution of Funded amount')

All funded amounts are below 20000

## Distribution of funded amount per sector

In [None]:
plt.figure(figsize=(12,8))
sns.barplot(x='sector',y='funded_amount',data=kiva_loans)
plt.xticks(rotation=45)
plt.title('Distribution of funded amount per sector')

The wholesale sector has the highest funded amount followed by the entertainment sector.Personal use has the lowest funded amount followed by Housing.

In [None]:
kiva_loans.head(2)

In [None]:
data = dict(
        type = 'choropleth',
        locations = kiva_loans['country_code'],
        z = kiva_loans['loan_amount'],
        text = kiva_loans['country'],
        colorbar = {'title' : 'loan amount'},
      ) 
#Layout
layout = dict(
    title = 'KIva Global Loan distribution',
    geo = dict(
        showframe = True,
        projection = {'type':'mercator'}
    )
)
#Plotting the graph
choromap = go.Figure(data = [data],layout = layout)
iplot(choromap)

In [None]:
choropleth = go.Choropleth(z = kiva_loans[‘loan_amount'],
locations = kiva_loans[‘country’], locationmode = ‘country names’)
#Layout
layout = go.Layout(title = ‘Loan amount distribution in the world’, \
geo = {‘projection’: {‘type’: ''}})
#Drawing the graph
figure = go.Figure(data = [choropleth], layout = layout)
plot(figure)