
**Objective : Estimate the welfare level of barrowers in specific regions based on shared economic and demographic characteristics.**


A good solution would connect the features of each loan or product to one of several poverty mapping datasets, which indicate the average level of welfare in a region on as granular a level as possible. Many datasets indicate the poverty rate in a given area, with varying levels of granularity. Kiva would like to be able to disaggregate these regional averages by gender, sector, or borrowing behavior in order to estimate a Kiva borrower’s level of welfare using all of the relevant information about them. Strong submissions will attempt to map vaguely described locations to more accurate geocodes.

Kernels submitted will be evaluated based on the following criteria:

1. Localization - How well does a submission account for highly localized borrower situations? Leveraging a variety of external datasets and successfully building them into a single submission will be crucial.

2. Execution - Submissions should be efficiently built and clearly explained so that Kiva’s team can readily employ them in their impact calculations.

3. Ingenuity - While there are many best practices to learn from in the field, there is no one way of using data to assess welfare levels. It’s a challenging, nuanced field and participants should experiment with new methods and diverse datasets.

Some Questions to answer :

<a href='#who_barrowed_most'>1.  Top 10 Countries that are highest funded.'</a>

<a href='#sectrors_barrowed'>2.  Top 10  highest funded countries - their sectors of barrowing.'</a>

<a href='#activities_funded'>3.  Top 50  activities  funded.'</a>

<a href='#sectors_breakup'>4.  Various Sectors and thier activities funded.'</a>

2. Which geography and regions are taking loans
3. Purpose of the loans - can we gradually classify. 
4. To asses the welfare levels of the the barrowers ??. 
5. Who is poor
6. Who is deserved of loan
7. Evaluation criteria as per the current data ??. 



In [1]:
 print('Import all the needed libraties')


In [2]:
# Import all the needed libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from plotly.offline  import download_plotlyjs,init_notebook_mode,plot, iplot
import cufflinks as cf
init_notebook_mode(connected = True)
cf.go_offline()

In [3]:
print("Read the csv files into dataframes")


In [4]:
df1 = pd.read_csv("../input/kiva_loans.csv")
df2 = pd.read_csv("../input/loan_themes_by_region.csv")
df3 = pd.read_csv("../input/kiva_mpi_region_locations.csv")
df4 = pd.read_csv("../input/loan_theme_ids.csv")

In [5]:
country_loans = df1[['country', 'loan_amount', 'funded_amount']]
table = country_loans.groupby(by = ['country']).sum().sort_values(by = 'funded_amount', ascending = False)
table1 = pd.DataFrame([list(table.funded_amount.index), list(table.funded_amount.values.astype(int))]).T
table1.columns = ['country', 'fund_amt']
table1['loan_amt'] = list(table.loan_amount.astype(int))
# Top 10 countries that barrowed the highest fund


<a id='who_barrowed_most'></a>
### Top 10 countries that are funded highest


In [6]:
print(table1.head(10))

In [7]:
table1.iplot(kind = 'bar', x = 'country', y = ['fund_amt','loan_amt'],xTitle = 'COUNTIRES', yTitle = 'FUNDED-AMOUNT', theme = 'solar')

### Observations : 
#### 1. United States  known to be a rich country,  it is in the 5th highest barrower position - Surprising 
#### 2. United States the loan applied amount and the funded Amount - Gap is also high. Does that mean USA loan applications are more. 
#### 3.  Philippine is the higest barrower 
#### 4. Virgin Islands is the lowest barrower - '0' -- Not funded eventhough loan applications are there
#### 5. Gaum is the lowest fund amount barrower 


<a id='sectors_barrowed'></a>
### Top 10  highest funded countries - thier sectors of barrowing.


In [8]:
sec = df1['sector'].value_counts()
# Top 10 Sectors that have the highest count of barrowers. 
print(sec.head(10))

In [9]:
x = df1.pivot_table(index = 'country', columns = 'sector', values = 'funded_amount', aggfunc = 'sum')
x = x[x>500]
x.fillna(0, inplace=True)
array = list(table1.head(10).country)
y = x.loc[x.index.isin(array)]
y.iplot(kind = 'heatmap', theme = 'solar')

<a id='sectrors_barrowed'></a>
### Top 10 funded countries and thier sectors of funding  

In [10]:
top_ten = list(table1.head(10).country.values)
sector_fund = df1[['country', 'sector', 'funded_amount']]
sector_max = pd.DataFrame( columns = ['country', 'sector', 'funded_amount'])
for x in range (0, len(top_ten)):
    temp = sector_fund[sector_fund['country'] == top_ten[x]]
    #temp.reset_index(drop = True, inplace = True)
    sector_max = sector_max.append(temp, ignore_index = True)
sec_max = sector_max.pivot_table(index = 'country', columns = 'sector', values= 'funded_amount', aggfunc = sum)
sec_max.iplot(kind = 'bar',title = 'Country Funded Sectors',theme = 'solar')

### Observations: 
#### 1. Agriculture, Food, Retail are the highest funded sector. 
#### 2. Whole sale is the least funded sector. 
#### 3. United States also -- Highest funded sector is Agriculature.

<a id='activities_funded'></a>
####  Top 50 activities  funded 

In [11]:
top_sectors = list(sec.head(10).index)
activity_fund = df1[['sector','activity', 'funded_amount']]
activity_max = pd.DataFrame(columns = ['sector','activity', 'funded_amount'])
for x in range (0, len(top_sectors)):
    temp = activity_fund[activity_fund['sector'] == top_sectors[x]]
    activity_max = activity_max.append(temp, ignore_index = True)
activity_max = activity_max.groupby(['sector','activity']).sum().sort_values(by='funded_amount', ascending = False).head(50)
activity_max.iplot(kind = 'bar',title = 'Sectors Funded for Activities ',theme = 'solar')


#### Observations : 
#### 1.  Agricuture sector and  Farming is the highly funded activity.
#### 2. Personal Use - Celebrations is the least funded actity -- ( Celebrations ??)
#### 3. Retail General Stroes is the second highly funded activity.

<a id='sectors_breakup'></a>
#### Various sectors and thier activty wise break up 

In [12]:
sectors = list(df1['sector'].unique())
for item in sectors:
    trace = df1[df1['sector'] == item].activity.value_counts()
    trace.iplot(kind = 'bar', theme = 'solar', title = item )

#### Add Female, Male Counts columns 

In [13]:
df1['borrower_genders'].astype(str)
temp = df1['borrower_genders'].astype(str)
Gender_count = []
Male_count = []
Female_count = []
for item in temp.values:
    item = item.strip()
    item = item.replace(' ','')
    lst =item.split(',')
    
    Gender_count.append([lst.count('male'), lst.count('female')])
for x in Gender_count:
    Male_count.append(x[0])
    Female_count.append(x[1])
df1['Male_count'] = Male_count
df1['Female_count'] = Female_count

#### Countries and Genderwise distribution of fund

In [14]:
pd.options.mode.chained_assignment = None
country_gender = df1[['country', 'funded_amount', 'Male_count', 'Female_count']]
a =country_gender['funded_amount']/(country_gender['Male_count'] + country_gender['Female_count'])*country_gender['Female_count']
country_gender['Female_fund'] = a
b = country_gender['funded_amount']/(country_gender['Male_count'] + country_gender['Female_count'])*country_gender['Male_count']
country_gender['Male_fund'] = b
country_gender.drop(['funded_amount', 'Male_count','Female_count'], axis = 1, inplace = True)
gen = country_gender.groupby('country'). sum().astype(int).sort_values(by = 'Female_fund', ascending = False)


In [15]:
gen.iplot(kind = 'bar',xTitle = 'COUNTIRES', yTitle = 'FUNDED AMOUNT', theme='solar')

#### Observations: Almost every coutry Female Barrowers are funded more.

#### Countries where the Males are funded more than Females

In [16]:
gen[gen['Female_fund']< gen ['Male_fund']].iplot(kind = 'bar', theme = 'solar')


#### Countries , their Regions and thier Multidimensional Poverty Index ( MPI) .  

#### Countries and thier regions count within country.

In [17]:

a = df1[['country', 'region']].drop_duplicates()
a.groupby(by = 'country').count().sort_values(by = 'region', ascending = False)

#a[a['country'] == 'Pakistan'].shape

#### World Regions and thier MPI. 

In [18]:
b = df3[['country','MPI', 'region', 'world_region']]
#b.groupby('country').mean().sort_values('MPI', ascending = True)
world_mpi = b[['world_region', 'MPI']].groupby('world_region'). std()


In [19]:
world_mpi.iplot(kind = 'bar', theme = 'solar')

#### Countries and thier MPI -- Poor countries in order

In [20]:
st = b[['country', 'MPI']].groupby('country').mean().dropna()
st = st.sort_values(by = 'MPI', ascending = False)
st.head(20).iplot(kind = 'bar', theme = 'solar')


#### Observation : Top Poor countries are  African countries.  

#### Country  regions and the MPI

In [21]:
region_mpi = b[['region', 'MPI']].groupby('region').mean().dropna()
region_mpi = region_mpi.sort_values(by = 'MPI', ascending = False)
region_mpi.head(50).iplot(kind = 'bar', theme = 'solar')

2nd April : 12:36