> Kiva.org is an online crowdfunding platform to extend financial services to poor and financially excluded people around the world. [...] In Kaggle Datasets' inaugural Data Science for Good challenge, Kiva is inviting the Kaggle community to help them build more localized models to estimate the poverty levels of residents in the regions where Kiva has active loans.

# Problem Statement
For the locations in which Kiva has active loans, your objective is to pair Kiva's data with additional data sources to estimate the welfare level of borrowers in specific regions, based on shared economic and demographic characteristics.

A good solution would connect the features of each loan or product to one of several poverty mapping datasets, which indicate the average level of welfare in a region on as granular a level as possible. Many datasets indicate the poverty rate in a given area, with varying levels of granularity. Kiva would like to be able to disaggregate these regional averages by gender, sector, or borrowing behavior in order to estimate a Kiva borrower’s level of welfare using all of the relevant information about them. Strong submissions will attempt to map vaguely described locations to more accurate geocodes.

"The challenge will be finding existing poverty measurement and mapping projects and build a model that predicts poverty using the kinds of variables that Kiva collects. As you say, you can use whatever indicator allows for meaningful and relatively granular estimates when we feel Kiva's data into your model." - Elliot Collins

Kernels submitted will be evaluated based on the following criteria:

>1. Localization - How well does a submission account for highly localized borrower situations? Leveraging a variety of external datasets and successfully building them into a single submission will be crucial.

>2. Execution - Submissions should be efficiently built and clearly explained so that Kiva’s team can readily employ them in their impact calculations.

>3. Ingenuity - While there are many best practices to learn from in the field, there is no one way of using data to assess welfare levels. It’s a challenging, nuanced field and participants should experiment with new methods and diverse datasets.

In [200]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

In [201]:
regions = ["Bishkek", "Chui", "Issyk-Kul", "Naryn",
           "Batken", "Batket", "Jalal-Abad", "Jalalabat", 
           "Talas", "Osh"]

In [202]:
loans_kyrgyzstan = pd.read_csv('../input/kiva_loans.csv')
loans_kyrgyzstan = loans_kyrgyzstan.loc[loans_kyrgyzstan.country == 'Kyrgyzstan']

 # Factors Indicating Poverty for Kyrgyzstan
 Based on our analysis of research conducted by the World Bank we have concluded that there are several factors that indicate whether a person lives in relative poverty within Kyrgyzstan. Those factors if considered independently do not provide a significant indicator of poverty, however if these factors are considered as a grouping than they can provide enough of a significant indicator. 
 
 ### Rural vs Urban
 In 2008, 74 percent of the country’s poor were rural residents, equal to 1.2 million people, while the rest of the poor (24
percent) resided in urban areas, totaling 430,000 people. (Taken from World Bank study)
Purely based on the percentages the poor are much more likely to live in rural areas than urban areas. 

### Mountainous vs Plain 
The mountainous terrain of the Kyrgyz Republic increases the likelihood of poverty. The incidence of poverty is greatest in the mountains and least on the plains(Taken from World Bank Study)

<a href="https://imgur.com/TKWnwLj"><img src="https://i.imgur.com/TKWnwLj.png" title="source: imgur.com" /></a>

### Without Children vs With Children(Household Size)
On a national level, 85% of the poor households had one or more child, whereas the 52% of the non-poor households had one or more child. The numbers indicated that poor households are more likely to have children. 

<a href="https://imgur.com/cL8uuvB"><img src="https://i.imgur.com/cL8uuvB.png" title="source: imgur.com" /></a>


### Regional Levels of Poverty
Regions within the country also vary widely in poverty levels. Kyrgyzstan divides their regions as "oblasts". Some of these regions within the country have much higher levels of poverty when compared to others. 
￼
<a href="https://imgur.com/Z0uH3aq"><img src="https://i.imgur.com/Z0uH3aq.png" title="source: imgur.com" /></a>





### ------------------------------------------------------------------------------------------------------------------------------------------------------------------
## With all of these factors put together we can more acurately map the level of poverty that a person lives in.

If a loan from Kyrgyzstan is not flagged for having any of the indicators of poverty we can give it a score of 0 out of 4. This score will indicate that this person lives in an urban area, with an environment of plains instead of mountains, this person probably does not have children and this person does not live in a specified region or city that has higher levels of poverty.

If a person receives a score of 4 out of 4, then this indicates that this person lives in environment that is a major indicator of massive poverty. 
 


In [203]:
file = pd.read_csv('../input/kiva_loans.csv', usecols = ["country", "region", "use"])

##  These are the major cities within the country. We will use these as indicators of an urban area.

In [204]:
cities = ["Balykchy", "Batken", "Bishkek", "Bordunskiy", "Cholpon-Ata", "Cholponata", 
          "Gulcha", "Isfana", "Jalal-Abad", "Jalalabad", "Kadamjay", "Kaindy", "Kant", 
          "Kara-Balta", "Karabalta", "Karakol", "Kara-Suu", "Karasuu", "Kemin", 
          "Kerben", "Ketmen'tebe","Ketmentebe", "Khaidarkan", "Kochkor-Ata", "Kochkorata",
          "Kok-Janggak", "Kokjanggak","Kok-Tash", "Koktash", "Kyzyl-Jar", "Kyzyljar",
          "Kyzyl-Kiya", "Kyzylkiya", "Mailuu-Suu", "Mailuusuu", "Naryn", "Nookat",
          "Orlovka", "Orto-Toyok", "Ortotoyok", "Osh", "Pristan'-Przheval'sk",
          "Pristanprzhevalsk", "Shamaldy-Say", "Shamaldysay", "Shopokov",
          "Sulukta", "Talas", "Tash-Komur", "Tashkomur", "Tokmok", "Toktogul","Uzgen",
          "Vostochny"]

In [205]:
urban_loans = loans_kyrgyzstan[loans_kyrgyzstan['region'].str.contains('|'.join(cities), na=False) & ~loans_kyrgyzstan['region'].str.contains('village', na=False)]
urban_loans

In [206]:
subFile = file[(file["country"].str.contains("Kyrgyzstan")==True)]

In [207]:
dicty = {}
allReg = {
    "Bishkek":0, 
    "Chui":0,
    "Issyk-Kul":0,
    "Naryn":0,
    "Batken":0,      
    "Jalal-Abad":0, 
    "Talas":0,
    "Osh (city)": 0,
    "Osh":0
}

for x in subFile["region"]:
    if "Bishkek" in str(x): allReg["Bishkek"]+=1
    elif "Chui" in str(x): allReg["Chui"]+=1
    elif "Issyk-Kul" in str(x): allReg["Issyk-Kul"]+=1
    elif "Naryn" in str(x): allReg["Naryn"]+=1
    elif "Batken" in str(x) or "Batket" in str(x): allReg["Batken"]+=1
    elif "Jalalabad" in str(x) or "Jalalabat" in str(x): allReg["Jalal-Abad"]+=1
    elif "Talas" in str(x): allReg["Talas"]+=1
    elif "Osh region" in str(x): allReg["Osh"]+=1
    elif "Osh" in str(x): allReg["Osh (city)"]+=1


for x in subFile["region"]:
    if any(y in str(x) for y in regions): continue
    else: dicty[x] = dicty.get(x, 0) + 1

## These are the number of loans within specific regions of the country.  These numbers exclude the villages.

In [208]:
for x in allReg: print(x, allReg[x])

In [209]:
poor_dist = {
    "Bishkek":8, 
    "Chui":7,
    "Issyk-Kul":14,
    "Naryn":7,
    "Batken":5,
    "Jalal-Abad":24,
    "Talas":6,
    "Osh (city)": 30,
    "Osh":30
}


## Since most of Kyrgyzstan is mountainous, below we have included a list of the non-mountainous regions within the country

In [210]:
non_mount_regions = ["Karabalta", "Kara-Balta", "Bishkek", "Tokmok", "Talas", "Kirov",
                     "Tash-Komur", "Tashkomur", "Jalal-Abad", "Jalalabad", "Osh", 
                     "Kyzyl-Kiya", "Kyzylkiya"]

In [211]:
non_mountainous_loans = loans_kyrgyzstan[loans_kyrgyzstan['region'].str.contains('|'.join(non_mount_regions), na=False)]
non_mountainous_loans

## In Kyrgyzstan having more than one child in the household can indicate a higher level of poverty. 
### Since the loan data does not provide the number of children living in the household we have opted to use the description of the loan usecase as a potential indicator of children in the household.

The code queries the "use" column for phrases such as "her child", "her children", "his child" etc.

In [232]:
children_loans = loans_kyrgyzstan[loans_kyrgyzstan["use"].str.contains("her child|her children|his child| his children", na=False)]
children_loans

## Certain cities and regions have a lower rate of poverty while some other cities and regions have a higher rate. 

### These are the cities and regions that have a lowest poverty rate.

In [213]:
not_poor_cities_or_region = ["Bishkek", "Chuy", "Chui"]

In [214]:
low_poverty_regional_loans = loans_kyrgyzstan[loans_kyrgyzstan['region'].str.contains('|'.join(not_poor_cities_or_region), na=False)]
low_poverty_regional_loans

### These are the cities and regions with the highest poverty rates.

In [215]:
poor_cities_or_regions = ["Issyk-Kul", "Issykkul", "Cholpon-Ata", "Cholponata", "Karakol",
                          "Kyzyl-Suu", "Kyzylsuu", "Balykchy", "Kara-Koo", "Karakoo",
                          "Tosor", "Bosteri", "Bokonbaev", "Tup"]

In [216]:
high_poverty_regional_loans = loans_kyrgyzstan[loans_kyrgyzstan['region'].str.contains('|'.join(poor_cities_or_regions), na=False)]
high_poverty_regional_loans

## Filtering all loans based on a criteria and assigning a positive, negative, or neutral output

In [217]:
urban_positive = loans_kyrgyzstan.copy(deep=True)
urban_positive["urban"] = loans_kyrgyzstan.isin(urban_loans)['id']
urban_positive

In [218]:
non_mount_positive = loans_kyrgyzstan.copy(deep=True)
non_mount_positive["non_mount"] = loans_kyrgyzstan.isin(non_mountainous_loans)['id']
non_mount_positive

In [233]:
children_positive = loans_kyrgyzstan.copy(deep=True)
children_positive["children"] = loans_kyrgyzstan.isin(children_loans)['id']
children_positive

In [234]:
high_poverty_region_positive = loans_kyrgyzstan.copy(deep=True)
high_poverty_region_positive["high_poverty_region"] = loans_kyrgyzstan.isin(high_poverty_regional_loans)['id']
high_poverty_region_positive

# Final result 
## Now we have the original loans dataset with the extra columns that indicate the whether the poverty factor is True or False

In [242]:
final_loans = urban_positive.copy(deep=True)
final_loans["non_mount"] = non_mount_positive["non_mount"]
final_loans["children"] = children_positive["children"]
final_loans["high_poverty_region"] = high_poverty_region_positive["high_poverty_region"]
final_loans

## If you look to the right of the dataframe you will see 4 columns indicating values to the factors we analyzed