## Project Assignment.

#### Introduction 

Kiva is an online crowdfunding platform to extend financial services to poor and financially excluded people around the world. Kiva lenders have provided over $1 billion dollars in loans to over 2 million people. In order to set investment priorities, help inform lenders, and understand their target communities, answering some fundamental questions is necessary. However, this requires inference based on a limited set of information for each borrower.

In this project our country in focus will be the Philippines,(though Kenya will be featured in some questions). And the vital questions to be answered are as follows: 

1. What country got the most loans? Does the number of times a country is referenced relate to the quantity of loans it got?
2. What sector got the most loans? Does the number of times a sector is referenced relate to the quantity of loans it got?
3. For the top sector, what activity had the highest amount of loans? What does that say about that activity?
4. Which region had the highest loan amounts?
5. What is the general repayment habit?
6. What were the numbers between male and female recipients? Does that communicate anything?
7. What sector in Kenya got the highest loans?
8. What region in Kenya got the highest loans?

#### Importing of the libraries to be used, reading in the data and data cleaning.

In [None]:
import pandas as pd
import numpy as np
import plotly.express as px
import seaborn as sns
import matplotlib.pyplot as plt
import warnings

warnings.filterwarnings ('ignore')

In [None]:
kiva = pd.read_csv ('../input/data-science-for-good-kiva-crowdfunding/kiva_loans.csv')

In [None]:
philippines = kiva[kiva['country']=='Philippines']. reset_index (drop = True)


In [None]:
kiva.info () #there's missing info.

In [None]:
kiva.isnull().sum() #where the info is missing.

In [None]:
kiva.duplicated ().sum() #no duplicated info

In [None]:
kiva.dropna (subset=['borrower_genders'], inplace = True) #dropped all the null values with no significance difference.

In [None]:
kiva.info()

In [None]:
kiva['borrower_genders']. sample (10)

In [None]:
kiva ['borrower_genders']. nunique() #means there are more entries in a number of cells.

In [None]:
def gender_lead (gender):
    gender  = str (gender)
    if gender.startswith ('f'):
        gender = 'female'
    else:
        gender = 'male'
    return gender

In [None]:
kiva ['gender_lead'] = kiva['borrower_genders'].apply (gender_lead)
kiva. head ()

### 1. What country got the most loans? Does the number of times a country is referenced relate to the quantity of loans it got?

In [None]:
kiva.columns

In [None]:
kiva['country'].value_counts().head(10). reset_index() # ANS: Philippines

In [None]:
loans= kiva.groupby('country')['loan_amount'].sum(). sort_values(ascending = False).reset_index().head(10)
loans # ANS:no, not all the countries appear in the table above appear in the table below.

### 2. What sector got the most loans? Does the number of times a sector is referenced relate to the quantity of loans it got?

In [None]:
kiva ['sector'].value_counts().head(10).reset_index() # ANS: Agriculture.

In [None]:
ksector_loans = kiva.groupby('sector')['loan_amount'].sum().sort_values(ascending = False).reset_index().head (10)
ksector_loans #ANS: Yes, there is a relationship seen between the no.of loans and the amount received.

In [None]:
philippines ['sector'].value_counts().head(10).reset_index () # Ans: Retail

In [None]:
psector_loans = philippines.groupby ('sector')['loan_amount'].sum().sort_values(ascending = False).head(10).reset_index()
psector_loans  #ANS: Yes, there is a relationship seen between the no.of loans and the amount received.

In [None]:

plt.figure(figsize=(10,5))
plt.xticks (rotation= 75)
plt.title ('Loan Amount by Sector')
sns.barplot(x= 'sector', y= 'loan_amount',data= philippines, ci = None, color = 'lightblue',estimator = np.sum)

### 3. For the top sector, what activity had the highest amount of loans? What does that say about that activity?

In [None]:
retail= philippines[philippines['sector']=='Retail']. reset_index(drop=True)
retail_activities = retail.groupby ('activity')['loan_amount']. sum().sort_values(ascending = False). reset_index().head(10)
retail_activities #ANS: General Store. The Philippines are depandant on loans for even basic needs. 

In [None]:
px.bar(retail_activities,x= 'activity', y= 'loan_amount')


### 4. Which region had the highest loan amounts

In [None]:
region = philippines. groupby ('region')['loan_amount'].sum().sort_values (ascending = False).head (10).reset_index()
region #ANS: Narra, Palawan.

### 5. What is the general repayment habit?

In [None]:
loan_themes = pd.read_csv('../input/data-science-for-good-kiva-crowdfunding/loan_themes_by_region.csv')
philippines_themes = loan_themes [loan_themes['country']== 'Philippines']. reset_index (drop=True)


In [None]:
fig2 = px.violin(philippines, y ='repayment_interval')
fig2.show() # mostly irregular in the Philippines.

As you can see from above, to my surprise, most borrowers in the Philippines fall under the irregular repayment interval. 

### 6. What were the numbers between male and female recipients? Does that communicate anything?

In [None]:
philippines.groupby ('borrower_genders')['id'].count().sort_values(ascending = False).head(10).reset_index()

ANS: Almost 95% of borrowers in the Philippines are women.

In [None]:
fig = px.pie(philippines , values ='id', names='borrower_genders', title='Number of Male recipients vs Female')
fig

### 7. What sector in Kenya got the highest loans?

In [None]:
kenya = kiva [kiva['country'] == 'Kenya']. reset_index (drop = True)
kenya_sector = kenya.groupby('sector')['loan_amount'].sum().head (10).sort_values (ascending = False). reset_index ()
kenya_sector

ANS: Agriculture got the highest loans.

In [None]:
px.bar(kenya_sector,y = 'sector', x ='loan_amount')

In [None]:
plt.figure (figsize=(10,5))
plt.xticks (rotation=75)
plt.title ('Loan Amount by Sector')
sns.barplot(y='sector', x='loan_amount', data=kenya_sector, ci =None, color = 'lightblue', estimator=np.sum)
plt.show()

### 8. What region in Kenya got the highest loans? 

In [None]:
kenya_regions = kenya.groupby('region')['loan_amount'].sum().sort_values(ascending = False).head (10).reset_index()
kenya_regions

ANS: Webuye was the region in Kenya that received the highest amount of loans. 

In [None]:
px.bar(kenya_regions,x ='region', y = 'loan_amount')