# Introduction to Pandas

##### In this introduction we'll get to know how to load simple data files like excel sheets and csv files (similar to excel sheets), using pandas. Also, we'll learn how to explore this data and learn more about it. 
##### First let's import library.

In [1]:
import pandas as pd 

##### Now we can load the data, the data can be stored anywhere, on your laptop or on the web, here, we load data available online on github using a link.
##### The data file is csv, to load it we use a method in pandas called read_csv, which mostly works with excel files as well.

In [2]:
df = pd.read_csv('https://raw.githubusercontent.com/StarBoy01/IndabaX-Sudan-2019/master/Train_v2.csv')

# The dataset
##### Financial Inclusion remains one of the main obstacles to economic and human development in Africa. For example, across Kenya, Rwanda, Tanzania, and Uganda only 9.1 million adults (or 13.9% of the adult population) have access to or use a commercial bank account.

Traditionally, access to bank accounts has been regarded as an indicator of financial inclusion. Despite the proliferation of mobile money in Africa, and the growth of innovative fintech solutions, banks still play a pivotal role in facilitating access to financial services. Access to bank accounts enable households to save and facilitate payments while also helping businesses build up their credit-worthiness and improve their access to other finance services. Therefore, access to bank accounts is an essential contributor to long-term economic growth.

The dataset contains demographic information and what financial services are used by approximately 33,610 individuals across East Africa. This data was extracted from various Finscope surveys ranging from 2016 to 2018, You are asked to make predictions for each unique id in the test dataset about the likelihood of the person having a bank account.

In [None]:
df.head()

Unnamed: 0,country,year,uniqueid,bank_account,location_type,cellphone_access,household_size,age_of_respondent,gender_of_respondent,relationship_with_head,marital_status,education_level,job_type
0,Kenya,2018,uniqueid_1,Yes,Rural,Yes,3,24,Female,Spouse,Married/Living together,Secondary education,Self employed
1,Kenya,2018,uniqueid_2,No,Rural,No,5,70,Female,Head of Household,Widowed,No formal education,Government Dependent
2,Kenya,2018,uniqueid_3,Yes,Urban,Yes,5,26,Male,Other relative,Single/Never Married,Vocational/Specialised training,Self employed
3,Kenya,2018,uniqueid_4,No,Rural,Yes,5,34,Female,Head of Household,Married/Living together,Primary education,Formally employed Private
4,Kenya,2018,uniqueid_5,No,Urban,No,8,26,Male,Child,Single/Never Married,Primary education,Informally employed


In [None]:
df.tail()

Unnamed: 0,country,year,uniqueid,bank_account,location_type,cellphone_access,household_size,age_of_respondent,gender_of_respondent,relationship_with_head,marital_status,education_level,job_type
23519,Uganda,2018,uniqueid_2113,No,Rural,Yes,4,48,Female,Head of Household,Divorced/Seperated,No formal education,Other Income
23520,Uganda,2018,uniqueid_2114,No,Rural,Yes,2,27,Female,Head of Household,Single/Never Married,Secondary education,Other Income
23521,Uganda,2018,uniqueid_2115,No,Rural,Yes,5,27,Female,Parent,Widowed,Primary education,Other Income
23522,Uganda,2018,uniqueid_2116,No,Urban,Yes,7,30,Female,Parent,Divorced/Seperated,Secondary education,Self employed
23523,Uganda,2018,uniqueid_2117,No,Rural,Yes,10,20,Male,Child,Single/Never Married,Secondary education,No Income


In [None]:
set(df.country)

{'Kenya', 'Rwanda', 'Tanzania', 'Uganda'}

In [None]:
set(df.year)

{2016, 2017, 2018}

In [None]:
set(df.household_size)

{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21}

In [None]:
set(df.relationship_with_head)

{'Child',
 'Head of Household',
 'Other non-relatives',
 'Other relative',
 'Parent',
 'Spouse'}

In [None]:
set(df.marital_status)

{'Divorced/Seperated',
 'Dont know',
 'Married/Living together',
 'Single/Never Married',
 'Widowed'}

In [None]:
set(df.education_level)

{'No formal education',
 'Other/Dont know/RTA',
 'Primary education',
 'Secondary education',
 'Tertiary education',
 'Vocational/Specialised training'}

In [None]:
set(df.job_type)

{'Dont Know/Refuse to answer',
 'Farming and Fishing',
 'Formally employed Government',
 'Formally employed Private',
 'Government Dependent',
 'Informally employed',
 'No Income',
 'Other Income',
 'Remittance Dependent',
 'Self employed'}

In [None]:
df.isnull().sum()

country                   0
year                      0
uniqueid                  0
bank_account              0
location_type             0
cellphone_access          0
household_size            0
age_of_respondent         0
gender_of_respondent      0
relationship_with_head    0
marital_status            0
education_level           0
job_type                  0
dtype: int64

In [None]:
df.describe()

Unnamed: 0,year,household_size,age_of_respondent
count,23524.0,23524.0,23524.0
mean,2016.975939,3.797483,38.80522
std,0.847371,2.227613,16.520569
min,2016.0,1.0,16.0
25%,2016.0,2.0,26.0
50%,2017.0,3.0,35.0
75%,2018.0,5.0,49.0
max,2018.0,21.0,100.0


In [None]:
df.dtypes

country                   object
year                       int64
uniqueid                  object
bank_account              object
location_type             object
cellphone_access          object
household_size             int64
age_of_respondent          int64
gender_of_respondent      object
relationship_with_head    object
marital_status            object
education_level           object
job_type                  object
dtype: object

In [None]:
df.iloc[0]

country                                     Kenya
year                                         2018
uniqueid                               uniqueid_1
bank_account                                  Yes
location_type                               Rural
cellphone_access                              Yes
household_size                                  3
age_of_respondent                              24
gender_of_respondent                       Female
relationship_with_head                     Spouse
marital_status            Married/Living together
education_level               Secondary education
job_type                            Self employed
Name: 0, dtype: object

In [None]:
df.iloc[0,[2,4]]

uniqueid         uniqueid_1
location_type         Rural
Name: 0, dtype: object

In [None]:
#slicing
df.iloc[0:5] #slicing rows

Unnamed: 0,country,year,uniqueid,bank_account,location_type,cellphone_access,household_size,age_of_respondent,gender_of_respondent,relationship_with_head,marital_status,education_level,job_type
0,Kenya,2018,uniqueid_1,Yes,Rural,Yes,3,24,Female,Spouse,Married/Living together,Secondary education,Self employed
1,Kenya,2018,uniqueid_2,No,Rural,No,5,70,Female,Head of Household,Widowed,No formal education,Government Dependent
2,Kenya,2018,uniqueid_3,Yes,Urban,Yes,5,26,Male,Other relative,Single/Never Married,Vocational/Specialised training,Self employed
3,Kenya,2018,uniqueid_4,No,Rural,Yes,5,34,Female,Head of Household,Married/Living together,Primary education,Formally employed Private
4,Kenya,2018,uniqueid_5,No,Urban,No,8,26,Male,Child,Single/Never Married,Primary education,Informally employed


In [None]:
#slicing
df.iloc[0:5,[2,3,7]] #slicing rows

Unnamed: 0,uniqueid,bank_account,age_of_respondent
0,uniqueid_1,Yes,24
1,uniqueid_2,No,70
2,uniqueid_3,Yes,26
3,uniqueid_4,No,34
4,uniqueid_5,No,26


In [None]:
df.loc[2]

country                                             Kenya
year                                                 2018
uniqueid                                       uniqueid_3
bank_account                                          Yes
location_type                                       Urban
cellphone_access                                      Yes
household_size                                          5
age_of_respondent                                      26
gender_of_respondent                                 Male
relationship_with_head                     Other relative
marital_status                       Single/Never Married
education_level           Vocational/Specialised training
job_type                                    Self employed
Name: 2, dtype: object

In [None]:
df.loc[0:5,["uniqueid", "bank_account", "age_of_respondent"]]

Unnamed: 0,uniqueid,bank_account,age_of_respondent
0,uniqueid_1,Yes,24
1,uniqueid_2,No,70
2,uniqueid_3,Yes,26
3,uniqueid_4,No,34
4,uniqueid_5,No,26
5,uniqueid_6,No,26


In [None]:
a = len(df[df.bank_account=='Yes'])
b = len(df[df.bank_account=='No'])
c = len(df)
print('We have an imbalanced dataset with a %i/%i ratio'%((b/c*100),(a/c*100)+1))

We have an imbalanced dataset with a 85/15 ratio


In [None]:
male_data = df.loc[df["gender_of_respondent"]=="Male"]
female_data = df.loc[df["gender_of_respondent"]=="Female"]

In [None]:
print(len(male_data))
print(len(female_data))

9647
13877


In [None]:
older_than_40 = df.loc[df["age_of_respondent"] > 40]

In [None]:
older_than_40

Unnamed: 0,country,year,uniqueid,bank_account,location_type,cellphone_access,household_size,age_of_respondent,gender_of_respondent,relationship_with_head,marital_status,education_level,job_type
1,Kenya,2018,uniqueid_2,No,Rural,No,5,70,Female,Head of Household,Widowed,No formal education,Government Dependent
7,Kenya,2018,uniqueid_8,No,Rural,Yes,1,42,Female,Head of Household,Married/Living together,Tertiary education,Formally employed Government
8,Kenya,2018,uniqueid_9,Yes,Rural,Yes,3,54,Male,Head of Household,Married/Living together,Secondary education,Farming and Fishing
9,Kenya,2018,uniqueid_10,No,Urban,Yes,3,76,Female,Head of Household,Divorced/Seperated,No formal education,Remittance Dependent
11,Kenya,2018,uniqueid_12,Yes,Rural,Yes,3,69,Male,Head of Household,Married/Living together,Secondary education,Other Income
...,...,...,...,...,...,...,...,...,...,...,...,...,...
23497,Uganda,2018,uniqueid_2085,No,Urban,No,1,74,Female,Head of Household,Widowed,Secondary education,Other Income
23505,Uganda,2018,uniqueid_2095,No,Rural,Yes,7,45,Male,Head of Household,Married/Living together,Primary education,Self employed
23508,Uganda,2018,uniqueid_2098,No,Rural,Yes,6,65,Female,Head of Household,Widowed,Primary education,Self employed
23512,Uganda,2018,uniqueid_2102,No,Rural,No,2,57,Female,Head of Household,Divorced/Seperated,No formal education,Other Income


In [None]:
younger_than_20 = df.loc[df["age_of_respondent"] < 20]

In [None]:
younger_than_20

Unnamed: 0,country,year,uniqueid,bank_account,location_type,cellphone_access,household_size,age_of_respondent,gender_of_respondent,relationship_with_head,marital_status,education_level,job_type
24,Kenya,2018,uniqueid_25,No,Urban,No,7,18,Male,Other relative,Single/Never Married,Secondary education,Remittance Dependent
41,Kenya,2018,uniqueid_42,No,Urban,Yes,4,19,Female,Other relative,Single/Never Married,Secondary education,Remittance Dependent
52,Kenya,2018,uniqueid_53,No,Urban,No,5,16,Female,Other relative,Single/Never Married,Primary education,Remittance Dependent
53,Kenya,2018,uniqueid_54,No,Rural,Yes,2,17,Male,Other relative,Single/Never Married,Secondary education,Remittance Dependent
76,Kenya,2018,uniqueid_77,No,Urban,No,4,17,Male,Other relative,Single/Never Married,Secondary education,Remittance Dependent
...,...,...,...,...,...,...,...,...,...,...,...,...,...
23499,Uganda,2018,uniqueid_2087,No,Urban,No,6,16,Female,Parent,Single/Never Married,Secondary education,No Income
23502,Uganda,2018,uniqueid_2090,No,Urban,Yes,3,17,Female,Parent,Single/Never Married,Vocational/Specialised training,Other Income
23511,Uganda,2018,uniqueid_2101,No,Rural,No,6,19,Female,Parent,Single/Never Married,Secondary education,No Income
23515,Uganda,2018,uniqueid_2108,No,Rural,No,6,16,Male,Parent,Single/Never Married,Primary education,Other Income


In [None]:
younger_than_20_female = df.loc[(df["age_of_respondent"] < 20) & (df["gender_of_respondent"] == "Female")]

In [None]:
younger_than_20_female

Unnamed: 0,country,year,uniqueid,bank_account,location_type,cellphone_access,household_size,age_of_respondent,gender_of_respondent,relationship_with_head,marital_status,education_level,job_type
41,Kenya,2018,uniqueid_42,No,Urban,Yes,4,19,Female,Other relative,Single/Never Married,Secondary education,Remittance Dependent
52,Kenya,2018,uniqueid_53,No,Urban,No,5,16,Female,Other relative,Single/Never Married,Primary education,Remittance Dependent
83,Kenya,2018,uniqueid_84,No,Rural,Yes,7,18,Female,Child,Single/Never Married,Secondary education,Remittance Dependent
104,Kenya,2018,uniqueid_105,Yes,Urban,No,4,16,Female,Other relative,Single/Never Married,Secondary education,Remittance Dependent
106,Kenya,2018,uniqueid_107,No,Rural,No,4,19,Female,Spouse,Married/Living together,Primary education,Informally employed
...,...,...,...,...,...,...,...,...,...,...,...,...,...
23471,Uganda,2018,uniqueid_2056,No,Rural,Yes,15,17,Female,Other relative,Married/Living together,Secondary education,Self employed
23482,Uganda,2018,uniqueid_2070,No,Rural,No,10,18,Female,Parent,Married/Living together,Secondary education,Self employed
23499,Uganda,2018,uniqueid_2087,No,Urban,No,6,16,Female,Parent,Single/Never Married,Secondary education,No Income
23502,Uganda,2018,uniqueid_2090,No,Urban,Yes,3,17,Female,Parent,Single/Never Married,Vocational/Specialised training,Other Income


# Timeseries Data and Location Data
##### For a country like Rwanda obtaining accurate measurements of vegetation is necessary for policy makers to lay out plans to counter climate change and enhance agricultural productivity. In recent years, satellite based imagery was found to be quite effective in obtaining such measurements especially for areas that are very hard to access and cover.
An equally important goal, following this, is the ability to forecast vegetation, and one popular feature that researchers use for this purpose is rainfall.
In this brief project, I use three datasets, one for the vegetation index, the other for rainfall, and the third for the longitude and latitude positions of all the districts in Rwanda to investigate how to model the relation between rainfall, vegetation, and the correlation between districts and the distances between them. The data provides monthly measurements of rainfall and vegetation between 2000 and 2014.

In [None]:
rain = pd.read_csv("https://raw.githubusercontent.com/MhmdGaffar/MCF-Spring2022/main/RwandaDistrictRainfall.csv", index_col=0)
vegetation = pd.read_csv("https://raw.githubusercontent.com/MhmdGaffar/MCF-Spring2022/main/RwandaDistrictVegetation.csv", index_col=0)
location = pd.read_csv("https://raw.githubusercontent.com/MhmdGaffar/MCF-Spring2022/main/RwandaDistrictCentroidsLongitude_Latitude.csv")

In [None]:
rain

Unnamed: 0,Year,2000,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,...,Unnamed: 171,Unnamed: 172,Unnamed: 173,Unnamed: 174,Unnamed: 175,Unnamed: 176,Unnamed: 177,Unnamed: 178,Unnamed: 179,Unnamed: 180
0,Month,JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,...,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC
1,Nyarugenge,52.9,52.9,100.0,103.9,84.4,15.6,11.8,17.6,29.4,...,135.3,121.7,43.2,23.5,11.8,50.9,72.6,105.9,113.8,66.7
2,Gasabo,52.9,60.4,113.3,103.1,74.9,17.6,11.8,23.5,29.4,...,156.0,132.5,47.1,19.2,11.8,51.4,74.9,113.3,123.5,72.1
3,Kicukiro,52.2,60.8,107.1,106.1,84.6,17.2,11.8,21.8,29.4,...,146.1,122.8,40.9,22.8,11.8,50.5,69.4,110.5,117.6,68.9
4,Nyanza,52.9,52.9,100.0,105.9,82.4,17.6,11.8,17.6,29.4,...,135.3,117.6,41.2,23.5,11.8,52.9,70.6,105.9,111.8,64.7
5,Gisagara,51.1,58.8,104.6,107.7,74.4,17.6,11.8,21.4,29.4,...,142.0,127.6,37.9,21.3,11.8,43.8,59.7,118.3,124.6,65.9
6,Nyaruguru,52.9,52.9,100.0,100.0,88.2,11.8,11.8,17.6,29.4,...,135.3,129.4,47.1,23.5,11.8,47.1,76.5,105.9,117.6,70.6
7,Huye,52.9,52.9,100.0,105.9,82.4,17.6,11.8,17.6,29.4,...,135.3,117.6,41.2,23.5,11.8,52.9,70.6,105.9,111.8,64.7
8,Nyamagabe,52.9,53.9,101.0,105.9,81.4,17.6,11.8,18.6,29.4,...,136.3,118.6,41.2,23.5,11.8,52.0,68.6,107.8,113.7,64.7
9,Ruhango,52.9,52.9,100.0,104.7,83.5,16.5,11.8,17.6,29.4,...,135.3,120.0,42.4,23.5,11.8,51.8,71.8,105.9,112.9,65.9


In [None]:
vegetation

Unnamed: 0.1,Unnamed: 0,2000,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,...,Unnamed: 171,Unnamed: 172,Unnamed: 173,Unnamed: 174,Unnamed: 175,Unnamed: 176,Unnamed: 177,Unnamed: 178,Unnamed: 179,Unnamed: 180
0,District,JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,...,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC
1,Nyarugenge,,,,,127.1335578,114.1946582,97.70739397,95.6993281,88.72996508,...,113.5968727,133.5323565,128.0716025,116.1183069,102.4158842,98.55087277,113.8274487,119.2340887,130.368276,119.3100344
2,Gasabo,,,,,134.2743651,121.7468629,101.7610175,96.82107909,89.28289474,...,115.2305092,133.1912077,131.8758071,114.5605264,98.38763667,94.16093452,113.13286,116.2225775,129.540706,125.1903701
3,Kicukiro,,,,,127.1976424,112.7092868,95.87081035,93.06527811,86.07576657,...,112.6646457,130.9590942,126.1412432,108.4875877,96.01755808,93.20558862,107.2355295,107.6467842,127.4850245,118.0183972
4,Nyanza,,,,,130.4075654,115.4760824,100.6323569,95.5655591,92.3354103,...,124.8438243,141.7545787,127.0757298,117.4069556,107.4815961,101.1194758,112.5017959,116.7108726,123.9542577,124.504017
5,Gisagara,,,,,124.8776285,113.0878269,100.303528,97.35343347,94.04727794,...,122.378951,139.4817396,129.7025921,115.649987,101.0564055,95.38798995,99.38707174,116.9339847,127.2393951,126.9944472
6,Nyaruguru,,,,,129.6223645,122.6006955,111.2817847,102.5606031,104.0604661,...,131.5708759,137.8287837,117.217116,123.9767329,115.8049681,106.4813293,118.1106207,114.1687235,114.5449266,125.7016144
7,Huye,,,,,126.3618628,113.8752434,97.9911595,93.18905041,92.2657809,...,122.2654198,137.088027,121.4025916,114.5778402,104.6253319,97.37199998,104.3536074,109.8755958,122.7300579,124.6258138
8,Nyamagabe,,,,,128.071805,122.7266892,109.5242075,101.4714064,101.832852,...,131.7217154,137.309751,117.1247211,122.6311793,117.40781,108.9784361,117.1179184,111.469853,114.9533912,127.9292606
9,Ruhango,,,,,132.4418465,117.9845853,102.5604878,97.98432886,93.23430608,...,122.0068246,141.3005874,128.3861364,119.7785914,110.4851098,105.0989208,115.4977838,115.889171,125.5708164,124.8245938


In [None]:
rain = rain.transpose()
vegetation = vegetation.transpose()

In [None]:
rain

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,21,22,23,24,25,26,27,28,29,30
Year,Month,Nyarugenge,Gasabo,Kicukiro,Nyanza,Gisagara,Nyaruguru,Huye,Nyamagabe,Ruhango,...,Musanze,Burera,Gicumbi,Rwamagana,Nyagatare,Gatsibo,Kayonza,Kirehe,Ngoma,Bugesera
2000,JAN,52.9,52.9,52.2,52.9,51.1,52.9,52.9,52.9,52.9,...,58.3,53.4,52.9,58.8,56.1,57.6,54.5,52.9,53.9,52.9
Unnamed: 2,FEB,52.9,60.4,60.8,52.9,58.8,52.9,52.9,53.9,52.9,...,52.9,58.4,52.9,52.9,58.0,56.9,55.6,52.9,53.9,52.9
Unnamed: 3,MAR,100.0,113.3,107.1,100.0,104.6,100.0,100.0,101.0,100.0,...,115.5,114.2,103.5,118.2,115.0,101.2,101.1,100.0,101.0,105.6
Unnamed: 4,APR,103.9,103.1,106.1,105.9,107.7,100.0,105.9,105.9,104.7,...,99.9,105.9,100.0,101.6,105.9,98.8,101.1,100.0,99.0,97.4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Unnamed: 176,AUG,50.9,51.4,50.5,52.9,43.8,47.1,52.9,52.0,51.8,...,40.8,46.6,49.4,39.4,43.9,45.1,50.3,52.9,50.0,41.8
Unnamed: 177,SEP,72.6,74.9,69.4,70.6,59.7,76.5,70.6,68.6,71.8,...,55.3,64.2,64.7,60.3,61.5,52.9,60.4,64.7,60.8,53.5
Unnamed: 178,OCT,105.9,113.3,110.5,105.9,118.3,105.9,105.9,107.8,105.9,...,113.0,121.1,103.5,124.4,120.2,115.7,108.0,100.0,104.9,105.3
Unnamed: 179,NOV,113.8,123.5,117.6,111.8,124.6,117.6,111.8,113.7,112.9,...,124.2,119.2,115.3,119.6,124.4,123.5,117.1,111.8,115.7,122.9


In [None]:
vegetation

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,21,22,23,24,25,26,27,28,29,30
Unnamed: 0,District,Nyarugenge,Gasabo,Kicukiro,Nyanza,Gisagara,Nyaruguru,Huye,Nyamagabe,Ruhango,...,Musanze,Burera,Gicumbi,Rwamagana,Nyagatare,Gatsibo,Kayonza,Kirehe,Ngoma,Bugesera
2000,JAN,,,,,,,,,,...,,,,,,,,,,
Unnamed: 2,FEB,,,,,,,,,,...,,,,,,,,,,
Unnamed: 3,MAR,,,,,,,,,,...,,,,,,,,,,
Unnamed: 4,APR,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Unnamed: 176,AUG,98.55087277,94.16093452,93.20558862,101.1194758,95.38798995,106.4813293,97.37199998,108.9784361,105.0989208,...,99.90443778,98.72514412,101.3524284,92.95137986,97.56669862,99.36832326,86.17205492,78.66157696,90.48359181,89.36380903
Unnamed: 177,SEP,113.8274487,113.13286,107.2355295,112.5017959,99.38707174,118.1106207,104.3536074,117.1179184,115.4977838,...,117.8109488,114.697183,116.1353836,110.9870261,119.403915,118.3117874,105.0228725,93.95008702,112.5852871,103.8903267
Unnamed: 178,OCT,119.2340887,116.2225775,107.6467842,116.7108726,116.9339847,114.1687235,109.8755958,111.469853,115.889171,...,121.214655,115.3335714,119.8266879,112.0550634,121.5340752,117.9763528,102.2660595,95.46246348,109.2140601,106.101497
Unnamed: 179,NOV,130.368276,129.540706,127.4850245,123.9542577,127.2393951,114.5449266,122.7300579,114.9533912,125.5708164,...,133.7574998,132.2963088,135.5282211,131.5404202,143.1184966,133.0589542,124.1030551,120.4565205,133.5623833,126.9493872


In [None]:
import numpy as np
#Creating date features
years = list(np.arange(2000, 2015))
n_y = []
for year in years:
    new = [year]*12
    n_y += new
months = list(np.arange(1,13))*15
dates = []
for i in range(180):
    date = str(months[i]) + '/' + str(n_y[i])
    dates.append(date)

In [None]:
dates

['1/2000',
 '2/2000',
 '3/2000',
 '4/2000',
 '5/2000',
 '6/2000',
 '7/2000',
 '8/2000',
 '9/2000',
 '10/2000',
 '11/2000',
 '12/2000',
 '1/2001',
 '2/2001',
 '3/2001',
 '4/2001',
 '5/2001',
 '6/2001',
 '7/2001',
 '8/2001',
 '9/2001',
 '10/2001',
 '11/2001',
 '12/2001',
 '1/2002',
 '2/2002',
 '3/2002',
 '4/2002',
 '5/2002',
 '6/2002',
 '7/2002',
 '8/2002',
 '9/2002',
 '10/2002',
 '11/2002',
 '12/2002',
 '1/2003',
 '2/2003',
 '3/2003',
 '4/2003',
 '5/2003',
 '6/2003',
 '7/2003',
 '8/2003',
 '9/2003',
 '10/2003',
 '11/2003',
 '12/2003',
 '1/2004',
 '2/2004',
 '3/2004',
 '4/2004',
 '5/2004',
 '6/2004',
 '7/2004',
 '8/2004',
 '9/2004',
 '10/2004',
 '11/2004',
 '12/2004',
 '1/2005',
 '2/2005',
 '3/2005',
 '4/2005',
 '5/2005',
 '6/2005',
 '7/2005',
 '8/2005',
 '9/2005',
 '10/2005',
 '11/2005',
 '12/2005',
 '1/2006',
 '2/2006',
 '3/2006',
 '4/2006',
 '5/2006',
 '6/2006',
 '7/2006',
 '8/2006',
 '9/2006',
 '10/2006',
 '11/2006',
 '12/2006',
 '1/2007',
 '2/2007',
 '3/2007',
 '4/2007',
 '5/2007',


In [None]:
#Adding created date features to dataframe
rain['Date'] = dates
vegetation['Date'] = dates

In [None]:
rain

Year,Month,Nyarugenge,Gasabo,Kicukiro,Nyanza,Gisagara,Nyaruguru,Huye,Nyamagabe,Ruhango,...,Burera,Gicumbi,Rwamagana,Nyagatare,Gatsibo,Kayonza,Kirehe,Ngoma,Bugesera,Date
2000,JAN,52.9,52.9,52.2,52.9,51.1,52.9,52.9,52.9,52.9,...,53.4,52.9,58.8,56.1,57.6,54.5,52.9,53.9,52.9,1/2000
Unnamed: 2,FEB,52.9,60.4,60.8,52.9,58.8,52.9,52.9,53.9,52.9,...,58.4,52.9,52.9,58.0,56.9,55.6,52.9,53.9,52.9,2/2000
Unnamed: 3,MAR,100.0,113.3,107.1,100.0,104.6,100.0,100.0,101.0,100.0,...,114.2,103.5,118.2,115.0,101.2,101.1,100.0,101.0,105.6,3/2000
Unnamed: 4,APR,103.9,103.1,106.1,105.9,107.7,100.0,105.9,105.9,104.7,...,105.9,100.0,101.6,105.9,98.8,101.1,100.0,99.0,97.4,4/2000
Unnamed: 5,MAY,84.4,74.9,84.6,82.4,74.4,88.2,82.4,81.4,83.5,...,81.6,85.9,75.9,87.4,74.5,79.7,82.4,79.4,71.2,5/2000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Unnamed: 176,AUG,50.9,51.4,50.5,52.9,43.8,47.1,52.9,52.0,51.8,...,46.6,49.4,39.4,43.9,45.1,50.3,52.9,50.0,41.8,8/2014
Unnamed: 177,SEP,72.6,74.9,69.4,70.6,59.7,76.5,70.6,68.6,71.8,...,64.2,64.7,60.3,61.5,52.9,60.4,64.7,60.8,53.5,9/2014
Unnamed: 178,OCT,105.9,113.3,110.5,105.9,118.3,105.9,105.9,107.8,105.9,...,121.1,103.5,124.4,120.2,115.7,108.0,100.0,104.9,105.3,10/2014
Unnamed: 179,NOV,113.8,123.5,117.6,111.8,124.6,117.6,111.8,113.7,112.9,...,119.2,115.3,119.6,124.4,123.5,117.1,111.8,115.7,122.9,11/2014


In [None]:
#Coverting data features from str to datetime
rain['Date'] = pd.to_datetime(rain['Date'], format='%m/%Y')
vegetation['Date'] = pd.to_datetime(vegetation['Date'], format='%m/%Y')

In [None]:
rain.dtypes

Year
Month                 object
Nyarugenge            object
Gasabo                object
Kicukiro              object
Nyanza                object
Gisagara              object
Nyaruguru             object
Huye                  object
Nyamagabe             object
Ruhango               object
Muhanga               object
Kamonyi               object
Karongi               object
Rutsiro               object
Rubavu                object
Nyabihu               object
Ngororero             object
Rusizi                object
Nyamasheke            object
Rulindo               object
Gakenke               object
Musanze               object
Burera                object
Gicumbi               object
Rwamagana             object
Nyagatare             object
Gatsibo               object
Kayonza               object
Kirehe                object
Ngoma                 object
Bugesera              object
Date          datetime64[ns]
dtype: object

In [None]:
vegetation = vegetation.dropna()
rain = rain.dropna()

In [None]:
districts = [a for a in rain.columns]
districts = districts[1:-1]
districts

['Nyarugenge',
 'Gasabo',
 'Kicukiro',
 'Nyanza',
 'Gisagara',
 'Nyaruguru',
 'Huye',
 'Nyamagabe',
 'Ruhango',
 'Muhanga',
 'Kamonyi',
 'Karongi',
 'Rutsiro',
 'Rubavu',
 'Nyabihu',
 'Ngororero',
 'Rusizi',
 'Nyamasheke',
 'Rulindo',
 'Gakenke',
 'Musanze',
 'Burera',
 'Gicumbi',
 'Rwamagana',
 'Nyagatare',
 'Gatsibo',
 'Kayonza',
 'Kirehe',
 'Ngoma',
 'Bugesera']

In [None]:
for d in districts:
  rain[d] = rain[d].astype(float)
  vegetation[d] = vegetation[d].astype(float)


In [None]:
rain.dtypes

Year
Nyarugenge    float64
Gasabo        float64
Kicukiro      float64
Nyanza        float64
Gisagara      float64
Nyaruguru     float64
Huye          float64
Nyamagabe     float64
Ruhango       float64
Muhanga       float64
Kamonyi       float64
Karongi       float64
Rutsiro       float64
Rubavu        float64
Nyabihu       float64
Ngororero     float64
Rusizi        float64
Nyamasheke    float64
Rulindo       float64
Gakenke       float64
Musanze       float64
Burera        float64
Gicumbi       float64
Rwamagana     float64
Nyagatare     float64
Gatsibo       float64
Kayonza       float64
Kirehe        float64
Ngoma         float64
Bugesera      float64
dtype: object

In [None]:
rain = rain.set_index("Date")
vegetation = vegetation.set_index("Date")

In [None]:
rain

Year,Nyarugenge,Gasabo,Kicukiro,Nyanza,Gisagara,Nyaruguru,Huye,Nyamagabe,Ruhango,Muhanga,...,Musanze,Burera,Gicumbi,Rwamagana,Nyagatare,Gatsibo,Kayonza,Kirehe,Ngoma,Bugesera
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2000-01-01,52.9,52.9,52.2,52.9,51.1,52.9,52.9,52.9,52.9,52.9,...,58.3,53.4,52.9,58.8,56.1,57.6,54.5,52.9,53.9,52.9
2000-02-01,52.9,60.4,60.8,52.9,58.8,52.9,52.9,53.9,52.9,52.9,...,52.9,58.4,52.9,52.9,58.0,56.9,55.6,52.9,53.9,52.9
2000-03-01,100.0,113.3,107.1,100.0,104.6,100.0,100.0,101.0,100.0,100.0,...,115.5,114.2,103.5,118.2,115.0,101.2,101.1,100.0,101.0,105.6
2000-04-01,103.9,103.1,106.1,105.9,107.7,100.0,105.9,105.9,104.7,105.9,...,99.9,105.9,100.0,101.6,105.9,98.8,101.1,100.0,99.0,97.4
2000-05-01,84.4,74.9,84.6,82.4,74.4,88.2,82.4,81.4,83.5,82.4,...,78.3,81.6,85.9,75.9,87.4,74.5,79.7,82.4,79.4,71.2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2014-08-01,50.9,51.4,50.5,52.9,43.8,47.1,52.9,52.0,51.8,52.9,...,40.8,46.6,49.4,39.4,43.9,45.1,50.3,52.9,50.0,41.8
2014-09-01,72.6,74.9,69.4,70.6,59.7,76.5,70.6,68.6,71.8,70.6,...,55.3,64.2,64.7,60.3,61.5,52.9,60.4,64.7,60.8,53.5
2014-10-01,105.9,113.3,110.5,105.9,118.3,105.9,105.9,107.8,105.9,105.9,...,113.0,121.1,103.5,124.4,120.2,115.7,108.0,100.0,104.9,105.3
2014-11-01,113.8,123.5,117.6,111.8,124.6,117.6,111.8,113.7,112.9,111.8,...,124.2,119.2,115.3,119.6,124.4,123.5,117.1,111.8,115.7,122.9


In [None]:
vegetation.describe()

Unnamed: 0,Nyarugenge,Gasabo,Kicukiro,Nyanza,Gisagara,Nyaruguru,Huye,Nyamagabe,Ruhango,Muhanga,...,Musanze,Burera,Gicumbi,Rwamagana,Nyagatare,Gatsibo,Kayonza,Kirehe,Ngoma,Bugesera
count,176.0,176.0,176.0,176.0,176.0,176.0,176.0,176.0,176.0,176.0,...,176.0,176.0,176.0,176.0,176.0,176.0,176.0,176.0,176.0,176.0
mean,117.673097,120.916785,117.789774,124.334651,122.516529,124.525833,119.745814,124.214489,121.898057,120.239673,...,126.036186,121.78917,122.250618,121.469079,119.873444,119.325087,109.3065,105.619196,121.174691,111.15771
std,11.950647,14.355193,14.034153,13.651175,14.325425,10.22236,13.137031,9.626895,12.638478,9.777133,...,11.142542,11.680516,12.033339,16.24367,17.949726,15.075111,14.201544,14.069105,15.691485,13.401941
min,88.729965,89.282895,86.075767,92.33541,94.047278,102.560603,92.265781,101.471406,93.234306,98.709076,...,94.725593,90.89451,93.914034,83.392169,80.959234,84.875235,80.866587,78.661577,88.514729,80.818187
25%,108.96546,110.379368,107.924534,114.015147,111.333977,116.013301,109.880345,117.12302,112.477427,112.552701,...,118.272874,114.219178,113.305345,109.836589,105.405811,107.143556,96.936271,94.184879,108.69702,101.381032
50%,117.916967,121.138878,118.570669,125.756044,123.278615,124.298115,120.732938,125.200653,122.73506,121.080996,...,127.092164,122.748343,122.267208,123.358683,119.470208,120.031448,109.53844,107.423297,123.598082,111.298605
75%,126.177026,131.57127,127.512742,134.551921,132.82789,132.724708,129.146504,131.621163,130.322388,127.015881,...,133.651004,130.776672,131.393471,134.009255,137.359668,131.886295,120.55663,117.479901,133.670699,120.415292
max,141.844063,149.150931,146.537968,150.380443,152.008141,144.31496,149.681245,145.046715,149.006357,139.661664,...,151.525474,144.766987,146.622064,152.635095,155.625355,149.058439,137.492614,132.738837,149.754878,141.224086


In [None]:
rain.describe()

Year,Nyarugenge,Gasabo,Kicukiro,Nyanza,Gisagara,Nyaruguru,Huye,Nyamagabe,Ruhango,Muhanga,...,Musanze,Burera,Gicumbi,Rwamagana,Nyagatare,Gatsibo,Kayonza,Kirehe,Ngoma,Bugesera
count,180.0,180.0,180.0,180.0,180.0,180.0,180.0,180.0,180.0,180.0,...,180.0,180.0,180.0,180.0,180.0,180.0,180.0,180.0,180.0,180.0
mean,90.478889,94.986111,93.267222,90.637222,92.091667,90.190556,90.637222,90.997222,90.549444,90.637222,...,91.185,94.425,91.063889,91.558333,93.404444,91.578333,91.095,89.938889,90.406111,89.572778
std,55.226489,58.906774,57.530493,55.135879,56.895633,55.530277,55.135879,55.444542,55.182145,55.135879,...,58.917486,58.799647,56.425624,59.979807,58.911134,56.757501,55.873975,55.029222,55.482255,56.144034
min,11.8,11.8,11.8,11.8,11.8,11.8,11.8,11.8,11.8,11.8,...,11.8,11.8,11.8,11.8,11.8,11.8,11.8,11.8,11.8,11.8
25%,50.9,50.325,50.375,52.9,48.6,47.1,52.9,52.675,51.8,52.9,...,41.85,46.975,47.1,43.0,46.15,48.525,49.7,47.1,48.75,44.1
50%,85.3,91.95,90.2,85.3,85.0,85.3,85.3,85.3,85.3,85.3,...,84.65,87.25,88.2,86.9,84.4,89.4,86.4,88.2,87.75,87.8
75%,124.025,131.0,126.575,123.5,125.7,123.5,123.5,124.75,123.8,123.5,...,126.5,132.275,121.775,126.175,130.3,124.0,123.775,117.6,121.075,122.975
max,286.4,273.1,288.2,282.4,260.7,294.1,282.4,280.4,284.7,282.4,...,298.5,264.9,303.5,280.7,270.1,263.1,277.0,282.4,275.5,266.2


In [None]:
#Averaging the columns together before getting monthly groups
rain_avg = rain.mean(axis=1)
veg_avg = vegetation.mean(axis=1)

In [None]:
#Getting monthly groups
monthly_rain = rain_avg.groupby(by=[rain_avg.index.month])
monthly_veg = veg_avg.groupby(by=[veg_avg.index.month])

In [None]:
rain_avg

Date
2000-01-01     53.993333
2000-02-01     54.816667
2000-03-01    105.773333
2000-04-01    102.820000
2000-05-01     81.696667
                 ...    
2014-08-01     47.743333
2014-09-01     66.023333
2014-10-01    110.296667
2014-11-01    118.373333
2014-12-01     66.710000
Length: 180, dtype: float64

In [None]:
for i in monthly_rain:
  print(i)

(1, Date
2000-01-01    53.993333
2001-01-01    85.536667
2002-01-01    74.730000
2003-01-01    57.150000
2004-01-01    84.456667
2005-01-01    64.963333
2006-01-01    60.340000
2007-01-01    72.086667
2008-01-01    55.443333
2009-01-01    68.163333
2010-01-01    97.666667
2011-01-01    78.883333
2012-01-01    47.976667
2013-01-01    77.653333
2014-01-01    57.206667
dtype: float64)
(2, Date
2000-02-01     54.816667
2001-02-01     69.426667
2002-02-01     80.496667
2003-02-01     64.516667
2004-02-01    100.556667
2005-02-01     85.803333
2006-02-01    100.786667
2007-02-01    113.506667
2008-02-01     75.146667
2009-02-01    181.623333
2010-02-01    211.570000
2011-02-01    117.650000
2012-02-01     72.170000
2013-02-01     77.653333
2014-02-01     97.873333
dtype: float64)
(3, Date
2000-03-01    105.773333
2001-03-01    238.500000
2002-03-01     99.850000
2003-03-01     57.150000
2004-03-01    150.353333
2005-03-01    155.693333
2006-03-01    142.453333
2007-03-01    117.926667
2008-0

In [None]:
location

Unnamed: 0,wkt_geom,Prov_ID,Province,Dist_ID,District,Longitude,Latitude
0,POINT(30.44539187624978283 -1.61907593351633294),5,Eastern Province,53,Gatsibo,30.445392,1.619076
1,POINT(29.72272538213003301 -1.9548907110369993),2,Southern Province,27,Muhanga,29.722725,1.954891
2,POINT(30.11387116565671107 -1.62157664604772656),4,Northern Province,45,Gicumbi,30.113871,1.621577
3,POINT(30.14372498206437356 -2.00886367940993305),1,Kigali City,13,Kicukiro,30.143725,2.008864
4,POINT(29.98722685498238505 -1.73928358683540596),4,Northern Province,41,Rulindo,29.987227,1.739284
5,POINT(29.51687590310958598 -2.69484387034010142),2,Southern Province,23,Nyaruguru,29.516876,2.694844
6,POINT(30.35473585167836674 -1.97549651070343946),5,Eastern Province,51,Rwamagana,30.354736,1.975497
7,POINT(29.90240320690512732 -2.00944646085529977),2,Southern Province,28,Kamonyi,29.902403,2.009446
8,POINT(30.14221204833539147 -1.89144733707354451),1,Kigali City,12,Gasabo,30.142212,1.891447
9,POINT(29.56946501894480406 -1.87784485514414223),3,Western Province,35,Ngororero,29.569465,1.877845


In [None]:
location.transpose()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,20,21,22,23,24,25,26,27,28,29
wkt_geom,POINT(30.44539187624978283 -1.61907593351633294),POINT(29.72272538213003301 -1.9548907110369993),POINT(30.11387116565671107 -1.62157664604772656),POINT(30.14372498206437356 -2.00886367940993305),POINT(29.98722685498238505 -1.73928358683540596),POINT(29.51687590310958598 -2.69484387034010142),POINT(30.35473585167836674 -1.97549651070343946),POINT(29.90240320690512732 -2.00944646085529977),POINT(30.14221204833539147 -1.89144733707354451),POINT(29.56946501894480406 -1.87784485514414223),...,POINT(29.70879513426893581 -2.52464457930509045),POINT(29.60659743764841068 -1.49853529150656861),POINT(29.08743061811287944 -2.56537162354070913),POINT(30.45718802733669506 -2.18299296045439828),POINT(30.02886536134083428 -1.99200475508649077),POINT(29.84360338606871821 -2.61811188825145225),POINT(30.64197861093327546 -1.84512559534216347),POINT(29.77176053219154284 -2.19359772453678392),POINT(29.79346313648000333 -2.33586083018547708),POINT(29.39904174077238252 -1.90372509877994078)
Prov_ID,5,2,4,1,4,2,5,2,1,3,...,2,4,3,5,1,2,5,2,2,3
Province,Eastern Province,Southern Province,Northern Province,Kigali City,Northern Province,Southern Province,Eastern Province,Southern Province,Kigali City,Western Province,...,Southern Province,Northern Province,Western Province,Eastern Province,Kigali City,Southern Province,Eastern Province,Southern Province,Southern Province,Western Province
Dist_ID,53,27,45,13,41,23,51,28,12,35,...,24,43,36,56,11,22,54,26,21,32
District,Gatsibo,Muhanga,Gicumbi,Kicukiro,Rulindo,Nyaruguru,Rwamagana,Kamonyi,Gasabo,Ngororero,...,Huye,Musanze,Rusizi,Ngoma,Nyarugenge,Gisagara,Kayonza,Ruhango,Nyanza,Rutsiro
Longitude,30.445392,29.722725,30.113871,30.143725,29.987227,29.516876,30.354736,29.902403,30.142212,29.569465,...,29.708795,29.606597,29.087431,30.457188,30.028865,29.843603,30.641979,29.771761,29.793463,29.399042
Latitude,1.619076,1.954891,1.621577,2.008864,1.739284,2.694844,1.975497,2.009446,1.891447,1.877845,...,2.524645,1.498535,2.565372,2.182993,1.992005,2.618112,1.845126,2.193598,2.335861,1.903725


In [None]:
location = location.transpose()

In [None]:
#Cleaning the dataset to obtain only the longitude and latitude data we need
headers = location.iloc[4]
headers

0        Gatsibo
1        Muhanga
2        Gicumbi
3       Kicukiro
4        Rulindo
5      Nyaruguru
6      Rwamagana
7        Kamonyi
8         Gasabo
9      Ngororero
10        Rubavu
11     Nyamagabe
12        Burera
13       Nyabihu
14       Gakenke
15    Nyamasheke
16        Kirehe
17     Nyagatare
18      Bugesera
19       Karongi
20          Huye
21       Musanze
22        Rusizi
23         Ngoma
24    Nyarugenge
25      Gisagara
26       Kayonza
27       Ruhango
28        Nyanza
29       Rutsiro
Name: District, dtype: object

In [None]:
new_df  = pd.DataFrame(location.values[5:], columns=headers).astype(float)

In [None]:
new_df

District,Gatsibo,Muhanga,Gicumbi,Kicukiro,Rulindo,Nyaruguru,Rwamagana,Kamonyi,Gasabo,Ngororero,...,Huye,Musanze,Rusizi,Ngoma,Nyarugenge,Gisagara,Kayonza,Ruhango,Nyanza,Rutsiro
0,30.445392,29.722725,30.113871,30.143725,29.987227,29.516876,30.354736,29.902403,30.142212,29.569465,...,29.708795,29.606597,29.087431,30.457188,30.028865,29.843603,30.641979,29.771761,29.793463,29.399042
1,1.619076,1.954891,1.621577,2.008864,1.739284,2.694844,1.975497,2.009446,1.891447,1.877845,...,2.524645,1.498535,2.565372,2.182993,1.992005,2.618112,1.845126,2.193598,2.335861,1.903725
