## <a class="anchor" id="main">FoCS Lab</a>
### Contributors: Marco Distrutti, Santosh Anand

* [1 Normalization](#normalize)
* [2 Duration Field](#duration)
* [3 Search lenders](#search-lenders)
* [4 How many loans](#borrowers)
* [5 Overall amount](#overall-amount)
* [6 Overall percentage](#overall-percentage)
* [7 Overall by country/year](#year-overall-percentage)
* [8 Lenders overall](#lenders-overall)
* [9 Country, lent vs borrowed](#country-lent-borrowed)
* [10 Country, Difference lent-borrowed in respect of population](#country-lent-borrowed-population)
* [11 Country, Difference lent-borrowed in respect of population below poverty line](#country-lent-borrowed-population-line)
* [12 Country, For each year, compute the total amount of loans](#country-loan)

In [1]:
import pandas as pd
import numpy as np
import itertools

df_loan_lenders = pd.read_csv('kiva-kaggle/loans_lenders.csv')
df_loan_lenders_original = df_loan_lenders

In [2]:
loan_lenders = df_loan_lenders
df_loan_lenders

Unnamed: 0,loan_id,lenders
0,483693,"muc888, sam4326, camaran3922, lachheb1865, reb..."
1,483738,"muc888, nora3555, williammanashi, barbara5610,..."
2,485000,"muc888, terrystl, richardandsusan8352, sherri4..."
3,486087,"muc888, james5068, rudi5955, daniel9859, don92..."
4,534428,"muc888, niki3008, teresa9174, mike4896, david7..."
...,...,...
1387427,678999,"michael43411218, carol5987, gooddogg1, chris41..."
1387428,1207353,"rjhoward1986, jeffrey6870, trolltech4460, elys..."
1387429,1206220,"vicky7746, gooddogg1, fairspirit, craig9729960..."
1387430,1206425,"rich6705, sergiiy9766, angela7509, barbara5610..."


## 1 <a class="anchor" id="normalize">Normalization</a>
#### Normalize the loan_lenders table. In the normalized table, each row must have one loan_id and one lender.
[⇑ index](#main)

Using **Vectorized** operations our computations are extremly faster then looping over the dataset because Vectorized operations are heavily implemented in **C** procedures. So the strategy is to create both vectors (loan_id and lender) separatly and loan_id is generated by repeating id with vectorized products.

In [3]:
%%time

#FIRST 
## SANTOSH -> Note that the lenders are seperated by ", " 
lenders = [lender for lenders in df_loan_lenders['lenders'] for lender in lenders.split(', ')]
# lenders


Wall time: 3.55 s


In [4]:
#SECOND AXIS - using vectorized operations we boost the performance using internal opimized C procedures
loan_ids = [loan_id for loan_id in df_loan_lenders['loan_id']]
#loan_ids

In [5]:
%%time
cardinality = [len(lenders.split(", ")) for lenders in df_loan_lenders['lenders']]
# cardinality


Wall time: 1.87 s


In [6]:
%%time

#create a new list with repeated ids for each loan in the same original row
flatted_ids = list(itertools.chain(*[[loan_ids[i]] * cardinality[i] for i in range(0, len(loan_ids))]))
# flatted_ids


Wall time: 5.32 s


In [7]:
#DATAFRAME
df_loan_lenders = pd.DataFrame({'loan_id':flatted_ids, 'lender':lenders})

#More then 28 milions of records
df_loan_lenders

Unnamed: 0,loan_id,lender
0,483693,muc888
1,483693,sam4326
2,483693,camaran3922
3,483693,lachheb1865
4,483693,rebecca3499
...,...,...
28293926,1206425,trogdorfamily7622
28293927,1206425,danny6470
28293928,1206425,don6118
28293929,1206486,alan5175


## 2 <a class="anchor" id="duration">Duration</a>
#### For each loan, add a column duration corresponding to the number of days between the disburse time and the planned expiration time. If any of those two dates is missing, also the duration must be missing.
[⇑ index](#main)

Specifying column names and types we can avoid unnecessary data and boost performance because we can read only what we need and the interpreter won't infer data types during the scan.

In [8]:
%%time
df_loans = pd.read_csv('kiva-kaggle/loans.csv',
                       usecols=['loan_id', 'disburse_time', 'planned_expiration_time', 'country_code', 'country_name', 'loan_amount', 'num_lenders_total', 'funded_amount'],
                       dtype={'loan_id': np.int32, 'disburse_time': 'str', 'planned_expiration_time': 'str', 'country_code': 'str', 'country_name': 'str', 'loan_amount': 'float', 'num_lenders_total': np.int32, 'funded_amount': 'float'})

#it should be possible to parse datetime by specifying a lambda parser in read_csv method but
#with vectorized operations we saved more then 2 minutes for this loading.
df_loans['planned_expiration_time'] = pd.to_datetime(df_loans['planned_expiration_time'])
df_loans['disburse_time']= pd.to_datetime(df_loans['disburse_time'])
df_loans["duration"] = df_loans['planned_expiration_time'] - df_loans['disburse_time']

df_loans[['disburse_time', 'planned_expiration_time', 'duration']]

Wall time: 10.3 s


Unnamed: 0,disburse_time,planned_expiration_time,duration
0,2013-12-22 08:00:00+00:00,2014-02-14 03:30:06+00:00,53 days 19:30:06
1,2013-12-20 08:00:00+00:00,2014-03-26 22:25:07+00:00,96 days 14:25:07
2,2014-01-09 08:00:00+00:00,2014-02-15 21:10:05+00:00,37 days 13:10:05
3,2014-01-17 08:00:00+00:00,2014-02-21 03:10:02+00:00,34 days 19:10:02
4,2013-12-17 08:00:00+00:00,2014-02-13 06:10:02+00:00,57 days 22:10:02
...,...,...,...
1419602,2015-11-23 08:00:00+00:00,2016-01-02 01:00:03+00:00,39 days 17:00:03
1419603,2015-11-24 08:00:00+00:00,2016-01-02 16:40:07+00:00,39 days 08:40:07
1419604,2015-11-13 08:00:00+00:00,2016-01-03 22:20:04+00:00,51 days 14:20:04
1419605,2015-11-03 08:00:00+00:00,2016-01-05 08:50:02+00:00,63 days 00:50:02


## 3 <a class="anchor" id="search-lenders">Search lenders</a>
#### Find the lenders that have funded at least twice.
[⇑ index](#main)

Pandas aggregation methods give us the possibility to create grouped dataframe and apply aggregation functions such as the occurrences counting.

In [9]:
%%time

df_lenders_multifunder = df_loan_lenders.groupby(['lender']).count().rename(columns={"loan_id": "funds"}).sort_values(by=["funds"])
df_lenders_multifunder = df_lenders_multifunder[df_lenders_multifunder["funds"] >= 2]
df_lenders_multifunder

Wall time: 9.15 s


Unnamed: 0_level_0,funds
lender,Unnamed: 1_level_1
clifford1150,2
sofe8281,2
barbi3519,2
ernesto5398,2
awgmqi9423,2
...,...
themissionbeltco,81434
nms,104314
gmct,128159
trolltech4460,150762


## 4 <a class="anchor" id="borrowers">How many loans</a>
#### For each country, compute how many loans have involved that country as borrowers.
[⇑ index](#main)

There is a problem with Namibia country, in all records corresponding to this Country we haven't the **country_code**, I take this opportunity to upload country data but we noticed that even from country_stats.csv Namibia has null value in **country_code**. So we decided to force the setting of Namibia iso 2chars code to 'NA' in order to have consistent data in the original **df_loans** dataset.

In [10]:
df_loans[df_loans["country_code"].isnull()][['loan_id', 'country_code', 'country_name']]

Unnamed: 0,loan_id,country_code,country_name
82889,991853,,Namibia
156970,513472,,Namibia
598087,851360,,Namibia
684876,1068159,,Namibia
971827,998555,,Namibia
1134818,1147866,,Namibia
1214923,851368,,Namibia
1281022,1147852,,Namibia
1415763,1068167,,Namibia


In [11]:
df_countries = pd.read_csv('kiva-kaggle/country_stats.csv')
df_countries[df_countries["country_code"].isnull()]

Unnamed: 0,country_name,country_code,country_code3,continent,region,population,population_below_poverty_line,hdi,life_expectancy,expected_years_of_schooling,mean_years_of_schooling,gni,kiva_country_name
115,Namibia,,NAM,Africa,Southern Africa,2533794,28.7,0.640007,65.062,11.657589,6.676,9769.848507,Namibia


In [12]:
#FIXING DF_LOANS
#it should be fixed applying a function to the dataset that will set a 'NA' value where a null value is found
#but we found this two step solution (divide two subsets and append them again) very faster against the apply function

#Vectorized operations are more optimized then the apply.

df_namibia = df_loans[df_loans["country_code"].isnull()]
df_namibia.country_code = ['NA'] * len(df_namibia)
df_loans = df_loans[df_loans["country_code"].notnull()].append(df_namibia)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value


In [13]:
%%time
df_countries_borrows = df_loans[['country_code', 'country_name', 'loan_id']].groupby(['country_name']).count().rename(columns={"loan_id": "borrows"}).sort_values(by=["borrows"])
df_countries_borrows

Wall time: 208 ms


Unnamed: 0_level_0,country_code,borrows
country_name,Unnamed: 1_level_1,Unnamed: 2_level_1
Mauritania,1,1
Papua New Guinea,1,1
Uruguay,1,1
Botswana,1,1
Canada,1,1
...,...,...
El Salvador,64037,64037
Cambodia,79701,79701
Peru,86000,86000
Kenya,143699,143699


## 5 <a class="anchor" id="overall-amount">Overall amount</a>
#### For each country, compute the overall amount of money borrowed
[⇑ index](#main)

We noticed that funded_amount is different then loan_amount (in some cases it could be higher) and for that reason we consider the funded_amount as the borrowed amont, and when is higher then the loan_amount (12 times) it is considered as an over funding.

In [15]:
when_less  = len(df_loans[df_loans["funded_amount"] < df_loans["loan_amount"]])
when_equal = len(df_loans[df_loans["funded_amount"] == df_loans["loan_amount"]])
when_more  = len(df_loans[df_loans["funded_amount"] > df_loans["loan_amount"]])

print("TOTAL: " + str(when_less + when_equal + when_more))
print("LESS: " + str(when_less) + " | EQUAL = " + str(when_equal) + " | MORE = " + str(when_more))

TOTAL: 1419607
LESS: 64279 | EQUAL = 1355316 | MORE = 12


In [16]:
%%time
df_countries_borrowed = df_loans[['country_code', 'country_name', 'funded_amount']].groupby(['country_code', 'country_name'], as_index=False).sum().rename(columns={"funded_amount": "borrowed"}).sort_values(by=["borrowed"])
df_countries_borrowed

Wall time: 219 ms


Unnamed: 0,country_code,country_name,borrowed
87,VI,Virgin Islands,0.0
30,GZ,Gaza,5000.0
85,UY,Uruguay,8000.0
12,BW,Botswana,8000.0
29,GU,Guam,8700.0
...,...,...,...
40,KH,Cambodia,50829425.0
66,PY,Paraguay,53715200.0
38,KE,Kenya,63650255.0
60,PE,Peru,78702800.0


## 6 <a class="anchor" id="overall-amount">Overall percentage</a>
#### Like the previous point, but expressed as a percentage of the overall amount lent.
[⇑ index](#main)

In [17]:
total_borrowed = df_countries_borrowed.borrowed.sum()
#vectorized operation
total_borrowed_perc = (df_countries_borrowed.borrowed / total_borrowed) * 100

df_countries_borrowed_perc = pd.DataFrame({'country_name': df_countries_borrowed.country_name, 'borrowed':total_borrowed_perc})
df_countries_borrowed_perc

Unnamed: 0,country_name,borrowed
87,Virgin Islands,0.000000
30,Gaza,0.000442
85,Uruguay,0.000708
12,Botswana,0.000708
29,Guam,0.000770
...,...,...
40,Cambodia,4.497442
66,Paraguay,4.752779
38,Kenya,5.631843
60,Peru,6.963708


## 7 <a class="anchor" id="year-overall-percentage">Overall by country/year</a>
#### Like the three previous points, but split for each year (with respect to disburse time).
[⇑ index](#main)

The following dataset is the grouped amount by **country_name** and **year**. For the same reason explained in Execise 5 ew will use the funded amount as borrowed amount.

In [29]:
df_loans_year = df_loans[['country_name', 'disburse_time', 'funded_amount']].rename(columns={"funded_amount": "borrowed"})
# in according to not null values
df_loans_year = df_loans_year[df_loans_year.disburse_time.notnull()]
df_loans_year['year'] = df_loans_year.disburse_time.dt.year

df_countries_borrowed_year = df_loans_year.groupby(['country_name', 'year']).sum().sort_values(by=["borrowed"]).reset_index()
df_countries_borrowed_year

Unnamed: 0,country_name,year,borrowed
0,Paraguay,2018,25.0
1,Mexico,2018,50.0
2,Pakistan,2018,75.0
3,Philippines,2018,100.0
4,Bolivia,2018,350.0
...,...,...,...
743,Kenya,2015,9789250.0
744,Philippines,2014,13751850.0
745,Philippines,2015,15847375.0
746,Philippines,2016,15951625.0


We need the total amounts for each year

In [32]:
overall_years = df_loans_year.groupby(['year']).sum().reset_index()
overall_years

Unnamed: 0,year,borrowed
0,2005,102850.0
1,2006,1375175.0
2,2007,15439425.0
3,2008,39384425.0
4,2009,59477925.0
5,2010,72187300.0
6,2011,93601250.0
7,2012,115427450.0
8,2013,128301200.0
9,2014,146668375.0


## 8 <a class="anchor" id="lenders-overall">Lenders overall</a>
#### For each lender, compute the overall amount of money lent. For each loan that has more than one lender, you must assume that all lenders contributed the same amount.
[⇑ index](#main)

We have seen that **num_lenders_total**, that should represent the number of lenders in loan, is different from the real number of the lenders involved in datasets relationship. We can show here:

In [57]:
# Show the difference explained

df_loan_lenders_grouped = df_loan_lenders.groupby(["loan_id"]).count().rename(columns={"lender": "n_lenders"}).reset_index()
df_loan_lenders_merged = df_loans.merge(df_loan_lenders_grouped)
df_loan_lenders_merged_subset = df_loan_lenders_merged[["loan_id", "funded_amount", "num_lenders_total", "n_lenders"]]
df_loan_lenders_merged_subset

Unnamed: 0,loan_id,funded_amount,num_lenders_total,n_lenders
0,657307,125.0,3,3
1,657259,400.0,11,7
2,658010,400.0,16,14
3,659347,625.0,21,17
4,656933,425.0,15,14
...,...,...,...,...
1387423,998555,3325.0,126,116
1387424,1147866,5000.0,183,170
1387425,851368,4150.0,159,156
1387426,1147852,5100.0,183,174


In [58]:
# Add the funded_per_person column

funded_per_person = df_loan_lenders_merged_subset.funded_amount / df_loan_lenders_merged_subset.n_lenders
df_loan_lenders_merged_subset = df_loan_lenders_merged_subset.assign(funded_per_person = funded_per_person)
df_loan_lenders_merged_subset

Unnamed: 0,loan_id,funded_amount,num_lenders_total,n_lenders,funded_per_person
0,657307,125.0,3,3,41.666667
1,657259,400.0,11,7,57.142857
2,658010,400.0,16,14,28.571429
3,659347,625.0,21,17,36.764706
4,656933,425.0,15,14,30.357143
...,...,...,...,...,...
1387423,998555,3325.0,126,116,28.663793
1387424,1147866,5000.0,183,170,29.411765
1387425,851368,4150.0,159,156,26.602564
1387426,1147852,5100.0,183,174,29.310345


In [59]:
df_loan_lenders_merged_subset = df_loan_lenders_merged_subset[["loan_id", "funded_per_person"]].set_index("loan_id")
df_lender_funds = pd.merge(df_loan_lenders, df_loan_lenders_merged_subset, on="loan_id").drop(columns=["loan_id"]).groupby(['lender']).sum()
df_lender_funds = df_lender_funds.rename(columns={"funded_per_person": "total_funded"}).reset_index()
df_lender_funds

Unnamed: 0,lender,total_funded
0,000,1703.868411
1,00000,1379.750248
2,0002,2472.563566
3,00mike00,52.631579
4,0101craign0101,2623.565117
...,...,...
1383794,zzmcfate,63381.546705
1383795,zzpaghetti9994,51.020408
1383796,zzrvmf8538,513.213719
1383797,zzzsai,267.667370


## 9 <a class="anchor" id="country-lent-borrowed">Country, lent vs borrowed</a>
#### For each country, compute the difference between the overall amount of money lent and the overall amount of money borrowed. Since the country of the lender is often unknown, you can assume that the true distribution among the countries is the same as the one computed from the rows where the country is known.
[⇑ index](#main)

In [169]:
df_lenders = pd.read_csv('kiva-kaggle/lenders.csv')
df_lenders

Unnamed: 0,permanent_name,display_name,city,state,country_code,member_since,occupation,loan_because,loan_purchase_num,invited_by,num_invited
0,qian3013,Qian,,,,1461300457,,,1.0,,0
1,reena6733,Reena,,,,1461300634,,,9.0,,0
2,mai5982,Mai,,,,1461300853,,,,,0
3,andrew86079135,Andrew,,,,1461301091,,,5.0,Peter Tan,0
4,nguyen6962,Nguyen,,,,1461301154,,,,,0
...,...,...,...,...,...,...,...,...,...,...,...
2349169,janet7309,Janet,,,,1342097163,,,,,0
2349170,pj4198,,,,,1342097515,,,,,0
2349171,maria2141,Maria,,,US,1342099723,,,2.0,,0
2349172,simone9846,Simone,,,,1342100213,,,,,0


In [170]:
df_lenders.count()

permanent_name       2349174
display_name         2346406
city                  729868
state                 635693
country_code          890539
member_since         2349174
occupation            504660
loan_because          174322
loan_purchase_num    1454893
invited_by            496825
num_invited          2349174
dtype: int64

In [171]:
(df_lenders.country_code.isnull().sum() / df_lenders.permanent_name.count()) * 100

62.091398934263694

The tables above highlights that 62% **missing values** we have in country_code and that the permanent_name seems to be the primary key, given the difference between the **permanent_name** key and the **country_code** occurrences we have more then 50% of missing values. Now the most important thing is to fill those missing values with an equal distribution in respect of known countries.

In [172]:
## Get only country code and lenders

lenders = df_lenders[['permanent_name', 'country_code']]

## lenders w/ country code
lenders_wCountry = lenders[lenders['country_code'].notnull()]
print(lenders_wCountry)

## lenders w/o country code
lenders_woCountry = lenders[lenders['country_code'].isnull()]
print(lenders_woCountry)

            permanent_name country_code
16              naresh2074           US
31       christina27976796           US
37               vikas1098           IN
39                qian1385           US
42                xigg8769           US
...                    ...          ...
2349158              rakhi           US
2349159          vicki5374           US
2349161       jennifer5879           CA
2349171          maria2141           US
2349173         laurie1160           US

[890539 rows x 2 columns]
         permanent_name country_code
0              qian3013          NaN
1             reena6733          NaN
2               mai5982          NaN
3        andrew86079135          NaN
4            nguyen6962          NaN
...                 ...          ...
2349167        todd5695          NaN
2349168    kate40761039          NaN
2349169       janet7309          NaN
2349170          pj4198          NaN
2349172      simone9846          NaN

[1458635 rows x 2 columns]


In [173]:
lenders_wCountry_1 = lenders_wCountry.groupby(['country_code']).count()
lenders_wCountry_1

Unnamed: 0_level_0,permanent_name
country_code,Unnamed: 1_level_1
AD,15
AE,1043
AF,228
AG,8
AI,4
...,...
YE,195
YT,2
ZA,1051
ZM,65


In [174]:
#distribution of known countries

lenders_wCountry
lenders_wCountry_1 = lenders_wCountry.groupby(['country_code']).count().rename(columns={"permanent_name": "total"}).reset_index()
lenders_wCountry_1['frac'] = lenders_wCountry_1.total / lenders_wCountry_1.total.sum()
print(lenders_wCountry_1)

totalToFill = (len(lenders_woCountry))

## fill the coutries according to the fraction
lenders_wCountry_1['toFill'] = round(lenders_wCountry_1.frac * totalToFill).astype(int)
lenders_wCountry_1
print(lenders_wCountry_1)

## This will give some difference from the original countres
diff1 = totalToFill - sum(lenders_wCountry_1.toFill)
diff1



    country_code  total      frac
0             AD     15  0.000017
1             AE   1043  0.001171
2             AF    228  0.000256
3             AG      8  0.000009
4             AI      4  0.000004
..           ...    ...       ...
229           YE    195  0.000219
230           YT      2  0.000002
231           ZA   1051  0.001180
232           ZM     65  0.000073
233           ZW     54  0.000061

[234 rows x 3 columns]
    country_code  total      frac  toFill
0             AD     15  0.000017      25
1             AE   1043  0.001171    1708
2             AF    228  0.000256     373
3             AG      8  0.000009      13
4             AI      4  0.000004       7
..           ...    ...       ...     ...
229           YE    195  0.000219     319
230           YT      2  0.000002       3
231           ZA   1051  0.001180    1721
232           ZM     65  0.000073     106
233           ZW     54  0.000061      88

[234 rows x 4 columns]


7

7 is the difference derived from the approximation blablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablabla

In [175]:
# get a list of list composed by ['AD', 'AD', ...., 'ZW'] in respect of the number of the countries occurrences 
ll = [[lenders_wCountry_1.country_code[i]] * lenders_wCountry_1.toFill[i] for i in range(0, len(lenders_wCountry_1))]

# flat the list
ll = list(itertools.chain.from_iterable(ll))

# This difference has to be randomly distributed among countries
# import random  
# from random import sample
import random
random.seed(102)
extra = random.sample(list(lenders_wCountry_1['country_code']), diff1)
ll.extend(extra)

# Shuffle this list now
random.shuffle(ll)
#ll[1:100]

In [176]:
lenders_woCountry = lenders_woCountry.assign(country_code = ll)
lenders_woCountry

Unnamed: 0,permanent_name,country_code
0,qian3013,US
1,reena6733,DK
2,mai5982,CR
3,andrew86079135,NZ
4,nguyen6962,US
...,...,...
2349167,todd5695,US
2349168,kate40761039,US
2349169,janet7309,LA
2349170,pj4198,US


In [177]:
# Compute for each lender the association lender -> country

df_lenders_countries = lenders_woCountry.append(lenders_wCountry).rename(columns={"permanent_name": "lender"})
df_lenders_countries

Unnamed: 0,lender,country_code
0,qian3013,US
1,reena6733,DK
2,mai5982,CR
3,andrew86079135,NZ
4,nguyen6962,US
...,...,...
2349158,rakhi,US
2349159,vicki5374,US
2349161,jennifer5879,CA
2349171,maria2141,US


In [178]:
df_lender_funds_countries = pd.merge(df_lender_funds, df_lenders_countries, how="left", on="lender")
df_lender_funds_countries

Unnamed: 0,lender,total_funded,country_code
0,000,1703.868411,US
1,00000,1379.750248,DE
2,0002,2472.563566,US
3,00mike00,52.631579,US
4,0101craign0101,2623.565117,US
...,...,...,...
1383794,zzmcfate,63381.546705,US
1383795,zzpaghetti9994,51.020408,US
1383796,zzrvmf8538,513.213719,AT
1383797,zzzsai,267.667370,US


In [179]:
df_countries_lent = df_lender_funds_countries.groupby(["country_code"]).sum().reset_index().sort_values(by=["total_funded"])

In [180]:
df_countries_borrowed

Unnamed: 0,country_code,country_name,borrowed
87,VI,Virgin Islands,0.0
30,GZ,Gaza,5000.0
85,UY,Uruguay,8000.0
12,BW,Botswana,8000.0
29,GU,Guam,8700.0
...,...,...,...
40,KH,Cambodia,50829425.0
66,PY,Paraguay,53715200.0
38,KE,Kenya,63650255.0
60,PE,Peru,78702800.0


In the next dataset we will merge all the data for the requested difference. As you can see from the result, some countries names are NaN, this happens because the data for that specific country is missing in the country_stats.csv. As questions 10 and question 11 need the population and the population below the poverty line, we pull them here.

In [181]:
df_countries_lent_borrowed = pd.merge(df_countries_lent, df_countries_borrowed, on="country_code", how="outer").drop(columns=["country_name"])
df_countries_lent_borrowed = pd.merge(df_countries_lent_borrowed, df_countries[["country_code", "country_name", "population", "population_below_poverty_line"]], on="country_code", how="left")
df_countries_lent_borrowed = df_countries_lent_borrowed.reindex(columns=["country_code", "country_name", "population", "population_below_poverty_line", "total_funded", "borrowed"])

After this, we cannot apply directly the difference, we have to convert null values to 0, we considered missing values in this situation such as a 0 value. We tested two type of strategies. **loc** strategy for updating the values **in loco** and **numpy.where**

From the following benchmark it is clear the reason why we used **np.where**.

```python
%timeit df_country_diff.loc[df_country_diff['borrowed'].isnull(), 'borrowed'] = 0
#1.05 ms ± 7.49 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df_country_diff['borrowed'] = np.where(df_country_diff['borrowed'].isnull(), 0, df_country_diff['borrowed'])
#221 µs ± 12.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

In [182]:
df_countries_lent_borrowed.borrowed = np.where(df_countries_lent_borrowed.borrowed.isnull(), 0, df_countries_lent_borrowed.borrowed)
df_countries_lent_borrowed.total_funded = np.where(df_countries_lent_borrowed.total_funded.isnull(), 0, df_countries_lent_borrowed.total_funded)
df_countries_lent_borrowed = df_countries_lent_borrowed.assign(difference = df_countries_lent_borrowed.total_funded - df_countries_lent_borrowed.borrowed)
df_countries_lent_borrowed

Unnamed: 0,country_code,country_name,population,population_below_poverty_line,total_funded,borrowed,difference
0,TF,,,,2.857143e+01,0.0,2.857143e+01
1,NU,,,,5.340909e+01,0.0,5.340909e+01
2,FK,,,,7.631579e+01,0.0,7.631579e+01
3,TK,,,,9.499561e+01,0.0,9.499561e+01
4,MS,,,,1.309375e+02,0.0,1.309375e+02
...,...,...,...,...,...,...,...
231,AU,Australia,24450561.0,,5.844393e+07,0.0,5.844393e+07
232,CA,Canada,36624199.0,9.4,9.166545e+07,50000.0,9.161545e+07
233,US,United States,324459463.0,15.1,6.879652e+08,36365340.0,6.515999e+08
234,TD,Chad,14899994.0,46.7,0.000000e+00,20075.0,-2.007500e+04


In our table for some of the countries code there is no entry in country table. 

## 10 <a class="anchor" id="country-lent-borrowed-population">Difference lent-borrowed in respect of population</a>
#### Which country has the highest ratio between the difference computed at the previous point and the population?
[⇑ index](#main)

As we have said in the previous Exercise for some of the countries code there is no entry in country table, so we cannot get the population of these countries and all the record with missing values will be dropped.

In [196]:
df_countries_lent_borrowed_nomissing = df_countries_lent_borrowed[df_countries_lent_borrowed.population.notnull()]
df_countries_lent_borrowed_nomissing

Unnamed: 0,country_code,country_name,population,population_below_poverty_line,total_funded,borrowed,difference
7,GA,Gabon,2025137.0,34.3,2.946429e+02,0.0,2.946429e+02
9,GW,Guinea-Bissau,1861283.0,67.0,4.799047e+02,0.0,4.799047e+02
10,DJ,Djibouti,956985.0,23.0,4.811473e+02,0.0,4.811473e+02
12,MR,Mauritania,4420184.0,31.0,5.031316e+02,15000.0,-1.449687e+04
15,NE,Niger,21477348.0,45.4,5.729012e+02,0.0,5.729012e+02
...,...,...,...,...,...,...,...
230,GB,United Kingdom,66181585.0,15.0,5.612169e+07,0.0,5.612169e+07
231,AU,Australia,24450561.0,,5.844393e+07,0.0,5.844393e+07
232,CA,Canada,36624199.0,9.4,9.166545e+07,50000.0,9.161545e+07
233,US,United States,324459463.0,15.1,6.879652e+08,36365340.0,6.515999e+08


In [197]:
df_countries_lent_borrowed_nomissing = df_countries_lent_borrowed_nomissing.drop(columns=["country_code", "population_below_poverty_line", "total_funded", "borrowed"])
df_countries_lent_borrowed_nomissing = df_countries_lent_borrowed_nomissing.assign(ratio_diff_pop=df_countries_lent_borrowed_nomissing.difference/df_countries_lent_borrowed_nomissing.population)
df_countries_lent_borrowed_nomissing.sort_values(by=["ratio_diff_pop"], ascending=False, inplace=True)
df_countries_lent_borrowed_nomissing

Unnamed: 0,country_name,population,difference,ratio_diff_pop
227,Norway,5305383.0,1.977643e+07,3.727616
232,Canada,36624199.0,9.161545e+07,2.501500
231,Australia,24450561.0,5.844393e+07,2.390290
194,Iceland,335025.0,7.257102e+05,2.166137
233,United States,324459463.0,6.515999e+08,2.008263
...,...,...,...,...
141,Mongolia,3075647.0,-1.481962e+07,-4.818376
125,El Salvador,6377853.0,-3.783202e+07,-5.931781
112,Armenia,2930450.0,-2.025041e+07,-6.910343
131,Paraguay,6811297.0,-5.365651e+07,-7.877577


From here we can see that the maximum positive ration is 3.7276 for Norway and maximum negative ration is -58.5733 for Samoa

## 11 <a class="anchor" id="country-lent-borrowed-population-line">Difference lent-borrowed in respect of population below poverty line</a>
#### Which country has the highest ratio between the difference computed at point 9 and the population that is not below the poverty line?
[⇑ index](#main)

In [227]:
df_countries_lent_borrowed_nomissing_line = df_countries_lent_borrowed[df_countries_lent_borrowed.population.notnull() & df_countries_lent_borrowed.population_below_poverty_line.notnull()]
df_countries_lent_borrowed_nomissing_line = df_countries_lent_borrowed_nomissing_line.drop(columns=["country_code", "total_funded", "borrowed"])
df_countries_lent_borrowed_nomissing_line

Unnamed: 0,country_name,population,population_below_poverty_line,difference
7,Gabon,2025137.0,34.3,2.946429e+02
9,Guinea-Bissau,1861283.0,67.0,4.799047e+02
10,Djibouti,956985.0,23.0,4.811473e+02
12,Mauritania,4420184.0,31.0,-1.449687e+04
15,Niger,21477348.0,45.4,5.729012e+02
...,...,...,...,...
229,Germany,82114224.0,16.7,3.404227e+07
230,United Kingdom,66181585.0,15.0,5.612169e+07
232,Canada,36624199.0,9.4,9.161545e+07
233,United States,324459463.0,15.1,6.515999e+08


Population below the poverty line is expressed in terms of percentage between [0, 100]. So we find the opposite percentage in order to find the population that is not below the poverty line and compute the number. 
(100 - population_below) * population / 100

In [228]:
not_below = (100 - df_countries_lent_borrowed_nomissing_line.population_below_poverty_line) * df_countries_lent_borrowed_nomissing_line.population / 100
df_countries_lent_borrowed_nomissing_line = df_countries_lent_borrowed_nomissing_line.assign(not_below = not_below)
df_countries_lent_borrowed_nomissing_line.not_below = df_countries_lent_borrowed_nomissing_line.not_below.astype(int)
df_countries_lent_borrowed_nomissing_line

Unnamed: 0,country_name,population,population_below_poverty_line,difference,not_below
7,Gabon,2025137.0,34.3,2.946429e+02,1330515
9,Guinea-Bissau,1861283.0,67.0,4.799047e+02,614223
10,Djibouti,956985.0,23.0,4.811473e+02,736878
12,Mauritania,4420184.0,31.0,-1.449687e+04,3049926
15,Niger,21477348.0,45.4,5.729012e+02,11726632
...,...,...,...,...,...
229,Germany,82114224.0,16.7,3.404227e+07,68401148
230,United Kingdom,66181585.0,15.0,5.612169e+07,56254347
232,Canada,36624199.0,9.4,9.161545e+07,33181524
233,United States,324459463.0,15.1,6.515999e+08,275466084


In [229]:
df_countries_lent_borrowed_nomissing_line = df_countries_lent_borrowed_nomissing_line.assign(ratio_diff_pop_not_below=df_countries_lent_borrowed_nomissing_line.difference/df_countries_lent_borrowed_nomissing_line.not_below).drop(columns=["population_below_poverty_line"])
df_countries_lent_borrowed_nomissing_line.sort_values(by=["ratio_diff_pop_not_below"], ascending=False, inplace=True)
df_countries_lent_borrowed_nomissing_line

Unnamed: 0,country_name,population,difference,not_below,ratio_diff_pop_not_below
232,Canada,36624199.0,9.161545e+07,33181524,2.761038
233,United States,324459463.0,6.515999e+08,275466084,2.365445
226,Sweden,9910701.0,1.620481e+07,8424095,1.923627
225,Switzerland,8476005.0,1.334647e+07,7916588,1.685886
228,Netherlands,17035938.0,2.610836e+07,15536775,1.680423
...,...,...,...,...,...
119,Bolivia,11051600.0,-4.177520e+07,6785682,-6.156375
127,Nicaragua,6217581.0,-2.899084e+07,4377177,-6.623181
125,El Salvador,6377853.0,-3.783202e+07,4151982,-9.111799
131,Paraguay,6811297.0,-5.365651e+07,5299189,-10.125420


From here we can see that the maximum positive ration is 2.7610 for Canada and maximum negative ration is 10.1622 for Armenia

## 12 <a class="anchor" id="country-loan">For each year, compute the total amount of loans</a>
#### For each year, compute the total amount of loans. Each loan that has planned expiration time and disburse time in different years must have its amount distributed proportionally to the number of days in each year. For example, a loan with disburse time December 1st, 2016, planned expiration time January 30th 2018, and amount 5000USD has an amount of 5000USD * 31 / (31+365+30) = 363.85 for 2016, 5000USD * 365 / (31+365+30) = 4284.04 for 2017, and 5000USD * 30 / (31+365+30) = 352.11 for 2018.
[⇑ index](#main)

In [690]:
df_loans_years = df_loans[["loan_id", "disburse_time", "planned_expiration_time", "funded_amount", "duration"]]

df_loans_years.disburse_time = np.where(df_loans_years.disburse_time.isnull(), df_loans_years.planned_expiration_time, df_loans_years.disburse_time)
df_loans_years.planned_expiration_time = np.where(df_loans_years.planned_expiration_time.isnull(), df_loans_years.disburse_time, df_loans_years.planned_expiration_time)
df_loans_years.duration = df_loans_years.planned_expiration_time - df_loans_years.disburse_time

df_loans_years = df_loans_years[df_loans_years.disburse_time.notnull() & df_loans_years.planned_expiration_time.notnull()]

In [691]:
years_cardinality = df_loans_years.planned_expiration_time.dt.year - df_loans_years.disburse_time.dt.year + 1
df_loans_years = df_loans_years.assign(years_cardinality = years_cardinality)
df_loans_years.years_cardinality = np.where(df_loans_years.duration.dt.days < 0, 1, df_loans_years.years_cardinality).astype(int)

df_loans_years.disburse_time = df_loans_years.disburse_time.dt.date
df_loans_years.planned_expiration_time = df_loans_years.planned_expiration_time.dt.date

#df_loans_years.disburse_time = np.where(df_loans_years.disburse_time > df_loans_years.planned_expiration_time, df_loans_years.planned_expiration_time, df_loans_years.disburse_time)
#df_loans_years.years_cardinality = np.where(df_loans_years.disburse_time > df_loans_years.planned_expiration_time, 1, df_loans_years.years_cardinality)
#df_loans_years = df_loans_years[df_loans_years["planned_expiration_time"] >= df_loans_years["disburse_time"]]

In [706]:
df_loans_years[["disburse_time", "years_cardinality"]]

type(df_loans_years.disburse_time)

pandas.core.series.Series

In [693]:
years = [year for cardinality,disburse_year in zip(df_loans_years.years_cardinality, df_loans_years.disburse_time.dt.year) for year in range(disburse_year, disburse_year+cardinality)]

AttributeError: Can only use .dt accessor with datetimelike values

In [None]:
loans = [loan for loan_id, cardinality in zip(df_loans_years.loan_id, df_loans_years.years_cardinality) for loan in [loan_id] * cardinality]

In [None]:
df_loans_years_exploded = pd.DataFrame(data= {"loan_id": loans, "year": years})
df_loans_years_exploded

In [None]:
df_loans_years_f = pd.merge(df_loans_years_exploded, df_loans_years, on="loan_id")
df_loans_years_f

In [None]:
# Equal years

df_same_years = df_loans_years_f[df_loans_years_f.disburse_time.dt.year == df_loans_years_f.planned_expiration_time.dt.year]
df_same_years = df_equal.assign(days=(df_equal.planned_expiration_time-df_equal.disburse_time))
df_same_years

In [None]:
df_disequal = df_loans_years_f[df_loans_years_f.disburse_time.dt.year != df_loans_years_f.planned_expiration_time.dt.year]

# Year == First Year

df_disequal_first_year = df_disequal[df_disequal.disburse_time.dt.year == df_disequal.year]
end_of_year = pd.to_datetime(pd.Series(df_disequal.disburse_time.dt.year.astype(str) + '-12-31 23:59:59'), format="%Y-%m-%d %H:%M:%S")

df_disequal_first_year = df_disequal_first_year.assign(days=end_of_year - df_disequal_first_year.disburse_time)
df_disequal_first_year

In [694]:
# Year == Last Year

df_disequal_last_year = df_disequal[df_disequal.planned_expiration_time.dt.year == df_disequal.year]
previous_year = pd.to_datetime(pd.Series((df_disequal.planned_expiration_time.dt.year-1).astype(str) + '-12-31'), format="%Y-%m-%d")

df_disequal_last_year = df_disequal_last_year.assign(days=df_disequal_last_year.planned_expiration_time - previous_year)
df_disequal_last_year

Unnamed: 0,loan_id,year,disburse_time,planned_expiration_time,funded_amount,duration,years_cardinality,days
1,657307,2014,2013-12-22 08:00:00,2014-02-14 03:30:06,125.0,53 days 19:30:06,2,45 days 03:30:06
3,657259,2014,2013-12-20 08:00:00,2014-03-26 22:25:07,400.0,96 days 14:25:07,2,85 days 22:25:07
7,656933,2014,2013-12-17 08:00:00,2014-02-13 06:10:02,425.0,57 days 22:10:02,2,44 days 06:10:02
16,660363,2014,2013-12-23 08:00:00,2014-02-21 17:10:02,1175.0,60 days 09:10:02,2,52 days 17:10:02
18,661165,2014,2013-12-26 08:00:00,2014-03-26 22:24:34,300.0,90 days 14:24:34,2,85 days 22:24:34
...,...,...,...,...,...,...,...,...
1571179,988180,2016,2015-11-23 08:00:00,2016-01-02 01:00:03,400.0,39 days 17:00:03,2,2 days 01:00:03
1571181,988213,2016,2015-11-24 08:00:00,2016-01-02 16:40:07,300.0,39 days 08:40:07,2,2 days 16:40:07
1571183,989109,2016,2015-11-13 08:00:00,2016-01-03 22:20:04,2425.0,51 days 14:20:04,2,3 days 22:20:04
1571185,989143,2016,2015-11-03 08:00:00,2016-01-05 08:50:02,100.0,63 days 00:50:02,2,5 days 08:50:02


In [695]:
# Year in Disburse - lanned expiration Range

df_in_range = df_loans_years_f[(df_loans_years_f.year != df_loans_years_f.disburse_time.dt.year) & (df_loans_years_f.year != df_loans_years_f.planned_expiration_time.dt.year)]


first_day = pd.to_datetime(pd.Series((df_in_range.year).astype(str) + '-01-01'), format="%Y-%m-%d")
last_day = pd.to_datetime(pd.Series((df_in_range.year+1).astype(str) + '-01-01'), format="%Y-%m-%d")

df_in_range = df_in_range.assign(days=last_day-first_day)
df_in_range

Unnamed: 0,loan_id,year,disburse_time,planned_expiration_time,funded_amount,duration,years_cardinality,days
37163,1077942,2012,2011-12-22 16:49:11,2016-06-29 01:46:30,650.0,1650 days 08:57:19,6,366 days
37164,1077942,2013,2011-12-22 16:49:11,2016-06-29 01:46:30,650.0,1650 days 08:57:19,6,365 days
37165,1077942,2014,2011-12-22 16:49:11,2016-06-29 01:46:30,650.0,1650 days 08:57:19,6,365 days
37166,1077942,2015,2011-12-22 16:49:11,2016-06-29 01:46:30,650.0,1650 days 08:57:19,6,365 days
37169,1078000,2013,2012-05-07 23:55:05,2016-06-29 01:47:03,525.0,1513 days 01:51:58,5,365 days
...,...,...,...,...,...,...,...,...
1548477,1078002,2014,2012-05-07 23:54:15,2016-06-29 01:47:04,525.0,1513 days 01:52:49,5,365 days
1548478,1078002,2015,2012-05-07 23:54:15,2016-06-29 01:47:04,525.0,1513 days 01:52:49,5,365 days
1548481,1078150,2013,2012-07-26 05:36:36,2016-06-29 01:48:15,5000.0,1433 days 20:11:39,5,365 days
1548482,1078150,2014,2012-07-26 05:36:36,2016-06-29 01:48:15,5000.0,1433 days 20:11:39,5,365 days


In [696]:
df_loans_years_f_with_days = pd.concat([df_same_years, df_disequal_first_year, df_in_range, df_disequal_last_year])
assert len(df_loans_years_f_with_days) == len(df_loans_years_f),"An error occurred during the computation, number of records does noy the total number of "

In [697]:
len(df_loans_years_f_with_days)

1571197

In [698]:
len(df_loans_years_f)

1571197

In [699]:
df_loans_years_f_with_days = df_loans_years_f_with_days.assign(loan_year=df_loans_years_f_with_days.funded_amount * df_loans_years_f_with_days.days.dt.days / df_loans_years_f_with_days.duration.dt.days)
df_loans_years_f_with_days.loan_year = np.where(df_loans_years_f_with_days.loan_year.isnull(), df_loans_years_f_with_days.funded_amount, df_loans_years_f_with_days.loan_year)

df_loans_years_f_with_days.sort_values(["loan_id", "year"])

Unnamed: 0,loan_id,year,disburse_time,planned_expiration_time,funded_amount,duration,years_cardinality,days,loan_year
383329,84,2005,2005-04-14 05:27:55,2005-04-14 05:27:55,500.0,0 days 00:00:00,1,0 days 00:00:00,500.0
918095,85,2005,2005-04-14 05:27:55,2005-04-14 05:27:55,500.0,0 days 00:00:00,1,0 days 00:00:00,500.0
1167760,86,2005,2005-04-14 05:27:55,2005-04-14 05:27:55,500.0,0 days 00:00:00,1,0 days 00:00:00,500.0
1507883,88,2005,2005-04-14 05:27:55,2005-04-14 05:27:55,300.0,0 days 00:00:00,1,0 days 00:00:00,300.0
96233,89,2005,2005-04-14 05:27:55,2005-04-14 05:27:55,500.0,0 days 00:00:00,1,0 days 00:00:00,500.0
...,...,...,...,...,...,...,...,...,...
1251100,1444072,2018,2018-01-10 08:00:00,2018-02-10 00:35:02,0.0,30 days 16:35:02,1,30 days 16:35:02,0.0
1251587,1444082,2018,2018-01-10 08:00:00,2018-02-10 05:05:07,0.0,30 days 21:05:07,1,30 days 21:05:07,0.0
1408551,1444083,2018,2018-01-10 08:00:00,2018-02-10 05:35:07,0.0,30 days 21:35:07,1,30 days 21:35:07,0.0
281011,1444084,2018,2018-01-10 08:00:00,2018-02-10 07:35:04,0.0,30 days 23:35:04,1,30 days 23:35:04,0.0


In [700]:
df_loans_years_f_with_days[ (df_loans_years_f_with_days["funded_amount"] != 0) & (df_loans_years_f_with_days["days"].dt.days != 0) ]

Unnamed: 0,loan_id,year,disburse_time,planned_expiration_time,funded_amount,duration,years_cardinality,days,loan_year
4,658010,2014,2014-01-09 08:00:00,2014-02-15 21:10:05,400.0,37 days 13:10:05,1,37 days 13:10:05,400.000000
5,659347,2014,2014-01-17 08:00:00,2014-02-21 03:10:02,625.0,34 days 19:10:02,1,34 days 19:10:02,625.000000
8,659605,2014,2014-01-15 08:00:00,2014-02-20 02:30:07,350.0,35 days 18:30:07,1,35 days 18:30:07,350.000000
9,660240,2014,2014-01-20 08:00:00,2014-02-21 07:50:11,125.0,31 days 23:50:11,1,31 days 23:50:11,125.000000
10,661601,2014,2014-01-10 08:00:00,2014-02-25 09:50:03,1600.0,46 days 01:50:03,1,46 days 01:50:03,1600.000000
...,...,...,...,...,...,...,...,...,...
1571179,988180,2016,2015-11-23 08:00:00,2016-01-02 01:00:03,400.0,39 days 17:00:03,2,2 days 01:00:03,20.512821
1571181,988213,2016,2015-11-24 08:00:00,2016-01-02 16:40:07,300.0,39 days 08:40:07,2,2 days 16:40:07,15.384615
1571183,989109,2016,2015-11-13 08:00:00,2016-01-03 22:20:04,2425.0,51 days 14:20:04,2,3 days 22:20:04,142.647059
1571185,989143,2016,2015-11-03 08:00:00,2016-01-05 08:50:02,100.0,63 days 00:50:02,2,5 days 08:50:02,7.936508


In [701]:
df_loans_years_f_with_days.groupby(by=["year"]).sum().reset_index()[["year", "loan_year"]]

Unnamed: 0,year,loan_year
0,2005,102850.0
1,2006,1375175.0
2,2007,15439420.0
3,2008,39384420.0
4,2009,59477920.0
5,2010,72187300.0
6,2011,91608150.0
7,2012,109766300.0
8,2013,120862800.0
9,2014,148004400.0


In [702]:
df_loans_years_f_with_days[df_loans_years_f_with_days["year"] == 2018]

Unnamed: 0,loan_id,year,disburse_time,planned_expiration_time,funded_amount,duration,years_cardinality,days,loan_year
5238,1444070,2018,2018-01-10 08:00:00,2018-02-10 00:35:02,0.0,30 days 16:35:02,1,30 days 16:35:02,0.000000
18858,1431631,2018,2018-02-01 08:00:00,2018-01-13 13:00:05,1200.0,-19 days +05:00:05,1,-19 days +05:00:05,1200.000000
22039,1431467,2018,2018-02-01 08:00:00,2018-01-12 14:50:08,400.0,-20 days +06:50:08,1,-20 days +06:50:08,400.000000
22040,1435193,2018,2018-02-01 08:00:00,2018-01-19 18:10:02,525.0,-13 days +10:10:02,1,-13 days +10:10:02,525.000000
22122,1434376,2018,2018-02-01 08:00:00,2018-01-18 06:40:04,1150.0,-15 days +22:40:04,1,-15 days +22:40:04,1150.000000
...,...,...,...,...,...,...,...,...,...
1557858,1439323,2018,2017-11-28 08:00:00,2018-01-24 11:50:01,200.0,57 days 03:50:01,2,24 days 11:50:01,84.210526
1557860,1439616,2018,2017-12-18 08:00:00,2018-01-22 08:10:03,50.0,35 days 00:10:03,2,22 days 08:10:03,31.428571
1557862,1441388,2018,2017-12-20 08:00:00,2018-01-26 01:50:05,450.0,36 days 17:50:05,2,26 days 01:50:05,325.000000
1557864,1441556,2018,2017-12-15 08:00:00,2018-01-25 17:30:06,100.0,41 days 09:30:06,2,25 days 17:30:06,60.975610


In [703]:
df_loans_years_f_with_days.loan_year.sum()

1106551253.9235284

In [704]:
df_loans_years.funded_amount.sum()-df_loans_years_f_with_days.loan_year.sum()

23633866.076471567

In [705]:
df_loans_years_f_with_days[(df_loans_years_f_with_days.days.dt.days == -1)]

Unnamed: 0,loan_id,year,disburse_time,planned_expiration_time,funded_amount,duration,years_cardinality,days,loan_year
993,1140926,2016,2016-09-26 19:52:15,2016-09-26 15:40:02,10000.0,-1 days +19:47:47,1,-1 days +19:47:47,10000.0
7264,465042,2012,2012-09-27 07:00:00,2012-09-26 19:20:02,1200.0,-1 days +12:20:02,1,-1 days +12:20:02,1200.0
10436,1387119,2017,2017-11-01 07:00:00,2017-11-01 03:40:03,175.0,-1 days +20:40:03,1,-1 days +20:40:03,175.0
10646,1387111,2017,2017-11-01 07:00:00,2017-11-01 03:40:03,750.0,-1 days +20:40:03,1,-1 days +20:40:03,750.0
10876,1387143,2017,2017-11-01 07:00:00,2017-11-01 06:40:03,375.0,-1 days +23:40:03,1,-1 days +23:40:03,375.0
...,...,...,...,...,...,...,...,...,...
1559386,756098,2014,2014-10-06 07:00:00,2014-10-05 15:50:09,12700.0,-1 days +08:50:09,1,-1 days +08:50:09,12700.0
1566447,1043514,2016,2016-04-29 07:00:00,2016-04-29 00:50:04,1700.0,-1 days +17:50:04,1,-1 days +17:50:04,1700.0
1566663,1271610,2017,2017-05-24 22:52:36,2017-05-23 23:00:48,7000.0,-1 days +00:08:12,1,-1 days +00:08:12,7000.0
1566954,1096708,2016,2016-07-15 07:00:00,2016-07-14 12:20:03,150.0,-1 days +05:20:03,1,-1 days +05:20:03,150.0
