<h2><center> Welcome to the TotalEnergies rEVolution Hackathon</h2></center>
<figure>
<!--<img src ="https://drive.google.com/uc?export=view&id=1hSOAfRhJ_jo-MZAjq81VYJu5bZNL7EjD" width = "800" height = '500'/> -->

***Pelogue***
>TotalEnergies Uganda welcomes you to the TotalEnergies rEVolution Hackathon, implemented by Outbox Uganda, to deliver optimal locations of EV charging points in Kampala considering several constraints like traffic, optimum coverage and population density and usage.

> This challenge serves as a qualification for the grand challenge NB: This challenge serves as a qualifier to be selected for the [TotalEnergies rEVolution Hackathon ($10 000 in prizes)](https://zindi.africa/competitions/total-energies-revolution-hackathon-finals) to be hosted virtually on 18-20 September 2023. To qualify for the hackathon, you need to be in a team of 4 members and make a submission to this challenge.

***About the problem***
>TotalEnergies Uganda welcomes you to the TotalEnergies rEVolution Hackathon, implemented by Outbox Uganda, to deliver optimal locations of EV charging points in Kampala considering several constraints like traffic, optimum coverage and population density and usage.

>The lack of optimal locations for electric vehicle (EV) charging points in Kampala presents a significant challenge for the adoption and usage of electric vehicles. Factors such as traffic congestion, optimum coverage, population density, and usage patterns need to be considered to ensure the effective implementation of EV charging infrastructure. Therefore, there is a need to address this problem by identifying and delivering the most suitable locations for EV charging points in Kampala.

>Nairobi is one of the most heavily congested cities in Africa. Each day thousands of Kenyans make the trip into Nairobi from towns such as Kisii, Keroka, and beyond for work, business, or to visit friends and family. The journey can be long, and the final approach into the city can impact the length of the trip significantly depending on traffic. How do traffic patterns influence people's decisions to come into the city by bus and which bus to take? Does knowing the traffic patterns in Nairobi help anticipate the demand for particular routes at particular times?

***Objective of this challenge***
> The primary objective of the hackathon is to develop innovative solutions that can identify the optimal locations for EV charging points in Kampala. Participants will be encouraged to leverage data analytics, machine learning, and other relevant technologies to analyse various constraints, such as traffic patterns, coverage requirements, population density, and EV usage.

>The aim of the competition is to create a predictive model using traffic data provided from Uber Movement and historic bus ticket sales data from Mobiticket to predict the number of tickets that will be sold for buses into Nairobi from cities in "up country" Kenya.

***About the Data***
>The data used to train the model will be historic hourly traffic patterns in Nairobi and historic ticket purchasing data for 14 bus routes into Nairobi from October 2017 to April 2018, and includes the place or origin, the scheduled time of departure, the channel used for the purchase, the type of vehicle, the capacity of the vehicle, and the assigned seat number. Zindi competitors will be allowed to create their own customized traffic datasets using the Uber Movement platform.

***Evaluation metric***
> The Mean Absolute Error will be used to evaluate accuracy of the submitted solutions. So the lower the score the better!

***Relevance of the Challenge***
>This resulting model can be used to anticipate customer demand for certain rides, to manage resources and vehicles more efficiently, to offer promotions and sell other services more effectively, such as micro-insurance, or even improve customer service by being able to send alerts and other useful information to customers

>The solutions to this challenge are the first step towards solving Nairobi's traffic problems. We look forward to taking this journey with you!

# Contents

# install libraries

In [14]:
%%capture
!pip install watermark
!pip install pandas-profiling
!pip install pandas
!pip install numpy
!pip install matplotlib
!pip install seaborn
!pip install scikit-learn
!pip install xgboost
!pip install lightgbm
!pip install missingno
!pip install tqdm

# Import libraries

In [41]:
import pandas as pd
import numpy as np
import os
import random
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error

### Seed

In [42]:
# Set seed for reproducibility
SEED = 2023
random.seed(SEED)
np.random.seed(SEED)

### Load data

In [43]:
# Load files
train_1 = pd.read_csv(os.path.join('../TotalEnergies/data/Train.csv'))
test_1 = pd.read_csv(os.path.join('../TotalEnergies/data/Test.csv'))
test = pd.read_csv('../TotalEnergies/data/Test.csv', parse_dates=['travel_date'], dayfirst=True).drop(['car_type', 'travel_to'], axis=1)
df = pd.read_csv('../TotalEnergies/data/Train.csv', parse_dates=['travel_date'], dayfirst=True)
train = df.groupby(['ride_id', 'travel_date', 'travel_time', 'travel_from', 'max_capacity']).size().reset_index(name='Count') #sort=False if needed?

submission = pd.read_csv(os.path.join('../TotalEnergies/submissions/SampleSubmission.csv'))

# Preview train dataset
submission.head()

Unnamed: 0,ride_id,number_of_ticket
0,4446,0
1,13962,0
2,5569,0
3,1675,0
4,5711,0


In [44]:
# Preview test dataset
test.head()

Unnamed: 0,ride_id,travel_date,travel_time,travel_from,max_capacity
0,4446,2018-04-27,09:00,Kisii,11
1,13962,2018-04-23,07:10,Homa Bay,49
2,5569,2018-04-24,07:20,Kisii,11
3,1675,2018-05-01,11:01,Kisii,11
4,5711,2018-04-22,10:51,Kisii,11


In [46]:
# Preview train dataset
train.describe()

Unnamed: 0,ride_id,max_capacity,Count
count,6249.0,6249.0,6249.0
mean,9963.644583,30.392223,8.264522
std,2296.304872,18.997471,8.632968
min,1442.0,11.0,1.0
25%,7989.0,11.0,2.0
50%,10024.0,49.0,7.0
75%,11917.0,49.0,11.0
max,20117.0,49.0,50.0


### Exploratory Data Analysis

### Data Description

In [18]:
# check the shape of the data
print("Train shape: ", train.shape)
print("Test shape: ", test.shape)
print("Submission shape: ", submission.shape)

Train shape:  (6249, 6)
Test shape:  (889, 5)
Submission shape:  (889, 2)


# Data Cleaning

In [19]:

test["travel_time"] = test["travel_time"].str.split(':').apply(lambda x: int(x[0]) * 60 + int(x[1]))
test['day'] = test['travel_date'].dt.dayofweek

In [20]:
train['t'] = 0
test['t'] = 1

In [21]:
X = pd.concat([train, test], sort=False)

In [22]:
Xd = pd.get_dummies(X, columns=['travel_from', 'day'])

In [23]:
Xd.head()

Unnamed: 0,ride_id,travel_date,travel_time,max_capacity,Count,t,travel_from_Awendo,travel_from_Homa Bay,travel_from_Kehancha,travel_from_Kendu Bay,...,travel_from_Rongo,travel_from_Sirare,travel_from_Sori,day_0.0,day_1.0,day_2.0,day_3.0,day_4.0,day_5.0,day_6.0
0,1442,2017-10-17,7:15,49,1.0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,5437,2017-11-19,7:12,49,1.0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,5710,2017-11-26,7:05,49,1.0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,5777,2017-11-27,7:10,49,5.0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
4,5778,2017-11-27,7:12,49,31.0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [37]:
# Time in minutes
train["travel_time"] = train["travel_time"].str.split(':').apply(lambda x: int(x[1]) * 60 + int(x[2]))
train["travel_time"].head()

IndexError: list index out of range

In [36]:
train['day'] = train['travel_date'].dt.dayofweek

# Modeling

In [34]:
X_train = Xd.loc[Xd['t'] == 0].drop(['Count', 'ride_id', 'travel_date'], axis=1)
y_train = Xd.loc[Xd['t'] == 0]['Count']
regr = RandomForestRegressor(n_estimators=500, criterion="absolute_error", max_depth=10, n_jobs=-1)
regr.fit(X_train, y_train)

ValueError: could not convert string to float: '7:15'

In [None]:
print(mean_absolute_error(regr.predict(X_train), y_train))

# Model Evaluation

In [None]:
X_test = Xd.loc[Xd['t'] == 1].drop(['Count', 'ride_id', 'travel_date'], axis=1)
pred = regr.predict(X_test)


# Create Solution

In [None]:
submission['number_of_ticket'][5:] = pred[5:]
submission.head(10)

# Create Submission File

In [21]:
submission.to_csv('number_of_ticket.csv', index=False)

# References