# Competition

[Kaggle Link](https://www.kaggle.com/competitions/tabular-playground-series-mar-2022/data)

# Tanım

- Tarih verilmiş
- x, y verilmiş 
- yön verilmiş
- trafik verilmiş/isteniyor
  
# Preprocessing

- Yön verisini One-Hot yapmak
- NaN yok
- Normalize bir veri
- Time verisini kullanılabilecek bir hale getirmek
  - Her 20 dakikayı ayırmak
  - Her saati ayırmak
  - Her günü ayırmak
  - Gün verisinden kurtulup sadece veri üstünde çalışmak
  
# Gruplar

## Preprocessing

- Ayfer Sinem Çoban
- Onur Ümit Şener
  
## Modelling

- Ata Güneş
- Mertcan Duran
- Oğulcan Akca
  
## Presentation

- Başak Topçuoğlu
- Saitcan Yıldırım


# Code

## Imports

In [103]:
import pandas as pd

## Data Init

In [104]:
raw_data = pd.read_csv('train.csv')
TOTAL_ROWS = raw_data.shape[0]

## Datetime Conversion


In [105]:
raw_data['time'] = pd.to_datetime(raw_data['time'])

In [106]:
df = raw_data.drop('row_id', axis=1).copy()

## Adding Hours, Minutes, Months

In [107]:
hours_list = []
minutes_list = []
month_list = []
season_list = []
for t in df['time']:
    hours_list.append(t.hour)
    minutes_list.append(t.minute // 20)
    month_list.append(t.month)


In [108]:
time_df = pd.DataFrame({'hours': hours_list,
                        'minutes': minutes_list,
                        'month': month_list,
                        })
df = pd.concat([df, time_df], axis=1)

In [109]:
df = df[['time', 'hours', 'minutes', 'month', 'x', 'y', 'direction', 'congestion']]

In [110]:
df.sample(5)

Unnamed: 0,time,hours,minutes,month,x,y,direction,congestion
393355,1991-06-24 07:40:00,7,2,6,2,0,NB,69
574980,1991-08-02 04:00:00,4,0,8,2,2,SB,43
791208,1991-09-17 21:20:00,21,1,9,1,2,NB,49
149766,1991-05-03 02:40:00,2,2,5,0,1,WB,49
639392,1991-08-15 23:00:00,23,0,8,2,2,NB,55


### Adding Days

In [111]:
weekdays = ("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
days = {}
days_list = [None] * TOTAL_ROWS
for i in range(7):
    days[i] = range(i, TOTAL_ROWS, 7)
for j in range(7):   
    for k in days[j]:
        days_list[k] = weekdays[j]
df.insert(1, 'days', days_list)

In [112]:
df.sample(5)

Unnamed: 0,time,days,hours,minutes,month,x,y,direction,congestion
792714,1991-09-18 05:00:00,Sunday,5,0,9,2,0,EB,63
211553,1991-05-16 09:20:00,Sunday,9,1,5,2,1,EB,69
499140,1991-07-16 22:20:00,Saturday,22,1,7,0,1,SB,56
11832,1991-04-03 12:40:00,Wednesday,12,2,4,0,0,SB,54
757432,1991-09-10 11:00:00,Friday,11,0,9,2,2,NB,55


## One-Hot

In [113]:
# df_oh['time'].str.get_dummies(' ') # Another method for one-hot, might be useful

In [114]:
df_oh = pd.get_dummies(df, dtype=int)

In [115]:
int_cols = ['minutes' ,'month', 'x', 'y']
int_df = pd.get_dummies(df, columns=int_cols, dtype=int)
df_oh = pd.concat([df_oh, int_df.iloc[:,5:]], axis=1)

In [116]:
df_oh.sample(5)

Unnamed: 0,time,hours,minutes,month,x,y,congestion,days_Friday,days_Monday,days_Saturday,...,month_7,month_8,month_9,x_0,x_1,x_2,y_0,y_1,y_2,y_3
487960,1991-07-14 13:00:00,13,0,7,0,1,58,1,0,0,...,1,0,0,1,0,0,0,1,0,0
839082,1991-09-28 09:40:00,9,2,9,2,3,70,0,0,0,...,0,0,1,0,0,1,0,0,0,1
240335,1991-05-22 13:20:00,13,1,5,1,2,58,1,0,0,...,0,0,0,0,1,0,0,0,1,0
374051,1991-06-20 04:40:00,4,2,6,2,0,40,0,0,0,...,0,0,0,0,0,1,1,0,0,0
532559,1991-07-24 02:40:00,2,2,7,0,3,12,0,0,0,...,1,0,0,1,0,0,0,0,0,1


# Model DataFrame

In [117]:
model_df = df_oh.copy()
model_df = model_df.iloc[:,6:]

In [118]:
print(model_df.columns.values)

['congestion' 'days_Friday' 'days_Monday' 'days_Saturday' 'days_Sunday'
 'days_Thursday' 'days_Tuesday' 'days_Wednesday' 'direction_EB'
 'direction_NB' 'direction_NE' 'direction_NW' 'direction_SB'
 'direction_SE' 'direction_SW' 'direction_WB' 'minutes_0' 'minutes_1'
 'minutes_2' 'month_4' 'month_5' 'month_6' 'month_7' 'month_8' 'month_9'
 'x_0' 'x_1' 'x_2' 'y_0' 'y_1' 'y_2' 'y_3']


# Export

Run this code to get the _.csv_ of the result

In [119]:
# model_df.to_csv('df.csv')