# Competition

[Kaggle Link](https://www.kaggle.com/competitions/tabular-playground-series-mar-2022/data)

# Tanım

- Tarih verilmiş
- x, y verilmiş 
- yön verilmiş
- trafik verilmiş/isteniyor
  
# Preprocessing

- Yön verisini One-Hot yapmak
- NaN yok
- Normalize bir veri
- Time verisini kullanılabilecek bir hale getirmek
  - Her 20 dakikayı ayırmak
  - Her saati ayırmak
  - Her günü ayırmak
  - Gün verisinden kurtulup sadece veri üstünde çalışmak
  
# Gruplar

## Preprocessing

- Ayfer Sinem Çoban
- Onur Ümit Şener
  
## Modelling

- Ata Güneş
- Mertcan Duran
- Oğulcan Akca
  
## Presentation

- Başak Topçuoğlu
- Saitcan Yıldırım


# Code

## Imports

In [96]:
import pandas as pd

## Data Init

In [41]:
raw_data = pd.read_csv('train.csv')

In [42]:
INTERVAL_20_MINUTES = 65
INTERVAL_HOUR = 260
INTERVAL_DAY = 4680
INTERVAL_WEEK = 32760
TOTAL_ROWS = raw_data.shape[0]
INTERVALS = (INTERVAL_20_MINUTES,
             INTERVAL_HOUR,
             INTERVAL_DAY,
             INTERVAL_WEEK
             )

## Datetime Conversion


In [43]:
raw_data['time'] = pd.to_datetime(raw_data['time'])

In [44]:
df = raw_data.drop('row_id', axis=1).copy()

## Adding Hours, Minutes, Months

In [45]:
hours_list = []
minutes_list = []
month_list = []
season_list = []
for t in df['time']:
    hours_list.append(t.hour)
    minutes_list.append(t.minute // 20)
    month_list.append(t.month)


In [46]:
time_df = pd.DataFrame({'hours': hours_list,
                        'minutes': minutes_list,
                        'month': month_list,
                        })
df = pd.concat([df, time_df], axis=1)

In [47]:
df = df[['time', 'hours', 'minutes', 'month', 'x', 'y', 'direction', 'congestion']]

In [48]:
df.sample(5)

Unnamed: 0,time,hours,minutes,month,x,y,direction,congestion
384551,1991-06-22 10:40:00,10,2,6,0,3,EB,29
382228,1991-06-21 22:40:00,22,2,6,1,2,NB,60
784377,1991-09-16 09:20:00,9,1,9,1,0,WB,32
211838,1991-05-16 11:00:00,11,0,5,0,1,EB,21
584101,1991-08-04 03:00:00,3,0,8,0,3,EB,33


### Adding Days

In [49]:
weekdays = ("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
days = {}
days_list = [None] * TOTAL_ROWS
for i in range(7):
    days[i] = range(i, TOTAL_ROWS, 7)
for j in range(7):   
    for k in days[j]:
        days_list[k] = weekdays[j]
df.insert(1, 'days', days_list)

In [50]:
df.sample(5)

Unnamed: 0,time,days,hours,minutes,month,x,y,direction,congestion
765863,1991-09-12 08:20:00,Monday,8,1,9,1,3,EB,26
248153,1991-05-24 05:20:00,Thursday,5,1,5,2,1,SE,34
828616,1991-09-26 04:00:00,Saturday,4,0,9,2,3,NE,33
831494,1991-09-26 19:00:00,Sunday,19,0,9,0,3,SB,60
256702,1991-05-26 01:20:00,Saturday,1,1,5,1,0,EB,39


## One-Hot

In [51]:
# df_oh['time'].str.get_dummies(' ') # Another method for one-hot, might be useful

In [78]:
df_oh = pd.get_dummies(df, dtype=int)

In [79]:
int_cols = ['minutes' ,'month', 'x', 'y']
int_df = pd.get_dummies(df, columns=int_cols, dtype=int)
df_oh = pd.concat([df_oh, int_df.iloc[:,5:]], axis=1)

In [80]:
df_oh.sample(5)

Unnamed: 0,time,hours,minutes,month,x,y,congestion,days_Friday,days_Monday,days_Saturday,...,month_7,month_8,month_9,x_0,x_1,x_2,y_0,y_1,y_2,y_3
790156,1991-09-17 16:00:00,16,0,9,0,3,44,0,0,0,...,0,0,1,1,0,0,0,0,0,1
27586,1991-04-06 21:20:00,21,1,4,1,1,56,0,0,0,...,0,0,0,0,1,0,0,1,0,0
28038,1991-04-06 23:40:00,23,2,4,1,1,52,0,0,0,...,0,0,0,0,1,0,0,1,0,0
821659,1991-09-24 16:20:00,16,1,9,2,3,36,0,0,0,...,0,0,1,0,0,1,0,0,0,1
510740,1991-07-19 10:00:00,10,0,7,1,3,84,0,0,0,...,1,0,0,0,1,0,0,0,0,1


# Model DataFrame

In [86]:
model_df = df_oh.copy()
model_df = model_df.iloc[:,6:]

In [97]:
print(model_df.columns.values)

['congestion' 'days_Friday' 'days_Monday' 'days_Saturday' 'days_Sunday'
 'days_Thursday' 'days_Tuesday' 'days_Wednesday' 'direction_EB'
 'direction_NB' 'direction_NE' 'direction_NW' 'direction_SB'
 'direction_SE' 'direction_SW' 'direction_WB' 'minutes_0' 'minutes_1'
 'minutes_2' 'month_4' 'month_5' 'month_6' 'month_7' 'month_8' 'month_9'
 'x_0' 'x_1' 'x_2' 'y_0' 'y_1' 'y_2' 'y_3']


# Export

Run this code to get the _.csv_ of the result

In [101]:
model_df.to_csv('df.csv')