# Competition

[Kaggle Link](https://www.kaggle.com/competitions/tabular-playground-series-mar-2022/data)

# Tanım

- Tarih verilmiş
- x, y verilmiş 
- yön verilmiş
- trafik verilmiş/isteniyor
  
# Preprocessing

- Yön verisini One-Hot yapmak
- NaN yok
- Normalize bir veri
- Time verisini kullanılabilecek bir hale getirmek
  - Her 20 dakikayı ayırmak
  - Her saati ayırmak
  - Her günü ayırmak
  - Gün verisinden kurtulup sadece veri üstünde çalışmak
  
# Gruplar

## Preprocessing

- Ayfer Sinem Çoban
- Onur Ümit Şener
  
## Modelling

- Ata Güneş
- Mertcan Duran
- Oğulcan Akca
  
## Presentation

- Başak Topçuoğlu
- Saitcan Yıldırım


# Code

## Imports

In [102]:
import datetime

import matplotlib.pyplot as plt
import pandas as pd
import plotly.express as px

## Data Init

In [103]:
raw_data = pd.read_csv('train.csv')

In [104]:
INTERVAL_20_MINUTES = 65
INTERVAL_HOUR = 260
INTERVAL_DAY = 4680
INTERVAL_WEEK = 32760
TOTAL_ROWS = raw_data.shape[0]
INTERVALS = (INTERVAL_20_MINUTES,
             INTERVAL_HOUR,
             INTERVAL_DAY,
             INTERVAL_WEEK
             )

## Datetime Conversion


In [105]:
raw_data['time'] = pd.to_datetime(raw_data['time'])

In [106]:
df = raw_data.drop('row_id', axis=1).copy()

## Adding Hours, Minutes, Months

In [107]:
hours_list = []
minutes_list = []
month_list = []
season_list = []
for t in df['time']:
    hours_list.append(t.hour)
    minutes_list.append(t.minute // 20)
    month_list.append(t.month)


In [108]:
time_df = pd.DataFrame({'hours': hours_list,
                        'minutes': minutes_list,
                        'month': month_list,
                        })
df = pd.concat([df, time_df], axis=1)

In [109]:
df = df[['time', 'hours', 'minutes', 'month', 'x', 'y', 'direction', 'congestion']]

In [110]:
df.sample(5)

Unnamed: 0,time,hours,minutes,month,x,y,direction,congestion
208301,1991-05-15 16:40:00,16,2,5,2,0,SB,43
41890,1991-04-09 22:40:00,22,2,4,1,2,SB,58
748643,1991-09-08 14:00:00,14,0,9,1,3,WB,47
49331,1991-04-11 12:40:00,12,2,4,2,3,NE,26
584523,1991-08-04 05:00:00,5,0,8,2,1,EB,63


### Adding Days

In [111]:
weekdays = ("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
days = {}
days_list = [None] * TOTAL_ROWS
for i in range(7):
    days[i] = range(i, TOTAL_ROWS, 7)
for j in range(7):   
    for k in days[j]:
        days_list[k] = weekdays[j]
df.insert(1, 'days', days_list)

In [112]:
df.sample(5)

Unnamed: 0,time,days,hours,minutes,month,x,y,direction,congestion
789213,1991-09-17 10:00:00,Saturday,10,0,9,2,1,SE,34
134492,1991-04-29 20:20:00,Tuesday,20,1,4,0,2,EB,48
510252,1991-07-19 07:40:00,Tuesday,7,2,7,0,0,SB,46
59255,1991-04-13 15:40:00,Monday,15,2,4,2,0,NB,68
725509,1991-09-03 15:20:00,Tuesday,15,1,9,2,1,NB,43


## One-Hot

In [115]:
# df_oh['time'].str.get_dummies(' ') # Another method for one-hot, might be useful

In [113]:
df_oh = pd.get_dummies(df, dtype=int)
df_oh.drop('time', axis=1, inplace=True)

In [114]:
df_oh

Unnamed: 0,hours,minutes,month,x,y,congestion,days_Friday,days_Monday,days_Saturday,days_Sunday,...,days_Tuesday,days_Wednesday,direction_EB,direction_NB,direction_NE,direction_NW,direction_SB,direction_SE,direction_SW,direction_WB
0,0,0,4,0,0,70,0,1,0,0,...,0,0,1,0,0,0,0,0,0,0
1,0,0,4,0,0,49,0,0,0,0,...,1,0,0,1,0,0,0,0,0,0
2,0,0,4,0,0,24,0,0,0,0,...,0,1,0,0,0,0,1,0,0,0
3,0,0,4,0,1,18,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
4,0,0,4,0,1,60,1,0,0,0,...,0,0,0,1,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
848830,11,2,9,2,3,54,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
848831,11,2,9,2,3,28,1,0,0,0,...,0,0,0,0,1,0,0,0,0,0
848832,11,2,9,2,3,68,0,0,1,0,...,0,0,0,0,0,0,1,0,0,0
848833,11,2,9,2,3,17,0,0,0,1,...,0,0,0,0,0,0,0,0,1,0


# Export

Run this code to get the _.csv_ of the result

In [101]:
# df_oh.to_csv('df.csv')