# This notebook will try to get a high score in Bike Sharing demand using just 10 lines of code!
### This dataset is tough: It is a multi-label problem with time series in it. It also requires tremendous feature engg to get a high score but we will do it using advanced  Python libraries such as Featurewiz and LazyTransform.

In [None]:
!pip install featurewiz --ignore-installed --no-deps
!pip install xlrd --ignore-installed --no-deps

### You need to install this since Kaggle has a wrong version ##
!pip install Pillow==9.0.0

In [None]:
!pip install category_encoders==2.4.0

In [None]:
!pip install lazytransform --ignore-installed --no-deps

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
datapath = '/kaggle/input/bike-sharing-demand/'
filename = 'train.csv'
trainfile = datapath+filename

In [None]:
train = pd.read_csv(trainfile)
print(train.shape)
train.head(1)


In [None]:
target = ['casual','registered','count']

### Since we have been given multiple targets, only featurewiz can handle multi-target (multi-output) regressions using a new estimator called SuloRegressor that is available exclusively in featurewiz library.

In [None]:
testfile = datapath + 'test.csv'
test = pd.read_csv(testfile)
print(test.shape)
test.head(1)


In [None]:
filename = 'sampleSubmission.csv'
subm = pd.read_csv(datapath+filename)
print(subm.shape)
subm.head()

In [None]:
idvars = []
idvars

In [None]:
if isinstance(target, str):
    preds = [x for x in list(train) if x not in idvars+[target]]
else:
    preds = [x for x in list(train) if x not in idvars+target]
len(preds)

In [None]:
### We need to set the X and y variables here
X_train = train[preds]
y_train = train[target]
y_train[-3:]

# You can now count the number of cells from here. We will crack the problem in under 10 cells.

### It appears that the target can vary a lot from single digits to well over 150. This means we need to use log-transformations to predict the target. Luckily for us, SuloRegressor is a new Estimator from featurewiz library. You can click the link here to check it out:
<a href="https://github.com/AutoViML/featurewiz"><img src="https://i.ibb.co/ZLdZMZg/featurewiz-logos.png" alt="featurewiz-logos" border="0"></a>

In [None]:
### Featurewiz has 2 new estimators
import featurewiz as FW
from featurewiz import SuloClassifier, SuloRegressor

In [None]:
date_col = 'datetime'
train[date_col] = pd.to_datetime(train[date_col])
test[date_col] = pd.to_datetime(test[date_col])

In [None]:
X_train = train[preds]
print(X_train.shape, y_train.shape)
X_test = test[preds]
print(X_test.shape)

# We are going to use LazyTransformer to transform all categorical, date-time variables in this dataset to numeric variables

In [None]:
from lazytransform import LazyTransformer

In [None]:
lazy = LazyTransformer(encoders='label', combine_rare=False, verbose=2)
X_train, y_train = lazy.fit_transform(X_train, y_train)

In [None]:
X_test = lazy.transform(X_test)
print(X_test.shape)

# SuloRegressor will do the job. Notice that we will use "log_transform" = True to convert the targets to log(targets) automatically

In [None]:
spe = SuloRegressor(base_estimator=None, n_estimators=None, 
                     pipeline=True, imbalanced=False, 
                    log_transform=True,
                     integers_only=False, verbose=1)

In [None]:
spe.fit(X_train, y_train)

In [None]:
y_preds = spe.predict(X_test)
y_preds

In [None]:
subm = pd.read_csv(datapath+'sampleSubmission.csv')
subm[target[-1]] = y_preds[:,-1]
subm.head()

# We exactly finished the problem in 10 cells. So now we submit and get our high score of 0.40. If you liked this notebook, please take a look at the GitHub:
https://github.com/AutoViML

In [None]:
subm.to_csv('submission.csv', index=False)