### Predict the power (KW/h) produced from the windmills
Moving from traditional energy plans powered by fossils fuels to unlimited renewable energy subscriptions allows for instant access to clean energy without heavy investment in infrastructure like solar panels, for example.
<br>
One clean energy source that has been gaining popularity around the world is wind turbines. Turbines are massive structures that are strategically placed in perpetually windy places to generate the most energy. Wind energy is generated when the power of the atmosphere’s airflow is harnessed to create electricity. Wind turbines do this by capturing the kinetic energy of the wind. Factors such as temperature, wind direction, turbine status, weather, blade length, etc. influence the amount of power generated.

### TASK
Predict Power Generation value based on factors.

### Challenge 
Challenge hosted at https://www.hackerearth.com/challenges/competitive/hackerearth-machine-learning-challenge-predict-windmill-power/

In [62]:
#Importing pandas module to read dataset
import pandas as pd

In [63]:
#Reading the dataset into memory
dataset=pd.read_csv("dataset/train.csv")

In [64]:
dataset.head(5)

Unnamed: 0,tracking_id,datetime,wind_speed(m/s),atmospheric_temperature(°C),shaft_temperature(°C),blades_angle(°),gearbox_temperature(°C),engine_temperature(°C),motor_torque(N-m),generator_temperature(°C),...,windmill_body_temperature(°C),wind_direction(°),resistance(ohm),rotor_torque(N-m),turbine_status,cloud_level,blade_length(m),blade_breadth(m),windmill_height(m),windmill_generated_power(kW/h)
0,WM_33725,2019-08-04 14:33:20,94.820023,-99.0,41.723019,-0.903423,82.410573,42.523015,2563.124522,76.66556,...,,239.836388,2730.310605,42.084666,BA,Medium,2.217542,0.314065,24.281689,6.766521
1,WM_698,2018-11-05 10:13:20,241.832734,27.764785,-99.0,-99.0,44.104919,46.25887,2372.384119,78.129803,...,,337.944723,1780.2072,107.888643,A2,Medium,4.210346,0.448494,27.262139,5.966275
2,WM_39146,2019-09-14 14:03:20,95.484724,,41.855473,12.652763,42.322098,42.878552,1657.169646,67.654469,...,45.033197,227.850294,1666.0499,-42.931459,ABC,Medium,2.719475,0.302321,27.366127,2.874342
3,WM_6757,2018-12-25 15:33:20,238.819424,-99.0,45.443914,15.115323,44.759643,47.282101,2888.134079,95.389974,...,44.827154,492.08152,1964.502895,42.744596,ABC,,4.857385,0.36714,24.287767,14.851089
4,WM_21521,2019-05-04 03:13:20,10.72289,,41.981183,1.715696,-17.616459,43.469852,781.695419,37.423065,...,-99.0,259.274601,1177.516152,13.387289,AAA,Medium,,0.453374,27.97165,3.519074


In [65]:
dataset.isnull().sum()

tracking_id                          0
datetime                             0
wind_speed(m/s)                    273
atmospheric_temperature(°C)       3450
shaft_temperature(°C)                2
blades_angle(°)                    216
gearbox_temperature(°C)              1
engine_temperature(°C)              12
motor_torque(N-m)                   24
generator_temperature(°C)           12
atmospheric_pressure(Pascal)      2707
area_temperature(°C)                 0
windmill_body_temperature(°C)     2363
wind_direction(°)                 5103
resistance(ohm)                      1
rotor_torque(N-m)                  572
turbine_status                    1759
cloud_level                        276
blade_length(m)                   5093
blade_breadth(m)                     0
windmill_height(m)                 543
windmill_generated_power(kW/h)     207
dtype: int64

In [66]:
dataset.dtypes

tracking_id                        object
datetime                           object
wind_speed(m/s)                   float64
atmospheric_temperature(°C)       float64
shaft_temperature(°C)             float64
blades_angle(°)                   float64
gearbox_temperature(°C)           float64
engine_temperature(°C)            float64
motor_torque(N-m)                 float64
generator_temperature(°C)         float64
atmospheric_pressure(Pascal)      float64
area_temperature(°C)              float64
windmill_body_temperature(°C)     float64
wind_direction(°)                 float64
resistance(ohm)                   float64
rotor_torque(N-m)                 float64
turbine_status                     object
cloud_level                        object
blade_length(m)                   float64
blade_breadth(m)                  float64
windmill_height(m)                float64
windmill_generated_power(kW/h)    float64
dtype: object

### Preprocessing 
Here we preprocess the dataset and replace null values of float64 columns with mean values. 
Again we replace the null values in object dataset with NA.

In [67]:
#Replacing null value of float64 columns with mean
dtype=dataset.dtypes
col=dataset.columns
for itr in range(len(dtype)):
    if dtype[itr]==object:
        continue
    mean=dataset[col[itr]].mean()
    dataset[col[itr]].fillna(mean,inplace=True)

In [68]:
dataset.isnull().sum()

tracking_id                          0
datetime                             0
wind_speed(m/s)                      0
atmospheric_temperature(°C)          0
shaft_temperature(°C)                0
blades_angle(°)                      0
gearbox_temperature(°C)              0
engine_temperature(°C)               0
motor_torque(N-m)                    0
generator_temperature(°C)            0
atmospheric_pressure(Pascal)         0
area_temperature(°C)                 0
windmill_body_temperature(°C)        0
wind_direction(°)                    0
resistance(ohm)                      0
rotor_torque(N-m)                    0
turbine_status                    1759
cloud_level                        276
blade_length(m)                      0
blade_breadth(m)                     0
windmill_height(m)                   0
windmill_generated_power(kW/h)       0
dtype: int64

In [69]:
#Replacing attributes of object type with NA
dataset["turbine_status"].fillna("NA",inplace=True)
dataset["cloud_level"].fillna("NA",inplace=True)

In [70]:
#Formatting datetime 
dataset["datetime"]= pd.to_datetime(dataset['datetime'], format='%Y-%m-%d')

In [71]:
dataset["datetime"]=dataset["datetime"].astype(str)

In [72]:
#Label Encoding columns into one-hot encoding
from sklearn.preprocessing import LabelEncoder
le=LabelEncoder()
dataset["turbine_status"]=le.fit_transform(dataset["turbine_status"])
dataset["cloud_level"]=le.fit_transform(dataset["cloud_level"])
dataset["datetime"]=le.fit_transform(dataset["datetime"])
dataset["tracking_id"]=le.fit_transform(dataset["tracking_id"])

In [73]:
#Extracting target columns
target_col=dataset["windmill_generated_power(kW/h)"]
dataset.drop(["windmill_generated_power(kW/h)"],axis=1,inplace=True)

### Model Selection
Here we use Random Forest Regressor model to train the data (X,y) and then use it for predictions on our test dataset.

In [49]:
from sklearn.ensemble import RandomForestRegressor
model=RandomForestRegressor()
model.fit(dataset.values,target_col.values)



RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=None,
           oob_score=False, random_state=None, verbose=0, warm_start=False)

### Working on Test Dataset
Now we extract data from our test dataset and perform all the operations on the dataset as stated above. 

In [76]:
test_dataset=dataset=pd.read_csv("dataset/test.csv")

In [77]:
df=test_dataset.copy()

In [78]:
dtype=df.dtypes
col=df.columns
for itr in range(len(dtype)):
    if dtype[itr]==object:
        continue
    mean=df[col[itr]].mean()
    df[col[itr]].fillna(mean,inplace=True)

In [79]:
df["datetime"]= pd.to_datetime(df['datetime'], format='%Y-%m-%d')
df["datetime"]=df["datetime"].astype(str)

In [80]:
df["turbine_status"].fillna("NA",inplace=True)
df["cloud_level"].fillna("NA",inplace=True)

In [81]:
from sklearn.preprocessing import LabelEncoder
le=LabelEncoder()
df["turbine_status"]=le.fit_transform(df["turbine_status"])
df["cloud_level"]=le.fit_transform(df["cloud_level"])
df["datetime"]=le.fit_transform(df["datetime"])
df["tracking_id"]=le.fit_transform(df["tracking_id"])

### Predictions 
Here we use data from the processed test dataset and feed it to the model for performing the prediction task.

In [82]:
predictions=model.predict(df.values)

  "because it will generate extra copies and increase " +


### Exporting to DataFrame 
Here we frame our data into pandas DataFrame and export it to the .csv file.

In [83]:
result_frame=pd.DataFrame({"tracking_id":test_dataset["tracking_id"],"datetime":test_dataset["datetime"],"windmill_generated_power(kW/h)":pd.Series(predictions)})

In [84]:
result_frame.to_csv("output.csv",index=False)