# Electricity Consumption Prediction (ECP)

In this project we're going to predict electricity consumption of each user for future periods.

Here is some info and stats about the data:
* **Number of unique users: 124675**
* **Total number of records: 5601193**

Also you can find `headers.xslx` file in `dataset` directory to find some more descriptions about the header names or features.

In [15]:
# Import required packages
import pandas as pd

In [16]:
# Load the dataset
df = pd.read_csv("../dataset/dbBills.csv")

In [17]:
# Print top 5 reeocrds of dataset
print(df.head())

   xSubscriptionId_fk           xIdentityNo  xOmorCode  xBakhshCode  \
0             9397665  2/6/15/01/25/10/7651         15            1   
1             9396214  2/6/15/01/44/10/4321         15            1   
2             9396214  2/6/15/01/44/10/4321         15            1   
3             8952093  2/6/15/04/32/04/4870         15            4   
4             8952093  2/6/15/04/32/04/4870         15            4   

   xCycleCode  xMamorCode xRegionName  xIsTropical xUsageGroupName  \
0          25          10        شهری            0           عمومي   
1          44          10        شهری            0           عمومي   
2          44          10        شهری            0           عمومي   
3          32           4     روستایی            0           عمومي   
4          32           4     روستایی            0           عمومي   

   xFamilyNum  xTariffOldCode  xFaze  xAmper  xCounterBuldingNo  \
0           1            2990      3      25           12606909   
1           1     

In [18]:
# Clean Dataset

# df = df.drop('xSubscriptionId_fk', 1) 

df = df.drop('xIsTropical', 1) # Since it's a constant value
df = df.drop('xOmorCode', 1) # Since it's a constant value

# df = df.drop('xBillStartDate', 1) # We have to extract month and season / Cycle
# df = df.drop('xBillEndDate', 1) # We have to extract month and season / Cycle

df = df.drop('xCounterBuldingNo', 1) # Since we can use 'xTimeControlCode' field

df = df.drop('xIdentityNo', 1) # Since we can use other decomposed fields.

df = df.drop('xMamorCode', 1) # It seems there is no useful information in this field

# Map Persian strings to English strings for field: 'xRegionName'
region_maps = {'شهری': 'Shahri', 'روستایی': 'Roustaei'}
df['xRegionName'].replace(region_maps, inplace=True)

# One-Hot Encoding on categorical values
# use pd.concat to join the new columns with your original dataframe
df = pd.concat([df,pd.get_dummies(df['xRegionName'], prefix='xRegionName')],axis=1)

df = df.drop('xRegionName', 1) # Since we can use one-hot encoded features instead

# Map Persian strings to English strings for field: 'xUsageGroupName'
usage_group_maps = {'عمومي': 'Omoomi',
                    'خانگي': 'Khanegi',
                    'كشاورزي': 'Keshavarzi',
                    'ساير مصارف': 'Sayer',
                    'صنعتي': 'Sanati'}
df['xUsageGroupName'].replace(usage_group_maps, inplace=True)

# One-Hot Encoding on categorical values
# use pd.concat to join the new columns with your original dataframe
df = pd.concat([df,pd.get_dummies(df['xUsageGroupName'], prefix='xUsageGroupName')],axis=1)

df = df.drop('xUsageGroupName', 1) # Since we can use one-hot encoded features instead

# One-Hot Encoding on categorical values
# use pd.concat to join the new columns with your original dataframe
df = pd.concat([df,pd.get_dummies(df['xBakhshCode'], prefix='xBakhshCode')],axis=1)

df = df.drop('xBakhshCode', 1) # Since we can use one-hot encoded features instead

# One-Hot Encoding on categorical values
# use pd.concat to join the new columns with your original dataframe
df = pd.concat([df,pd.get_dummies(df['xTimeControlCode'], prefix='xTimeControlCode')],axis=1)

df = df.drop('xTimeControlCode', 1) # Since we can use one-hot encoded features instead

print(df.head())

# xCycleCode ???
# xTariffOldCode ???

# Save cleaned dataset
# df.to_csv('../dataset/dbBills_cleaned.csv')

   xSubscriptionId_fk  xCycleCode  xFamilyNum  xTariffOldCode  xFaze  xAmper  \
0             9397665          25           1            2990      3      25   
1             9396214          44           1            2990      3      25   
2             9396214          44           1            2990      3      25   
3             8952093          32           1            2990      3      25   
4             8952093          32           1            2990      3      25   

  xBillStartDate xBillEndDate  xMeduimKw  xHighKw         ...          \
0     1391/08/25   1391/09/09        538        0         ...           
1     1391/08/18   1391/09/09       1999        0         ...           
2     1391/09/09   1391/10/03          0        0         ...           
3     1386/11/27   1387/02/05       1065        0         ...           
4     1387/02/05   1387/04/05        560        0         ...           

   xUsageGroupName_Khanegi  xUsageGroupName_Omoomi  xUsageGroupName_Sanati  \
0 