## Setup
### Importing the required Libraries

Run the next cell to import and configure the Python libraries and set up coding environment.

In [None]:
## Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python.
import pandas as pd
## Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. Matplotlib can be used in Python scripts, the Python and IPython shells, the Jupyter notebook, web application servers, and four graphical user interface toolkits.
import matplotlib.pyplot as plt
## `%matplotlib` is a magic function in IPython. With this, the output of plotting commands is displayed inline within frontends like the Jupyter notebook, directly below the code cell that produced it. The resulting plots will then also be stored in the notebook document.
%matplotlib inline
## Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
import seaborn as sns
import numpy as np
print("Setup Complete")

Run the following cell to set up the feedback system.

In [None]:
# Set up code checking
## The checking code and notebooks used in Kaggle Learn courses.
from learntools.core import binder
binder.bind(globals())
from learntools.data_viz_to_coder.ex7 import *
print("Setup Complete")

## Step 1: Attach a dataset to the notebook

Begin by selecting a CSV dataset from [Kaggle Datasets](https://www.kaggle.com/datasets).  If you're unsure how to do this or would like to work with your own data, please revisit the instructions in the previous tutorial.

Once you have selected a dataset, click on the **[+ ADD DATASET]** option in the top right corner.  This will generate a pop-up window that you can use to search for your chosen dataset.  

![ex6_search_dataset](https://i.imgur.com/QDEKwYp.png)

Once you have found the dataset, click on the **[Add]** button to attach it to the notebook.  You can check that it was successful by looking at the **Workspace** dropdown menu to the right of the notebook -- look for an **input** folder containing a subfolder that matches the name of the dataset.

![ex6_dataset_added](https://i.imgur.com/oVlEBPx.png)

You can click on the carat to the right of the name of the dataset to double-check that it contains a CSV file.  For instance, the image below shows that the example dataset contains two CSV files: (1) **dc-wikia-data.csv**, and (2) **marvel-wikia-data.csv**.

![ex6_dataset_dropdown](https://i.imgur.com/4gpFw71.png)

Once you've uploaded a dataset with a CSV file, run the code cell below **without changes** to receive credit for your work!

In [None]:
# Check for a dataset with a CSV file
step_1.check()

## Step 2: Specify the filepath

Now that the dataset is attached to the notebook, you can find its filepath.  To do this, use the **Workspace** menu to list the set of files, and click on the CSV file you'd like to use.  This will open the CSV file in a tab below the notebook.  You can find the filepath towards the top of this new tab.  

![ex6_filepath](https://i.imgur.com/pWe0sVb.png)

After you find the filepath corresponding to your dataset, fill it in as the value for `my_filepath` in the code cell below, and run the code cell to check that you've provided a valid filepath.  For instance, in the case of this example dataset, we would set
```
my_filepath = "../input/dc-wikia-data.csv"
```  
Note that **you must enclose the filepath in quotation marks**; otherwise, the code will return an error.

Once you've entered the filepath, you can close the tab below the notebook by clicking on the **[X]** at the top of the tab.

In [None]:
# Specify the path of the CSV file to read
my_filepath = '../input/smart-home-dataset-with-weather-information/HomeC.csv'
# Check for a valid filepath to a CSV file in a dataset
step_2.check()

## Step 3: Load the data

Use the next code cell to load data file into `my_data`.  Use the filepath that was specified in the previous step.

### ADD a Date Time Index to dataset to be more meaningful. 
  ####  Dataset use Unix epoch timestamps for min so I calculate the start time to generate the data time index.

In [None]:
# Read the data file into a variable my_data
## pandas.read_csv: Read a comma-separated values (csv) file into DataFrame.
my_data = pd.read_csv(my_filepath  ,   parse_dates=True)
my_data.info()

Some columns are not numerical. In this project we do not need them.

In [None]:
# Remove the type of data is object in the dataset.
home_dat = my_data.select_dtypes(exclude=['object'])

## you can convert a time from unix epoch timestamp to normal stamp using import time 
## print( ' start ' , time.strftime('%Y-%m-%d %H:%S', time.localtime(1451624400)))

# Data publisher says the dataset contains the readings with a time span of 1 minute of house appliances in kW from a smart meter and weather conditions of that particular region. So, I set freq='min' and convert Uinx time to readable date.
time_index = pd.date_range('2016-01-01 00:00', periods=503911, freq='min')
time_index = pd.DatetimeIndex(time_index)
home_dat = home_dat.set_index(time_index)
# Check that a dataset has been uploaded into my_data
step_3.check()

# Data Preparation :

In [None]:
# Print the first 10 rows of the data
## pandas.DataFrame.head: This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.
home_dat.head(10)

In [None]:
# Print the last 10 rows of the data
## This function returns last n rows from the object based on position. It is useful for quickly verifying data, for example, after sorting or appending rows.
home_dat.tail(10)

We see that the last row is invalid, so let's remove it.

In [None]:
home_dat = home_dat[0:-1] ## == dataset[0:dataset.shape[0]-1] == dataset[0:len(dataset)-1] == dataset[:-1]
home_dat.tail()

### It can be seen from the above that the time recorded in the dataset is from 2016-01-01 00:00:00 to 2016-12-15 22:29:00.

In [None]:
# Separate two different Attributes
energy_data = home_dat.filter(items=['use [kW]', 'gen [kW]', 'House overall [kW]', 
                                     'Dishwasher [kW]', 'Furnace 1 [kW]', 'Furnace 2 [kW]', 
                                     'Home office [kW]', 'Fridge [kW]', 'Wine cellar [kW]', 
                                     'Garage door [kW]', 'Kitchen 12 [kW]', 'Kitchen 14 [kW]', 
                                     'Kitchen 38 [kW]', 'Barn [kW]', 'Well [kW]',
                                     'Microwave [kW]', 'Living room [kW]', 'Solar [kW]'])

weather_data = home_dat.filter(items=['temperature','humidity', 'apparentTemperature'])

In [None]:
# Print the first 5 rows of the energy data
energy_data.head()

In [None]:
# Print the first 5 rows of the weather data
weather_data.head()

## Generate Data per day and month :

In [None]:
# Genetate data per day
## pandas.DataFrame.resample: Convenience method for frequency conversion and resampling of time series.
energy_per_day = energy_data.resample('D').sum() # for energy we use sum to calculate overall consumption in period
energy_per_day.head()

> Here are the `rule`s you can use:
- B         business day frequency
- C         custom business day frequency (experimental)
- D         calendar day frequency
- W         weekly frequency
- M         month end frequency
- SM        semi-month end frequency (15th and end of month)
- BM        business month end frequency
- CBM       custom business month end frequency
- MS        month start frequency
- SMS       semi-month start frequency (1st and 15th)
- BMS       business month start frequency
- CBMS      custom business month start frequency
- Q         quarter end frequency
- BQ        business quarter endfrequency
- QS        quarter start frequency
- BQS       business quarter start frequency
- A         year end frequency
- BA, BY    business year end frequency
- AS, YS    year start frequency
- BAS, BYS  business year start frequency
- BH        business hour frequency
- H         hourly frequency
- T, min    minutely frequency
- S         secondly frequency
- L, ms     milliseconds
- U, us     microseconds
- N         nanoseconds

It seems `use [kW]` and `House overall [kW]` show the same data. Let's visualize these two columns.

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=1)
energy_per_day['use [kW]'].plot(ax=axes[0],figsize=(20,10))
energy_per_day['House overall [kW]'].plot(ax=axes[1],figsize=(20,10))

They are same. It's better to remove one of them.

In [None]:
energy_data = energy_data.drop(columns=['use [kW]'])
energy_per_day = energy_per_day.drop(columns=['use [kW]'])

Similarly, it seems `gen [kW]` and `Solar [kW]` show the same data. Let's visualize these two columns.

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=1)
energy_per_day['gen [kW]'].plot(ax=axes[0],figsize=(20,10))
energy_per_day['Solar [kW]'].plot(ax=axes[1],figsize=(20,10))

They are same. It's better to remove one of them.

In [None]:
energy_data = energy_data.drop(columns=['gen [kW]'])
energy_per_day = energy_per_day.drop(columns=['gen [kW]'])

## Step 4: Visualise the data

Use the next code cell to create a figure that tells a story behind the dataset.

Then plot the everyday data of energy.

In [None]:
# Set the width and height of the figure
plt.figure(figsize=(20,10))

# Add title
plt.title("Overall energy consumption per day")
sns.lineplot(data = energy_per_day.filter(items=['House overall [kW]']), dashes=False)

In [None]:
energy_per_month = energy_data.resample('M').sum()
plt.figure(figsize=(20,10))
plt.title("Overall energy consumption per month")
sns.lineplot(data = energy_per_month.filter(items=['House overall [kW]']), dashes=False)
# use power == house overall
# gen power == solar

### we can note that :  in August  and September the highest consumption in the year then the lowest consumption in July and December.

In [None]:
plt.figure(figsize=(20,10))
plt.title("Each appliance energy consumption per day")
sns.lineplot(data = energy_per_day.filter(items=['Dishwasher [kW]', 'Furnace 1 [kW]', 'Furnace 2 [kW]', 
                                     'Home office [kW]', 'Fridge [kW]', 'Wine cellar [kW]', 
                                     'Garage door [kW]', 'Kitchen 12 [kW]', 'Kitchen 14 [kW]', 
                                     'Kitchen 38 [kW]', 'Barn [kW]', 'Well [kW]',
                                     'Microwave [kW]', 'Living room [kW]']), dashes=False)

In [None]:
plt.figure(figsize=(20,10))
plt.title("Each appliance energy consumption per month")
sns.lineplot(data = energy_per_month.filter(items=['Dishwasher [kW]', 'Furnace 1 [kW]', 'Furnace 2 [kW]', 
                                     'Home office [kW]', 'Fridge [kW]', 'Wine cellar [kW]', 
                                     'Garage door [kW]', 'Kitchen 12 [kW]', 'Kitchen 14 [kW]', 
                                     'Kitchen 38 [kW]', 'Barn [kW]', 'Well [kW]',
                                     'Microwave [kW]', 'Living room [kW]']), dashes=False)

In [None]:
energy_per_month.head(12)

In [None]:
plt.figure(figsize=(20,10))
plt.title("Devices energy consumption")

# Plot the devices consumption
sns.lineplot(data = energy_per_month.filter(items=['Dishwasher [kW]', 'Furnace 1 [kW]', 'Furnace 2 [kW]', 
                                     'Fridge [kW]', 'Garage door [kW]', 'Well [kW]',
                                     'Microwave [kW]']), dashes=False)

In [None]:
plt.figure(figsize=(20,10))
plt.title("Rooms energy consumption")

# Plot the rooms consumption 
sns.lineplot(data = energy_per_month.filter(items=[      # remove the devices consumption 
                                     'Home office [kW]', 'Wine cellar [kW]', 'Kitchen 12 [kW]',
                                     'Kitchen 14 [kW]', 'Kitchen 38 [kW]', 'Barn [kW]',
                                      'Living room [kW]']) , dashes=False)

### As we see  the home office has the highest consumption in the home and the kitchen has the lowest consumption.

In [None]:
plt.figure(figsize=(20,7))
plt.title("Solar generation per month")
sns.lineplot(data = energy_per_day.filter(['Solar [kW]']).resample('M').sum(),dashes=False)

#### from plot :The Solar power has the highest rate in the April - May

## Home activity in day 2016-10-4

In [None]:
plt.figure(figsize=(20,10))
plt.title("Home activity in day 2016-10-4")
sns.lineplot(data = energy_data.loc['2016-10-04 00:00' : '2016-10-04 23:59'].filter(['Home office [kW]', 
                                     'Wine cellar [kW]', 'Kitchen 12 [kW]',
                                     'Kitchen 14 [kW]', 'Kitchen 38 [kW]', 'Barn [kW]',
                                     'Living room [kW]']),dashes=False)

In [None]:
weather_per_day = weather_data.resample('D').mean()  # note!! (mean) # D =>> for day sample
weather_per_day.head()

In [None]:
weather_per_month = weather_data.resample('M').mean()                # M =>> for month sample
plt.figure(figsize=(15,5))
plt.ylabel('°F')
plt.title("Temperature mean per month")
sns.lineplot(data = weather_per_month.filter(items=['temperature', 'apparentTemperature']),dashes=False)

In [None]:
weather_per_month = weather_data.resample('M').mean()                # M =>> for month sample
plt.figure(figsize=(15,5))
plt.title("Humidity mean per month")
sns.lineplot(data = weather_per_month.filter(items=['humidity']),dashes=False)

In [None]:
rooms_energy = energy_per_month.filter(items=[      # remove the devices consumption 
                                     'Home office [kW]', 'Wine cellar [kW]', 'Kitchen 12 [kW]',
                                     'Kitchen 14 [kW]', 'Kitchen 38 [kW]', 'Barn [kW]',
                                     'Living room [kW]']) 
devices_energy = energy_per_month.filter(items=[    # remove the rooms consumption
                                     'Dishwasher [kW]',
                                     'Furnace 1 [kW]', 'Furnace 2 [kW]',  'Fridge [kW]',
                                     'Garage door [kW]', 'Well [kW]',
                                     'Microwave [kW]'])

all_rooms_consum = rooms_energy.sum()
all_devices_consum = devices_energy.sum()
print(all_rooms_consum)
print(all_devices_consum)

In [None]:
plot = all_rooms_consum.plot(kind="pie", autopct='%.2f', figsize=(10,10))
plot.set_title("Consumption for rooms")
plot.set_ylabel('%')

### from this plot we can see that : the home office has the highest consumption in home

In [None]:
plot = all_devices_consum.plot(kind="pie", autopct='%.2f', figsize=(10,10))
plot.set_title("Consumption for devices")
plot.set_ylabel('%')

### The furnace has the highest consumption , in devices nearly the half of devices consumption

In [None]:
sns.regplot(x = energy_per_day['Furnace 2 [kW]'], y = weather_per_day['temperature'])

#### The relation between Furnace consumption and Temprature are invers

In [None]:
sns.regplot(x = energy_per_day['Wine cellar [kW]'], y = weather_per_day['temperature'])

In [None]:
sns.regplot(x = energy_per_day['Fridge [kW]'], y = weather_per_day['temperature'])

### The relation between temprature and Fridge consumption (Strong dependant)

In [None]:
sns.regplot(x = energy_per_day['Barn [kW]'], y = weather_per_day['temperature'])

### The temprature effect on barn is weak