# Assignment 4

## Introduction
This assignment uses data from the UC Irvine Machine Learning Repository, a popular repository for machine learning datasets. In particular, we will be using the "Individual household electric power consumption Data Set" which I have made available on the course web site:

* Dataset: Electric power consumption [20Mb]

* Description: Measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 years. Different electrical quantities and some sub-metering values are available.

The following descriptions of the 9 variables in the dataset are taken from the UCI web site:

- Date: Date in format dd/mm/yyyy
- Time: time in format hh:mm:ss
- Global_active_power: household global minute-averaged active power (in kilowatt)
- Global_reactive_power: household global minute-averaged reactive power (in kilowatt)
- Voltage: minute-averaged voltage (in volt)
- Global_intensity: household global minute-averaged current intensity (in ampere)
- Sub_metering_1: energy sub-metering No. 1 (in watt-hour of active energy). It corresponds to the kitchen, containing mainly a  dishwasher, an oven and a microwave (hot plates are not electric but gas powered).
- Sub_metering_2: energy sub-metering No. 2 (in watt-hour of active energy). It corresponds to the laundry room, containing a washing-machine, a tumble-drier, a refrigerator and a light.
- Sub_metering_3: energy sub-metering No. 3 (in watt-hour of active energy). It corresponds to an electric water-heater and an air-conditioner.

## Dataset
We will only be using data from the dates 2007-02-01 and 2007-02-02. One alternative is to read the data from just those dates rather than reading in the entire dataset and subsetting to those dates.

### Dataset Domain 
Analysis of Household energy consumption

## Research Question
- Our overall goal here is simply to examine how household energy usage varies over a 2-day period in February, 2007



## Loading The Data

In [1]:
## Load the csv file
import datetime
import pandas as pd
dtypes = {'Date': 'str', 'Time': 'str', 'Global_active_power': 'float', 'Global_reactive_power': 'float','Voltage': 'float','Global_intensity': 'float','Sub_metering_1': 'float','Sub_metering_2': 'float','Sub_metering_3': 'float'}
parse_dates = ['col1', 'col2']

hoursePowerConsumption = pd.read_csv('household_power_consumption_Sample.txt',sep=';',dtype=dtypes, parse_dates=[['Date', 'Time']],infer_datetime_format=True,na_values=['?'])

hoursePowerConsumption.dtypes



Date_Time                datetime64[ns]
Global_active_power             float64
Global_reactive_power           float64
Voltage                         float64
Global_intensity                float64
Sub_metering_1                  float64
Sub_metering_2                  float64
Sub_metering_3                  float64
dtype: object

In [88]:
hoursePowerConsumption.head(10)


Unnamed: 0,Date_Time,Global_active_power,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3
0,2007-01-31 23:52:00,0.228,0.0,243.28,1.0,0.0,0.0,0.0
1,2007-01-31 23:53:00,0.302,0.068,243.57,1.4,0.0,0.0,0.0
2,2007-01-31 23:54:00,0.338,0.128,243.43,1.4,0.0,0.0,0.0
3,2007-01-31 23:55:00,0.334,0.13,243.8,1.4,0.0,0.0,0.0
4,2007-01-31 23:56:00,0.332,0.126,243.26,1.4,0.0,0.0,0.0
5,2007-01-31 23:57:00,0.328,0.124,242.59,1.4,0.0,0.0,0.0
6,2007-01-31 23:58:00,0.328,0.126,242.87,1.4,0.0,0.0,0.0
7,2007-01-31 23:59:00,0.326,0.126,242.8,1.4,0.0,0.0,0.0
8,2007-02-01 00:00:00,0.326,0.128,243.15,1.4,0.0,0.0,0.0
9,2007-02-01 00:01:00,0.326,0.13,243.32,1.4,0.0,0.0,0.0


In [2]:

hoursePowerConsumption = hoursePowerConsumption[(hoursePowerConsumption['Date_Time'] > "2007-01-31 23:59:59") & (hoursePowerConsumption['Date_Time'] < "2007-02-03 00:00:00")]
hoursePowerConsumption.head(10)

Unnamed: 0,Date_Time,Global_active_power,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3
8,2007-02-01 00:00:00,0.326,0.128,243.15,1.4,0.0,0.0,0.0
9,2007-02-01 00:01:00,0.326,0.13,243.32,1.4,0.0,0.0,0.0
10,2007-02-01 00:02:00,0.324,0.132,243.51,1.4,0.0,0.0,0.0
11,2007-02-01 00:03:00,0.324,0.134,243.9,1.4,0.0,0.0,0.0
12,2007-02-01 00:04:00,0.322,0.13,243.16,1.4,0.0,0.0,0.0
13,2007-02-01 00:05:00,0.32,0.126,242.29,1.4,0.0,0.0,0.0
14,2007-02-01 00:06:00,0.32,0.126,242.46,1.4,0.0,0.0,0.0
15,2007-02-01 00:07:00,0.32,0.126,242.63,1.4,0.0,0.0,0.0
16,2007-02-01 00:08:00,0.32,0.128,242.7,1.4,0.0,0.0,0.0
17,2007-02-01 00:09:00,0.236,0.0,242.89,1.0,0.0,0.0,0.0


## Ploting

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib notebook

## Global Active Power Histogram



In [4]:
hoursePowerConsumption.hist(column="Global_active_power",alpha=0.7,bins=12)
plt.xlabel("Global Active Power (kilowatts)",fontsize=15)
plt.ylabel("Frequency",fontsize=15)



<IPython.core.display.Javascript object>

(0, 1200)

In [110]:
hoursePowerConsumption.plot(x = "Date_Time", y="Global_active_power")
plt.xlabel("DateTime",fontsize=15)
plt.ylabel("Global Active Power (kilowatts)",fontsize=15)

#plt.ylim ([0,1200])


<IPython.core.display.Javascript object>

<matplotlib.text.Text at 0x1d4c2712da0>

In [24]:
hoursePowerConsumption.plot(x="Date_Time", y=["Sub_metering_1", "Sub_metering_2", "Sub_metering_3"])
plt.xlabel("Date_Time",fontsize=15)
plt.ylabel("Sub_metering_types",fontsize=15)
plt.gcf().subplots_adjust(bottom=0.25)
#ylab = "Energy sub metering"

<IPython.core.display.Javascript object>

## Discussion

It is clear that there is a big  relation between the household energy and 

- Sub_metering_1: energy sub-metering No. 1 (in watt-hour of active energy). It corresponds to the kitchen, containing mainly a  dishwasher, an oven and a microwave (hot plates are not electric but gas powered) shown in figure 3.

The second attribute is 
- Sub_metering_3: energy sub-metering No. 3 (in watt-hour of active energy). It corresponds to an electric water-heater and an air-conditioner. shown in figure 3

## Links


-  UC Irvine Machine Learning Repository http://archive.ics.uci.edu/ml/index.php
- Dataset: Electric power consumption [20Mb] https://d396qusza40orc.cloudfront.net/exdata%2Fdata%2Fhousehold_power_consumption.zip
- the UCI web site:
 https://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumption