In [4]:
from os import chdir
chdir('/home/jovyan')

# Downloading the Datasets

First things first: download the datasets that will be used.

## 1. [Individual household electric power consumption Data Set](https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumption)

### Abstract 

Measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 years. Different electrical quantities and some sub-metering values are available.

Property | Description
:---: | :--- 
Type | Multivariate, Time-Series
Area | Physical
Attribute Characteristics | Real
Number of Attributes | 9
Date Donated | 2012-08-30
Associated Tasks | Regression, Clustering
Missing Values | Yes

### Source

Georges Hebrail (georges.hebrail '@' edf.fr), Senior Researcher, EDF R&D, Clamart, France
Alice Berard, TELECOM ParisTech Master of Engineering Internship at EDF R&D, Clamart, France


### Data Set Information

This archive contains **2075259 measurements** gathered in a house located in Sceaux (7km of Paris, France) between December 2006 and November 2010 (47 months).

#### **Notes**:

1. (global_active_power*1000/60 - sub_metering_1 - sub_metering_2 - sub_metering_3) represents the active energy consumed every minute (in watt hour) in the household by electrical equipment not measured in sub-meterings 1, 2 and 3.


2. The dataset contains some missing values in the measurements (nearly 1,25% of the rows). All calendar timestamps are present in the dataset but for some timestamps, the measurement values are missing: a missing value is represented by the absence of value between two consecutive semi-colon attribute separators. For instance, the dataset shows missing values on April 28, 2007.



Attribute | Information
:---: | :---
date | Date in format dd/mm/yyyy
time | time in format hh:mm:ss
global_active_power | household global minute-averaged active power (in kilowatt)
global_reactive_power | household global minute-averaged reactive power (in kilowatt)
voltage | minute-averaged voltage (in volt)
global_intensity | household global minute-averaged current intensity (in ampere)
sub_metering_1 | energy sub-metering No. 1 (in watt-hour of active energy). It corresponds to the kitchen, containing mainly a dishwasher, an oven and a microwave (hot plates are not electric but gas powered).
sub_metering_2 | energy sub-metering No. 2 (in watt-hour of active energy). It corresponds to the laundry room, containing a washing-machine, a tumble-drier, a refrigerator and a light.
sub_metering_3 | energy sub-metering No. 3 (in watt-hour of active energy). It corresponds to an electric water-heater and an air-conditioner.


In [4]:
!wget -P data/ \
https://archive.ics.uci.edu/ml/machine-learning-databases/00235/household_power_consumption.zip

--2019-10-04 01:11:16--  https://archive.ics.uci.edu/ml/machine-learning-databases/00235/household_power_consumption.zip
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20640916 (20M) [application/x-httpd-php]
Saving to: ‘data/household_power_consumption.zip’


2019-10-04 01:11:18 (15.5 MB/s) - ‘data/household_power_consumption.zip’ saved [20640916/20640916]



In [3]:
%%bash
unzip -u data/household_power_consumption.zip
rm -rf data/*.zip
sed '/^\s*$/d' data/household_power_consumption.txt > data/household_power_consumption.csv
rm -rf data/*.txt

In [3]:
import pandas as pd

dataset = pd.read_csv('data/household_power_consumption.csv',
                      sep=";")

  interactivity=interactivity, compiler=compiler, result=result)


In [4]:
dataset.head()

Unnamed: 0,Date,Time,Global_active_power,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3
0,16/12/2006,17:24:00,4.216,0.418,234.84,18.4,0.0,1.0,17.0
1,16/12/2006,17:25:00,5.36,0.436,233.63,23.0,0.0,1.0,16.0
2,16/12/2006,17:26:00,5.374,0.498,233.29,23.0,0.0,2.0,17.0
3,16/12/2006,17:27:00,5.388,0.502,233.74,23.0,0.0,1.0,17.0
4,16/12/2006,17:28:00,3.666,0.528,235.68,15.8,0.0,1.0,17.0


In [5]:
dataset.describe()

Unnamed: 0,Sub_metering_3
count,2049280.0
mean,6.458447
std,8.437154
min,0.0
25%,0.0
50%,1.0
75%,17.0
max,31.0


In [6]:
dataset.dtypes

Date                      object
Time                      object
Global_active_power       object
Global_reactive_power     object
Voltage                   object
Global_intensity          object
Sub_metering_1            object
Sub_metering_2            object
Sub_metering_3           float64
dtype: object

CREATE TABLE statement:

`CREATE TABLE adult (
    _id SERIAL PRIMARY KEY,
    Date TEXT,
    Time TEXT,
    Global_active_power TEXT,
    Global_reactive_power TEXT,
    Voltage TEXT,
    Global_intensity TEXT,
    Sub_metering_1 TEXT,
    Sub_metering_2 TEXT,
    Sub_metering_3 TEXT
);
COPY adult FROM '/tmp/household_power_consumption.csv' DELIMITER ';' CSV;
`