# Extracting Time Series

[▲ Overview](0.0-Overview.ipynb)

[◀ Loading and Decoding Dataset](2.0-Loading-dataset.ipynb)

[▶ Data Exploration](3.0-Exploring-timeseries.ipynb)

In [1]:
import pandas as pd
from australian_housing import paths
from australian_housing.data.extract_timeseries import new_south_wales_index

In [2]:
df = pd.read_csv(paths.manager.interim_data_file, index_col=0)

In [3]:
df.head()

Unnamed: 0_level_0,Measure,Sector of Ownership,Type of work,Type of building,Geography Level,Region,Frequency,Value
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2011-07-01,Total number of dwelling units,Total Sectors,New,Houses,Statistical Area Level 3,Gosford,Monthly,14.0
2011-08-01,Total number of dwelling units,Total Sectors,New,Houses,Statistical Area Level 3,Gosford,Monthly,17.0
2011-09-01,Total number of dwelling units,Total Sectors,New,Houses,Statistical Area Level 3,Gosford,Monthly,21.0
2011-10-01,Total number of dwelling units,Total Sectors,New,Houses,Statistical Area Level 3,Gosford,Monthly,15.0
2011-11-01,Total number of dwelling units,Total Sectors,New,Houses,Statistical Area Level 3,Gosford,Monthly,15.0


In [4]:
for head in ('Measure', 'Sector of Ownership', 'Type of work', 'Type of building', 'Geography Level', 'Region', 'Frequency'):
    df[head] = df[head].astype('category')
df.describe(include='category')

Unnamed: 0,Measure,Sector of Ownership,Type of work,Type of building,Geography Level,Region,Frequency
count,2920,2920,2920,2920,2920,2920,2920
unique,1,1,1,10,4,4,1
top,Total number of dwelling units,Total Sectors,New,Total Residential,Statistical Area Level 4,New South Wales,Monthly
freq,2920,2920,2920,292,730,730,2920


In this dataset `Measure`, `Sector of Ownership`, `Type of work` and `Frequency` do not contain further information. For `Type of building`, `Geography Level`, and `Region` we need to select the correct values requested in the exercise (`Type of building` = `Houses`, `Geography Level` = `States and Territories`, and `Region` = `New South Wales`).

In [5]:
df['Measure'].unique()

[Total number of dwelling units]
Categories (1, object): [Total number of dwelling units]

In [6]:
df['Sector of Ownership'].unique()

[Total Sectors]
Categories (1, object): [Total Sectors]

In [7]:
df['Type of work'].unique()

[New]
Categories (1, object): [New]

In [8]:
df['Type of building'].unique()

[Houses, Semi-detached, row or terrace houses, townhous..., Semi-detached, row or terrace houses, townhous..., Semi-detached, row or terrace houses, townhous..., Flats units or apartments - In a one or two st..., Flats units or apartments - In a three storey ..., Flats units or apartments - In a four or more ..., Flats units or apartments - Total including th..., Total Other Residential, Total Residential]
Categories (10, object): [Houses, Semi-detached, row or terrace houses, townhous..., Semi-detached, row or terrace houses, townhous..., Semi-detached, row or terrace houses, townhous..., ..., Flats units or apartments - In a four or more ..., Flats units or apartments - Total including th..., Total Other Residential, Total Residential]

In [9]:
df['Geography Level'].unique()

[Statistical Area Level 3, Statistical Area Level 4, States and Territories, Australia]
Categories (4, object): [Statistical Area Level 3, Statistical Area Level 4, States and Territories, Australia]

In [10]:
df['Region'].unique()

[Gosford, Central Coast, New South Wales, Australia]
Categories (4, object): [Gosford, Central Coast, New South Wales, Australia]

In [11]:
nsw = df[new_south_wales_index(df)]
nsw

Unnamed: 0_level_0,Measure,Sector of Ownership,Type of work,Type of building,Geography Level,Region,Frequency,Value
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2011-07-01,Total number of dwelling units,Total Sectors,New,Houses,States and Territories,New South Wales,Monthly,1511.0
2011-08-01,Total number of dwelling units,Total Sectors,New,Houses,States and Territories,New South Wales,Monthly,1634.0
2011-09-01,Total number of dwelling units,Total Sectors,New,Houses,States and Territories,New South Wales,Monthly,1561.0
2011-10-01,Total number of dwelling units,Total Sectors,New,Houses,States and Territories,New South Wales,Monthly,1485.0
2011-11-01,Total number of dwelling units,Total Sectors,New,Houses,States and Territories,New South Wales,Monthly,1594.0
2011-12-01,Total number of dwelling units,Total Sectors,New,Houses,States and Territories,New South Wales,Monthly,1207.0
2012-01-01,Total number of dwelling units,Total Sectors,New,Houses,States and Territories,New South Wales,Monthly,982.0
2012-02-01,Total number of dwelling units,Total Sectors,New,Houses,States and Territories,New South Wales,Monthly,1243.0
2012-03-01,Total number of dwelling units,Total Sectors,New,Houses,States and Territories,New South Wales,Monthly,1327.0
2012-04-01,Total number of dwelling units,Total Sectors,New,Houses,States and Territories,New South Wales,Monthly,1057.0


In [12]:
nsw[['Value']].head()

Unnamed: 0_level_0,Value
Time,Unnamed: 1_level_1
2011-07-01,1511.0
2011-08-01,1634.0
2011-09-01,1561.0
2011-10-01,1485.0
2011-11-01,1594.0


[▲ Overview](0.0-Overview.ipynb)

[◀ Loading and Decoding Dataset](2.0-Loading-dataset.ipynb)

[▶ Data Exploration](3.0-Exploring-timeseries.ipynb)